date:20101205

[HACKERS] Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

2010-12-05 Thread Fujii Masao

On Mon, Dec 6, 2010 at 3:42 PM, Fujii Masao  wrote:
> On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao  wrote:
>> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
>> -forever' (i.e., you set allow_standalone_master to false), the master
>> should detect the standby crash as soon as possible by using the
>> timeout. For example, imagine that max_wal_senders is set to one and
>> the master cannot detect the standby crash because of absence of the
>> timeout. In this case, even if you start new standby, it will not be
>> able to connect to the master since there is no free walsender slot.
>> As the result, the master actually waits forever.
>
> This occurred to me that the timeout would be required even for
> asynchronous streaming replication. So, how about implementing the
> replication timeout feature before synchronous replication itself?

Here is the patch. This is one of features required for synchronous
replication, so I added this into current CF as a part of synchronous
replication.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


replication_timeout_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeout and wait-forever in sync rep

2010-12-05 Thread Heikki Linnakangas


On 06.12.2010 07:42, Fujii Masao wrote:

On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao  wrote:

The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
-forever' (i.e., you set allow_standalone_master to false), the master
should detect the standby crash as soon as possible by using the
timeout. For example, imagine that max_wal_senders is set to one and
the master cannot detect the standby crash because of absence of the
timeout. In this case, even if you start new standby, it will not be
able to connect to the master since there is no free walsender slot.
As the result, the master actually waits forever.


This occurred to me that the timeout would be required even for
asynchronous streaming replication. So, how about implementing the
replication timeout feature before synchronous replication itself?


Sounds good to me. The more pieces we can nibble off the main patch the 
better.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Heikki Linnakangas


On 06.12.2010 02:55, Robert Haas wrote:

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane  wrote:

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges.  The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction.  I think the requirements would basically be
(1) xmin<= any listed XIDs<  xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

Thoughts?


I think this is too ugly to live.  I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots.  It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.


The client doesn't need to know anything about the snapshot blob that 
the server gives it. It just needs to pass it back to the server through 
the other connection. To the client, it's just an opaque chunk of bytes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeout and wait-forever in sync rep

2010-12-05 Thread Fujii Masao

On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao  wrote:
> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
> -forever' (i.e., you set allow_standalone_master to false), the master
> should detect the standby crash as soon as possible by using the
> timeout. For example, imagine that max_wal_senders is set to one and
> the master cannot detect the standby crash because of absence of the
> timeout. In this case, even if you start new standby, it will not be
> able to connect to the master since there is no free walsender slot.
> As the result, the master actually waits forever.

This occurred to me that the timeout would be required even for
asynchronous streaming replication. So, how about implementing the
replication timeout feature before synchronous replication itself?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Dmitriy Igrishin

Hey Andrew,

No, thanks. :-)

And I don't think that libpq should follows it (libpqtypes).

2010/12/6 Andrew Chernow 

>
>  A varargs version of PQexecParams() would be handy, though. Imagine being
>> able
>> to do:
>>
>> PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar,
>> barvar);
>>
>> instead of constructing an array for the variables.
>>
>>
> http://libpqtypes.esilo.com/man3/PQexecf.html
>
>
> --
> Andrew Chernow
> eSilo, LLC
> every bit counts
> http://www.esilo.com/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
// Dmitriy.

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Koichi Suzuki

Thank you Joachim;

Yes, and the current patch requires the original (publisher)
transaction is alive to prevent RecentXmin updated.

I hope this restriction is acceptable if publishing/subscribing is
provided via functions, not statements.

Cheers;
--
Koichi Suzuki



2010/12/6 Joachim Wieland :
> On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas  wrote:
>> On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan  wrote:
>>> Why not just say give me the snapshot currently held by process ?
>>>
>>> And please, not temp files if possible.
>>
>> As far as I'm aware, the full snapshot doesn't normally exist in
>> shared memory, hence the need for publication of some sort.  We could
>> dedicate a shared memory region for publication but then you have to
>> decide how many slots to allocate, and any number you pick will be too
>> many for some people and not enough for others, not to mention that
>> shared memory is a fairly precious resource.
>
> So here is a patch that I have been playing with in the past, I have
> done it a while back and thanks go to Koichi Suzuki for his helpful
> comments. I have not published it earlier because I haven't worked on
> it recently and from the discussion that I brought up in march I got
> the feeling that people are fine with having a first version of
> parallel dump without synchronized snapshots.
>
> I am not really sure that what the patch does is sufficient nor if it
> does it in the right way but I hope that it can serve as a basis to
> collect ideas (and doubt).
>
> My idea is pretty much similar to Robert's about publishing snapshots
> and subscribing to them, the patch even uses these words.
>
> Basically the idea is that a transaction in isolation level
> serializable can publish a snapshot and as long as this transaction is
> alive, its snapshot can be adopted by other transactions. Requiring
> the publishing transaction to be serializable guarantees that the copy
> of the snapshot in shared memory is always current. When the
> transaction ends, the copy of the snapshot is also invalidated and
> cannot be adopted anymore. So instead of doing explicit checks, the
> patch aims at always having a reference transaction around that
> guarantees validity of the snapshot information in shared memory.
>
> The patch currently creates a new area in shared memory to store
> snapshot information but we can certainly discuss this... I had a GUC
> in mind that can control the number of available "slots", similar to
> max_prepared_transactions. Snapshot information can become quite
> large, especially with a high number of max_connections.
>
> Known limitations: the patch is lacking awareness of prepared
> transactions completely and doesn't check if both backends belong to
> the same user.
>
>
> Joachim
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-12-05 Thread Craig Ringer


On 6/12/2010 12:57 PM, Craig Ringer wrote:

On 22/11/2010 7:37 PM, Magnus Hagander wrote:


Finally getting to looking at this one - sorry about the very long delay.


Ditto, I'm afraid.


Oh, I forgot to mention in the patch email: I'm not sure I've taken the 
right approach in terms of how I've hooked the crash dump code into the 
build system. I'm fairly happy with when it's actually inited, and with 
the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure 
about is how I've handled the conditional compilation by platform.


Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It 
exports a generic win32-garbage-free function:


 void
 installCrashDumpHandler();

I was just going to create a no-op version of the same function in a 
(new) file src/backend/ports/crashdump.c , but I was having problems 
figuring out how to sensibly express "compile this file for everything 
except these ports". In addition, doing it this way means there's still 
a pointless no-op function call in each backend start and some useless 
symbols - maybe not a big worry, but not ideal.


What I ended up doing was putting a conditionally compiled switch in 
port.h that produces an inline no-op version of installCrashDumpHandler 
for unsupported platforms, and an extern for supported platforms 
(currently win32). That'll optimize out completely on unsupported 
platforms, which is nice if unimportant.


I'm just not sure if port.h is really the right place and if it's only 
used by the backend. I considered a new header included by postgres.h 
(since it's really backend-only stuff) but as it seems Pg generally 
prefers bigger headers over more little headers, I popped it in port.h .


Comments?

--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq changes for synchronous replication

2010-12-05 Thread Fujii Masao

On Mon, Dec 6, 2010 at 3:07 AM, Greg Smith  wrote:
> The one time this year top-posting seems appropriate...this patch seems
> stalled waiting for some sort of response to the concerns Alvaro raised
> here.

Sorry for the delay. I didn't have the time.

> I gave this a look.  It seems good, but I'm not sure about this bit:

Thanks for the review!

> I guess this was OK when this was conceived as CopyXlog, but since it's
> now a generic mechanism, this seems a bit unwise.  Should this be
> reconsidered so that it's possible to change the format or number of
> columns?

I changed CopyBothResponse message so that it includes the format
and number of columns of copy data. Please see the attached patch.

> (The paragraph added to the docs is also a bit too specific about this
> being used exclusively in streaming replication, ISTM)

Yes. But it seems difficult to generalize the docs more because currently
only SR uses Copy-both. So I had to write that, for example, the condition
to get into the state is only "START REPLICATION" command.

> While modifying the code, it occurred to me that we might have to add new
> ExecStatusType like PGRES_COPY_BOTH and use that for CopyBoth mode,
> for the sake of consistency. But since it's just alias of PGRES_COPY_BOTH
> for now, i.e., there is no specific behavior for that ExecStatusType, I
> don't
> think that it's worth adding that yet.
>
>
> I'm not so sure about this.  If we think that it's worth adding a new
> possible state, we should do so now; we will not be able to change this
> behavior later.

OK. I added that new state.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


libpqrcv_send_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-12-05 Thread Craig Ringer


On 22/11/2010 7:37 PM, Magnus Hagander wrote:


Finally getting to looking at this one - sorry about the very long delay.


Ditto, I'm afraid.


I agree with Heikki's earlier comment that it's better to have this
included in the backend - but that's obviously not going to happen for
already-released versions. I'd therefor advocate a contrib module for
existing versions, and then in core for 9.1+.


I think that was the conclusion the following discussion came to, as 
well. It's easy to distribute a binary .dll for older versions and if 
the EDB folks found this facility really useful they could even bundle 
it in the installer. There's no point backpatching it given how few 
win32 users will build from source.



We should then have an option to turn it off (on by default). But we
don't need to pay the overhead on every backend startup - we could
just map the value into the parameter shared memory block, and require
a full postmaster restart in order to change it.


As discussed later, implemented by testing for a 'crashdumps' directory. 
If the directory exists, a dump is written. If the directory is missing 
a log message is emitted to explain why no dump was written; as this'll 
only happen when a backend dies, log noise isn't an issue and it'll help 
save sysadmin head-scratching given the "magic" behaviour of turning 
on/off based on directory existence.



Do we want to backpatch it into contrib/? Adding a new module there
seems kind of wrong - probably better to keep the source separate and
just publish the DLL files for people who do debugging?


+1 to just publishing the DLLs


Looking at the code:
* Why do we need to look at differnt versions of dbghelp.dll? Can't we
always use the one with Windows? And in that case, can't we just use
compile-time linking, do we need to bother with DLL loading at all?


Newer versions than those shipped with an OS release contain updated and 
improved functionality as well as bug fixes. We *can* use the version 
shipped with the OS, as even in XP dbghelp.dll contains the required 
functionality, but it'd be better to bundle it. That's microsoft's 
recommendation when you use dbghelp.dll .



* Per your comment about using elog() - a good idea in that case is to
use write_stderr(). That will send the output to stderr if there is
one, and otherwise the eventlog (when running as service).


Hm, ok. The attached patch doesn't do that yet. Given the following 
comments, do you think the change still necessary?



* And yes, in the crash handler, we should *definitely* not use elog().


Currently I'm actually using it, but only after the dump has been 
written or we've decided not to write a dump. At this point, if elog() 
fails we don't really care, we'd just be returning control to the OS's 
fatal exception cleanup anyway. It's IMO desirable for output to go to 
the same log as everything else, so admins can find out what's going on 
and can associate the crash dump files with events that happened around 
a certain time.



* On Unix, the core file is dropped in the database directory, we
don't have a separate directory for crashdumps. If we want to be
consistent, we should do that here too. I do think that storing them
in a directory like "crashdumps" is better, but I just wanted to raise
the comment.


Given the recent discussion about using the dir as an on/off switch, 
that'll be necessary.



* However, when storing it in crashdumps, I think the code would need
to create that directory if it does not exist, doesn't it?


Not now that we're using it an on/off switch, but of course it would've 
made sense otherwise.



* Right now, we overwrite old crashdumps. It is well known that PIDs
are recycled pretty quickly on Windows - should we perhaps dump as
postgres--.mdmp when there is a filename conflict?


Good idea. Rather than try to test for existing files I've just taken 
the simple route (always good in code run during process crash) and 
appended the system ticks to the dump name. System ticks are a 32-bit 
quantity that wraps about every 40 days, so they should be easily unique 
enough in combination with the pid. The logs will report the crashdump 
filenames unless the backend is too confused to be able to elog(), and 
all the crash dump files have file system timestamps so there isn't much 
point trying to format a crash date/time into their names.


I could use wall clock seconds (or ms) instead, but favour system ticks 
because they can be read with a simple int-returning call that doesn't 
require any more stack allocation. On Windows, getting wall clock time 
in seconds seems to involve a call to GetSystemTimeAsFileTime 
(http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to 
populate a struct containing a fake-64-bit int as two 32-bit ints. It's 
not clear that it's free of timezone calculations and generally looks 
like something better avoided in a crash handler if it's not necessary. 
As GetSystemTicks seems quite sufficient, I thought

Re: [HACKERS] Review: Extensions Patch

2010-12-05 Thread David E. Wheeler

On Dec 4, 2010, at 6:14 AM, Dimitri Fontaine wrote:

> Hi,
> 
> Thanks for the review, that's quite one! :)
> 
> I'm not sure to follow you all along, it seems like the reading is "try
> it first then understand and comment again", so sometimes I'm not sure
> if you're saying that docs are missing the point or that the behaviour
> ain't right.

Overall I think the docs could use a lot more fleshing out. Some of the stuff 
in the wiki would help a lot. At some point, though, I'll work over the docs 
myself and either send a patch to you or to the list (if it has been committed 
to core).

> "David E. Wheeler"  writes:
>> `make installcheck` fails (regression.diffs attached).
> 
> You forgot the attachment. I'd add mine if I didn't get "All 124 tests
> passed." result for the command :)

Sorry, attached now.

> (Note that I've been having problems too, but can't reproduce them
> easily, and they might have been related to forgotten maintainer-clean
> steps).

Yeah, I did that a bunch of times. This was the best run.

> Well, the exposed functionality aren't much (2 SQL commands and some
> support functions) and you need to have a contrib at hand to exercise
> them. So I'm not sure about how to add in regression tests for contrib
> infrastructure work and have them depend only on -core. I'll take ideas.

You write a very simple contrib module exclusively for testing. It doesn't have 
to actually do anything other than create a couple of objects. A domain perhaps.

> The other thing is that the very important feature here is dump and
> restore of a database containing extensions. You're not reporting about
> it, though. And I'm not sure about how to check for pg_dump content from
> pg_regress oriented tests.

Yeah, I haven't tried the dump and restore stuff. Are there no dump/restore 
tests in core? If there are, I expect all you'd have to do is add an extension 
or two to be dumped and restored.

>> Yes. Much of the execution is superuser-only, so we need to make sure
>> that such is actually the case (unit tests would help with that).
> 
> How do you mean? Trying to subvert the features by being able to run
> them as a non-superuser? How do you install tests around that?

If the tests run as a super user (I think they usually do), then all you have 
to do is create a new unprivileged user, switch to that user via SET ROLE, and 
call the function in question.

>> The pg_extension catalog docs are a little thin, but probably
>> sufficient. They appear to be duplicated, though, with the entries for
>> catalog-pg-extension and view-pg-extensions looking identical. One is
>> apparently a list of installed extensions, and the other an SRF that
>> lists installed and available extensions. But I find the duplication a
>> bit confusing. (Might be easer if I wasn't reading SGML though.)
> 
> Well yes the pg_extension catalog is filled at CREATE EXTENSION time so
> that we have an OID to register dependencies on. Then I added a SRF to
> also list available extensions so that it's easy to integrate the
> feature into pgadmin and the like.

Right.

> Now I'm open to suggestions as far as the names are involved, or maybe
> we should keep the SRF but remove the view. psql could easily enough use
> the function directly I think.

I think the catalog should be pg_created_extensions (or 
pg_installed_extensions) and the view should be pg_available_extensions.

Is there no way to have a catalog table visible to *all* databases that 
contains the list of available extensions?

>> I don't understand this paragraph at all:
>> This paragraph isn't very clear, either:
>> Examples might help.
> 
> Well, I think native English skills are required here. And I'm not too
> sure about the example.

I recognize that some documentation pages have no (or very few examples). But 
oftentimes nothing makes the utility of a feature more apparent than an 
example. I tend to use them liberally in my documentation.

As I said, I'll probably contribute a refactoring of the docs at some point.

>> Control file keys:
>> 
>> * comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, 
>> no?
>> * version -- If it's free-form, how will you do dependency-checking? Or will 
>> you?
>> * script
>> * encoding -- Seems unnecessary; one can explicitly set it in the .sql file, 
>> no?
>> * custom_variable_classes
> 
> Doc says:
>  http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html
> 
>  This control file allows for registering the extension and should
>  provide values for the keys described below.
> 
> Then there's the detail about fields. I think the context is enough to
> understand, but I'd be happy to integrate some revisions of the current
> content.

Yes, I think if you just mentioned that the comment is used for an implicit 
CREATE COMMENT that it would be useful. That's true, yes? The others are okay 
as-is.

> For the version, it has been agreed before (Developer Meeting) that v1
> would not care about inter-extension

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Joachim Wieland

On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas  wrote:
> On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan  wrote:
>> Why not just say give me the snapshot currently held by process ?
>>
>> And please, not temp files if possible.
>
> As far as I'm aware, the full snapshot doesn't normally exist in
> shared memory, hence the need for publication of some sort.  We could
> dedicate a shared memory region for publication but then you have to
> decide how many slots to allocate, and any number you pick will be too
> many for some people and not enough for others, not to mention that
> shared memory is a fairly precious resource.

So here is a patch that I have been playing with in the past, I have
done it a while back and thanks go to Koichi Suzuki for his helpful
comments. I have not published it earlier because I haven't worked on
it recently and from the discussion that I brought up in march I got
the feeling that people are fine with having a first version of
parallel dump without synchronized snapshots.

I am not really sure that what the patch does is sufficient nor if it
does it in the right way but I hope that it can serve as a basis to
collect ideas (and doubt).

My idea is pretty much similar to Robert's about publishing snapshots
and subscribing to them, the patch even uses these words.

Basically the idea is that a transaction in isolation level
serializable can publish a snapshot and as long as this transaction is
alive, its snapshot can be adopted by other transactions. Requiring
the publishing transaction to be serializable guarantees that the copy
of the snapshot in shared memory is always current. When the
transaction ends, the copy of the snapshot is also invalidated and
cannot be adopted anymore. So instead of doing explicit checks, the
patch aims at always having a reference transaction around that
guarantees validity of the snapshot information in shared memory.

The patch currently creates a new area in shared memory to store
snapshot information but we can certainly discuss this... I had a GUC
in mind that can control the number of available "slots", similar to
max_prepared_transactions. Snapshot information can become quite
large, especially with a high number of max_connections.

Known limitations: the patch is lacking awareness of prepared
transactions completely and doesn't check if both backends belong to
the same user.

Joachim
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 95beba8..c24150f 100644
*** a/src/backend/storage/ipc/ipci.c
--- b/src/backend/storage/ipc/ipci.c
*** CreateSharedMemoryAndSemaphores(bool mak
*** 124,129 
--- 124,130 
  		size = add_size(size, BTreeShmemSize());
  		size = add_size(size, SyncScanShmemSize());
  		size = add_size(size, AsyncShmemSize());
+ 		size = add_size(size, SyncSnapshotShmemSize());
  #ifdef EXEC_BACKEND
  		size = add_size(size, ShmemBackendArraySize());
  #endif
*** CreateSharedMemoryAndSemaphores(bool mak
*** 228,233 
--- 229,235 
  	BTreeShmemInit();
  	SyncScanShmemInit();
  	AsyncShmemInit();
+ 	SyncSnapshotInit();

  #ifdef EXEC_BACKEND

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 6e7a6db..00522fb 100644
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
*** typedef struct ProcArrayStruct
*** 91,96 
--- 91,111 

  static ProcArrayStruct *procArray;

+ 
+ /* this should be a GUC later... */
+ #define MAX_SYNC_SNAPSHOT_SETS	4
+ typedef struct
+ {
+ 	SnapshotData	ssd;
+ 	char			name[NAMEDATALEN];
+ 	BackendId		backendId;
+ 	OiddatabaseId;
+ } NamedSnapshotData;
+ 
+ typedef NamedSnapshotData* NamedSnapshot;
+ 
+ static NamedSnapshot syncSnapshots;
+ 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
*** static int KnownAssignedXidsGetAndSetXmi
*** 159,164 
--- 174,182 
  static TransactionId KnownAssignedXidsGetOldestXmin(void);
  static void KnownAssignedXidsDisplay(int trace_level);

+ static bool DeleteSyncSnapshot(const char *name);
+ static bool snapshotPublished = false;  /* true if we have published at least one snapshot */
+ 
  /*
   * Report shared-memory space needed by CreateSharedProcArray.
   */
*** ProcArrayRemove(PGPROC *proc, Transactio
*** 350,355 
--- 368,379 
  void
  ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
  {
+ 	if (snapshotPublished)
+ 	{
+ 		DeleteSyncSnapshot(NULL);
+ 		snapshotPublished = false;
+ 	}
+ 
  	if (TransactionIdIsValid(latestXid))
  	{
  		/*
*** KnownAssignedXidsDisplay(int trace_level
*** 3104,3106 
--- 3132,3374 

  	pfree(buf.data);
  }
+ 
+ 
+ /*
+  *  Report space needed for our shared memory area.
+  *
+  *  Memory is structured as follows:
+  *
+  *  NamedSnapshotData[0]
+  *  NamedSnapshotData[1]
+  *  NamedSnapshotData[2]
+  *  Xids for NamedSnapshotData[0]
+  *  Sub-Xids for NamedSnapshotData[

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Andrew Chernow




A varargs version of PQexecParams() would be handy, though. Imagine being able
to do:

PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar, 
barvar);

instead of constructing an array for the variables.



http://libpqtypes.esilo.com/man3/PQexecf.html

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Andrew Chernow


On 12/5/2010 4:22 AM, Marc Balmer wrote:

I am suggesting adding a function to libpq:

PGresult *PQvexec(PGconn *conn, const char *fmt, ...);

It behaves similar to PQexec, but it allows for printf style varargs and
does connection re-establishing if the connection fails (it can be
discussed if this already to much magic, maybe remove this part).  It
has been carefully designed to handle memory the right way.  We use this
since a long time.

What do you think?



I think it is a wonderful idea.  Check out libpqtypes.  It has a PQexecf, 
PQexecvf, PQsendf and PQsendvf.  But that is just the beginning


http://libpqtypes.esilo.com
http://pgfoundry.org/projects/libpqtypes/
--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas

On Sun, Dec 5, 2010 at 9:35 PM, Rob Wultsch  wrote:
> On Sun, Dec 5, 2010 at 6:59 PM, Robert Haas  wrote:
>> On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch  wrote:
>>> I think you have read a bit more into what I have said than is
>>> correct.  MySQL can deal with thousands of users and separate schemas
>>> on commodity hardware. There are many design decisions (some
>>> questionable) that have made MySQL much better in a shared hosting
>>> environment than pg and I don't know where the grants system falls
>>> into that.
>>
>> Objection: Vague.
>
> I retract the remark, your honor.

Clarifying it would be fine, too...  :-)

> At some point Hackers should look at pg vs MySQL multi tenantry but it
> is way tangential today.

My understanding is that our schemas work like MySQL databases; and
our databases are an even higher level of isolation.  No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Rob Wultsch

On Sun, Dec 5, 2010 at 6:59 PM, Robert Haas  wrote:
> On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch  wrote:
>> I think you have read a bit more into what I have said than is
>> correct.  MySQL can deal with thousands of users and separate schemas
>> on commodity hardware. There are many design decisions (some
>> questionable) that have made MySQL much better in a shared hosting
>> environment than pg and I don't know where the grants system falls
>> into that.
>
> Objection: Vague.
>

I retract the remark, your honor.

At some point Hackers should look at pg vs MySQL multi tenantry but it
is way tangential today.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Robert Haas

On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan  wrote:
> Why not just say give me the snapshot currently held by process ?
>
> And please, not temp files if possible.

As far as I'm aware, the full snapshot doesn't normally exist in
shared memory, hence the need for publication of some sort.  We could
dedicate a shared memory region for publication but then you have to
decide how many slots to allocate, and any number you pick will be too
many for some people and not enough for others, not to mention that
shared memory is a fairly precious resource.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_execute_from_file review

2010-12-05 Thread Robert Haas

On Sun, Dec 5, 2010 at 6:01 PM, Tom Lane  wrote:
> Itagaki Takahiro  writes:
>> On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine  
>> wrote:
>>> My understanding is that the variadic form shadows the other one in a
>>> way that it's now impossible to call it from SQL level. That's the
>>> reason why I did the (text, text, text, VARIADIC text) version before,
>>> but is it true?
>
>> The VARIADIC version doesn't hide the 3-args version. I tested the
>> behavior by printf-debug. The planner seems to think the non VARIADIC
>> version is the best-matched one when 3 arguments are passed.
>
> Why is there a variadic replace() in this patch at all?  It seems just
> about entirely unrelated to the stated purpose of the patch, as well
> as being of dubious usefulness.  When would it be superior to
>
>        replace(replace(orig, from1, to1), from2, to2), ...
>
> The implementation doesn't appear to offer any material speed
> improvement over nested calls of that sort, and I'm finding it hard to
> visualize when it would be more useful than nested calls.  The
> documentation is entirely inadequate as well.

An iterated replacement has different semantics from a simultaneous
replace - replacing N placeholders with values simultaneously means
you don't need to worry about the case where one of the replacement
strings contains something that looks like a placeholder.  I actually
think a simultaneous replacement feature would be quite handy but I
make no comment on whether it belongs as part of this patch.  The
discussion on this patch has been rather wide-ranging, and it's not
clear to me that there's really consensus on what this patch needs to
- or should - do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas

On Sat, Dec 4, 2010 at 8:04 PM, Jeff Janes  wrote:
> But who would be doing the passing?  For the postmaster to be doing
> that would probably go against the minimalist design.  It would have
> to keep track of which backend is available, and which db and user it
> is primed for.  Perhaps a feature could be added to the backend to
> allow it to get passed a FD from pgbouncer or pgpool-II and then hand
> control back to the pooler upon "close" of the connection, as they
> already have the infrastructure to keep pools around while the
> postmaster does not.  Are pgbouncer and pgpool close enough to "core"
> to make such intimate collaboration with the backend OK?

I am not sure.  I'm afraid that might be adding complexity without
really solving anything, but maybe I'm a pessimist.

One possible way to do make an improvement in this area would be to
move the responsibility for accepting connections out of the
postmaster.  Instead, you'd have a group of children that would all
call accept() on the socket, and the OS would arbitrarily pick one to
receive each new incoming connection.  The postmaster would just be
responsible for making sure that there were enough children hanging
around.  You could in fact make this change without doing anything
else, in which case it wouldn't save any work but would possibly
reduce connection latency a bit since more of the work could be done
before the connection actually arrived.

From there, you could go two ways.

One option would be to have backends that would otherwise terminate
normally instead do the equivalent of DISCARD ALL and then go back
around and try to accept() another incoming connection.  If they get a
guy who wants the database to which they previously connected, profit.
 If not, laboriously flush every cache in sight and rebind to the new
database.

Another option would be to have backends that would otherwise
terminate normally instead do the equivalent of DISCARD ALL and then
mark themselves as able to accept a new connection to the same
database to which they are already connected (but not any other
database).  Following authentication, a backend that accepted a new
incoming connection looks through the pool of such backends and, if it
finds one, hands off the connection using file-descriptor passing and
then loops back around to accept() again.  Otherwise it handles the
connection itself.  This wouldn't offer much of an advantage over the
first option for a cluster that basically has just one database, or
for a cluster that has 1000 actively used databases.  But it would be
much better for a system with three databases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Andrew Dunstan




On 12/05/2010 08:55 PM, Robert Haas wrote:

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane  wrote:

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges.  The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction.  I think the requirements would basically be
(1) xmin<= any listed XIDs<  xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

Thoughts?

I think this is too ugly to live.  I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots.  It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.

How about just pg_publish_snapshot(), returning a token that is only
valid until the end of the transaction in which it was called, and
pg_subscribe_snapshot(token)?  The implementation can be that the
publisher writes its snapshot to a temp file and returns the name of
the temp file, setting an at-commit hook to remove the temp file.  The
subscriber reads the temp file and sets the contents as its
transaction snapshot.  If security is a concern, one could also save
the publisher's role OID to the file and require the subscriber's to
match.


Why not just say give me the snapshot currently held by process ?

And please, not temp files if possible.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas

On Sun, Dec 5, 2010 at 2:45 PM, Rob Wultsch  wrote:
> I think you have read a bit more into what I have said than is
> correct.  MySQL can deal with thousands of users and separate schemas
> on commodity hardware. There are many design decisions (some
> questionable) that have made MySQL much better in a shared hosting
> environment than pg and I don't know where the grants system falls
> into that.

Objection: Vague.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Robert Haas

On Sun, Dec 5, 2010 at 3:17 PM, Rob Wultsch  wrote:
> On Sun, Dec 5, 2010 at 12:45 PM, Rob Wultsch  wrote:
>> One thing I would suggest that the PG community keeps in mind while
>> talking about built in connection process caching, is that it is very
>> nice feature for memory leaks caused by a connection to not exist for
>> and continue growing forever.
>
> s/not exist for/not exist/
>
> I have had issues with very slow leaks in MySQL building up over
> months. It really sucks to have to go to management to ask for
> downtime because of a slow memory leak.

Apache has a very simple and effective solution to this problem - they
have a configuration option controlling the number of connections a
child process handles before it dies and a new one is spawned.  I've
found that setting this to 1000 works excellently.  Process startup
overhead decreases by three orders of magnitude, and only egregiously
bad leaks add up to enough to matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Robert Haas

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane  wrote:
> I'm wondering if we should reconsider the pass-it-through-the-client
> approach, because if we could make that work it would be more general and
> it wouldn't need any special privileges.  The trick seems to be to apply
> sufficient sanity testing to the snapshot proposed to be installed in
> the subsidiary transaction.  I think the requirements would basically be
> (1) xmin <= any listed XIDs < xmax
> (2) xmin not so old as to cause GlobalXmin to decrease
> (3) xmax not beyond current XID counter
> (4) XID list includes all still-running XIDs in the given range
>
> Thoughts?

I think this is too ugly to live.  I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots.  It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.

How about just pg_publish_snapshot(), returning a token that is only
valid until the end of the transaction in which it was called, and
pg_subscribe_snapshot(token)?  The implementation can be that the
publisher writes its snapshot to a temp file and returns the name of
the temp file, setting an at-commit hook to remove the temp file.  The
subscriber reads the temp file and sets the contents as its
transaction snapshot.  If security is a concern, one could also save
the publisher's role OID to the file and require the subscriber's to
match.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-05 Thread Robert Haas

On Sun, Dec 5, 2010 at 12:41 PM, Andrew Dunstan  wrote:
> Well, ISTM that amounts to not having "official topic branches" :-) I agree
> that this is supposed to be one of git's strengths (or more exactly a
> strength of distributed SCM's generally).  I don't really see any great
> value in sanctifying a particular topic branch with some official status.

I think the value in an official topic branch would be to allow formal
incremental commit of large patches.  In other words, we could decide
that a commit to the official topic branch must meet the same
standards of quality normally applied to a commit to the master
branch, and must go through the same process.  It would be understood
that the topic branch would eventually be merged (with or without
squash) back into the master branch, but that objections were to be
raised as pieces were committed to the topic branch, not at merge
time.  The merge itself would require consensus as to timing, but we'd
agree to take a dim view of "I haven't reviewed anything that's been
going on here for the last six months but now hate all of it".

I think that this sort of approach could be useful either for really
big projects, like perhaps SQL/MED; or possibly for certain categories
of changes (things we want to include the next time we do a libpq
version bump).  Although, the trouble with the latter is that we do
compatibility breaks so rarely that the effort involved in maintaining
the branch might well exceed the value we'd get out of having it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_execute_from_file review

2010-12-05 Thread Itagaki Takahiro

On Mon, Dec 6, 2010 at 08:01, Tom Lane  wrote:
> Why is there a variadic replace() in this patch at all?  It seems just
> about entirely unrelated to the stated purpose of the patch, as well
> as being of dubious usefulness.

As I wrote in the previous mail, the most important part of the patch
for CREATE EXTENSION is pg_read_binary_file(). We can rewrite not only
replace(VARIADIC) but also other functions in the patch with existing
functions. However, the author wanted simple-case user APIs, and I also
agreed to export each step of the complex pg_execute_sql_file().

But I have no objections to hide some of the subroutines if there are
any problems.

| $sql := replace(
|  convert_from(
|pg_read_binary_file($path, 0),
|$encoding),
|  '@extschema@', $schema));
| EXECUTE $sql;

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Greg Smith


Rob Wultsch wrote:

Forgive me, but is all of this a step on the slippery slope to
direct io? And is this a bad thing


I don't really think so.  There's an important difference in my head 
between direct I/O, where the kernel is told "write this immediately!", 
and what I'm trying to achive.  I want to give the kernel an opportunity 
to write blocks out in an efficient way, so that it can take advantage 
of elevator sorting, write combining, and similar tricks.  But, 
eventually, those writes have to make it out to disk.  Linux claims to 
have concepts like a "deadline" for I/O to happen, but they turn out to 
not be so effective once the system gets backed up with enough writes.  
Since fsync time is the only effective deadline, I'm progressing from 
the standpoint that adjusting when it happens relative to the write will 
help, while still allowing the kernel an opportunity to get the writes 
out on its own schedule.


When ends up happening if you push toward fully sync I/O is the design 
you see in some other databases, where you need multiple writer 
processes.  Then requests for new pages can continue to allocate as 
needed, while keeping any one write from blocking things.  That's one 
sort of a way to simulate asynchronous I/O, and you can substitute true 
async I/O instead in many of those implementations.  We didn't have much 
luck with portability on async I/O when that was last experimented with, 
and having multiple background writer processes seems like overkill; 
that whole direction worries me.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_execute_from_file review

2010-12-05 Thread Tom Lane

Itagaki Takahiro  writes:
> On Fri, Dec 3, 2010 at 18:02, Dimitri Fontaine  wrote:
>> My understanding is that the variadic form shadows the other one in a
>> way that it's now impossible to call it from SQL level. That's the
>> reason why I did the (text, text, text, VARIADIC text) version before,
>> but is it true?

> The VARIADIC version doesn't hide the 3-args version. I tested the
> behavior by printf-debug. The planner seems to think the non VARIADIC
> version is the best-matched one when 3 arguments are passed.

Why is there a variadic replace() in this patch at all?  It seems just
about entirely unrelated to the stated purpose of the patch, as well
as being of dubious usefulness.  When would it be superior to

replace(replace(orig, from1, to1), from2, to2), ...

The implementation doesn't appear to offer any material speed
improvement over nested calls of that sort, and I'm finding it hard to
visualize when it would be more useful than nested calls.  The
documentation is entirely inadequate as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CommitFest 2010-11: Status report at 2/3 of scheduled time

2010-12-05 Thread Greg Smith

After an unfortunate two weeks where I was lost in Nyquil-land and not 
paying as much attention as I should have, I just finished a pass 
checking over all the open 26 CommitFest items.  Just under 50% of the 
patches that were open at the start of the CF are still hanging around, 
with 6 in "Ready for Committer" and the rest split evently waiting for 
reviewers or the author.  A few larger patches or patch sets deserve 
some commentary.


The multi-part Extensions patch continues to chug along.  A committer 
needs to pick up "pg_execute_from_file()" next to keep progress on that 
series going, as the other two remaining patches depend on it (and may 
need to be rebased after it goes in).  Help is also needed on coding 
review of the main "Extensions" patch.  The as yet untouched "ALTER 
EXTENSION ... SET SCHEMA ..." is the last patch in the set to apply, and 
while it needs a reviewer eventually I don't think that's the blocking 
point on the general extension work yet.


As for the also large SQL/MED set, I'm not sure whether it's posible to 
get review work started on its two dependent components--"file_fdw" and 
"postgresql_fdw"--until the main patch is finished off.  Given the size 
of code involved, it wouldn't surprise me if everyone is still be 
chewing on that series into the next CF.  Glad to see it all submitted 
for this one though.


Most of the remaining patches that are in "Needs Review" have already 
gone through one round of review.  There are two notable exceptions.  
While some libpq work has been getting finished for Sync Rep, neither of 
the two main patches for that have received any love for almost two 
months now.  This is the spot where I start to get worried about seeing 
this finish for 9.1.  I'm now trying to find time to work on those; 
anybody else need this feature who can help?


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Rob Wultsch

On Sun, Dec 5, 2010 at 2:53 PM, Greg Smith  wrote:
> Heikki Linnakangas wrote:
>>
>> If you fsync() a file with one dirty page in it, it's going to return very
>> quickly, but a 1GB file will take a while. That could be problematic if you
>> have a thousand small files and a couple of big ones, as you would want to
>> reserve more time for the big ones. I'm not sure what to do about it, maybe
>> it's not a problem in practice.
>
> It's a problem in practice allright, with the bulk-loading situation being
> the main one you'll hit it.  If somebody is running a giant COPY to populate
> a table at the time the checkpoint starts, there's probably a 1GB file of
> dirty data that's unsynced around there somewhere.  I think doing anything
> about that situation requires an additional leap in thinking about buffer
> cache evicition and fsync absorption though.  Ultimately I think we'll end
> up doing sync calls for relations that have gone "cold" for a while all the
> time as part of BGW activity, not just at checkpoint time, to try and avoid
> this whole area better.  That's a lot more than I'm trying to do in my first
> pass of improvements though.
>
> In the interest of cutting the number of messy items left in the official
> CommitFest, I'm going to mark my patch here "Returned with Feedback" and
> continue working in the general direction I was already going.  Concept
> shared, underlying patches continue to advance, good discussion around it;
> those were my goals for this CF and I think we're there.
>
> I have a good idea how to autotune the sync spread that's hardcoded in the
> current patch.  I'll work on finishing that up and organizing some more
> extensive performance tests.  Right now I'm more concerned about finishing
> the tests around the wal_sync_method issues, which are related to this and
> need to get sorted out a bit more urgently.
>
> --
> Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
> PostgreSQL Training, Services and Support        www.2ndQuadrant.us
>

Forgive me, but is all of this a step on the slippery slope to
direction io? And is this a bad thing?


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

2010-12-05 Thread Greg Smith


Josh Berkus wrote:

I modified test_fsync in two ways to run this; first, to make it support
O_DIRECT, and second to make it run in the *current* directory.


Patch please?  I agree with the latter change; what test_fsync does is 
surprising.


I suggested a while ago that we refactor test_fsync to use a common set 
of source code as the database itself for detecting things related to 
wal_sync_method, perhaps just extract that whole set of DEFINE macro 
logic to somewhere else.  That happened at a bad time in the development 
cycle (right before a freeze) and nobody ever got back to the idea 
afterwards.  If this code is getting touched, and it's clear it is in 
some direction, I'd like to see things change so it's not possible for 
the two to diverge again afterwards.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Greg Smith


Heikki Linnakangas wrote:
If you fsync() a file with one dirty page in it, it's going to return 
very quickly, but a 1GB file will take a while. That could be 
problematic if you have a thousand small files and a couple of big 
ones, as you would want to reserve more time for the big ones. I'm not 
sure what to do about it, maybe it's not a problem in practice.


It's a problem in practice allright, with the bulk-loading situation 
being the main one you'll hit it.  If somebody is running a giant COPY 
to populate a table at the time the checkpoint starts, there's probably 
a 1GB file of dirty data that's unsynced around there somewhere.  I 
think doing anything about that situation requires an additional leap in 
thinking about buffer cache evicition and fsync absorption though.  
Ultimately I think we'll end up doing sync calls for relations that have 
gone "cold" for a while all the time as part of BGW activity, not just 
at checkpoint time, to try and avoid this whole area better.  That's a 
lot more than I'm trying to do in my first pass of improvements though.


In the interest of cutting the number of messy items left in the 
official CommitFest, I'm going to mark my patch here "Returned with 
Feedback" and continue working in the general direction I was already 
going.  Concept shared, underlying patches continue to advance, good 
discussion around it; those were my goals for this CF and I think we're 
there.


I have a good idea how to autotune the sync spread that's hardcoded in 
the current patch.  I'll work on finishing that up and organizing some 
more extensive performance tests.  Right now I'm more concerned about 
finishing the tests around the wal_sync_method issues, which are related 
to this and need to get sorted out a bit more urgently.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instrument checkpoint sync calls

2010-12-05 Thread Greg Smith


Jeff Janes wrote:

I've attached a tiny patch to apply over yours, to deal with this and
with the case where no files are synced.
  


Thanks for that.  That obvious error eluded me because in most of the 
patch update testing I was doing (on ext3), the longest sync was always 
about the same length as the total sync time.


Attached patch (in correct diff form this time!) collects up all 
changes.  That includes elimination of a potential race condition if 
someone changes log_checkpoints while a long sync phase is executing.  I 
don't know whether that can happen, and it obviously won't give accurate 
stats going back to the beginning of the checkpoint in that case, but 
it  tries to defend aginst producing complete garbage if that value 
changes out from under it.


This is the first version of this patch I feel fairly good about; no 
open concerns left on my side.  Jeff, since you're now the de-facto 
credited reviewer of this one by virtue of suggesting bug fixes, could 
you take this update out for a spin too?



Combining this instrumentation patch with the backend sync one already
committed, the net result under log_min_messages=debug1is somewhat
undesirable in that I can now see the individual sync times for the
syncs done by the checkpoint writer, but I do not get times for the
syncs done by the backend (I only get informed of their existence).
  


On a properly working system, backend syncs shouldn't be happening.  So 
if you see them, I think the question shouldn't be "how long are they 
taking?", it's "how do I get rid of them?"  From that perspective, 
knowing of their existence is sufficient to suggest the necessary tuning 
changes, such as dropping bgwriter_delay.


When you get into a situation where they are showing up, a whole lot of 
them can happen in a very brief period; enough that I'd actually be 
concerned about the added timing overhead, something I normally don't 
care about very much.  That's the main reason I didn't bother 
instrumenting those too--the situations where they happen are ones 
already running low on resources.


Big writes for things that can only be written out at checkpoint time, 
on the other hand, are unavoidable in the current design.  Providing 
detail on them is going to be relevant unless there's a major 
refactoring of how those happen.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us


diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ae9ed8f..6adc139 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** LogCheckpointEnd(bool restartpoint)
*** 6955,6964 
  {
  	long		write_secs,
  sync_secs,
! total_secs;
  	int			write_usecs,
  sync_usecs,
! total_usecs;
  
  	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
  
--- 6955,6969 
  {
  	long		write_secs,
  sync_secs,
! total_secs,
! longest_secs,
! average_secs;
  	int			write_usecs,
  sync_usecs,
! total_usecs,
! longest_usecs,
! average_usecs;
! 	double		average_sync_time;
  
  	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
  
*** LogCheckpointEnd(bool restartpoint)
*** 6974,6991 
  		CheckpointStats.ckpt_sync_end_t,
  		&sync_secs, &sync_usecs);
  
  	if (restartpoint)
  		elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
! 			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
  			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 write_secs, write_usecs / 1000,
  			 sync_secs, sync_usecs / 1000,
! 			 total_secs, total_usecs / 1000);
  	else
  		elog(LOG, "checkpoint complete: wrote %d buffers (%.1f%%); "
  			 "%d transaction log file(s) added, %d removed, %d recycled; "
! 			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
  			 CheckpointStats.ckpt_bufs_written,
  			 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
  			 CheckpointStats.ckpt_segs_added,
--- 6979,7017 
  		CheckpointStats.ckpt_sync_end_t,
  		&sync_secs, &sync_usecs);
  
+ 	/*
+ 	 * Timing values returned from CheckpointStats are in milliseconds.
+ 	 * Convert to the second plus microsecond form that TimestampDifference
+ 	 * returns for homogeneous printing.
+ 	 */
+ 	longest_secs = (long) (CheckpointStats.ckpt_longest_sync / 1000);
+ 	longest_usecs = 1000 * (CheckpointStats.ckpt_longest_sync -
+ 		longest_secs * 1000);
+ 
+ 	average_sync_time = 0;
+ 	if (CheckpointStats.ckpt_sync_rels > 0) 
+ 		average_sync_time = CheckpointStats.ckpt_agg_sync_time /
+ 			CheckpointStats.ckpt_sync_rels;
+ 	average_secs = (long) (average_sync_time / 1000);
+ 	average_usecs = 1000 * (average_sync_time - average_secs * 1000);
+ 
  	if (restartpoint)
  		elog(LOG, "restartpoint complete: wrote %d buffers (%.1f%%); "
! 			 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
!

Re: [HACKERS] V3: Idle in transaction cancellation

2010-12-05 Thread Kevin Grittner

"Kevin Grittner"  wrote:
> Andres Freund  wrote:
>> On Thursday 02 December 2010 00:48:53 Kevin Grittner wrote:
>>
>>> Is there any provision for one backend to cause a *different*
>>> backend which is idle in a transaction to terminate cleanly when
>>> it attempts to process its next statement?
>>
>> You might want to check out SendProcSignal() et al.
>
> Yeah, that was the missing link for me. Thanks!

OK, I have confirmed that the patch set Andres posted compiles and
passes regression tests (both `make check` and `make
installcheck-world`), and that the API provided works for at least
one purpose -- allowing SSI conflict detection to cancel an a
transaction other than the one on which the failure is detected, to
allow a guarantee that there will be no rollbacks without first
having a successful commit of a transaction which wrote data.

I created a "skip" branch in my git repo to hold the work for this
test.  A single commit there which includes all three of Andres's
patches plus the code needed to implement the SSI feature is at:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=
commitdiff;h=0e1e86830b49dab51ff80619a580527260f4e186

I now wish I'd committed Andres's patches separately first.  If
anyone likes, I could try to put something together to separate
things out like that.

I did notice one picky thing while working on this -- the left brace
for some if statements wasn't placed consistently with surrounding
code.  Rather than:

if (x)
{
a();
b();
}

the patch tends to do this:

if (x){
a();
b();
}

I agree with Andres that the API to use it is a little bit spread out
and non-obvious, but I don't see a nicer way to do it.  Maybe someone
else can think of something.

I didn't test the patch in relation to the original purpose (for
which only the first two of the three patch files should be needed)
-- error handling with hot standby.  I'm hoping that someone familiar
with the issues there can test that part of it.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Rob Wultsch

On Sun, Dec 5, 2010 at 12:45 PM, Rob Wultsch  wrote:
> One thing I would suggest that the PG community keeps in mind while
> talking about built in connection process caching, is that it is very
> nice feature for memory leaks caused by a connection to not exist for
> and continue growing forever.

s/not exist for/not exist/

I have had issues with very slow leaks in MySQL building up over
months. It really sucks to have to go to management to ask for
downtime because of a slow memory leak.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Rob Wultsch

On Sun, Dec 5, 2010 at 11:59 AM, Josh Berkus  wrote:
>
>> * no coordination of restarts/configuration changes between the cluster
>> and the pooler
>> * you have two pieces of config files to configure your pooling settings
>> (having all that available say in a catalog in pg would be awesome)
>> * you lose all of the advanced authentication features of pg (because
>> all connections need to go through the pooler) and also ip-based stuff
>> * no SSL support(in the case of pgbouncer)
>> * complexity in reseting backend state (we added some support for that
>> through explicit SQL level commands in recent releases but it still is a
>> hazard)
>
> More:
>
> * pooler logs to separate file, for which there are (currently) no anaysis
> tools
> * pooling is incompatible with the use of ROLES for data security
>
> The last is a major issue, and not one I think we can easily resolve. MySQL
> has a pooling-friendly user system, because when you connect to MySQL you
> basically always connect as the superuser and on connection it switches you
> to your chosen login role.  This, per Rob Wulsch, is one of the things at
> the heart of allowing MySQL to support 100,000 low frequency users per cheap
> hosting system.
>
> As you might imagine, this behavior is also the source of a lot of MySQL's
> security bugs.  I don't see how we could imitate it without getting the bugs
> as well.
>
>

I think you have read a bit more into what I have said than is
correct.  MySQL can deal with thousands of users and separate schemas
on commodity hardware. There are many design decisions (some
questionable) that have made MySQL much better in a shared hosting
environment than pg and I don't know where the grants system falls
into that.

MySQL does not have that many security problems because of how grants
are stored. Most MySQL security issues are DOS sort of stuff based on
a authenticated user being able to cause a crash. The decoupled
backend storage and a less than awesome parser shared most of the
blame for these issues.

One thing I would suggest that the PG community keeps in mind while
talking about built in connection process caching, is that it is very
nice feature for memory leaks caused by a connection to not exist for
and continue growing forever.

NOTE: 100k is not a number that I would put much stock in. I don't
recall ever mentioning that number and it is not a number that would
be truthful for me to throw out.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

2010-12-05 Thread Peter Eisentraut

On fre, 2010-12-03 at 15:27 -0500, Robert Haas wrote:
> On Fri, Dec 3, 2010 at 2:56 PM, r t  wrote:
> > What exactly was the objection to the following -->
> > ALTER TABLE table_name ADD PRIMARY KEY (column_list) USING index_name;
> > Is the objection that you might have been trying to specify a constraint
> > named "using" ? I'm willing to make that option more difficult. :-)
> 
> I think it's that someone might expect the word after USING to be the
> name of an index AM.

That could be avoided by writing

USING INDEX 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Josh Berkus




* no coordination of restarts/configuration changes between the cluster
and the pooler
* you have two pieces of config files to configure your pooling settings
(having all that available say in a catalog in pg would be awesome)
* you lose all of the advanced authentication features of pg (because
all connections need to go through the pooler) and also ip-based stuff
* no SSL support(in the case of pgbouncer)
* complexity in reseting backend state (we added some support for that
through explicit SQL level commands in recent releases but it still is a
hazard)


More:

* pooler logs to separate file, for which there are (currently) no 
anaysis tools

* pooling is incompatible with the use of ROLES for data security

The last is a major issue, and not one I think we can easily resolve. 
MySQL has a pooling-friendly user system, because when you connect to 
MySQL you basically always connect as the superuser and on connection it 
switches you to your chosen login role.  This, per Rob Wulsch, is one of 
the things at the heart of allowing MySQL to support 100,000 low 
frequency users per cheap hosting system.


As you might imagine, this behavior is also the source of a lot of 
MySQL's security bugs.  I don't see how we could imitate it without 
getting the bugs as well.



--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-12-05 Thread Greg Smith


Marko Tiikkaja wrote:
This is almost exactly the patch from 2010-02 without 
CommandCounterIncrement()s.  It's still a bit rough around the edges 
and needs some more comments, but I'm posting it here anyway.


This patch passes all regression tests, but feel free to try to break 
it, there are probably ways to do that.  This one also has the "always 
run DMLs to completion, and exactly once" behaviour.


So this patch was marked "Ready for Committer", but a) no committer has 
picked it up yet and b) Marko has made changes here that nobody else has 
tested out yet that I've seen on the last.  Accordingly, that 
classification may have been optimistic.  It seems to me that another 
testing run-through from someone like David might be appropriate to 
build some confidence this latest patch should be a commit candidate.  
If there is a committer intending to work on this as-is, they haven't 
identified themselves.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Tom Lane

Greg Smith  writes:
> In addition, Joachim submitted a synchronized snapshot patch that looks 
> to me like it slipped through the cracks without being fully explored.  
> ...
> The way I read that thread, there were two objections:

> 1) This mechanism isn't general enough for all use-cases outside of 
> pg_dump, which doesn't make it wrong when the question is how to get 
> parallel pg_dump running

> 2) Running as superuser is excessive.  Running as the database owner was 
> suggested as likely to be good enough for pg_dump purposes.

IIRC, in old discussions of this problem we first considered allowing
clients to pull down an explicit representation of their snapshot (which
actually is an existing feature now, txid_current_snapshot()) and then
upload that again to become the active snapshot in another connection.
That was rejected on the grounds that you could cause all kinds of
mischief by uploading a bad snapshot; so we decided to think about
providing a server-side-only means to clone another backend's current
snapshot.  Which is essentially what Joachim's above-mentioned patch
provides.  However, as was discussed in that thread, that approach is
far from being ideal either.

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges.  The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction.  I think the requirements would basically be
(1) xmin <= any listed XIDs < xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

One tricky part would be ensuring GlobalXmin doesn't decrease when the
snap is installed, but I think that could be made to work if we take
ProcArrayLock exclusively and insist on observing some other running
transaction with xmin <= proposed xmin.  For the pg_dump case this would
certainly hold since xmin would be the parent pg_dump's xmin.

Given the checks stated above, it would be possible for someone to
install a snapshot that corresponds to no actual state of the database,
eg it shows some T1 as running and T2 as committed when actually T1
committed before T2.  I don't see any simple way for the installation
function to detect that, but I'm not sure whether it matters.  The user
might see inconsistent data, but do we care?  Perhaps as a safety
measure we should only allow snapshot installation in read-only
transactions, so that even if the xact does observe inconsistent data it
can't possibly corrupt the database state thereby.  This'd be no skin
off pg_dump's nose, obviously.  Or compromise on "only superusers can
do it in non-read-only transactions".

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: new patch of MERGE (merge_204) & a question about duplicated ctid

2010-12-05 Thread Greg Smith


Boxuan Zhai wrote:

I have updated the MERGE patch for two main problems.


The patch inside the .tar.gz file you attached isn't right; that 
extracts to a tiny file of junk characters.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] serializable read only deferrable

2010-12-05 Thread Kevin Grittner

Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> I'm reviving the discussion on the subject topic because I just
>> had an epiphany which makes it seem simple to implement. The
>> concept of this is that if you start a SERIALIZABLE READ ONLY
>> transaction in an SSI environment when certain conditions are
>> true, it doesn't need to acquire predicate locks or test for
>> rw-conflicts.
> 
> I assume this would have to be a "hard" definition of READ ONLY,
> not the rather squishy definition we use now? How would we manage
> the compatibility implications?

If there are issues with whether READ ONLY covers the right ground,
it's likely to affect more than this particular issue -- I've taken
advantage of the READ ONLY property of transactions to allow quite a
few novel optimizations to SSI beyond what is presented in Cahill's
doctoral work.

I'm excluding temporary tables from SSI on the grounds that they are
only read and written by a single transaction and therefore can't be
a source of rw-dependencies, and I'm excluding system tables on the
grounds that they don't follow normal snapshot isolation rules.  Hint
bit rewrites are not an issue for SSI.  Are there any other squishy
aspect I might need to consider?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq changes for synchronous replication

2010-12-05 Thread Greg Smith

The one time this year top-posting seems appropriate...this patch seems 
stalled waiting for some sort of response to the concerns Alvaro raised 
here.


Alvaro Herrera wrote:

Excerpts from Fujii Masao's message of jue nov 25 10:47:12 -0300 2010:

  

The attached patch s/CopyXLog/CopyBoth/g and adds the description
about CopyBoth into the COPY section.



I gave this a look.  It seems good, but I'm not sure about this bit:

+   case 'W':   /* Start Copy Both */
+   /*
+* We don't need to use getCopyStart here since 
CopyBothResponse
+* specifies neither the copy format nor the number of 
columns in
+* the Copy data. They should be always zero.
+*/
+   conn->result = PQmakeEmptyPGresult(conn, PGRES_COPY_OUT);
+   if (!conn->result)
+   return;
+   conn->asyncStatus = PGASYNC_COPY_BOTH;
+   conn->copy_already_done = 0;
+   break;

I guess this was OK when this was conceived as CopyXlog, but since it's
now a generic mechanism, this seems a bit unwise.  Should this be
reconsidered so that it's possible to change the format or number of
columns?

(The paragraph added to the docs is also a bit too specific about this
being used exclusively in streaming replication, ISTM)

  

While modifying the code, it occurred to me that we might have to add new
ExecStatusType like PGRES_COPY_BOTH and use that for CopyBoth mode,
for the sake of consistency. But since it's just alias of PGRES_COPY_BOTH
for now, i.e., there is no specific behavior for that ExecStatusType, I don't
think that it's worth adding that yet.



I'm not so sure about this.  If we think that it's worth adding a new
possible state, we should do so now; we will not be able to change this
behavior later.

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-05 Thread Greg Smith


Joachim Wieland wrote:

Regarding snapshot cloning and dump consistency, I brought this up
already several months ago and asked if the feature is considered
useful even without snapshot cloning.


In addition, Joachim submitted a synchronized snapshot patch that looks 
to me like it slipped through the cracks without being fully explored.  
Since it's split in the official archives the easiest way to read the 
thread is at 
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg143866.html


Or you can use these two:
http://archives.postgresql.org/pgsql-hackers/2010-01/msg00916.php
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00363.php

That never made it into a CommitFest proper that I can see, it just 
picked up review mainly from Markus.  The way I read that thread, there 
were two objections:


1) This mechanism isn't general enough for all use-cases outside of 
pg_dump, which doesn't make it wrong when the question is how to get 
parallel pg_dump running


2) Running as superuser is excessive.  Running as the database owner was 
suggested as likely to be good enough for pg_dump purposes.


Ultimately I think that stalled because without a client that needed it 
the code wasn't so interesting yet.  But now there is one; should that 
get revived again?  It seems like all of the pieces needed to build 
what's really desired here are available, it's just the always 
non-trivial task of integrating them together the right way that's needed.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] allow COPY routines to read arbitrary numbers of fields

2010-12-05 Thread Andrew Dunstan



Attached is a patch that allows CopyReadAttibutesText() and 
CopyReadAttributesCSV() to read arbitrary numbers of attributes. 
Underflowing attributes are recorded as null, and space is made for 
overflowing attributes on a line.


This patch doesn't result in any user-visible behavior. The current 
calling code will fail if the number of attributes read is not what is 
expected, as happens now. But it will allow the API to be used (when 
exposed) by a foreign data wrapper that can accept arbitrary numbers of 
attributes. My aim here is to get to something like:


   CREATE FOREIGN TABLE my_csv (
t text[]
   )
   SERVER file_server
   OPTIONS (format 'csv', filename '/path/to/my/data.csv', textarray
   'true',
 header 'true', delimiter ';', quote '@', escape '"', null '');

   SELECT t[3] as f1, t[1] as f2, t[] as probably_null
   FROM my_csv;

It would probably be nice to apply this before we start exposing the 
COPY API to FDW routines, as discussed earlier today.


cheers

andrew



*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 141,146  typedef struct CopyStateData
--- 141,151 
  	 */
  	StringInfoData attribute_buf;
  
+ 	/* field raw data pointers found by COPY FROM */
+ 
+ 	int max_fields;
+ 	char ** raw_fields;
+ 
  	/*
  	 * Similarly, line_buf holds the whole input line being processed. The
  	 * input cycle is first to read the whole line into line_buf, convert it
***
*** 250,259  static void CopyOneRowTo(CopyState cstate, Oid tupleOid,
  static void CopyFrom(CopyState cstate);
  static bool CopyReadLine(CopyState cstate);
  static bool CopyReadLineText(CopyState cstate);
! static int CopyReadAttributesText(CopyState cstate, int maxfields,
! 	   char **fieldvals);
! static int CopyReadAttributesCSV(CopyState cstate, int maxfields,
! 	  char **fieldvals);
  static Datum CopyReadBinaryAttribute(CopyState cstate,
  		int column_no, FmgrInfo *flinfo,
  		Oid typioparam, int32 typmod,
--- 255,262 
  static void CopyFrom(CopyState cstate);
  static bool CopyReadLine(CopyState cstate);
  static bool CopyReadLineText(CopyState cstate);
! static int CopyReadAttributesText(CopyState cstate, int maxfields);
! static int CopyReadAttributesCSV(CopyState cstate, int maxfields);
  static Datum CopyReadBinaryAttribute(CopyState cstate,
  		int column_no, FmgrInfo *flinfo,
  		Oid typioparam, int32 typmod,
***
*** 1679,1685  CopyFrom(CopyState cstate)
  	Oid			in_func_oid;
  	Datum	   *values;
  	bool	   *nulls;
! 	int			nfields;
  	char	  **field_strings;
  	bool		done = false;
  	bool		isnull;
--- 1682,1688 
  	Oid			in_func_oid;
  	Datum	   *values;
  	bool	   *nulls;
! 	int			nfields = 0;
  	char	  **field_strings;
  	bool		done = false;
  	bool		isnull;
***
*** 1920,1927  CopyFrom(CopyState cstate)
  	nulls = (bool *) palloc(num_phys_attrs * sizeof(bool));
  
  	/* create workspace for CopyReadAttributes results */
! 	nfields = file_has_oids ? (attr_count + 1) : attr_count;
! 	field_strings = (char **) palloc(nfields * sizeof(char *));
  
  	/* Initialize state variables */
  	cstate->fe_eof = false;
--- 1923,1934 
  	nulls = (bool *) palloc(num_phys_attrs * sizeof(bool));
  
  	/* create workspace for CopyReadAttributes results */
! 	if (! cstate->binary)
! 	{
! 		nfields = file_has_oids ? (attr_count + 1) : attr_count;
! 		cstate->max_fields = nfields;
! 		cstate->raw_fields = (char **) palloc(nfields * sizeof(char *));
! 	}
  
  	/* Initialize state variables */
  	cstate->fe_eof = false;
***
*** 1985,1994  CopyFrom(CopyState cstate)
  
  			/* Parse the line into de-escaped field values */
  			if (cstate->csv_mode)
! fldct = CopyReadAttributesCSV(cstate, nfields, field_strings);
  			else
! fldct = CopyReadAttributesText(cstate, nfields, field_strings);
  			fieldno = 0;
  
  			/* Read the OID field if present */
  			if (file_has_oids)
--- 1992,2009 
  
  			/* Parse the line into de-escaped field values */
  			if (cstate->csv_mode)
! fldct = CopyReadAttributesCSV(cstate, nfields);
  			else
! fldct = CopyReadAttributesText(cstate, nfields);
! 
! 			/* check for overflowing fields */
! 			if (nfields > 0 && fldct > nfields)
! ereport(ERROR,
! 		(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
! 		 errmsg("extra data after last expected column")));
! 
  			fieldno = 0;
+ 			field_strings = cstate->raw_fields;
  
  			/* Read the OID field if present */
  			if (file_has_oids)
***
*** 2218,2224  CopyFrom(CopyState cstate)
  
  	pfree(values);
  	pfree(nulls);
! 	pfree(field_strings);
  
  	pfree(in_functions);
  	pfree(typioparams);
--- 2233,2240 
  
  	pfree(values);
  	pfree(nulls);
! 	if (! cstate->binary)
! 		pfree(cstate->raw_fields);
  
  	pfree(in_functions);
  	pfree(typioparams);
***
*** 2717,2737  GetDecimalFromHex(char hex)
   * performing de-escaping as needed

Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-05 Thread Andrew Dunstan




On 12/05/2010 12:10 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 12/04/2010 07:12 PM, Robert Haas wrote:

I wouldn't necessarily be opposed to  official topic branches at some point in 
the future, but I think it's premature to speculate about whether it'd be 
useful here.

I'd need a lot of convincing if it imposed an extra burden on people
like Tom. The only way I could see  working is if some committer took
ownership of the topic branch and guaranteed to keep it pretty much in
sync with the master branch.

Well, allegedly this is one of the reasons we moved to git.  Anybody can
do that in their own repository, just as easily as a core committer
could.  AFAICS it's not necessary for the core repo to contain the
branch, up until the point where it's ready to merge into master.



Well, ISTM that amounts to not having "official topic branches" :-) I 
agree that this is supposed to be one of git's strengths (or more 
exactly a strength of distributed SCM's generally).  I don't really see 
any great value in sanctifying a particular topic branch with some 
official status.


What I would like to see is people publishing the location of 
development repos so that they can be pulled from or merged, especially 
for any large patch.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] serializable read only deferrable

2010-12-05 Thread Tom Lane

"Kevin Grittner"  writes:
> I'm reviving the discussion on the subject topic because I just had
> an epiphany which makes it seem simple to implement.  The concept of
> this is that if you start a SERIALIZABLE READ ONLY transaction in an
> SSI environment when certain conditions are true, it doesn't need to
> acquire predicate locks or test for rw-conflicts.

I assume this would have to be a "hard" definition of READ ONLY, not
the rather squishy definition we use now?  How would we manage the
compatibility implications?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-05 Thread Tom Lane

Andrew Dunstan  writes:
> On 12/04/2010 07:12 PM, Robert Haas wrote:
>> I wouldn't necessarily be opposed to  official topic branches at some point 
>> in the future, but I think it's premature to speculate about whether it'd be 
>> useful here.

> I'd need a lot of convincing if it imposed an extra burden on people 
> like Tom. The only way I could see  working is if some committer took 
> ownership of the topic branch and guaranteed to keep it pretty much in 
> sync with the master branch.

Well, allegedly this is one of the reasons we moved to git.  Anybody can
do that in their own repository, just as easily as a core committer
could.  AFAICS it's not necessary for the core repo to contain the
branch, up until the point where it's ready to merge into master.

>> What is needed right now is design work, not code.

> Indeed. In this case I don't think we even have agreement on the 
> features let alone how they might work.

Yeah.  But it's fair to look ahead to how development might proceed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using different semaphore for locking

2010-12-05 Thread Tom Lane

flyusa2010 fly  writes:
> I found that postgres uses different semaphore system call on some different
> operating systems.
> For example, I found that on linux, System V semaphore (semop etc.) is used
> to implement locking, while on Darwin, POSIX semaphore (sem_wait, sem_post
> etc.) is used.
> linux and Darwin support both System V and POSIX semaphores, i'm wondering
> why postgres implement locking using different types of semaphore system
> call. Is there any performance or semantic issue on different OSes?

I think your question is answered in src/template/darwin:

# Select appropriate semaphore support.  Darwin 6.0 (Mac OS X 10.2) and up
# support System V semaphores; before that we have to use POSIX semaphores,
# which are less good for our purposes because they eat a file descriptor
# per backend per max_connection slot.
case $host_os in
  darwin[015].*)
USE_NAMED_POSIX_SEMAPHORES=1
;;
  *)
USE_SYSV_SEMAPHORES=1
;;
esac

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] serializable read only deferrable

2010-12-05 Thread Kevin Grittner

I'm reviving the discussion on the subject topic because I just had
an epiphany which makes it seem simple to implement.  The concept of
this is that if you start a SERIALIZABLE READ ONLY transaction in an
SSI environment when certain conditions are true, it doesn't need to
acquire predicate locks or test for rw-conflicts.  This would be
particularly useful for pg_dump or large reports, as it would allow
them to read data which was guaranteed to be consistent with later
states of the database without risking serialization failure or
contributing to the failure of other transactions.  They should also
run a bit faster without the overhead of locking and checking.
 
Having completed the switch from a pair of rw-conflict pointers per
serializable transaction to a list of rw-conflicts, I'm working
through the more aggressive transaction clean-up strategies thereby
allowed in preparation for the graceful degradation code.  Along the
way, I noticed how easy it is to allow a READ ONLY transaction to opt
out of predicate locking and conflict detection when it starts with
no concurrent non READ ONLY transactions active, or even to remove
READ ONLY transactions from those activities when such a state is
reached during the execution of READ ONLY transactions; while
properly recognizing the *additional* conditions under which this
would be valid is rather painful.  (Those additional conditions being
that no concurrent non-read-only transaction may overlap a committed
non-read-only transaction which wrote data and committed before the
read-only transaction acquired its snapshot.)
 
The simple way to implement SERIALIZABLE READ ONLY DEFERRABLE under
SSI would be to have each non-read-only serializable transaction
acquire a heavyweight lock which can coexist with other locks at the
same level (SHARE looks good) on some common object and hold that for
the duration of the transaction, while a SERIALIZABLE READ ONLY
DEFERRABLE transaction would need to acquire a conflicting lock
(EXCLUSIVE looks good) before it could acquire a snapshot, and
release the lock immediately after acquiring the snapshot.
 
For these purposes, it appears that advisory locks could work, as
long as the lock release does not wait for the end of the transaction
(which it doesn't, if I'm reading the docs right) and as long as I
can pick a lock ID which won't conflict with other uses.  That latter
part is the only iffy aspect of the whole thing that I can see.  Of
course, I could add a third lock method, but that seems like overkill
to be able to get one single lock.
 
Since I'm already allowing a transaction to opt out of predicate
locking and conflict detection if there are no non-read-only
transactions active when it acquires its snapshot, the work needed
within the SSI code is pretty trivial; it's all in adding the
DEFERRABLE word as a non-standard extension to SET TRANSACTION et al,
and finding a heavyweight lock to use.
 
Thoughts?
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-05 Thread Andrew Dunstan




On 12/04/2010 07:12 PM, Robert Haas wrote:


I wouldn't necessarily be opposed to  official topic branches at some point in 
the future, but I think it's premature to speculate about whether it'd be 
useful here.


I'd need a lot of convincing if it imposed an extra burden on people 
like Tom. The only way I could see  working is if some committer took 
ownership of the topic branch and guaranteed to keep it pretty much in 
sync with the master branch.



  What is needed right now is design work, not code.



Indeed. In this case I don't think we even have agreement on the 
features let alone how they might work.



cheers

andreew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what are clog and xlog?

2010-12-05 Thread Kevin Grittner

flyusa2010 fly  wrote:
 
> clog is for commit log (very unclear for me ...)
 
It is an important part of tracking tuple visibility.
 
You might want to start with this page:
 
http://wiki.postgresql.org/wiki/Hint_Bits
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new patch of MERGE (merge_204) & a question about duplicated ctid

2010-12-05 Thread David Fetter

On Sat, Dec 04, 2010 at 09:27:52PM +0800, Boxuan Zhai wrote:
> Dear Greg,
> 
> I have updated the MERGE patch for two main problems.

Please attach the actual patch :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Using different semaphore for locking

2010-12-05 Thread flyusa2010 fly

I found that postgres uses different semaphore system call on some different
operating systems.
For example, I found that on linux, System V semaphore (semop etc.) is used
to implement locking, while on Darwin, POSIX semaphore (sem_wait, sem_post
etc.) is used.
linux and Darwin support both System V and POSIX semaphores, i'm wondering
why postgres implement locking using different types of semaphore system
call. Is there any performance or semantic issue on different OSes?

Thnaks!

[HACKERS] what are clog and xlog?

2010-12-05 Thread flyusa2010 fly

Hi, folks,

I'm a newbie to postgres. I'm confused with xlog and clog.
To my initial understanding, xlog is the periodic checkpoint log for data,
while clog is for commit log (very unclear for me ...)

Thanks!

Re: [HACKERS] disk caching for writing log

2010-12-05 Thread flyusa2010 fly

Thanks for your reply.
Yes, i mean disk may lie to os.


On Fri, Dec 3, 2010 at 12:14 PM, Stefan Kaltenbrunner
 wrote:

> On 12/03/2010 06:43 PM, Heikki Linnakangas wrote:
>
>> On 03.12.2010 13:49, flyusa2010 fly wrote:
>>
>>> When writing log, dbms should synchronously flush log to disk. I'm
>>> wondering, if it is possible that the logs are in disk cache, while the
>>> control is returned to dbms again, so dbms thinks logs are persistent on
>>> disk. In this case, if the disk fails, then there's incorrectness for
>>> dbms
>>> log writing, because the log is not persistent, but dbms considers it is
>>> persistent!
>>>
>>
>> I have no idea what you mean. The method we use to flush the WAL to disk
>> should not be fallible to such failures, we wait for fsync() or
>> fdatasync() to return before we assume the logs are safely on disk. If
>> you can elaborate what you mean by "control is returned to dbms", maybe
>> someone can explain why in more detail.
>>
>
> I think he is refering to the plain old "the disk/os is lying about whether
> the data really made it to stable storage" issue(especially with the huge
> local caches on modern disks) - if you have such a disk and/or an OS with
> broken barrier support you are doomed.
>
>
> Stefan
>

Re: [HACKERS] SQL/MED - file_fdw

2010-12-05 Thread Andrew Dunstan




On 12/04/2010 11:11 PM, Itagaki Takahiro wrote:

On Sun, Dec 5, 2010 at 07:24, Andrew Dunstan  wrote:

Looking at file_parser.c, it seems to be largely taken from copy.c. Wouldn't
it be better to call those functions, or refactor them so they are callable
if necessary?

We could export private functions and structs in copy.c,
though details of the implementation should be kept in copy.c.

How about splitting the file_fdw patch into two pieces?
One exports the copy functions from the core, and another
implements file_fdw using the infrastructure.



Yes please.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Dmitriy Igrishin

Hey hackers,

Varargs-exec is useful only when programmer calls it directly.
It is useless when libpq is used to create a more flexible high-level
library (e.g., for C++). PQexecParams (PQexecPrepared) are good
for it.

What about auto reconnect. There are PQreset already and
PG_CONNECTION_OK (_BAD) to manipulate the strategy of
reconnection. If connection became BAD how many times
suggested function will try to reconnect ?
IMO, auto-reconnection is a magic...

2010/12/5 Magnus Hagander 

> On Sun, Dec 5, 2010 at 11:57, Heikki Linnakangas
>  wrote:
> > On 05.12.2010 12:10, Magnus Hagander wrote:
> >>
> >> On Sun, Dec 5, 2010 at 10:22, Marc Balmer  wrote:
> >>>
> >>> I am suggesting adding a function to libpq:
> >>>
> >>> PGresult *PQvexec(PGconn *conn, const char *fmt, ...);
> >>>
> >>> It behaves similar to PQexec, but it allows for printf style varargs
> and
> >>
> >> How is that not a horrible idea, compared to using PQexecParams()? You
> >> have to remember to do all your escaping and things manually, whereas
> >> PQexecParams() does it automatically.
> >
> > A varargs version of PQexecParams() would be handy, though. Imagine being
> > able to do:
> >
> > PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2",
> foovar,
> > barvar);
> >
> > instead of constructing an array for the variables.
>
> I agree, that sounds a lot more useful. And if definitely needs to be
> split off from the auto-reconnection stuff.
>
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
// Dmitriy.

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Magnus Hagander

On Sun, Dec 5, 2010 at 11:57, Heikki Linnakangas
 wrote:
> On 05.12.2010 12:10, Magnus Hagander wrote:
>>
>> On Sun, Dec 5, 2010 at 10:22, Marc Balmer  wrote:
>>>
>>> I am suggesting adding a function to libpq:
>>>
>>> PGresult *PQvexec(PGconn *conn, const char *fmt, ...);
>>>
>>> It behaves similar to PQexec, but it allows for printf style varargs and
>>
>> How is that not a horrible idea, compared to using PQexecParams()? You
>> have to remember to do all your escaping and things manually, whereas
>> PQexecParams() does it automatically.
>
> A varargs version of PQexecParams() would be handy, though. Imagine being
> able to do:
>
> PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar,
> barvar);
>
> instead of constructing an array for the variables.

I agree, that sounds a lot more useful. And if definitely needs to be
split off from the auto-reconnection stuff.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Marc Balmer



Am 05.12.2010 um 11:57 schrieb Heikki Linnakangas 
:

> On 05.12.2010 12:10, Magnus Hagander wrote:
>> On Sun, Dec 5, 2010 at 10:22, Marc Balmer  wrote:
>>> I am suggesting adding a function to libpq:
>>> 
>>> PGresult *PQvexec(PGconn *conn, const char *fmt, ...);
>>> 
>>> It behaves similar to PQexec, but it allows for printf style varargs and
>> 
>> How is that not a horrible idea, compared to using PQexecParams()? You
>> have to remember to do all your escaping and things manually, whereas
>> PQexecParams() does it automatically.
> 
> A varargs version of PQexecParams() would be handy, though. Imagine being 
> able to do:
> 
> PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", foovar, 
> barvar);
> 
> instead of constructing an array for the variables.

yes, indeed.  while the suggested implementation relies on the caller to do all 
escaping (a bad idea...), the ease of use of a printf-like function with the 
security of PQexecParam would be nice.

I'd say forget about my first suggestion, I will spend a bit more time on a 
better approach.
(and at the same time remove the connection reset code)

> 
> -- 
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Heikki Linnakangas


On 05.12.2010 12:10, Magnus Hagander wrote:

On Sun, Dec 5, 2010 at 10:22, Marc Balmer  wrote:

I am suggesting adding a function to libpq:

PGresult *PQvexec(PGconn *conn, const char *fmt, ...);

It behaves similar to PQexec, but it allows for printf style varargs and


How is that not a horrible idea, compared to using PQexecParams()? You
have to remember to do all your escaping and things manually, whereas
PQexecParams() does it automatically.


A varargs version of PQexecParams() would be handy, though. Imagine 
being able to do:


PQexecVParams("SELECT * FROM mytable WHERE foo = $1 AND bar = $2", 
foovar, barvar);


instead of constructing an array for the variables.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Magnus Hagander

On Sun, Dec 5, 2010 at 10:22, Marc Balmer  wrote:
> I am suggesting adding a function to libpq:
>
> PGresult *PQvexec(PGconn *conn, const char *fmt, ...);
>
> It behaves similar to PQexec, but it allows for printf style varargs and

How is that not a horrible idea, compared to using PQexecParams()? You
have to remember to do all your escaping and things manually, whereas
PQexecParams() does it automatically.

> does connection re-establishing if the connection fails (it can be
> discussed if this already to much magic, maybe remove this part).  It
> has been carefully designed to handle memory the right way.  We use this
> since a long time.

It certainly doesn't belong in a function like that - and in fact, I
think reconnection has to be handled at a different layer anyway.What
if the connection was in the middle of a transaction? Silently rolls
it back without letting the app know, since it switched to a new one?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggesting a libpq addition

2010-12-05 Thread Pavel Stehule

2010/12/5 Marc Balmer :
> I am suggesting adding a function to libpq:
>
> PGresult *PQvexec(PGconn *conn, const char *fmt, ...);
>
> It behaves similar to PQexec, but it allows for printf style varargs and
> does connection re-establishing if the connection fails (it can be
> discussed if this already to much magic, maybe remove this part).  It
> has been carefully designed to handle memory the right way.  We use this
> since a long time.
>
> What do you think?
>

It's depend on implementation, but it can be a great security hole -
see SQL injection

Regards

Pavel Stehule

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Suggesting a libpq addition

2010-12-05 Thread Marc Balmer

I am suggesting adding a function to libpq:

PGresult *PQvexec(PGconn *conn, const char *fmt, ...);

It behaves similar to PQexec, but it allows for printf style varargs and
does connection re-establishing if the connection fails (it can be
discussed if this already to much magic, maybe remove this part).  It
has been carefully designed to handle memory the right way.  We use this
since a long time.

What do you think?
/*
 * Copyright (c) 2008, 2009, 2010 Marc Balmer 
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *notice, this list of conditions and the following disclaimer.
 * 2. Author's name may not be used endorse or promote products derived
 *from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
 * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
 * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 */

/* PostgreSQL utility functions */

#include 
#include 
#include 
#include 

#include 

PGresult *
PQvexec(PGconn *conn, const char *fmt, ...)
{
va_list ap;
PGresult *res;
int len, ntries;
char *sql;
char *errmsg;

for (ntries = 0; PQstatus(conn) == CONNECTION_BAD && ntries < 10;
ntries++) {
PQreset(conn);
if (PQstatus(conn) == CONNECTION_BAD)
sleep(1);
}
if (PQstatus(conn) == CONNECTION_BAD) {
res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
goto finish;
}

va_start(ap, fmt);
len = vasprintf(&sql, fmt, ap);
va_end(ap);
if (len == -1) {
res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
goto finish;
}
res = PQexec(conn, sql);
for (ntries = 0; PQstatus(conn) == CONNECTION_BAD
&& ntries < 10; ntries++) {
PQreset(conn);
if (PQstatus(conn) == CONNECTION_BAD)
sleep(1);
else {
PQclear(res);
res = PQexec(conn, sql);
}
}
free(sql);
finish:
return res;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] profiling connection overhead

2010-12-05 Thread Stefan Kaltenbrunner


On 12/01/2010 05:32 AM, Jeff Janes wrote:

On 11/28/10, Robert Haas  wrote:


In a close race, I don't think we should get bogged down in
micro-optimization here, both because micro-optimizations may not gain
much and because what works well on one platform may not do much at
all on another.  The more general issue here is what to do about our
high backend startup costs.  Beyond trying to recycle backends for new
connections, as I've previous proposed and with all the problems it
entails,


Is there a particular discussion of that matter you could point me to?


the only thing that looks promising here is to try to somehow
cut down on the cost of populating the catcache and relcache, not that
I have a very clear idea how to do that.  This has to be a soluble
problem because other people have solved it.


Oracle's backend start up time seems to be way higher than PG's.
Their main solution is something that is fundamentally a built in
connection pooler with some bells and whistles built in.   I'm not
sure "other people" you had in mind--Oracle is generally the one that
pops to my mind.


To some degree we're a
victim of our own flexible and extensible architecture here, but I
find it pretty unsatisfying to just say, OK, well, we're slow.



What about "well OK, we have PGbouncer"?  Are there fixable
short-comings that it has which could make the issue less of an issue?


well I would very much like to seen an integrated pooler in postgresql - 
pgbouncer is a very nice piece of software (and might even be a base for 
an integrated bouncer), but being not closely tied to the backend you 
are loosing a lot.
One of the more obvious examples is that now that we have no flatfile 
copy of pg_authid you have to use cruel hacks like:

http://www.depesz.com/index.php/2010/12/04/auto-refreshing-password-file-for-pgbouncer/

to get "automatic" management of roles. There are some other drawbacks 
as well:


* no coordination of restarts/configuration changes between the cluster 
and the pooler
* you have two pieces of config files to configure your pooling settings 
(having all that available say in a catalog in pg would be awesome)
* you lose all of the advanced authentication features of pg (because 
all connections need to go through the pooler) and also ip-based stuff

* no SSL support(in the case of pgbouncer)
* complexity in reseting backend state (we added some support for that 
through explicit SQL level commands in recent releases but it still is a 
hazard)



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

63 matches

Mail list logo