from:"james"

Re: [HACKERS] WAL Re-Writes

2016-01-27 Thread james


On 27/01/2016 13:30, Amit Kapila wrote:


Thoughts?


Are the decreases observed with SSD as well as spinning rust?

I might imagine that decreasing the wear would be advantageous, 
especially if the performance decrease is less with low read latency.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-02 Thread james


On 02/03/2014 15:30, Magnus Hagander wrote:
Terminal Services have definitely become more common over time, but 
with faster and cheaper virtualization, a lot of people have switched 
to that instead, which would remove the problem of course.


I wonder how common it actually is, though, to *build postgres* on a 
terminal services machine with other users on it...


Well, the banks I've contracted at recently are all rather keen on 
virtual desktops for developers, and some of those are terminal 
services.  We're a headache, and packaging up all the things we need is 
a pain, so there is some mileage in buying grunty servers and doing 
specific installs that are then shared, rather than making an MSI 
generally available.


Also I have experience of being given accounts for jenkins etc that are 
essentially terminal services logins, and having these things unable to 
maintain a software stack can effectively disqualify tech we would 
otherwise use.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] swapcache-style cache?

2012-02-22 Thread james

Has anyone considered managing a system like the DragonFLY swapcache for 
a DBMS like PostgreSQL?


ie where the admin can assign drives with good random read behaviour 
(but perhaps also-ran random write) such as SSDs to provide a cache for 
blocks that were dirtied, with async write that hopefully writes them 
out before they are forcibly discarded.


And where a cache fail (whether by timeout, hard fail, or CRC fail) just 
means having to go back to the real transactional storage.


I'd been thinking that swapcache would help where the working set won't 
fit in RAM, also L2ARC on Solaris - but it seems to me that there is no 
reason not to allow the DBMS to manage the set-aside area itself where 
it is given either access to the raw device or to a pre-sized file on 
the device it can map in segments.


While L2ARC is obviously very heavyweight and entwined in ZFS, 
Dragonfly's swapcache seems to me remarkably elegant and, it would seem, 
very effective.


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sort

2013-05-24 Thread james

> Have you considered GPU-based sorting? I know there's been discussion 
in the past.


If you use OpenCL, then you can use a CPU driver if there is no GPU, and 
that can allow you to leverage all the CPU cores without having to do 
the multi-thread stuff in the backend.


While the compilation of a specific kernel can be quite expensive, it 
also has the effect of a JIT compiler in terms of system independence.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] C++ compiler

2013-06-24 Thread james


On 25/06/2013 05:16, Tom Lane wrote:
It might be time to reconsider whether we should move the baseline 
portability requirement up to C99.


My understanding was that you picked up a lot of users when the Win32 
port became useful.  While you can build with msys, I would think that 
leaving Microsoft's tooling behind would be a mistake, and as far as I 
am aware they have said that they are supporting C++11 but not bothering 
with C99.


I'm really not in favor of moving to C++ though, as the 
portability-vs-usefulness tradeoffs seem pretty unattractive there.


As a long-time C++ programmer I don't see what the problem would be 
beyond (some) existing contributors being wary of the unknown.  Its not 
as if any platform developed enough to be a sane db server has not got a 
decent C++ compiler or two.  Portability is only really a problem with a 
subset of new C++11 features.

Re: [HACKERS] plpython implementation

2013-06-30 Thread james


On 01/07/2013 02:43, Claudio Freire wrote:

In essence, you'd have to use another implementation. CPython guys
have left it very clear they don't intend to "fix" that, as they don't
consider it a bug. It's just how it is.

Given how useful it is to have a scripting language that can be used outside
of the database as well as inside it, would it be reasonable to consider
'promoting' pllua?

My understanding is that it (lua) is much cleaner under the hood (than 
CPython).

Although I do recognise that Python as a whole has always had more traction.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-14 Thread james


On 14/07/2013 20:13, Greg Smith wrote:
The most efficient way to write things out is to delay those writes as 
long as possible.


That doesn't smell right to me.  It might be that delaying allows more 
combining and allows the kernel to see more at once and optimise it, but 
I think the counter-argument is that it is an efficiency loss to have 
either CPU or disk idle waiting on the other.  It cannot make sense from 
a throughput point of view to have disks doing nothing and then become 
overloaded so they are a bottleneck (primarily seeking) and the CPU does 
nothing.


Now I have NOT measured behaviour but I'd observe that we see disks that 
can stream 100MB/s but do only 5% of that if they are doing random IO.  
Some random seeks during sync can't be helped, but if they are done when 
we aren't waiting for sync completion then they are in effect free.  The 
flip side is that we can't really know whether they will get merged with 
adjacent writes later so its hard to schedule them early.  But we can 
observe that if we have a bunch of writes to adjacent data then a seek 
to do the write is effectively amortised across them.


So it occurs to me that perhaps we can watch for patterns where we have 
groups of adjacent writes that might stream, and when they form we might 
schedule them to be pushed out early (if not immediately), ideally out 
as far as the drive (but not flushed from its cache) and without forcing 
all other data to be flushed too.  And perhaps we should always look to 
be getting drives dedicated to dbms to do something, even if it turns 
out to have been redundant in the end.


That's not necessarily easy on Linux without using a direct unbuffered 
IO but to me that is Linux' problem.  For a start its not the only 
target system, and having feedback 'we need' from db and mail system 
groups to the NT kernels devs hasn't hurt, and it never hurt Solaris to 
hear what Oracle and Sybase devs felt they needed either.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale

2014-10-03 Thread james

On 03/10/2014 05:53, Kouhei Kaigai wrote:
> Yep, that's my pain. Even though usual query does not take many buffers 
> pinned,
> my use case needs to fetch megabytes scale data at once because of performance
> reason; page-by-page synchronous scan makes GPU being idle.
Doesn't your GPU have an async queue and exec mechanism? Then you could
do an asyn
DMA to the GPU with an event, use that event in he GPU to start the
kernel and in the
DB to release the pin?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

2013-01-07 Thread james


The processing functions have been extended to provide populate_record() and 
populate_recordset() functions.The latter in particular could be useful in 
decomposing a piece of json representing an array of flat objects (a fairly 
common pattern) into a set of Postgres records in a single pass.


So this would allow an 'insert into ... select ... from 
(...)'?


I had been wondering how to do such an insertion efficiently in the 
context of SPI, but it seems that there is no SPI_copy equiv that would 
allow a query parse and plan to be avoided.


Is this mechanism likely to be as fast as we can get at the moment in 
contexts where copy is not feasible?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

2013-01-08 Thread james



I had been wondering how to do such an insertion efficiently in the context of 
SPI, but it seems that there is no SPI_copy equiv that would allow a query 
parse and plan to be avoided.


Your query above would need to be planned too, although the plan will be 
trivial.


Ah yes, I meant that I had not found a way to avoid it (for multi-row 
inserts etc) from a stored proc context where I have SPI functions 
available.



You should not try to use it as a general bulk load facility. And it will not 
be as fast as COPY for several reasons, including that the Json parsing 
routines are necessarily much heavier than the COPY parse routines, which have 
in any case been optimized over quite a long period. Also, a single json datum 
is limited to no more than 1Gb. If you have such a datum, parsing it involves 
having it in memory and then taking a copy (I wonder if we could avoid that 
step - will take a look). Then each object is decomposed into a hash table of 
key value pairs, which it then used to construct the record datum. Each field 
name  in the result record is used to look up the value in the hash table - 
this happens once in the case of populate_record() and once per object in the 
array in the case of populate_recordset(). In the latter case the resulting 
records are put into a tuplestore structure (which spills to disk if necessary) 
which is then returned to the caller when all the objects in

 the js
on array are processed. COPY doesn't have these sorts of issues. It knows 
without having to look things up where each datum is in each record, and it 
stashes the result straight into the target table. It can read and insert huge 
numbers of rows without significant memory implications.

Yes - but I don't think I can use COPY from a stored proc context can I? 
 If I could use binary COPY from a stored proc that has received a 
binary param and unpacked to the data, it would be handy.


If SPI provided a way to perform a copy to a temp table and then some 
callback on an iterator that yields rows to it, that would do the trick 
I guess.



Perhaps if you give us a higher level view of what you're trying to achieve we 
can help you better.


I had been trying to identify a way to work with record sets where the 
records might be used for insert, or for updates or deletion statements, 
preferably without forming a large custom SQL statement that must then 
be parsed and planned (and which would be a PITA if I wanted to use the 
SQL-C preprocessor or some language bindings that like to prepare a 
statement and execute with params).


The data I work with has a master-detail structure and insertion 
performance matters, so I'm trying to limit manipulations to one 
statement per table per logical operation even where there are multiple 
detail rows.


Sometimes the network latency can be a pain too and that also suggests 
an RPC with unpack and insert locally.


Cheers
James



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

2013-01-08 Thread james


You can use COPY from a stored procedure, but only to and from files.


I think that's in the chocolate fireguard realm though as far as 
efficiency for this sort of scenario goes, even if its handled by 
retaining an mmap'd file as workspace.





If SPI provided a way to perform a copy to a temp table and then some callback 
on an iterator that yields rows to it, that would do the trick I guess.


SPI is useful, but it's certainly possible to avoid its use. After all, that 
what almost the whole backend does, including the COPY code. Of course, it's a 
lot harder to write that way, which is part of why SPI exists. Efficiency has 
its price.


So it is possible to use a lower level interface from a C stored proc? 
SPI is the (only) documented direct function extension API isn't it?


Is the issue with using the JSON data-to-record set that the parsing can 
be costly?  Perhaps it can be achieved with B64 of compressed protobuf, 
or such.  I don't mind if it seems a bit messy - the code can be 
generated from the table easily enough, especially if I can use C++.  I 
guess an allocator that uses SPI_palloc would solve issues with memory 
management on error?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange Windows problem, lock_timeout test request

2013-01-26 Thread james


So, while no native 64-bit compilers are available for free as part of
Visual Studio Express 2012 (11), it doesn't matter much. The issue is
more that it's very much Microsoft's whim whether they release compilers
at all and if so, which ones, when and how.


I think I have a pretty vanilla Visual Studio Express 2012 for Desktop and:

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64>.\cl
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.51106.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64>

Am I misunderstanding the discussion here?

Isn't that the 64-bit tool suite?

Does anyone care if the compiler is a 64 bit binary itself?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread james


Anyway, this is getting way off track. The point is that the MS SDKs and
compilers are a bit of a mess and that MinGW support is useful because
we can't rely on them continuing to offer free SDKs and compilers in future.


Well, more compilers are always useful, but complaining that Microsoft 
might withdraw their working compilers smacks of 'what if?' paranoia. 
What if mingw support for Win64 was (sometimes/often/always/still) a bit 
rubbish?  Oh wait ...






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread james


On the contrary, only a few months ago there was a far from groundless fear 
that Microsoft would do just that. Following considerable outcry they changed 
their mind. But this is definitely not just paranoia. As for w64 support, the 
mingw-64 project exists more or less explicitly to produce 64 bit compilers, 
including those hosted on mingw/msys.



Huh.  The only reason we have to use mingw64 or one of the assorted 
personal builds is because 'mingw support' doesn't deliver on its own, 
and last I looked there was a confusing variety of personal builds with 
various strengths and weaknesses.  I managed to make some progress but 
we seem to be a ways off having a reference download (and ideally one 
with clang too I guess).


I'd very much like there to be a good reference implementation, but the 
whole mingw/mingw64 thing is indicative of some problems, and reminds me 
of egcs.


You have references to back up your statements, and demonstrate that it 
wasn't primarily FUD?  FWIW I think the higher entry prices of pay-for 
VStudio almost guarantees continued availability of a free compiler, 
though it might end up slightly crippled, but I'm not a product planner 
for MS any more than you are.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james

I haven't tried quex, but I have tried lemon (which can be broken out of 
SQLite) and re2c and ragel.


I like ragel and lemon, but the combination supports a push-parser style 
from memory, and many tools are inconvenient unless you are prepared to 
suck in a whole message before parsing, or let the parser drive a pull 
loop, or use a coroutine structure.


Could go all trendy and use a PEG tool like, er,, peg 
(http://piumarta.com/software/peg/).  (I haven't tried them tho')


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james


Doesn't that imply that a plan cache might be worthwhile?

But no matter: didn't the OP really have issue with packaging and 
Windows support - and there are a lot of Windows users, and in general 
there are many Windows devs: making it easier for them to contribute has 
to be good doesn't it?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james

I believe there are tools that are significantly faster than flex.  I 
believe re2c generates code that is faster.  But the key thing is to 
test, probably, or perhaps ask around.  I'm out of touch, but from 
memory flex wasn't the be-all and end-all.


Lemon is definitely easy to maintain/port and the result is pretty nice, 
too (I know Bison/Yacc wasn't the focus here).


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bugs in our Windows socket code

2012-05-13 Thread james


That is, if you request FD_WRITE events for a pre-existing socket with
WSAEventSelect, you will not get one until the outbound network buffer has
been filled and then has partially emptied.  (This is incredibly broken,
but Microsoft evidently has no intention of fixing it.)


I think you should distinguish between 'broken' and 'not what I am used 
to'.  Microsoft have always had an edge triggered delivery here, rather 
than the level triggered delivery on BSD.  But WinSock is not BSD 
sockets.  If you want to remember what the state is, associate a flag 
with the socket.  Microsoft are very good at retaining compatibility, 
'fixing it' in your terms would be crass breakage.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fake async rep target

2012-05-29 Thread james


How easy would it be to implement a fake async rep target?

Perhaps even as something that a server could allow a connection to 
request?  (ie a suitably permissioned connection could convert itself to 
receive n async replication stream, rather than being statically 
configured?)


I know that it sounds a bit bonkers, but a while back I worked on a 
system where we configured a rep target (using OpenServer) we could 
observe changes to tables and enqueue secondary processing. Rather 
painful in that case because of the way that repserver is configured,
and I'm not sure it was worth the pain when configuring test and dev 
environments.


However, in principle, it seems that this is quite an elegant standing 
for a whole raft of trigger functions - and probably a lot cheaper to 
execute. The key, I think, is to be able to allow dynamic attachment of 
such a 'change feed' by an account that has god-like read access.


Is the existing async rep code amenable to this sort of abuse?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fake async rep target

2012-05-30 Thread james

Well, I was assuming that there was some intelligence in the receiver 
that could effectively parse this for the application; are you 
suggesting that is effectively binary deltas to apply to raw pages?


Certainly, Sybase rep server works by creating function calls or SQL 
updates (depending on how you set it all up), and a fairly simple parser 
can be used to process the received stream.


I was hoping that the receiver code could be used to at least identify 
which tuples are affected and their before/after primary keys.


Ideally the result would be a packet by transaction, listing the 
impacted tables and for each table a list of affected primary keys and 
whether they were inserted, deleted or updated.


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fake async rep target

2012-05-30 Thread james

Well, I was assuming that there was some intelligence in the receiver 
that could effectively parse this for the application; are you 
suggesting that is effectively binary deltas to apply to raw pages?


Certainly, Sybase rep server works by creating function calls or SQL 
updates (depending on how you set it all up), and a fairly simple parser 
can be used to process the received stream.


I was hoping that the receiver code could be used to at least identify 
which tuples are affected and their before/after primary keys.


Ideally the result would be a packet by transaction, listing the 
impacted tables and for each table a list of affected primary keys and 
whether they were inserted, deleted or updated.


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread james


On 05/01/2014 16:50, Robert Haas wrote:

  But on Windows, segments are*automatically*
destroyed*by the operating system*  when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

Surely you just need to DuplicateHandle into the parent process?  If you
want to (tidily) dispose of it at some time, then you'll need to tell the
postmaster that you have done so and what the handle is in its process,
but if you just want it to stick around, then you can just pass it up.

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread james

On 05/01/2014 18:02, Robert Haas wrote:

On Sun, Jan 5, 2014 at 12:34 PM, james  wrote:

>On 05/01/2014 16:50, Robert Haas wrote:
>
>  But on Windows, segments are*automatically*
>destroyed*by the operating system*  when the last process unmaps them,
>so it's not quite so clear to me how we can allow it there.  The main
>shared memory segment is no problem because the postmaster always has
>it mapped, even if no one else does, but that doesn't help for dynamic
>shared memory segments.
>
>Surely you just need to DuplicateHandle into the parent process?  If you
>want to (tidily) dispose of it at some time, then you'll need to tell the
>postmaster that you have done so and what the handle is in its process,
>but if you just want it to stick around, then you can just pass it up.

Uh, I don't know, maybe?  Does the postmaster have to do something to
receive the duplicated handle

In principle, no, so long as the child has a handle to the parent 
process that has

the appropriate permissions.  Given that these processes have a parent/child
relationship that shouldn't be too hard to arrange.

, or can the child just throw it over the
wall to the parent and let it rot until the postmaster finally exits?

Yes.  Though it might be a good idea to record the handle somewhere (perhaps
in a table) so that any potential issues from an insane system spamming 
the postmaster

with handles are apparent.

I'm intrigued - how are the handles shared between children that are peers
in the current scheme?  Some handle transfer must already be in place.

Could you share the handles to an immortal worker if you want to reduce any
potential impact on the postmaster?

The latter would be nicer for our purposes, perhaps, as running more
code from within the postmaster is risky for us.  If a regular backend
process dies, the postmaster will restart everything and the database
will come back on line, but if the postmaster itself dies, we're hard
down.

-- Robert Haas EnterpriseDB: http://www.enterprisedb.com The 
Enterprise PostgreSQL Company

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-06 Thread james


On 06/01/2014 03:14, Robert Haas wrote:

That's up to the application.  After calling dsm_create(), you call
dsm_segment_handle() to get the 32-bit integer handle for that
segment.  Then you have to get that to the other process(es) somehow.
If you're trying to share a handle with a background worker, you can
stuff it in bgw_main_arg.  Otherwise, you'll probably need to store it
in the main shared memory segment, or a file, or whatever.
Well, that works for sysv shm, sure.  But I was interested (possibly 
from Konstantin)
how the handle transfer takes place at the moment, particularly if it is 
possible
to create additional segments dynamically.  I haven't looked at the 
extension at all.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-06 Thread james


On 06/01/2014 04:20, Amit Kapila wrote:

Duplicate handle should work, but we need to communicate the handle
to other process using IPC.
Only if the other process needs to use it.  The IPC is not to transfer 
the handle to
the other process, just to tell it which slot in its handle table 
contains the handle.
If you just want to ensure that its use-count never goes to zero, the 
receiver does

not need to know what the handle is.

However ...

The point remains that you need to duplicate it into every process that 
might

want to use it subsequently, so it makes sense to DuplicateHandle into the
parent, and then to advertise that  handle value publicly so that other 
child

processes can DuplicateHandle it back into their own process.

The handle value can change so you also need to refer to the handle in the
parent and map it in each child to the local equivalent.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One process per session lack of sharing

2016-07-15 Thread james


On 15/07/2016 09:28, Craig Ringer wrote:
I don't think anyone's considering moving from multi-processing to 
multi-threading in PostgreSQL. I really, really like the protection 
that the shared-nothing-by-default process model gives us, among other 
things.


As I understand it, the main issue is that it is hard to integrate 
extensions that use heavyweight runtimes and are focussed on isolation 
within a virtual machine.  Its not just


Perhaps it would be possible for the postmaster (or a delegate process) 
to host such a runtime, and find a way for a user process that wants to 
use such a runtime to communicate with it, whether by copying function 
parameters over RPC or by sharing some of its address space explicitly 
to the runtime to operate on directly.


Such a host delegate process could be explicitly built with multithread 
support and not 'infect' the rest of the code with its requirements.


Using granular RPC is nice for isolation but I am concerned that the 
latencies might be high.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] todo - I want the elog() thingy

2001-05-13 Thread james


> > * Allow elog() to return error codes, module name, file name, line
> >   number, not just messages [elog]

I bags this one. A nice relatively easy place for me to start hacken' the
Postges. Which source tree do I diff and patch against? Er, I have no idea
how to use these diff and patch things but I know that a manual exists.

How do I get the CVS source tree? Surely I don't have to download the whole
thing every day? I only have 1KB/sec of connectivity and it's extremely
expensive ($300/month).

Can I just download the files for elog() and do it that way, and I'll write
some driver function to unit test it, and send the patch when I'm done to
the patches list.

Any developers got some tips for me?

---
James



- Original Message -
From: "Oleg Bartunov" <[EMAIL PROTECTED]>
To: "Bruce Momjian" <[EMAIL PROTECTED]>
Cc: "PostgreSQL-development" <[EMAIL PROTECTED]>
Sent: Sunday, May 13, 2001 9:35 PM
Subject: Re: [HACKERS] 7.2 items


> I'd like to have partial sorting implemented in 7.2.
> While it's rather narrow optimization for case ORDER BY ... LIMIT ...
> it has big (in my opinion) impact to Web application.
> We get up to  6x performance improvement in our experiments with our very
> crude patch for 7.1. The idea is very simple - stop sorting when we get
> requested rows. Unfortunately, our knowledge of internals is poor and
> we need some help.
>
> Regards,
> Oleg
>
> On Thu, 10 May 2001, Bruce Momjian wrote:
>
> > Here is a small list of big TODO items.  I was wondering which ones
> > people were thinking about for 7.2?
> >
>
> --
-
> >
> > * Add replication of distributed databases [replication]
> > o automatic fallover
> > o load balancing
> > o master/slave replication
> > o multi-master replication
> > o partition data across servers
> > o sample implementation in contrib/rserv
> > o queries across databases or servers (two-phase commit)
> > * Point-in-time data recovery using backup and write-ahead log
> > * Allow row re-use without vacuum (Vadim)
> > * Add the concept of dataspaces/tablespaces [tablespaces]
> > * Allow better control over user privileges [privileges]
> > * Allow elog() to return error codes, module name, file name, line
> >   number, not just messages [elog]
> > * Allow international error message support and add error codes [elog]
> > * Make binary/file in/out interface for TOAST columns
> > * Large object interface improvements
> > * Allow inherited tables to inherit index, UNIQUE constraint, and
primary key
> >   [inheritance]
> > * Add ALTER TABLE DROP COLUMN feature [drop]
> > * Add ALTER TABLE ... DROP CONSTRAINT
> > * Automatically drop constraints/functions when object is dropped
> >
> >
>
> Regards,
> Oleg
> _
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly
>
>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi,

I've done some testing with one of my data sets in an 8VPU virtual
environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25
million records.

Explain with a max_parallel_degree of 8 tells me that the query will only
use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
   Group Key: (date_trunc('DAY'::text, pageview))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16)
   Group Key: date_trunc('DAY'::text, pageview)
   ->  Parallel Seq Scan on fact_agg_2015_12
 (cost=0.00..743769.76 rows=4221741 width=12)


I am getting the following timings (everything was cached before I started
tested). I didn't average the runtime, but I ran each one three times and
took the middle value.

*max_parallel_degree runtime*
0  11693.537 ms
1  6387.937 ms
2 4328.629 ms
3 3292.376 ms
4 2743.148 ms
5 2278.449 ms
6     2000.599 ms


I'm pretty happy!

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 8:44 AM, David Rowley 
wrote:

> On 12 March 2016 at 16:31, David Rowley 
> wrote:
> > I've attached an updated patch which is based on commit 7087166,
> > things are really changing fast in the grouping path area at the
> > moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi,

Happy to test, really looking forward to seeing this stuff in core.

The explain analyze is below:

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
(actual time=2282.092..2282.202 rows=15 loops=1)
   Group Key: (date_trunc('DAY'::text, pageview_start_tstamp))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16) (actual
time=2281.749..2282.060 rows=105 loops=1)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16) (actual time=2276.879..2277.030 rows=15 loops=7)
   Group Key: date_trunc('DAY'::text, pageview_start_tstamp)
   ->  Parallel Seq Scan on celebrus_fact_agg_1_p2015_12
 (cost=0.00..743769.76 rows=4221741 width=12) (actual time=0.066..1631
.650 rows=3618887 loops=7)

One question - how is the upper limit of workers chosen?


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 12:30 PM, David Rowley  wrote:

> On 14 March 2016 at 14:16, James Sewell  wrote:
>
>> I've done some testing with one of my data sets in an 8VPU virtual
>> environment and this is looking really, really good.
>>
>> My test query is:
>>
>> SELECT pageview, sum(pageview_count)
>> FROM fact_agg_2015_12
>> GROUP BY date_trunc('DAY'::text, pageview);
>>
>> The query returns 15 rows. The fact_agg table is 5398MB and holds around
>> 25 million records.
>>
>> Explain with a max_parallel_degree of 8 tells me that the query will
>> only use 6 background workers. I have no indexes on the table currently.
>>
>> Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
>>Group Key: (date_trunc('DAY'::text, pageview))
>>->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
>>  Number of Workers: 6
>>  ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
>> width=16)
>>Group Key: date_trunc('DAY'::text, pageview)
>>->  Parallel Seq Scan on fact_agg_2015_12
>>  (cost=0.00..743769.76 rows=4221741 width=12)
>>
>
> Great! Thanks for testing this.
>
> If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
> number of Gather rows come out at 105? I'd just like to get an idea of my
> cost estimate for the Gather are going to be accurate for real world data
> sets.
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Cool,

I've been testing how this works with partitioning (which seems to be
strange, but I'll post separately about that) and something odd seems to be
going on now with the parallel triggering:

postgres=# create table a as select * from base_p2015_11;
SELECT 2000

postgres=# select * from a limit 1;
 ts | count |  a  |  b   |  c   |  d   | e
+---+-+--+--+--+---
 2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
(1 row)

postgres-# \d a
 Table "datamart_owner.a"
 Column |Type | Modifiers
+-+---
 ts | timestamp without time zone |
 count  | integer |
 a  | integer |
 b  | integer |
 c  | integer |
 d  | integer |
 e  | integer |

postgres=# select pg_size_pretty(pg_relation_size('a'));
 pg_size_pretty

 1149 MB

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
  QUERY PLAN
--
 Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
 Sort Key: (date_trunc('DAY'::text, ts))
 ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
   Number of Workers: 5
   ->  Partial HashAggregate  (cost=217059.08..217061.58
rows=200 width=16)
 Group Key: date_trunc('DAY'::text, ts)
 ->  Parallel Seq Scan on a  (cost=0.00..197059.06
rows=405 width=12)
(9 rows)

postgres=# analyze a;

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--
 GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
 Sort Key: (date_trunc('DAY'::text, ts))
     ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
(5 rows)

Unsure what's happening here.



James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 1:31 PM, David Rowley 
wrote:

> On 14 March 2016 at 14:52, James Sewell  wrote:
> > One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.
>
> There is also a comment in that function which states:
> /*
> * Limit the degree of parallelism logarithmically based on the size of the
> * relation.  This probably needs to be a good deal more sophisticated, but
> we
> * need something here for now.
> */
>
> So this will likely see some revision at some point, after 9.6.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi again,

I've been playing around with inheritance combined with this patch.
Currently it looks like you are taking max(parallel_degree) from all the
child tables and using that for the number of workers.

For large machines it makes much more sense to use sum(parallel_degree) -
but I've just seen this comment in the code:

/*
 * Decide what parallel degree to request for this append path.  For
 * now, we just use the maximum parallel degree of any member.  It
 * might be useful to use a higher number if the Append node were
 * smart enough to spread out the workers, but it currently isn't.
 */

Does this mean that even though we are aggregating in parallel, we are only
operating on one child table at a time currently?

Cheers,

James Sewell,
 Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 2:39 PM, James Sewell 
wrote:

> Cool,
>
> I've been testing how this works with partitioning (which seems to be
> strange, but I'll post separately about that) and something odd seems to be
> going on now with the parallel triggering:
>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 2000
>
> postgres=# select * from a limit 1;
>  ts | count |  a  |  b   |  c   |  d   | e
> +---+-+--+--+--+---
>  2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
> (1 row)
>
> postgres-# \d a
>  Table "datamart_owner.a"
>  Column |Type | Modifiers
> +-+---
>  ts | timestamp without time zone |
>  count  | integer |
>  a  | integer |
>  b  | integer |
>  c  | integer |
>  d  | integer |
>  e  | integer |
>
> postgres=# select pg_size_pretty(pg_relation_size('a'));
>  pg_size_pretty
> 
>  1149 MB
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>   QUERY PLAN
>
> --
>  Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
>Number of Workers: 5
>->  Partial HashAggregate  (cost=217059.08..217061.58
> rows=200 width=16)
>  Group Key: date_trunc('DAY'::text, ts)
>  ->  Parallel Seq Scan on a  (cost=0.00..197059.06
> rows=405 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
> --
>  GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
> (5 rows)
>
> Unsure what's happening here.
>
>
>
> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
> __
>
>
> Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>
>
> On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <
> david.row...@2ndquadrant.com> wrote:
>
>> On 14 March 2016 at 14:52, James Sewell 
>> wrote:
>> > One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>>
>> There is also a comment in that function which states:
>> /*
>> * Limit the degree of parallelism logarithmically based on the size of the
>> * relation.  This probably needs to be a good deal more sophisticated,
>> but we
>> * need something here for now.
>> */
>>
>> So this will likely see some revision at some point, after 9.6.
>>
>> --
>>  D

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

On Mon, Mar 14, 2016 at 3:05 PM, David Rowley 
wrote:

>
> Things to try:
> 1. alter table a add column ts_date date; update a set ts_date =
> date_trunc('DAY',ts); vacuum full analyze ts;
> 2. or, create index on a (date_trunc('DAY',ts)); analyze a;
> 3. or for testing, set the work_mem higher.
>
>
Ah, that makes sense.

Tried with a BTREE index, and it works as perfectly but the index is 428MB
- which is a bit rough.

Removed that and put on a BRIN index, same result for 48kB - perfect!

Thanks for the help,

James

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell

On Tuesday, 15 March 2016, Robert Haas  wrote:
>
> > Does the cost of the aggregate function come into this calculation at
> > all? In PostGIS land, much smaller numbers of rows can generate loads
> > that would be effective to parallelize (worker time much >> than
> > startup cost).
>
> Unfortunately, no - only the table size.  This is a problem, and needs
> to be fixed.  However, it's probably not going to get fixed for 9.6.
> :-(
>

Any chance of getting a GUC (say min_parallel_degree) added to allow
setting the initial value of parallel_degree, then changing the small
relation check to also pass if parallel_degree > 1?

That way you could set min_parallel_degree on a query by query basis if you
are running aggregates which you know will take a lot of CPU.

I suppose it wouldn't make much sense at all to set globally though, so it
could just confuse matters.

Cheers,

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell

On Tue, Mar 15, 2016 at 9:32 AM, Robert Haas  wrote:

>
> I kind of doubt this would work well, but somebody could write a patch
> for it and try it out.

OK I'll give this a go today and report back.

Would the eventual plan be to use pg_proc.procost for the functions from
each aggregate concerned? If so I might have a peek at that too, although I
imagine I won't get far.

Cheers,

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Choosing parallel_degree

2016-03-14 Thread James Sewell

Thanks David,

Eventually it would be great to take into account the cost of the function
doing the agg (pg_proc.procost, which is a multiple of CPU units).

This would allow people to mark specific aggregations as needing more CPU
power, therefore needing more workers per page (or should it be tuple in
this case?).

In the meantime some way to manually influence this would be good. I just
did some testing (on an 8VCPU machine) with a 139MB table, which gets 3
workers currently.

For a count(*) I get a time of 131.754 ms. If I increase this to 8 workers
I get around 86.193 ms.

Obviously this doesn't mean much as YMMV - but it does show that the
ability to manually adjust the scaling would be great, especially in
data warehouse or reporting environments.

I did want to test with some really slow aggs, but even when I take out the
small table test in create_parallel_paths I can't seem to get a parallel
plan for a tiny table. Any idea on why this would be David?


Cheers,













James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Tue, Mar 15, 2016 at 12:25 PM, David Rowley  wrote:

> Over in [1] James mentioned about wanting more to be able to have more
> influence over the partial path's parallel_degree decision.  At risk
> of a discussion on that hijacking the parallel aggregate thread, I
> thought I'd start this for anyone who would want to discuss making
> changes to that.
>
> I've attached a simple C program which shows the parallel_degree which
> will be chosen at the moment. For now it's based on the size of the
> base relation. Perhaps that will need to be rethought later, perhaps
> based on costs. But I just don't think it's something for 9.6.
>
> Here's the output of the C program.
>
> For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
> For 3001 pages there will be 2 workers (rel size 23 MB, 0 GB)
> For 9001 pages there will be 3 workers (rel size 70 MB, 0 GB)
> For 27001 pages there will be 4 workers (rel size 210 MB, 0 GB)
> For 81001 pages there will be 5 workers (rel size 632 MB, 0 GB)
> For 243001 pages there will be 6 workers (rel size 1898 MB, 1 GB)
> For 729001 pages there will be 7 workers (rel size 5695 MB, 5 GB)
> For 2187001 pages there will be 8 workers (rel size 17085 MB, 16 GB)
> For 6561001 pages there will be 9 workers (rel size 51257 MB, 50 GB)
> For 19683001 pages there will be 10 workers (rel size 153773 MB, 150 GB)
> For 59049001 pages there will be 11 workers (rel size 461320 MB, 450 GB)
> For 177147001 pages there will be 12 workers (rel size 1383960 MB, 1351 GB)
> For 531441001 pages there will be 13 workers (rel size 4151882 MB, 4054 GB)
> For 1594323001 pages there will be 14 workers (rel size 12455648 MB, 12163
> GB)
>
> [1]
> http://www.postgresql.org/message-id/CANkGpBtUvzpdvF2=_iq64ujmvrpycs6d4i9-wepbusq1sq+...@mail.gmail.com
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Choosing parallel_degree

2016-03-15 Thread James Sewell

On Wed, Mar 16, 2016 at 11:26 AM, Julien Rouhaud 
wrote:

>
> I'm not too familiar with parallel planning, but I tried to implement
> both in attached patch. I didn't put much effort into the
> parallel_threshold GUC documentation, because I didn't really see a good
> way to explain it. I'd e happy to improve it if needed. Also, to make
> this parameter easier to tune for users, perhaps we could divide the
> default value by 3 and use it as is in the first iteration in
> create_parallel_path() ?
>
> Also, global max_parallel_degree still needs to be at least 1 for the
> per table value to be considered.
>
>
All applies and works from my end.

Is the max_parallel_degree per table of much use here? It allows the max
number of workers per table to be set - but it's still bound by the same
formula (now from the GUC). So in reality it's only really useful for
limiting the number of workers, not raising it.

Would there be a common use case for limiting parallelism on a subset of
tables in a database you've explicitly set to have a higher amount
of parallel operations via the GUC? I struggle to think of one?

I think in practicality the reverse would be more common, you'd want to set
certain tables to a starting point of a certain number of workers (and ramp
up to more if the formula allowed it). You could set this to 0 for
never use parallel
agg on this table.

Another option is to allow access to the the threshold multiplier
(currently hard coded to 3) per table - but this might become pretty hard
to explain succinctly in the documentation.

Cheers,
James

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Choosing parallel_degree

2016-03-19 Thread James Sewell

Hey,

I think are definitely use cases for using parallel agg  on a small table
when the time for each agg operation is very high. PostGIS can be used to
create many examples of low row count and table size but high CPU
operations.

This does bring up an interesting point I don't quite understand though. If
I run parallel agg on a table with 4 rows with 2 workers will it run on two
workers (2 rows each) or will the first one grab all 4 rows?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Mar 17, 2016 at 10:30 PM, Julien Rouhaud 
wrote:

> On 17/03/2016 12:21, David Rowley wrote:
> > On 18 March 2016 at 00:13, Julien Rouhaud 
> wrote:
> >> With the current threshold, you need a table bigger than 8 MB to be able
> >> to force parallel workers. I'm not sure there'll be benefits for
> >> multiple workers on a table smaller than 8 MB, since setting up all the
> >> parallel stuff takes time.
> >
> > It would be really nice if it were possible to drop the setting really
> > low, so that combined with a low parallel_setup_cost we could enable
> > parallel query on small tables in the regression test suite.
> >
> >
>
> Indeed. That could also be a use case for moving parallel_threshold to a
> GUC, but not sure what'd be best.
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread James Sewell

Hi again,

This is probably me missing something, but is there a reason parallel
aggregate doesn't seem to ever create append nodes containing Index scans?

SET random_page_cost TO 0.2;
SET max_parallel_degree TO 8;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
   QUERY PLAN
-
 Finalize GroupAggregate  (cost=310596.32..310598.03 rows=31 width=16)
   Group Key: view_time_day
   ->  Sort  (cost=310596.32..310596.79 rows=186 width=16)
 Sort Key: view_time_day
 ->  Gather  (cost=310589.00..310589.31 rows=186 width=16)
   Number of Workers: 5
   ->  Partial HashAggregate  (cost=310589.00..310589.31
rows=31 width=16)
 Group Key: view_time_day
 ->  Parallel Seq Scan on base  (cost=0.00..280589.00
rows=600 width=12)


SET max_parallel_degree TO 0;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
QUERY PLAN
---
 GroupAggregate  (cost=0.56..600085.92 rows=31 width=16)
   Group Key: view_time_day
   ->  Index Only Scan using base_view_time_day_count_i_idx on base
 (cost=0.56..450085.61 rows=3000 width=12)
(3 rows)


Cheers,


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Mar 17, 2016 at 8:08 AM, David Rowley 
wrote:

> On 17 March 2016 at 01:29, Robert Haas  wrote:
> > On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila 
> wrote:
> >> Isn't it better to call it as Parallel Aggregate instead of Partial
> >> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
> >> changed to Parallel Seq Scan, so I am not able to think why it is
> better to
> >> call Partial incase of Aggregates.
> >
> > I think partial is the right terminology.  Unlike a parallel
> > sequential scan, a partial aggregate isn't parallel-aware and could be
> > used in contexts having nothing to do with parallelism.  It's just
> > that it outputs transition values instead of a finalized value.
>
> +1  the reason the partial aggregate patches have been kept separate
> from the parallel aggregate patches is that partial aggregate will
> serve for many other purposes. Parallel Aggregate is just one of many
> possible use cases for this, so it makes little sense to give it a
> name according to a single use case.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Choosing parallel_degree

2016-03-19 Thread James Sewell

On Thu, Mar 17, 2016 at 5:05 AM, Julien Rouhaud 
wrote:

>
> attached v3 drops the GUC part.
>

This looks good good. I do think that some threshold control would be good
in the long term - but you are right Robert it just feels strange.

Maybe once the final formula is implemented in 9.7+ and this gets some real
world use cases it can be revisited?

One thing I really, really like about the way the new patch works is that
you can set parallel_degree on an inheritance parent, then that will set
the minimum workers for all associated children (when accessing from the
parent).

Currently this patch will not fire on small tables even when
parallel_degree is set, can we fix this by adding a check for
ref->parallel_degree  to the table size condition?

Cheers,
James

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Choosing parallel_degree

2016-03-20 Thread James Sewell

OK cool, thanks.

Can we remove the minimum size limit when the per table degree setting is
applied?

This would help for tables with 2  - 1000 pages combined with a high CPU
cost aggregate.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Sun, Mar 20, 2016 at 11:23 PM, David Rowley  wrote:

> On 18 March 2016 at 10:13, James Sewell  wrote:
> > This does bring up an interesting point I don't quite understand though.
> If I run parallel agg on a table with 4 rows with 2 workers will it run on
> two workers (2 rows each) or will the first one grab all 4 rows?
> It works on a per page basis, workers just each grab the next page to
> be scanned from a page counter that sits in shared memory, the worker
> just increments the page number, releases the lock on the counter and
> scans that page.
>
> See heap_parallelscan_nextpage()
>
> So the answer to your question is probably no. At least not unless the
> the page only contained 2 rows.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Parallel Aggregate

2016-03-21 Thread James Sewell

Good news!

On Tuesday, 22 March 2016, David Rowley 
wrote:

> On 22 March 2016 at 02:35, Robert Haas  > wrote:
> > I have committed this after changing some of the comments.
> >
> > There might still be bugs ... but I don't see them.  And the speedups
> > look very impressive.
> >
> > Really nice work, David.
>
> Thanks for that, and thank you for taking the time to carefully review
> it and commit it.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

[HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell

Hey All,

I've been doing some (futile) work trying to speed up aggregates with a
group by in PostgreSQL 9.5.

I installed PostgreSQL 9.6 on the same machine to see if I could get
anything running in parallel when using partitioning - which didn't work.

But - I did find this:

With the following setup:

CREATE TABLE base(
view_time TIMESTAMP WITHOUT time ZONE,
view_time_day TIMESTAMP WITHOUT time ZONE,
count_n numeric);


INSERT INTO base
SELECT view_time,
   date_trunc('day', view_time),
   COUNT::numeric,
FROM
  (SELECT
  TIMESTAMP '2015-12-01' + random() * interval '30 days' AS view_time,
trunc(random() * 99 + 1) AS COUNT
   FROM generate_series(1,3000)) a;

analyze base;

Now when I run the following SQL (multiple times to allow for getting
everything into shared buffers, which is 4GB on my machine):

select sum(count_n) from base group by view_time_day;


I get the following results:

PSQL 9.5 - *~21 seconds*
PSQL 9.6 devel - *~8.5 seconds*


I think that's pretty good!

I know this is a devel release, things may change, blah blah. But still,
something has changed for the better here!

I get the same plan on both nodes:

 HashAggregate  (cost=670590.56..670590.95 rows=31 width=13)
   Group Key: view_time_day
   ->  Seq Scan on base  (cost=0.00..520590.04 rows=3104 width=13)

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell

I've actually just tested this on 9.3 - and I get roughly the same as
9.6devel.

Now going back to make sure my 9.5 environment is sane.

Hopefully this isn't me jumping the gun.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Feb 25, 2016 at 10:50 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Thu, Feb 25, 2016 at 12:26 PM, Bruce Momjian  wrote:
> > On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
> >> Now when I run the following SQL (multiple times to allow for getting
> >> everything into shared buffers, which is 4GB on my machine):
> >>
> >>
> >> select sum(count_n) from base group by view_time_day;
> >>
> >>
> >> I get the following results:
> >>
> >>
> >> PSQL 9.5 - ~21 seconds
> >> PSQL 9.6 devel - ~8.5 seconds
> >>
> >>
> >> I think that's pretty good!
> >>
> >> I know this is a devel release, things may change, blah blah. But still,
> >> something has changed for the better here!
> >
> > Wow, that is cool.  Can anyone suggest which commit improved this?
>
> Since it sums numerics, maybe integer transition functions from commit
> 959277a4f579da5243968c750069570a58e92b38 helped?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell

Argh seems like a false alarm for now.

I installed 9.5 from RPM source (the other was one I had installed
previously) and the performance matched 9.6

Sorry about that, I must have *something* screwed up on the other one.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread James Cloos

>>>>> "TL" == Tom Lane  writes:

TL> The question to be asked is whether there is still anybody out there
TL> using float timestamps.

Gentoo's ebuild includes:

   $(use_enable !pg_legacytimestamp integer-datetimes) \

meaning that by default --enable-integer-datetimes is passed to configure,
but if the pg_legacytimestamp use flag is set, then --disable-integer-datetimes
is passed instead.

They document it as:


Use double precision floating-point numbers instead of 64-bit
integers for timestamp storage.


Ie, w/o any kind of deprecation notice.

I don't know how many (how few?) add pg_legacytimestamp to USE when
merging postgresql.  But it is still available as of 9.6 and also
with their live build of git://git.postgresql.org/git/postgresql.git.

-JimC
-- 
James Cloos  OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-06 Thread James Cloos

>>>>> "ST" == Shaun Thomas  writes:

ST> That said, the documentation here says FLOAT4 is an alias for REAL,
ST> so it's somewhat nonintuitive for FLOAT4 to be so much slower than
ST> FLOAT8, which is an alias for DOUBLE PRECISION.

There are some versions of glibc where doing certain math on double is
faster than doing it on float, depending on how things are compiled.

Maybe this is one of them?

-JimC
-- 
James Cloos  OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley

On Mon, 2014-01-13 at 19:48 -0500, Trond Myklebust wrote:
> On Jan 13, 2014, at 19:03, Hannu Krosing  wrote:
> 
> > On 01/13/2014 09:53 PM, Trond Myklebust wrote:
> >> On Jan 13, 2014, at 15:40, Andres Freund  wrote:
> >> 
> >>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote:
> >>>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner  
> >>>> wrote:
> >>>>> I notice, Josh, that you didn't mention the problems many people
> >>>>> have run into with Transparent Huge Page defrag and with NUMA
> >>>>> access.
> >>>> Amen to that.  Actually, I think NUMA can be (mostly?) fixed by
> >>>> setting zone_reclaim_mode; is there some other problem besides that?
> >>> I think that fixes some of the worst instances, but I've seen machines
> >>> spending horrible amounts of CPU (& BUS) time in page reclaim
> >>> nonetheless. If I analyzed it correctly it's in RAM << working set
> >>> workloads where RAM is pretty large and most of it is used as page
> >>> cache. The kernel ends up spending a huge percentage of time finding and
> >>> potentially defragmenting pages when looking for victim buffers.
> >>> 
> >>>> On a related note, there's also the problem of double-buffering.  When
> >>>> we read a page into shared_buffers, we leave a copy behind in the OS
> >>>> buffers, and similarly on write-out.  It's very unclear what to do
> >>>> about this, since the kernel and PostgreSQL don't have intimate
> >>>> knowledge of what each other are doing, but it would be nice to solve
> >>>> somehow.
> >>> I've wondered before if there wouldn't be a chance for postgres to say
> >>> "my dear OS, that the file range 0-8192 of file x contains y, no need to
> >>> reread" and do that when we evict a page from s_b but I never dared to
> >>> actually propose that to kernel people...
> >> O_DIRECT was specifically designed to solve the problem of double 
> >> buffering 
> >> between applications and the kernel. Why are you not able to use that in 
> >> these situations?
> > What is asked is the opposite of O_DIRECT - the write from a buffer inside
> > postgresql to linux *buffercache* and telling linux that it is the same
> > as what
> > is currently on disk, so don't bother to write it back ever.
> 
> I don’t understand. Are we talking about mmap()ed files here? Why
> would the kernel be trying to write back pages that aren’t dirty?

No ... if I have it right, it's pretty awful: they want to do a read of
a file into a user provided buffer, thus obtaining a page cache entry
and a copy in their userspace buffer, then insert the page of the user
buffer back into the page cache as the page cache page ... that's right,
isn't it postgress people?

Effectively you end up with buffered read/write that's also mapped into
the page cache.  It's a pretty awful way to hack around mmap.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley

On Tue, 2014-01-14 at 15:39 +0100, Hannu Krosing wrote:
> On 01/14/2014 09:39 AM, Claudio Freire wrote:
> > On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing  
> > wrote:
> >> Again, as said above the linux file system is doing fine. What we
> >> want is a few ways to interact with it to let it do even better when
> >> working with postgresql by telling it some stuff it otherwise would
> >> have to second guess and by sometimes giving it back some cache
> >> pages which were copied away for potential modifying but ended
> >> up clean in the end.
> > You don't need new interfaces. Only a slight modification of what
> > fadvise DONTNEED does.
> >
> > This insistence in injecting pages from postgres to kernel is just a
> > bad idea. 
> Do you think it would be possible to map copy-on-write pages
> from linux cache to postgresql cache ?
> 
> this would be a step in direction of solving the double-ram-usage
> of pages which have not been read from syscache to postgresql
> cache without sacrificing linux read-ahead (which I assume does
> not happen when reads bypass system cache).

The current mechanism for coherency between a userspace cache and the
in-kernel page cache is mmap ... that's the only way you get the same
page in both currently.

glibc used to have an implementation of read/write in terms of mmap, so
it should be possible to insert it into your current implementation
without a major rewrite.  The problem I think this brings you is
uncontrolled writeback: you don't want dirty pages to go to disk until
you issue a write()  I think we could fix this with another madvise():
something like MADV_WILLUPDATE telling the page cache we expect to alter
the pages again, so don't be aggressive about cleaning them.  Plus all
the other issues with mmap() ... but if you can detail those, we might
be able to fix them.

> and we can write back the copy at the point when it is safe (from
> postgresql perspective)  to let the system write them back ?

Using MADV_WILLUPDATE, possibly ... you're still not going to have
absolute control.  The kernel will write back the pages if the dirty
limits are exceeded, for instance, but we could tune it to be useful.

> Do you think it is possible to make it work with good performance
> for a few million 8kb pages ?
> 
> > At the very least, it still needs postgres to know too much
> > of the filesystem (block layout) to properly work. Ie: pg must be
> > required to put entire filesystem-level blocks into the page cache,
> > since that's how the page cache works. 
> I was more thinking of an simple write() interface with extra
> flags/sysctls to tell kernel that "we already have this on disk"
> > At the very worst, it may
> > introduce serious security and reliability implications, when
> > applications can destroy the consistency of the page cache (even if
> > full access rights are checked, there's still the possibility this
> > inconsistency might be exploitable).
> If you allow write() which just writes clean pages, I can not see
> where the extra security concerns are beyond what normal
> write can do.

The problem is we can't give you absolute control of when pages are
written back because that interface can be used to DoS the system: once
we get too many dirty uncleanable pages, we'll thrash looking for memory
and the system will livelock.

James

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley

On Tue, 2014-01-14 at 11:48 -0500, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
>  wrote:
> > No, I'm sorry, that's never going to be possible.  No user space
> > application has all the facts.  If we give you an interface to force
> > unconditional holding of dirty pages in core you'll livelock the system
> > eventually because you made a wrong decision to hold too many dirty
> > pages.   I don't understand why this has to be absolute: if you advise
> > us to hold the pages dirty and we do up until it becomes a choice to
> > hold on to the pages or to thrash the system into a livelock, why would
> > you ever choose the latter?  And if, as I'm assuming, you never would,
> > why don't you want the kernel to make that choice for you?
> 
> If you don't understand how write-ahead logging works, this
> conversation is going nowhere.  Suffice it to say that the word
> "ahead" is not optional.

No, I do ... you mean the order of write out, if we have to do it, is
important.  In the rest of the kernel, we do this with barriers which
causes ordered grouping of I/O chunks.  If we could force a similar
ordering in the writeout code, is that enough?

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley

On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas  wrote:
> >
> > In terms of avoiding double-buffering, here's my thought after reading
> > what's been written so far.  Suppose we read a page into our buffer
> > pool.  Until the page is clean, it would be ideal for the mapping to
> > be shared between the buffer cache and our pool, sort of like
> > copy-on-write.  That way, if we decide to evict the page, it will
> > still be in the OS cache if we end up needing it again (remember, the
> > OS cache is typically much larger than our buffer pool).  But if the
> > page is dirtied, then instead of copying it, just have the buffer pool
> > forget about it, because at that point we know we're going to write
> > the page back out anyway before evicting it.
> >
> > This would be pretty similar to copy-on-write, except without the
> > copying.  It would just be forget-from-the-buffer-pool-on-write.
> 
> 
> But... either copy-on-write or forget-on-write needs a page fault, and
> thus a page mapping.
> 
> Is a page fault more expensive than copying 8k?
> 
> (I really don't know).

A page fault can be expensive, yes ... but perhaps you don't need one. 

What you want is a range of memory that's read from a file but treated
as anonymous for writeout (i.e. written to swap if we need to reclaim
it).  Then at some time later, you want to designate it as written back
to the file instead so you control the writeout order.  I'm not sure we
can do this: the separation between file backed and anonymous pages is
pretty deeply ingrained into the OS, but if it were possible, is that
what you want?

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley

On Tue, 2014-01-14 at 10:39 -0500, Tom Lane wrote:
> James Bottomley  writes:
> > The current mechanism for coherency between a userspace cache and the
> > in-kernel page cache is mmap ... that's the only way you get the same
> > page in both currently.
> 
> Right.
> 
> > glibc used to have an implementation of read/write in terms of mmap, so
> > it should be possible to insert it into your current implementation
> > without a major rewrite.  The problem I think this brings you is
> > uncontrolled writeback: you don't want dirty pages to go to disk until
> > you issue a write()
> 
> Exactly.
> 
> > I think we could fix this with another madvise():
> > something like MADV_WILLUPDATE telling the page cache we expect to alter
> > the pages again, so don't be aggressive about cleaning them.
> 
> "Don't be aggressive" isn't good enough.  The prohibition on early write
> has to be absolute, because writing a dirty page before we've done
> whatever else we need to do results in a corrupt database.  It has to
> be treated like a write barrier.
> 
> > The problem is we can't give you absolute control of when pages are
> > written back because that interface can be used to DoS the system: once
> > we get too many dirty uncleanable pages, we'll thrash looking for memory
> > and the system will livelock.
> 
> Understood, but that makes this direction a dead end.  We can't use
> it if the kernel might decide to write anyway.

No, I'm sorry, that's never going to be possible.  No user space
application has all the facts.  If we give you an interface to force
unconditional holding of dirty pages in core you'll livelock the system
eventually because you made a wrong decision to hold too many dirty
pages.   I don't understand why this has to be absolute: if you advise
us to hold the pages dirty and we do up until it becomes a choice to
hold on to the pages or to thrash the system into a livelock, why would
you ever choose the latter?  And if, as I'm assuming, you never would,
why don't you want the kernel to make that choice for you?

James



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley

On Tue, 2014-01-14 at 12:39 -0500, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley
>  wrote:
> > On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
> >> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas  wrote:
> >> > In terms of avoiding double-buffering, here's my thought after reading
> >> > what's been written so far.  Suppose we read a page into our buffer
> >> > pool.  Until the page is clean, it would be ideal for the mapping to
> >> > be shared between the buffer cache and our pool, sort of like
> >> > copy-on-write.  That way, if we decide to evict the page, it will
> >> > still be in the OS cache if we end up needing it again (remember, the
> >> > OS cache is typically much larger than our buffer pool).  But if the
> >> > page is dirtied, then instead of copying it, just have the buffer pool
> >> > forget about it, because at that point we know we're going to write
> >> > the page back out anyway before evicting it.
> >> >
> >> > This would be pretty similar to copy-on-write, except without the
> >> > copying.  It would just be forget-from-the-buffer-pool-on-write.
> >>
> >> But... either copy-on-write or forget-on-write needs a page fault, and
> >> thus a page mapping.
> >>
> >> Is a page fault more expensive than copying 8k?
> >>
> >> (I really don't know).
> >
> > A page fault can be expensive, yes ... but perhaps you don't need one.
> >
> > What you want is a range of memory that's read from a file but treated
> > as anonymous for writeout (i.e. written to swap if we need to reclaim
> > it). Then at some time later, you want to designate it as written back
> > to the file instead so you control the writeout order.  I'm not sure we
> > can do this: the separation between file backed and anonymous pages is
> > pretty deeply ingrained into the OS, but if it were possible, is that
> > what you want?
> 
> Doesn't sound exactly like what I had in mind.  What I was suggesting
> is an analogue of read() that, if it reads full pages of data to a
> page-aligned address, shares the data with the buffer cache until it's
> first written instead of actually copying the data.

The only way to make this happen is mmap the file to the buffer and use
MADV_WILLNEED.

>   The pages are
> write-protected so that an attempt to write the address range causes a
> page fault.  In response to such a fault, the pages become anonymous
> memory and the buffer cache no longer holds a reference to the page.

OK, so here I thought of another madvise() call to switch the region to
anonymous memory.  A page fault works too, of course, it's just that one
per page in the mapping will be expensive.

Do you care about handling aliases ... what happens if someone else
reads from the file, or will that never occur?  The reason for asking is
that it's much easier if someone else mmapping the file gets your
anonymous memory than we create an alias in the page cache.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley

On Tue, 2014-01-14 at 15:09 -0500, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 3:00 PM, James Bottomley
>  wrote:
> >> Doesn't sound exactly like what I had in mind.  What I was suggesting
> >> is an analogue of read() that, if it reads full pages of data to a
> >> page-aligned address, shares the data with the buffer cache until it's
> >> first written instead of actually copying the data.
> >
> > The only way to make this happen is mmap the file to the buffer and use
> > MADV_WILLNEED.
> >
> >>   The pages are
> >> write-protected so that an attempt to write the address range causes a
> >> page fault.  In response to such a fault, the pages become anonymous
> >> memory and the buffer cache no longer holds a reference to the page.
> >
> > OK, so here I thought of another madvise() call to switch the region to
> > anonymous memory.  A page fault works too, of course, it's just that one
> > per page in the mapping will be expensive.
> 
> I don't think either of these ideas works for us.  We start by
> creating a chunk of shared memory that all processes (we do not use
> threads) will have mapped at a common address, and we read() and
> write() into that chunk.

Yes, that's what I was thinking: it's a cache.  About how many files
comprise this cache?  Are you thinking it's too difficult for every
process to map the files?

> > Do you care about handling aliases ... what happens if someone else
> > reads from the file, or will that never occur?  The reason for asking is
> > that it's much easier if someone else mmapping the file gets your
> > anonymous memory than we create an alias in the page cache.
> 
> All reads and writes go through the buffer pool stored in shared
> memory, but any of the processes that have that shared memory region
> mapped could be responsible for any individual I/O request.

That seems to be possible with the abstraction.  The initial mapping
gets the file backed pages: you can do madvise to read them (using
readahead), flush them (using wontneed) and flip them to anonymous
(using something TBD).  Since it's a shared mapping API based on the
file, any of the mapping processes can do any operation.  Future mappers
of the file get the mix of real and anon memory, so it's truly shared.

Given that you want to use this as a shared cache, it seems that the API
to flip back from anon to file mapped is wontneed.  That would also
trigger writeback of any dirty pages in the previously anon region ...
which you could force with msync.  As far as I can see, this is
identical to read/write on a shared region with the exception that you
don't need to copy in and out of the page cache.

>From our point of view, the implementation is nice because the pages
effectively never leave the page cache.  We just use an extra per page
flag (which I'll get shot for suggesting) to alter the writeout path
(which is where the complexity which may kill the implementation is).

James

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PostgreSQL Failback without rebuild

2014-02-04 Thread James Sewell

Hello All,

I have been reading through some of the recent discussions about failback
when in a streaming replication setup. I define failback as:

1. Node A is master, Node B is slave
2. Node A crashes || Node A is stopped || nothing happens
3. Promote Node B to Master
4. Attach Node A as slave

My understanding is currently to achieve step three you need to take a base
backup of Node B and deploy it to Node A before starting streaming
replication (or use rsync etc...).

This is very undesirable for many users, especially if they have a very
large database.

>From the discussions I can see that the problem is to do with Node A
writing changes to disk that Node B are not streamed before Node A crashes.

Has there been any consensus on this issue? Are there any solutions which
might make it into 9.4 or 9.5? I've seen some proposals and a tool
(pg_rewind), but all seem to have draw backs.

I've been looking mainly at these threads:

http://www.postgresql.org/message-id/CAF8Q-Gy7xa60HwXc0MKajjkWFEbFDWTG=ggyu1kmt+s2xcq...@mail.gmail.com

http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com

http://www.postgresql.org/message-id/519df910.4020...@vmware.com

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

--
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.

Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread James Sewell

I've just noticed that on PostgreSQL 9.3 I can do the following with a
master node A and a slave node B (as long as I have set
recovery_target_timeline = 'latest'):

   1. Stop Node A
   2. Promote Node B
   3. Attach Node A as slave

This is sufficient for my needs (I know it doesn't cover a crash), can
anyone see any potential problems with this approach?


Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Wed, Feb 5, 2014 at 6:03 PM, Michael Paquier
wrote:

> On Wed, Feb 5, 2014 at 3:14 PM, Amit Kapila 
> wrote:
> > On Wed, Feb 5, 2014 at 10:30 AM, James Sewell  >
> >> I've seen some proposals and a tool (pg_rewind), but all seem to have
> draw
> >> backs.
> >
> > As far as I remember, one of the main drawbacks for pg_rewind was
> related to
> > hint bits which can be avoided by wal_log_hints. pg_rewind is not part of
> > core
> > PostgreSQL code, however if you wish, you can try that tool to see if
> can it
> > solve your purpose.
> For 9.3, pg_rewind is only safe with page checksums enabled. For 9.4,
> yes wal_log_hints or checksums is mandatory. The code contains as well
> some safety checks as well to ensure that a node not using those
> parameters cannot be rewinded.
> Regards,
> --
> Michael
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread James Sewell

Node A could get ahead even if it has been shut down cleanly BEFORE the
promotion?

I'd always assumed if I shut down the master the slave would be at the same
point after shutdown - is this incorrect?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Fri, Feb 7, 2014 at 4:58 PM, Michael Paquier
wrote:

>
>
>
> On Fri, Feb 7, 2014 at 1:57 PM, James Sewell wrote:
>
>> I've just noticed that on PostgreSQL 9.3 I can do the following with a
>> master node A and a slave node B (as long as I have set
>> recovery_target_timeline = 'latest'):
>>
>>1. Stop Node A
>>2. Promote Node B
>>3. Attach Node A as slave
>>
>> This is sufficient for my needs (I know it doesn't cover a crash), can
>> anyone see any potential problems with this approach?
>>
> Yes, node A could get ahead of the point where WAL forked when promoting
> B. In this case you cannot reconnect A to B, and need to actually recreate
> a node from a fresh base backup, or rewind it. pg_rewind targets the
> latter, postgres core is able to to the former, and depending on things
> like your environment and/or the size of your server, you might prefer one
> or the other.
> Regards,
> --
> Michael
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] pg_dump and schema names

2013-08-08 Thread James Sewell

I was under the impression that every command that references a relation
makes use of the search path, regardless of what it is *doing*. Maybe this
is different in older versions though?

I actually ran into this recently and had to remove all the xx. schema
components using vi before I could run the dump to move to a different
schema.

Just to be sure:

postgres=# create schema test;
CREATE SCHEMA
postgres=# set search_path to test;
SET
postgres=# create table test1(id serial);
NOTICE:  CREATE TABLE will create implicit sequence "test1_id_seq" for
serial column "test1.id"
CREATE TABLE
postgres=# alter table test1 owner to postgres;
ALTER TABLE


Cheers,

James Sewell
Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Fri, Aug 9, 2013 at 2:04 PM, Bruce Momjian  wrote:

> pg_dump goes to great lengths not to hard-code the schema name into
> commands like CREATE TABLE, instead setting the search_path before
> creating the table;  these commands:
>
> CREATE SCHEMA xx;
> CREATE TABLE xx.test(x int);
>
> generates this output:
>
> SET search_path = xx, pg_catalog;
> CREATE TABLE test (
> x integer
> );
>
> If you dump a schema and want to reload it into another schema, you
> should only need to update that one search_path line.  However, later in
> the dump file, we hardcode the schema name for setting the object owner:
>
> ALTER TABLE xx.test OWNER TO postgres;
>
> Could we use search_path here to avoid the schema designation?  I am not
> sure this possible because while CREATE defaults to the first existing
> schema in the search_path, I am not sure ALTER has the same behavior
> because you are not _creating_ something with ALTER.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
>   + It's impossible for everything to be true. +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
<>

[HACKERS] PSQL return coder

2013-10-09 Thread James Sewell

Hello,

I am using PSQL to run SQL from a file with the -f flag as follows:

BEGIN
SQL
SQL
...
END

This gives me rollback on error and a nicer output than -1. This works fine.

My question is in a rollback scenario is it possible to get PSQL to return
a non 0 exit status?

Cheers,a
James



-- 

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] PSQL return coder

2013-10-15 Thread James Sewell

I was avoiding ON_ERROR_STOP because I was using ON_ERROR_ROLLBACK, but
have just realised that if I encase my SQL in a transaction then rollback
will still happen.

Perfect!


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Fri, Oct 11, 2013 at 12:25 AM, Merlin Moncure  wrote:

> On Thu, Oct 10, 2013 at 1:52 AM, Tom Lane  wrote:
> > James Sewell  writes:
> >> My question is in a rollback scenario is it possible to get PSQL to
> return
> >> a non 0 exit status?
> >
> > Maybe you could use -c instead of -f?
> >
> > $ psql -c 'select 1; select 1/0' regression
> > ERROR:  division by zero
> > $ echo $?
> > 1
> >
> > You won't need explicit BEGIN/END because this is already a single
> > transaction.
>
> According to the man page,
> "EXIT STATUS
>psql returns 0 to the shell if it finished normally, 1 if a fatal
> error
>of its own (out of memory, file not found) occurs, 2 if the
>  connection
>to the server went bad and the session was not interactive, and 3
> if an
>error occurred in a script and the variable ON_ERROR_STOP was set."
>
> So for a longer script ON_ERROR_STOP might be the ticket (which is
> usually a good idea anyways).
>
> merlin
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-10-16 Thread James Sewell

Hey All,

I had missed these emails, sorry.

The search+bind mode issue is one of documentation location, I have fixed
it by moving the section to the applied to both list. As the patch is to do
with post-auth response this is correct.

As far as the issue when something other than 0 or 1 is set I am happy
throw an error (although this doesn't seem to be how option such as LDAPTLS
work: 1 if 1 else 0). I assume I would use the ereport() function to do
this (using the second example from this page
http://www.postgresql.org/docs/9.2/static/error-message-reporting.html)?

Cheers,
James

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__

 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099

On Thu, Sep 19, 2013 at 12:56 AM, Peter Eisentraut  wrote:

> On 7/8/13 9:33 PM, James Sewell wrote:
> > New patch attached. I've moved from using a boolean to an enum trivalue.
>
> When ldapreferrals is set to something other than "0" or "1" exactly, it
> just ignores the option.  That's not good, I think.  It should probably
> be an error.
>
>

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

[HACKERS] Streaming Replication clusters and load balancing

2015-09-21 Thread James Sewell

Hello all,

I have recently been working with PostgreSQL and HAProxy to provide
seamless load balancing to a group of database servers. This on it's own
isn't a hard thing: I have an implementation finished and am now thinking
about the best way to bring it to a production ready state which could be
used by others, and used in load-balancers other than HAProxy with minimal
config changes.

My initial requirements were:

Given a group of PostgreSQL servers check each x seconds and:

   - Allow read/write access only to the master server (via IPA / portA)
  - Disallow access if there are multiple master servers




   - Allow read access to all servers (via IPB / portB) as long as the
   following holds:
  - They are attached to the current master server via streaming
  replication (or they are the current master server)
  - They can currently contact the master server (safest option,
  disallow all access when master-less)
  - They are in the same timeline as the master server (do I need this
  check?)
  - The master server reports that they have less than x bytes lag



HAProxy can talk to PostgreSQL for a health check via TCP or PSQL
(connection check only). Neither of these allow the logic above - therefore
this logic has to be hosted outside of HAProxy. This might change in the
future if HAProxy gets the ability to send SQL statements (like an F5 can).

Today the best way to provide this information to  HAProxy (and many other
load balancers, application frameworks, proxies, monitoring solutions) is
via HTTP, where HTTP 200 is pass the check and HTTP 503 is fail the check
(and don't load balance to this node). In my case I have a script which
accepts HTTP requests to /read to check if this node is available for read
only and /write which checks if this node is available for read/writes.

The options as I see them are:

   - Implement a script / small app which connects to PostgreSQL and
   executes these checks
  - Implemented and proven today at many independent sites
  - Should it run on HAProxy server or PSQL server?
  - Integrated HTTP server  or x.inetd script?
  - Platform independence?
  - What if it dies?


   - Implement a custom PostgreSQL BGworker which provides this information
   over HTTP
  - No outside of PostgreSQL config needed
  - No reliance on another daemon / interface being up
  - libmicrohttpd or similar should help with  platform independence
  - Security / acceptance by community?
  - Only newer versions of PostgreSQL


   - Spend the time working on getting SQL checks into HAProxy
  - What about other platforms which only support HTTP?

I think all of the options would benefit from a PSQL extension which does
the following:

   - Encapsulates the check logic (easier to upgrade, manipulate)
   - Stores historic check data for a number of hours / days / months
   - Stores defaults (override via HTTP could be possible for things like
   lag)

Does anyone else have any thoughts on this topic?

Eventually many cool features could flow out of this kind of work:

   - Integration with High Availability products - I have this working with
   EnterpriseDB EFM now.
  - Locate the current master using the HA product
 - more than one master doesn't cause loss of service as long as HA
 state is sane
  - Locate all clustered standby servers using the HA product
 - if a standby is removed from the HA cluster, it is removed from
 load balancing
 - if a standby is not part of the cluster, it is removed from load
 balancing (even if it is part of streaming replication)


   - HTTP replication status requests which facilitate dynamically managing
   HAProxy (or other) PostgreSQL server pools
  - Add a node to streaming replication, it automatically shows up in
  the pool and starts being checked to see if it can service reads
  - Great for cloud scale out


   - Allocation of additional load balancer groups based on some criteria
   (?), for example
  - read/write (as above)
  - read only (as above)
  - data warehouse (reporting reads only)
  - DR (replica with no reads or writes - until it becomes a master)

Keen to hear comments.

Cheers,

James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

[HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-01 Thread James Sewell

Hey All,

This patch request grew from this post (of mine) to pgsql-general:

http://www.postgresql.org/message-id/cabuevezouae-g1_oejagujjmem675dnystwybp4d_wz6om+...@mail.gmail.com

The patch adds another available LDAP option (ldapnochaseref) for
search+bind mode in the pg_hba.conf fil. If set to 1 (0 is default) then it
performs a ldap_set_option which disables chasing of any LDAP references
which are returned as part of the search LDIF.

I can think of two use cases for this:

1. (the case which spawned my email) A valid search is performed, but
for some reason a "ref:" with a non responsive LDAP server is returned as
well, which causes the authentication to time out (could be intermittent if
DNS round robin or similar is used and some of the LDAP servers are not
functioning / a packet dropping firewall is in the way).
2. (a case I found when testing with AD) A valid search is performed and
6 "ref:" entries are returned, which all must be chased before
authentication can succeed. Setting ldapnochaseref speeds up authentication
with no negative cost (assuming you understand your LDAP schema).

I think it's work noting that this setting seems to be the default for
ldapsearch on Linux these days.

Hopefully I found all the documentation that I was meant to update, let me
know if not though.

Cheers,

James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000 F: 03 8370 8099 W: www.lisasoft.com

pgsql_ldapnochaseref_v1.diff
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-02 Thread James Sewell

Hey Peter,

You are correct, it is the same  as the referrals option in pam_ldap. It's
also the -C (sometimes -R - it seems ldapsearch options are pretty
non-standard) option in ldapsearch.

As far as I'm aware you can't pass this in an LDAP URL, primarily because
this never gets sent to the LDAP server. The server always returns an LDIF
with inline references, this just determines if you chase them client side
or just list them as is.

I could be missing something here, but using:

 ldapreferrals={0|1}

Would require a three state type, as we need a way of not interfering with
the library defaults? To 'enable' the new behavior here using a boolean you
would need to set ldapreferrals=false - which with the normal way of
dealing with config booleans would alter the default behavior if the option
was not specified.

How do you feel about:

  ldapdisablereferrals=(0|1)

Cheers,
James Sewell

James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com

On Tue, Jul 2, 2013 at 10:46 PM, Peter Eisentraut  wrote:

> On 7/2/13 12:20 AM, James Sewell wrote:
> > Hey All,
> >
> > This patch request grew from this post (of mine) to pgsql-general:
> >
> >
> http://www.postgresql.org/message-id/cabuevezouae-g1_oejagujjmem675dnystwybp4d_wz6om+...@mail.gmail.com
> >
> > The patch adds another available LDAP option (ldapnochaseref) for
> > search+bind mode in the pg_hba.conf fil. If set to 1 (0 is default) then
> > it performs a ldap_set_option which disables chasing of any LDAP
> > references which are returned as part of the search LDIF.
>
> This appears to be the same as the "referrals" option in pam_ldap
> (http://linux.die.net/man/5/pam_ldap).  So it seems legitimate.
>
> For consistency, I would name the option ldapreferrals={0|1}.  I prefer
> avoiding double negatives.
>
> Do you know of a standard way to represent this option in an LDAP URL,
> perhaps as an extension?
>
>

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
<>

Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-03 Thread James Sewell

Heya,

I see what you are saying, the problem as I see it is that the action we
are taking here is "disable chasing ldap referrals". If the name is
ldapreferrals and we use a boolean then setting it to 1  reads in a counter
intuitive manner:

  "set ldapreferals=true to disable chasing LDAP referrals."

Perhaps you are fine with this though if it's documented? It does work in
the inverse way to pam_ldap, where setting to true enables referral
chasing. pam_ldap works like so:

  not set  : library default
  set to 0 : disable referral chasing
  set to 1 : enable referral chasing

The other option would be to have the default value (of the parameter) be
true and set the boolean to false to disable it. I can't find any other
examples of this though - I assume having a one off like this in the code
is a bad thing also?

I'm happy to let you guys decide.

Cheers,
James

James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Wed, Jul 3, 2013 at 6:12 PM, Magnus Hagander  wrote:

>
> On Wed, Jul 3, 2013 at 3:04 AM, James Sewell wrote:
>
>> Hey Peter,
>>
>> You are correct, it is the same  as the referrals option in pam_ldap.
>> It's also the -C (sometimes -R - it seems ldapsearch options are pretty
>> non-standard) option in ldapsearch.
>>
>> As far as I'm aware you can't pass this in an LDAP URL, primarily because
>> this never gets sent to the LDAP server. The server always returns an LDIF
>> with inline references, this just determines if you chase them client side
>> or just list them as is.
>>
>> I could be missing something here, but using:
>>
>>  ldapreferrals={0|1}
>>
>> Would require a three state type, as we need a way of not interfering
>> with the library defaults? To 'enable' the new behavior here using a
>> boolean you would need to set ldapreferrals=false - which with the normal
>> way of dealing with config booleans would alter the default behavior if the
>> option was not specified.
>>
>> How do you feel about:
>>
>>   ldapdisablereferrals=(0|1)
>>
>>
> I agree with Peter that the negative thing is bad. l don't see the
> problem, really. If you don't specify it, you rely on library defaults. If
> you do specify it, we lock it to that setting. I don't see the need to
> specifically have a setting to rely on library defaults - just remove it
> from the line and you get that.
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
<>

Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-08 Thread James Sewell

Hey,

New patch attached. I've moved from using a boolean to an enum trivalue.

Let me know what you think.

Cheers,
James

James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Thu, Jul 4, 2013 at 8:23 PM, Magnus Hagander  wrote:

>
> On Thu, Jul 4, 2013 at 2:30 AM, James Sewell wrote:
>
>> Heya,
>>
>> I see what you are saying, the problem as I see it is that the action we
>> are taking here is "disable chasing ldap referrals". If the name is
>> ldapreferrals and we use a boolean then setting it to 1  reads in a counter
>> intuitive manner:
>>
>
> That assumes that the default in the ldap library is always going to be to
> chase them. Does the standard somehow mandate that it should be?
>
>
>   "set ldapreferals=true to disable chasing LDAP referrals."
>>
>
> You'd obviously reverse the meaning as well. "set ldapreferals=false to
> disable chasing LDAP referrals."
>
>
> Perhaps you are fine with this though if it's documented? It does work in
>> the inverse way to pam_ldap, where setting to true enables referral
>> chasing. pam_ldap works like so:
>>
>>   not set  : library default
>>   set to 0 : disable referral chasing
>>   set to 1 : enable referral chasing
>>
>>
> That is exactly what I'm suggesting it should do, and I'm pretty sure
> that's what Peter suggested as well.
>
>
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
<>

pgsql_ldapnochaseref_v1.2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rounding to even for numeric data type

2015-03-29 Thread James Cloos

>>>>> "MP" == Michael Paquier  writes:

MP> So, attached is a patch that does 1) and 2) to make clear to the
MP> user how numeric and double precision behave regarding rounding.
MP> I am adding it to CF 2015-06 to keep track of it...

Given that the examples show -2.5 rounds to -3, the IEEE term is
roundTiesToAway, and the typical conversational english is round ties
away from zero.

RoundUp means mean towards +Infinity.

754 specifies that for decimal, either roundTiesToEven or roundTiesToAway
are acceptable defaults, and which of the two applies is language dependent.
Does ANSI SQL say anything about how numeric should round?

In general, for decimals (or anything other than binary), there are
twelve possible roundings:

 ToEven ToOdd AwayFromZero ToZero Up Down
 TiesToEven TiesToOdd TiesAwayFromZero TiesToZero TiesUp TiesDown

(Up is the same as ceil(3), Down as floor(3).)

-JimC
-- 
James Cloos  OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ILIKE vs indices

2012-12-28 Thread James Cloos

While tuning an application, I found the posts from 2003 recomending the
use of LOWER() and LIKE in place of ILIKE to take advantage of indices.

For this app, given the limitations of the upper-layer protocol it must
support, that change replaced about 30 minutes of repeated seq scans with
about 1 minute of repeated index scans!  On a query-set often repeated
several times per day.  (Probably more times per day now.)

Is there any contraindication to recasting:

  foo ILIKE 'bar'

into:

  LOWER(foo) LIKE LOWER('bar')

and documenting that an index has to be on LOWER(column) to benefit ILIKE?

Perhaps the parser could read the former as the latter?

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ILIKE vs indices

2012-12-29 Thread James Cloos

>>>>> "TL" == Tom Lane  writes:

JC>> Is there any contraindication to recasting:
JC>> foo ILIKE 'bar'
JC>> into:
JC>> LOWER(foo) LIKE LOWER('bar')

TL> In some locales those are not equivalent, I believe, or at least
TL> shouldn't be.  (What the current code actually does is a separate
TL> question.)

I see.  After determining indexing based on th existance of an initial
fixed string, exluding anything matching isalpha(), it uses tolower(3)
and friends to do the actual match.

So my proposal wouldn't change what matches, but might make fixing any
bugs in what *should* match more difficult?

TL> In any case it's not obvious why LOWER rather than UPPER.

I suggested lower() because that matched all of the suggestions I found.
And as it turns out matches the current behaviour, too.

The footnote about adding explicit mention to the docs was expressly
because it is not otherwise obvious whether indices should use lower()
or upper().

I'll ask on one of the unicode lists whether there are any locales where
a case-insensitive match should be different than a case-preserving match
of tolower() vs tolower().

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

XLog vs SSD [Was: Re: [HACKERS] random write in xlog?]

2010-12-08 Thread James Cloos

>>>>> "JJ" == Jeff Janes  writes:

JJ> Anyway, the writes are logically sequentially, but not physically.
JJ> If I remember correctly, it always writes out full blocks, even if
JJ> the last part of the block has not yet been filled with new data.
JJ> When the remainder gets filled, it then writes out the full block
JJ> again, both the already written and the new part.

What does that mean for use of a flash SSD for the xlog dir?

Does the block writing mesh up well with the usage pattern a flash
SSD needs to maximize lifespan?

I'd love a dram ssd for pg_xlog and the journals for the other
filesystems, but they cost too much.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: XLog vs SSD [Was: Re: [HACKERS] random write in xlog?]

2010-12-09 Thread James Cloos

>>>>> "JJ" == Jeff Janes  writes:

JJ> So PG always writing 8K at a time is unlikely to make a difference
JJ> than if it wrote a smaller amount.

Ah.

Somehow I was thinking of the xlog files' 16M filesize rather than the
internal 8k block size

If it is only writing 8k blocks then there is probably little hope of
making efficient use of flash.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread James Cloos

>>>>> "HL" == Heikki Linnakangas  writes:

HL> It would be nice to have a base-2 text format to represent floats.
HL> It wouldn't be as human-friendly as base-10, but it could be used
HL> when you don't want to lose precision. pg_dump in particular.

hexidecimal notation for floats exists.  The printf format flag is %a
for miniscule and %A for majuscule.

The result of 1./3. is 0xa.aabp-5.

This site has some info and a conversion demo:

http://gregstoll.dyndns.org/~gregstoll/floattohex/

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Testing 9.2 in ~production environment

2012-06-17 Thread James Cloos

I'm giving 9.2-beta2 a test simulating a production workflow.

Everything looks OK except the speed.  Most (all?) queries take about
five to six times as long as they do with 9.1.

The configurations are essentially the same, the query plans are the same.

A (hot) example, pulled semi-randomly from a run, with the names mangled
to protect the innocent:

= 9.1 =
 Nested Loop  (cost=0.00..26.92 rows=1 width=28) (actual time=0.114..0.514 
rows=19 loops=1)
   ->  Index Scan using ms_pkey on ms msg  (cost=0.00..26.03 rows=1 width=20) 
(actual time=0.026..0.207 rows=19 loops=1)
 Index Cond: ((ms_id >= 407) AND (ms_id <= 435) AND (mb_id = 50222))
 Filter: (status = ANY ('{0,1,2}'::integer[]))
   ->  Index Scan using ph_pkey on ph pm  (cost=0.00..0.87 rows=1 width=16) 
(actual time=0.010..0.010 rows=1 loops=19)
 Index Cond: (id = msg.ph_id)
 Total runtime: 0.605 ms

= 9.2 =
 Nested Loop  (cost=0.00..30.12 rows=1 width=28) (actual time=0.439..2.540 
rows=19 loops=1)
   ->  Index Scan using ms_pkey on ms msg  (cost=0.00..29.18 rows=1 width=20) 
(actual time=0.155..1.157 rows=19 loops=1)
 Index Cond: ((ms_id >= 407) AND (ms_id <= 435) AND (mb_id = 50222))
 Filter: (status = ANY ('{0,1,2}'::integer[]))
   ->  Index Scan using ph_pkey on ph pm  (cost=0.00..0.93 rows=1 width=16) 
(actual time=0.053..0.054 rows=1 loops=19)
 Index Cond: (id = msg.ph_id)
 Total runtime: 2.752 ms

All of the tables and indices for the run in question fit into ram.  The
effective cache, work mem, costs, etc were optimized in 9.0, and kept
for 9.1 and the beta.  That the plans are the same suggests that isn't
the problem, yes?

I think I recall mention from a previous beta (but goog isn't helping me
confirm) that there is some extra debugging or such enabled in the betas.

If so, and if turning that off would provide a better comparison, where
in the src should I look?

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-17 Thread James Cloos

>>>>> "AF" == Andres Freund  writes:

AF> Is it possible that you compiled with assertions enabled? That would 
roughly 
AF> fit that magnitude. SHOW debug_assertions; Should show you whether it was 
AF> enabled.

Thanks, but SHOW debug_assertions reports off.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-17 Thread James Cloos

>>>>> "PE" == Peter Eisentraut  writes:

PE> That depends on how you built it.  Just being a beta by itself doesn't
PE> turn on any extra debugging.

OK.  So either I misremembered or it was something no longer done.

PE> That depends on how you built it.

Its a Gentoo box; both were build from their ebuilds, with the same gcc,
flags, etc.

PE> Compare the output of pg_config --configure from both installations.

The only differences are 9.1 vs 9.2 in the paths.

Thanks,

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-18 Thread James Cloos

>>>>> "JB" == Josh Berkus  writes:

JB> Can you check the collations of the two databases?  I'm wondering if 9.1
JB> is in "C" collation and 9.2 is something else.

Thanks!

pg_dump -C tells me these two differences:

 -SET client_encoding = 'SQL_ASCII';
 +SET client_encoding = 'UTF8';

 -CREATE DATABASE dbm WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' 
LC_COLLATE = 'C' LC_CTYPE = 'C';
 +CREATE DATABASE dbm WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 
'C' LC_CTYPE = 'en_US.UTF-8';

for every db in the clusters.

I presume that lc_ctype is the significant difference?

LC_CTYPE *is* specified as 'C' in the dump from which I created the 9.2
cluster, so it must have been overridden by pg_restore.  I see that my
dist's /etc rc script now sets LC_CTYPE.  Would that explain why lc_ctype
changed between the two clusters?

Is there any way to alter a db's lc_ctype w/o dumping and restoring?  I
want to preserve some of the changes made since I copied the 9.1 cluster.
Alter database reports that lc_ctype cannot be changed.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-20 Thread James Cloos

Updating pg_database to set datctype='C' did solve the speed issues with
the two largs dbs.

Presumably, since LC_CTYPE=en_US.UTF-8 was in the env when I ran pg_restore,
it overrode the ctype setting in the dump files.

Some of the slow selects do use ilike; even w/ datctype='C' the indices
are skipped for at least this query:

# explain analyze SELECT mb_id FROM mb WHERE name ILIKE 'foo@bar' AND ownerid=7;

 QUERY PLAN 
 
-
 Seq Scan on mb (cost=0.00..570.96 rows=3 width=4) (actual time=9.443..25.039 
rows=1 loops=1)
   Filter: ((name ~~* 'foo@bar'::text) AND (ownerid = 7))
   Rows Removed by Filter: 34827
 Total runtime: 25.071 ms
(4 rows)

The mb table has several indices, including separate ones on name and ownerid.

(not my design, btw.  And I really do need to re-write the middleware)

Whether it is strcoll(3) (even though LC_COLLATE is explicitly C) or
LIKE, it does make a significant difference for those two apps.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-20 Thread James Cloos

As a followup, I find that I can avoid the seq scan by adding an index
to that table as:

   create index mb_name_own_idx on mb ( lower(name), ownerid );

and changing the query from using the idiom:

  WHERE name ILIKE 'foo@bar' AND ownerid=7;

to using:

  WHERE lower(name) = lower('foo@bar') AND ownerid=7;

which saves 20+ ms on each of the 30+ k such selects in a full run.

I haven't tested how fast it would be with that change and a utf8 ctype.

Because of how the middleware achives its portability between pg, my et al,
changing it to use lower and = will require significant surgery.

Is there any way to specify the index such that the ILIKE query will use
said index?

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Postgres vs. intel ccNUMA on Linux

2010-09-29 Thread James Robinson


Hackers,

	Any tips / conventional wisdom regarding running postgres on large- 
ish memory ccNUMA intel machines, such as a 32G dual-quad-core,  
showing two NUMA nodes of 16G each? I expect each postgres backend's  
non-shared memory usage to remain nice and reasonably sized, hopefully  
staying within the confines of its processor's local memory region,  
but how will accesses to shared memory and / or buffer cache play out?  
Do people tune their backends via 'numactl' ?


	Furthermore, if one had more than one database being served by the  
machine, would it be advisable to do this via multiple clusters  
instead of a single cluster, tweaking the processor affinity of each  
postmaster accordingly, trying to ensure each cluster's shared memory  
segments and buffer cache pools remain local for the resulting backends?


Thanks!

James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] .gitignore files, take two

2010-10-09 Thread James Cloos

I'm reading this a bit late, but...

We (Xorg) found that ignoring:

   *~
   *.bak
   *.patch

in addition to the files generated by building is very helpful.

We do use git tag and git describe in the make dist process, as
well as git log >ChangeLog.  That may be relevant; avoiding git
describe's dirty designation is important when using it in that
fashion.  But it helps to be able to run git add --all cleanly.

I understand that other git users have similar experience.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-25 Thread James Cloos

>>>>> "JD" == Jeff Davis  writes:

JD> 2. Fix the input/output functions in a special mode for dump/reload,
JD>to make them true inverses.

That can be done by supporting the %A printf(3)/scanf(3) format.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-25 Thread James Cloos

>>>>> "JD" == Jeff Davis  writes:

JD> 2. Fix the input/output functions in a special mode for dump/reload,
JD> to make them true inverses.

JC> That can be done by supporting the %A printf(3)/scanf(3) format.

JD> I don't happen to see a %A format in the man page, but I doubt the
JD> output would look like a timestamp.

Modern printf(1) also supports it, so an easy example:

:; printf '%A\n' 3.1415926535897932384626433832795029
0XC.90FDAA22168C235P-2

%a is the same, but with miniscule letters.

It is the hex format of the float types, and defaults to exactly enough
precision.  The length modifier L makes %a expect a long double.

JD> And if it doesn't look like a timestamp, it violates the spirit of a
JD> logical dump of the data.

Point taken.  Had I read the whole thread before replying I would have
been reminded that the float timestamps were archaic; that avoids any
need of %A for timestamps.

That said, the possiblity of hex i/o format for the float datatypes
would be welcome.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-25 Thread James Cloos

>>>>> "TL" == Tom Lane  writes:

JC> That said, the possiblity of hex i/o format for the float datatypes
JC> would be welcome.

TL> It's unportable, for two different reasons:

TL> 2. The printf specifiers you want us to rely on are not standard.

They are in C99.

TL> 1. pg_dump output would become platform-specific.  This is highly
TL> undesirable.

It is true that pg would have to test for them in configure and supply
alternative code wherever libc fails to support them.

I can readily accept that there are many more pressing needs.

But would such a patch for master be rejected?

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL: better default ciphersuite

2013-12-15 Thread James Cloos

>>>>> "MK" == Marko Kreen  writes:
>>>>> "PE" == Peter Eisentraut  writes:

MK>> Well, we should - the DEFAULT is clearly a client-side default
MK>> for compatibility only.  No server should ever run with it.

PE> Any other opinions on this out there?

For reference, see:

  https://wiki.mozilla.org/Security/Server_Side_TLS

for the currently suggested suite for TLS servers.

That is:

ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:
ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:
DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:
ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:
ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:
ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:
DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:
DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:
AES256-GCM-SHA384:ECDHE-RSA-RC4-SHA:ECDHE-ECDSA-RC4-SHA:
AES128:AES256:RC4-SHA:HIGH:
!aNULL:!eNULL:!EXPORT:!DES:!3DES:!MD5:!PSK

The page explains why.

But for pgsql, I'd leave off the !PSK; pre-shared keys may prove useful
for some.  And RC4, perhaps, also should be !ed.

And if anyone wants Kerberos tls-authentication, one could add
KRB5-DES-CBC3-SHA, but that is ssl3-only.

Once salsa20-poly1305 lands in openssl, that should be added to the
start of the list.

-JimC
--
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley

On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
> On 1/13/14, 2:27 PM, Claudio Freire wrote:
> > On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby  wrote:
> >> On 1/13/14, 2:19 PM, Claudio Freire wrote:
> >>>
> >>> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas 
> >>> wrote:
> >>>>
> >>>> On a related note, there's also the problem of double-buffering.  When
> >>>> we read a page into shared_buffers, we leave a copy behind in the OS
> >>>> buffers, and similarly on write-out.  It's very unclear what to do
> >>>> about this, since the kernel and PostgreSQL don't have intimate
> >>>> knowledge of what each other are doing, but it would be nice to solve
> >>>> somehow.
> >>>
> >>>
> >>>
> >>> There you have a much harder algorithmic problem.
> >>>
> >>> You can basically control duplication with fadvise and WONTNEED. The
> >>> problem here is not the kernel and whether or not it allows postgres
> >>> to be smart about it. The problem is... what kind of smarts
> >>> (algorithm) to use.
> >>
> >>
> >> Isn't this a fairly simple matter of when we read a page into shared 
> >> buffers
> >> tell the kernel do forget that page? And a corollary to that for when we
> >> dump a page out of shared_buffers (here kernel, please put this back into
> >> your cache).
> >
> >
> > That's my point. In terms of kernel-postgres interaction, it's fairly 
> > simple.
> >
> > What's not so simple, is figuring out what policy to use. Remember,
> > you cannot tell the kernel to put some page in its page cache without
> > reading it or writing it. So, once you make the kernel forget a page,
> > evicting it from shared buffers becomes quite expensive.
> 
> Well, if we were to collaborate with the kernel community on this then
> presumably we can do better than that for eviction... even to the
> extent of "here's some data from this range in this file. It's (clean|
> dirty). Put it in your cache. Just trust me on this."

This should be the madvise() interface (with MADV_WILLNEED and
MADV_DONTNEED) is there something in that interface that is
insufficient?

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley

On Mon, 2014-01-13 at 22:12 +0100, Andres Freund wrote:
> On 2014-01-13 12:34:35 -0800, James Bottomley wrote:
> > On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
> > > Well, if we were to collaborate with the kernel community on this then
> > > presumably we can do better than that for eviction... even to the
> > > extent of "here's some data from this range in this file. It's (clean|
> > > dirty). Put it in your cache. Just trust me on this."
> > 
> > This should be the madvise() interface (with MADV_WILLNEED and
> > MADV_DONTNEED) is there something in that interface that is
> > insufficient?
> 
> For one, postgres doesn't use mmap for files (and can't without major
> new interfaces).

I understand, that's why you get double buffering: because we can't
replace a page in the range you give us on read/write.  However, you
don't have to switch entirely to mmap: you can use mmap/madvise
exclusively for cache control and still use read/write (and still pay
the double buffer penalty, of course).  It's only read/write with
directio that would cause problems here (unless you're planning to
switch to DIO?).

>  Frequently mmap()/madvise()/munmap()ing 8kb chunks has
> horrible consequences for performance/scalability - very quickly you
> contend on locks in the kernel.

Is this because of problems in the mmap_sem?

> Also, that will mark that page dirty, which isn't what we want in this
> case.

You mean madvise (page_addr)?  It shouldn't ... the state of the dirty
bit should only be updated by actual writes.  Which MADV_ primitive is
causing the dirty marking, because we might be able to fix it (unless
there's some weird corner case I don't know about).

>  One major usecase is transplanting a page comming from postgres'
> buffers into the kernel's buffercache because the latter has a much
> better chance of properly allocating system resources across independent
> applications running.

If you want to share pages between the application and the page cache,
the only known interface is mmap ... perhaps we can discuss how better
to improve mmap for you?

We also do have a way of transplanting pages: it's called splice.  How
do the semantics of splice differ from what you need?

> Oh, and the kernel's page-cache management while far from perfect,
> actually scales much better than postgres'.

Well, then, it sounds like the best way forward would be to get
postgress to use the kernel page cache more efficiently.

James

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley

On Mon, 2014-01-13 at 21:29 +, Greg Stark wrote:
> On Mon, Jan 13, 2014 at 9:12 PM, Andres Freund  wrote:
> > For one, postgres doesn't use mmap for files (and can't without major
> > new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has
> > horrible consequences for performance/scalability - very quickly you
> > contend on locks in the kernel.
> 
> 
> I may as well dump this in this thread. We've discussed this in person
> a few times, including at least once with Ted T'so when he visited
> Dublin last year.
> 
> The fundamental conflict is that the kernel understands better the
> hardware and other software using the same resources, Postgres
> understands better its own access patterns. We need to either add
> interfaces so Postgres can teach the kernel what it needs about its
> access patterns or add interfaces so Postgres can find out what it
> needs to know about the hardware context.
> 
> The more ambitious and interesting direction is to let Postgres tell
> the kernel what it needs to know to manage everything. To do that we
> would need the ability to control when pages are flushed out. This is
> absolutely necessary to maintain consistency. Postgres would need to
> be able to mark pages as unflushable until some point in time in the
> future when the journal is flushed. We discussed various ways that
> interface could work but it would be tricky to keep it low enough
> overhead to be workable.

So in this case, the question would be what additional information do we
need to exchange that's not covered by the existing interfaces.  Between
madvise and splice, we seem to have most of what you want; what's
missing?

> The less exciting, more conservative option would be to add kernel
> interfaces to teach Postgres about things like raid geometries. Then
> Postgres could use directio and decide to do prefetching based on the
> raid geometry, how much available i/o bandwidth and iops is available,
> etc.
> 
> Reimplementing i/o schedulers and all the rest of the work that the
> kernel provides inside Postgres just seems like something outside our
> competency and that none of us is really excited about doing.

This would also be a well trodden path ... I believe that some large
database company introduced Direct IO for roughly this purpose.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Multiple synchronous_standby_names rules

2017-01-11 Thread James Sewell

Hello,

When working with a production (P) and a DR (D) environment it is often a
requirement to be able to protect against data loss when promoting within a
site, and also when losing A and promoting a master at D.

The current synchronous_standby_names do not allow this.

In a simple example we could take the following nodes:

P1 (current master), P2, P3
D1, D2, D3

Where P1 is replicating to (P2, P3, D1, D2, D3).

The closest synchronous_standby_names  setting you could get to my use case
would be:

1 (D1, D2, D3)

This would allow the loss of either site without losing data - however it
would not allow promotion within site P from P1 -> (P2 | P3)  without the
potential for data loss.

What is needed to support this is the ability to configure Px with
something like:

 1 (P1, P2, P3), 1 (D1, D2, D3)

Would there be any appetite for this - or would it be seen as over
complication of the current rules?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect



Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

[HACKERS] ADD FOREIGN KEY locking

2015-02-17 Thread James Sewell

Hello all,

When I add a FK with a statement like this:

ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);

I see a lock on table b:

select locktype,mode,granted from pg_locks, pg_stat_activity where
relation::regclass::text = 'b' AND pg_locks.pid = pg_stat_activity.pid;

locktype | relation
mode | AccessShareLock
granted  | t
query | SOME LONG RUNNING QUERY WHICH SELECTS FROM b

locktype | relation
mode | AccessExclusiveLock
granted  | f
query | ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);


This means that my add key won't complete until my long running query does.
That seems a bit odd to me? In this database there are lots of
datawarehouse type queries running, which makes it a bit hard for me to
schedule this operation.

Is this just a manifestation of adding the key being in an ALTER TABLE,
which always needs an AccessExclusiveLock? Or am I missing some edge case
when this lock would be required in this circumstance?

No real urgency on this question, I just found it a bit strange and thought
someone might be able to shed some light.

James Sewell,
Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] ADD FOREIGN KEY locking

2015-02-17 Thread James Sewell

Oh,

I've just noticed something in the Commit fest post

- Reducing lock strength of trigger and foreign key DDL

Perhaps I just need to be more patient.

Cheers,


James Sewell,
 Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Wed, Feb 18, 2015 at 10:57 AM, James Sewell 
wrote:

> Hello all,
>
> When I add a FK with a statement like this:
>
> ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);
>
> I see a lock on table b:
>
> select locktype,mode,granted from pg_locks, pg_stat_activity where
> relation::regclass::text = 'b' AND pg_locks.pid = pg_stat_activity.pid;
>
> locktype | relation
> mode | AccessShareLock
> granted  | t
> query | SOME LONG RUNNING QUERY WHICH SELECTS FROM b
>
> locktype | relation
> mode | AccessExclusiveLock
> granted  | f
> query | ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);
>
>
> This means that my add key won't complete until my long running query
> does. That seems a bit odd to me? In this database there are lots of
> datawarehouse type queries running, which makes it a bit hard for me to
> schedule this operation.
>
> Is this just a manifestation of adding the key being in an ALTER TABLE,
> which always needs an AccessExclusiveLock? Or am I missing some edge case
> when this lock would be required in this circumstance?
>
> No real urgency on this question, I just found it a bit strange and
> thought someone might be able to shed some light.
>
> James Sewell,
> Solutions Architect
> __
>
>
>  Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

[HACKERS] Composite index and min()

2015-02-25 Thread James Sewell

Hello,

I have the following table:

 \d a
 Table "phxconfig.a"
Column |  Type   | Modifiers
---+-+---
 phx_run_id| integer |
 cell_id   | integer |
Indexes:
"a_phx_run_id_cell_id_idx" btree (phx_run_id, cell_id)

When I use a min() query I get the following plans:

test=# explain select min(phx_run_id) from a;
  QUERY PLAN
---
 Result  (cost=0.22..0.23 rows=1 width=0)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.14..0.22 rows=1 width=4)
   ->  Index Only Scan using a_phx_run_id_cell_id_idx on a
 (cost=0.14..7.89 rows=100 width=4)
 Index Cond: (phx_run_id IS NOT NULL)

test=# explain select min(cell_id) from a;
   QUERY PLAN
-
 Aggregate  (cost=2.25..2.26 rows=1 width=4)
   ->  Seq Scan on a  (cost=0.00..2.00 rows=100 width=4)

Can anyone comment on why this happens?

The index  kicks in when I do an explicit cell_id comparison. These are
large tables, and they are in a partition layout so it really hurts when I
do the min call on the parent table.

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

[HACKERS] ALTER TABLE ... DISABLE TRIGGER vs. AccessExclusiveLock

2010-07-27 Thread James Robinson


Hackers,

Experience and a read through backend/commands/tablecmds.c's  
AlterTable() indicate that ALTER TABLE ... DISABLE TRIGGER obtains an  
exclusive lock on the table (as does any ALTER TABLE).


Blocking other readers from a table when we've, within the body of a  
transaction performing a bulk update operation where we don't want /  
need triggers to fire, seems at first glance to be over-kill. I can  
see how AlterTable()'s complex logic is made less complex through 'get  
and keep a big lock', since most of its operational modes really do  
need exclusive access, but is it strictly required for ... DISABLE /  
REENABLE TRIGGER?


Could, say, RowExclusiveLock hypothetically provide adequate  
protection, allowing concurrent reads, but blocking out any other  
writers (for ENABLE / DISABLE TRIGGER) -- such as if driven through a  
new statement other than ALTER TABLE -- such as "DISABLE TRIGGER foo  
ON tbar" ?


Thanks!

James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Postgres / plpgsql equivalent to python's getattr() ?

2011-08-03 Thread James Robinson


Hackers,

Python's getattr() allows for dynamic lookup of attributes on an  
object, as in:


inst = MyClass(x=12, y=24)
v = getattr(inst, 'x')
assert v == 12

Oftentimes in writing data validating trigger functions, it'd be real  
handy to be able to do a similar thing in plpgsql against column  
values in a row or record type, such as making use of a trigger  
argument for hint as what column to consider in this table's case. Oh,  
to be able to do something like (toy example known to be equivalent to  
a check):


CREATE OR REPLACE FUNCTION must_be_positive() RETURNS TRIGGER AS
$$
begin
if getattr(NEW, TG_ARGV[0]) <= 0
then
raise exception(TG_ARGV[0] || ' must be positive');
end if;

-- after trigger
return null;
end;
$$ LANGUAGE PLPGSQL;


A function which takes a row + a text column name, and / or a peer  
function taking row + index within row would really open up plpgsql's  
expressivity in cases where you're writing mainly SQL stuff, not  
really wanting to go over to plpythonu or whatnot (whose description  
of rows are as dicts).


Is there something in the internals which inherently prevent this? Or  
am I fool and it already exists?


Not having to defer to EXECUTE would be attractive.



James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres / plpgsql equivalent to python's getattr() ?

2011-08-04 Thread James Robinson



On Aug 4, 2011, at 4:55 AM, Florian Pflug wrote:





@OP: Here's my implementation of the feature you desire as a set of  
C-language
functions: https://github.com/fgp/pg_record_inspect. Other people  
did code up
similar things in the past, but I currently cannot find any links to  
their work.
But it little bit digging in the mailing list archives should turn  
them up.



Many thanks, Florian, we'll be checking that out.

James

James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Connection and transaction triggers

2007-08-05 Thread James Mansion

I've been looking at the new trigger features planned for Firebird 2.1 
and I'm a bit jealous.


I'm interested in 'stateful' proc packages that can benefit from 
accumulating data from updates
during a transaction and then performing an action at the end - perhaps 
doing a bulk insert
in a pre-commit trigger, or communicating with an external system in a 
post-commit trigger,

and using a begin tran or rollback trigger to tidy up the accumulator.

Would this be a difficult thing to add?

I quite like their temporary table treatment too.

James


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Another idea for index-only scans

2007-08-18 Thread James Mansion


Decibel! wrote:
The advantage to Bruce's idea is that it sounds pretty simple to 
implement. While it wouldn't be of use for many general cases, it 
*would* be useful for read-only tables, ie: old partitions.


Wouldn't the mostcommon case by foreign key checks against tables that 
essentially map application enums to display strings?  This is a rather 
common scenario.  It would be nice if such tables (which are typically 
small) could be retained in each backend process with a simple check 
that the cached data is still valid.


James


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread James Mansion


Andrew Dunstan wrote:
Could we achieve the same thing in a more general way by having a 
per-FK tiny (say 10?) LRU cache of values checked. Then it wouldn't 
only be restricted to constant expressions. Of course, then the 
trigger would need to keep state, so it might well be too complex 
(e.g. what if there are are concurrent inserts?)


I was wondering whether one could try to identify what might be termed 
'enum tables' that exist to provide lookups.


There are perhaps three main types of table that is the target of a 
foreign key lookup:


1) tables that map to program language enumerations: typically small 
(less than a hundred rows) and changing very infrequently.


2) tables that hold quasi-static reference data where rows are 'never' 
deleted (the may be amended, perhaps to identify that they are logically 
inactivated, but still needed for reference lookup from existing rows 
elsewhere) - typically customer definitions, product definitions, site 
definitions and that sort of thing that is often regarded as 'static 
data' by a user application session but which may change.


3) master records in master/detail relationships such as order/orderline.

If you can have mechanisms that reflect the likelihood of an update and 
optimise accordingly, then hopefully performance in real-world 
applications can be improved.


In the case of 1) for example, we might reasonably have a single logical 
read/write lock that controls access to ALL such tables in a schema, and 
a single 'update generation count'.  The lock would effectively provide 
repeatable read stability across all of the tables (a multi-table table 
lock) while in place, and the generation count (which can be a tran id) 
idicates to caching processes when the cache is stale.  This means that 
unlike normal MVCC the readers will block a writer, but in this case we 
expect the write to happen only during application release.


In the case of 2), we can't use the cross-table lock, and the tables may 
be large, so the suggested LRU cache per table (with a table-level 
read/write lock again) may be most effective, but we may elect to regard 
a read lock as allowing any operation that doesn't invalidate the 
primary key..


And in the case of 3) we don't do anything special at all.

I certainly think that anything which can materially reduce lookups in 
case 1) and hopefully 2) will encourage good database design and 
declarative referential integrity - in some clases of high performance 
application the cost is too high to be done inline with an update, which 
is a shame.


James


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposal: real procedures again (8.4)

2007-10-29 Thread James Mansion


Josh Berkus wrote:
Not only would they be generally useful for SP programming, but multisets 
would eliminate one of the big hurdles in re-writing T-SQL stored 
procedures in PG, and thus make it easier to port from SQL Server.  You 
don't hear a lot of demand for multisets on the mailing lists because 
we're not getting those SQL Server / Sybase crossovers now.
  
Its true that multiple result sets are a big deal with T-SQL 
programming: but I think you'll also
need to provide a way for the locking model to behave in a similar way 
and also very importantly
to be able to emulate the after-statement triggers view of new and old 
images.


James


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Win32 shared memory speed

2007-11-11 Thread James Mansion


Magnus Hagander wrote:

IIRC, there hasn't been any direct benchmark for it (though I've wanted to do 
that but had no time), but it's been the olnly real explanation put forward for 
the behaviour we've seen. And it does make sense given the thread-centric view 
of the windows mm.

/Magnus 
  
How is it supposed to be slow, once its mapped into your process?  
There's no OS interaction at all then.


If you are suggesting that the inter-process synch objects are slow, 
then that may be so: just use interlocked
increment and a spin lock in place of a mutex and use an associated 
event to wake up if necessary.


You dont have to use a named kernel mutex, though it may be handy while 
setting up the shared memory.


If you are repeatedly changing the mappings - well, that may be 
something that needs optimisation.



James


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] .NET or Mono functions in PG

2007-12-01 Thread James Mansion


Magnus Hagander wrote:

I did look at this at some earlier point as well. One big problem at that
time was that once you embedded mono, yuo had all sorts of threads running
in your backend ;-)
  
Is that necessarily a problem?  You have to compile with a 
thread-capable libc and take some
care that the heap implementation is well tuned, but there's no reason 
why the mono housekeeping
threads should cause you any problem is there?  It should be much like 
the embedded Java.

Another way to do it is "the PL/J" way (I think). Which is starting up a
separate process with the VM in it and then do RPC of some kind to it.
Which has more overhead per call, but lower per backend etc. And a lot less
"dangerous".

  
Given that one would want to embed to have very low latency both on 
trigger invocation and

on calls back into the data engine, I don't really see the point personally.

I'm not sure how important it is to make the embeded interface look like 
a standard
interface (in that way that the embedded CLR in MSSQL does - mostly) or 
whether
it can be a thin wrapper over the C API.  I think there's good mileage 
in starting
with the thin wrapper, then at least some common business logic code can 
be used.


James


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] .NET or Mono functions in PG

2007-12-03 Thread James Mansion


Tom Lane wrote:

It is also pretty well established that if pltcl or plperl cause the
backend to become multithreaded, things break horribly.  I strongly
  
Isn't that something that needs to be fixed?  Its one thing not to allow 
for multiple threads to
be in your code, but not having a threaded library linked in is going to 
become more and

more of a problem.

James


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

1 2 3 4 >

1 - 100 of 359 matches

Mail list logo