Re: [HACKERS] Improve lseek scalability v3
On Fri, Sep 16, 2011 at 04:16:49PM +0200, Andres Freund wrote: > I sent an email containing benchmarks from Robert Haas regarding the Subject. > Looking at lkml.org I can't see it right now, Will recheck when I am at home. > > He replaced lseek(SEEK_END) with fstat() and got speedups up to 8.7 times the > lseek performance. > The workload was 64 clients hammering postgres with a simple readonly > workload > (pgbench -S). Yay! Data! > For reference see the thread in the postgres archives which also links to > performance data: http://archives.postgresql.org/message- > id/CA+TgmoawRfpan35wzvgHkSJ0+i-W=vkjpknrxk2ktdr+hsa...@mail.gmail.com So both fstat and lseek do more work than postgres wants. lseek modifies the file pointer while fstat copies all kinds of unnecessary information into userspace. I imagine this is the source of the slowdown seen in the 1-client case. There have been various proposals to make the amount of information returned by fstat limited to the 'cheap' (for various definitions of 'cheap') fields. I'd like to dig into the requirement for knowing the file size a little better. According to the blog entry it's used for "the query planner". Does the query planner need to know the exact number of bytes in the file, or is it after an order-of-magnitude? Or to-the-nearest-gigabyte? -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve lseek scalability v3
> One other thing we're interested in is portability. I mean, even if > Linux were to introduce a new hypothetical syscall that was able to > return the file size at a ridiculously low cost, we probably wouldn't > use it because it'd be Linux-specific. So an improvement of lseek() > seems to be the best option. Fully agreed. It doesn't make any sense at all to implement special syscalls just to workaround a basic system call not scaling. -Andi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: plpgsql - remove unnecessary ccache search when a array variable is updated
Pavel Stehule writes: > this patch significantly reduce a ccache searching. I looked at this patch a little bit. It's got a very serious problem: it supposes that the parent of an ARRAYELEM datum must be a VAR datum, which is not so. As an example, it gets an Assert failure on this: create table rtype (id int, ar text[]); create or replace function foo() returns text[] language plpgsql as $$ declare r record; begin r := row(12, '{foo,bar,baz}')::rtype; r.ar[2] := 'replace'; return r.ar; end$$; select foo(); There is not any good place to keep the array element lookup data for the non-VAR cases that is comparable to what you did for VAR. I wasn't exactly thrilled about adding another field to PLpgSQL_var anyway, because it would go unused in the large majority of cases. A possible solution is to use the ARRAYELEM datum itself to hold the cached lookup data. I'm not sure if it's worth having a level of indirection as you do here; you might as well just drop the fields right into PLpgSQL_arrayelem, because they'd be used in the vast majority of cases. Also, in order to deal with subscripting record fields, you'd better be prepared for the possibility that the target array type changes from time to time. I'd envision this working similarly to what various array-manipulating functions do: you remember the last input OID you looked up, and whenever that changes, repeat the lookup steps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve lseek scalability v3
On Fri, Sep 16, 2011 at 9:08 PM, Benjamin LaHaise wrote: > For such tables, can't Postgres track the size of the file internally? I'm > assuming it's keeping file descriptors open on the tables it manages, in > which case when it writes to a file to extend it, the internally stored size > could be updated. Not making a syscall at all would scale far better than > even a modified lseek() will perform. There's no hardwired limit on how many tables you can have in a database, it's not limited by the number of file descriptors. Postgres would have to keep some kind of LRU for recently opened files and their sizes or something like that. There would probably still be a lot of lseeks/fstats going on. Generally keeping a Postgres cached value for the size would then have a reliability issue. It's much safer to have a single authoritative value -- the actual length of the file -- than have the same value stored in two locations and then need to worry about them getting out of sync. If a write fails when extending the file due to a filesystem running out of space then Postgres might not know how to update its internal cached state accurately for example. There's no question it could be done but it's not clear it would necessarily be much faster than a lock-free lseek/fstat. On Fri, Sep 16, 2011 at 6:27 PM, Andres Freund wrote: > It depends on where the information is used. For some of the uses it needs to > be exact (the assumed size is rechecked after acquiring a lock preventing > extension) Fwiw this might give the wrong impression. I don't believe scans acquire a lock preventing extension -- that is another process can be concurrently extending the file at the same time as the scan is proceeding. The scan only locks out truncation (vacuum). Any blocks added by another process are ignored by the scan because they can only contain records invisible to that transaction. This does depend on the lseek/fstat being done after the transaction snapshot is taken which is possibly "rechecking" the value taken by the query planner but they're really two independent things. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve lseek scalability v3
On Friday, September 16, 2011 11:02:38 PM Andres Freund wrote: > Also with fstat() instead of lseek() there was no bottleneck anymore, so I > don't think the benefits would warrant that. At least thats what I observed on a 4 x 6 machine without the patch applied (can't reboot it). That shouldn't be concurrency relevant so... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve lseek scalability v3
On Friday, September 16, 2011 10:08:17 PM Benjamin LaHaise wrote: > On Fri, Sep 16, 2011 at 07:27:33PM +0200, Andres Freund wrote: > > many tuples does the table have. Those statistics are only updated every > > now and then though. > > So it uses those old stats to check how many tuples are normally stored > > on a page and then uses that to extrapolate the number of tuples from > > the current nr of pages (which is computed by lseek(SEEK_END) over the > > 1GB segements of a table). > > > > I am not sure how interested you are on the relevant postgres internals? > > For such tables, can't Postgres track the size of the file internally? I'm > assuming it's keeping file descriptors open on the tables it manages, in > which case when it writes to a file to extend it, the internally stored > size could be updated. Not making a syscall at all would scale far better > than even a modified lseek() will perform. Yes, it tracks the fds internally. The problem is that postgres is process based so those tables are not reachable by all processes. It could start tracking those in shared memory but the synchronization overhead for that would likely be more expensive than the syscall overhead (Given that the fdsets are possibly (and realistically) disjunct between the individual backends you would have to reserve enough shared memory for a fully seperate fds between each process... Which would complicate efficient lookup). Also with fstat() instead of lseek() there was no bottleneck anymore, so I don't think the benefits would warrant that. Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] force_not_null option support for file_fdw
Shigeru Hanada writes: > (2011/09/09 0:47), Kohei Kaigai wrote: >> makeString() does not copy the supplied string itself, so it is not >> preferable to reference >> NameStr(attr->attname) across ReleaseSysCache(). > Oops, fixed. > [ I should check some of my projects for this issue... ] I've committed this with some mostly-cosmetic revisions, notably * use defGetBoolean, since this ought to be a plain boolean option rather than having its own private idea of which spellings are accepted. * get rid of the ORDER BY altogether in the regression test case --- it seems a lot safer to assume that COPY will read the data in the presented order than that text will be sorted in any particular way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unite recovery.conf and postgresql.conf
> I'm in favor of defining a separate, content-free trigger file to > enable > archive recovery. Not sure about the name "recovery.ready", though > --- > that makes it look like one of the WAL archive transfer trigger > files, > which does not seem like a great analogy. The pg_standby > documentation > suggests names like "foo.trigger" for failover triggers, which is a > bit > better analogy because something external to the database creates the > file. What about "recovery.trigger"? Do we want a trigger file to enable recovery, or one to *disable* recovery? Or both? Also, I might point out that we're really confusing our users by talking about "recovery" all the time, if they're just using streaming replication. Just sayin' > * will seeing these values present in pg_settings confuse anybody? No. pg_settings already has a couple dozen "developer" parameters which nobody not on this mailing list understands. Adding the recovery parameters to it wouldn't confuse anyone further, and would have the advantage of making the recovery parameters available by monitoring query on a hot standby. For that matter, I'd suggest that we add a read-only setting called in_recovery. > * can the values be changed when not in recovery, if so what happens, > and again will that confuse anybody? Yes, and no. > * is there any security hazard from ordinary users being able to see > what settings had been used? primary_conninfo could be a problem, since it's possible to set a password there. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI heap_insert and page-level predicate locks
Jeff Davis writes: > On Wed, 2011-06-08 at 17:29 -0500, Kevin Grittner wrote: >> Heikki Linnakangas wrote: >>> AFAICS, the check for page lock is actually unnecessary. >> Absolutely correct. Patch attached. > I like the change, but the comment is slightly confusing. I've committed this patch with comment rewording along the lines suggested by Jeff. I also moved the CheckForSerializableConflictIn call to just before, instead of just after, the RelationGetBufferForTuple call. We no longer have to do it after, since we don't need to know which buffer to pass, and it should buy some more low-level parallelism to run the SSI checks while not holding exclusive lock on the eventual target buffer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
On Fri, Sep 16, 2011 at 10:37, Tom Lane wrote: > Greg Stark writes: >> On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane wrote: >>> Does anyone want >>> to argue for doing something more complicated, and if so what exactly? > >> Well there's no harm trying to write to oom_score_adj and if that >> fails with EEXISTS trying to write to oom_adj. Yeah, I don't really like the idea of a compile time option that is kernel version dependent... But i don't feel too strongly about it either (all my kernels are new enough that they support oom_score_adj). I'll also note that on my system we are in the good company of ssd and chromium: sshd (978): /proc/978/oom_adj is deprecated, please use /proc/978/oom_score_adj instead. chromium-sandbo (1377): /proc/1375/oom_adj is deprecated, please use /proc/1375/oom_score_adj instead. [ It quite annoying that soon after we decided to stick -DLINUX_OOM_ADJ in they changed it. Whatever happened to a stable userspace API :-( ] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve lseek scalability v3
Excerpts from Andres Freund's message of vie sep 16 14:27:33 -0300 2011: > Hi, > On Friday 16 Sep 2011 17:36:20 Matthew Wilcox wrote: > > Does the query planner need to know the exact number of bytes in the file, > > or is it after an order-of-magnitude? Or to-the-nearest-gigabyte? > It depends on where the information is used. For some of the uses it needs to > be exact (the assumed size is rechecked after acquiring a lock preventing > extension) at other places I guess it would be ok if the accuracy got lower > with bigger files (those files won't ever get bigger than 1GB). One other thing we're interested in is portability. I mean, even if Linux were to introduce a new hypothetical syscall that was able to return the file size at a ridiculously low cost, we probably wouldn't use it because it'd be Linux-specific. So an improvement of lseek() seems to be the best option. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve lseek scalability v3
Hi, On Friday 16 Sep 2011 17:36:20 Matthew Wilcox wrote: > On Fri, Sep 16, 2011 at 04:16:49PM +0200, Andres Freund wrote: > > I sent an email containing benchmarks from Robert Haas regarding the > > Subject. Looking at lkml.org I can't see it right now, Will recheck when > > I am at home. > > > > He replaced lseek(SEEK_END) with fstat() and got speedups up to 8.7 times > > the lseek performance. > > The workload was 64 clients hammering postgres with a simple readonly > > workload (pgbench -S). > Yay! Data! > > For reference see the thread in the postgres archives which also links to > > performance data: http://archives.postgresql.org/message- > > id/CA+TgmoawRfpan35wzvgHkSJ0+i-W=vkjpknrxk2ktdr+hsa...@mail.gmail.com > So both fstat and lseek do more work than postgres wants. lseek modifies > the file pointer while fstat copies all kinds of unnecessary information > into userspace. I imagine this is the source of the slowdown seen in > the 1-client case. Yes, that was my theory as well. > I'd like to dig into the requirement for knowing the file size a little > better. According to the blog entry it's used for "the query planner". Its used for multiple things - one of which is the query planner. The query planner needs to know how many tuples a table has to produce a sensible plan. For that is has stats which tell 1. how big is the table 2. how many tuples does the table have. Those statistics are only updated every now and then though. So it uses those old stats to check how many tuples are normally stored on a page and then uses that to extrapolate the number of tuples from the current nr of pages (which is computed by lseek(SEEK_END) over the 1GB segements of a table). I am not sure how interested you are on the relevant postgres internals? > Does the query planner need to know the exact number of bytes in the file, > or is it after an order-of-magnitude? Or to-the-nearest-gigabyte? It depends on where the information is used. For some of the uses it needs to be exact (the assumed size is rechecked after acquiring a lock preventing extension) at other places I guess it would be ok if the accuracy got lower with bigger files (those files won't ever get bigger than 1GB). But I have a hard time seeing an implementation where the approximate size would be faster to get than just the filesize? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
Alvaro Herrera writes: > Now the problem is that we have defined the LINUX_OOM_ADJ symbol as > meaning the value we're going to write. Maybe this wasn't the best > choice. I mean, it's very flexible, but it doesn't seem to offer any > benefit over a plain boolean choice. > Is your proposal to create a new LINUX_OOM_SCORE_ADJ cpp symbol with the > same semantics? Yes, that's what I was thinking. We could avoid that if we were going to hard-wire a decision that zero is the thing to write, but I see no reason to place such a restriction on users. Who's to say they might not want backends to adopt some other value? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
Excerpts from Tom Lane's message of vie sep 16 13:37:46 -0300 2011: > Greg Stark writes: > > On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane wrote: > >> Does anyone want > >> to argue for doing something more complicated, and if so what exactly? > > > Well there's no harm trying to write to oom_score_adj and if that > > fails with EEXISTS trying to write to oom_adj. > > Well, (1) what value are you going to write (they need to be different > for the two files), and (2) the main point of the exercise, at present, > is to avoid kernel log messages. I'm not sure that trying to create > random files under /proc isn't going to draw bleats in the kernel log. I guess the question is what semantics the new code has. In the old badness() world, child processes inherited the oom_adj value of its parent. What the code in fork_process was used for was resetting the value back to 0 (meaning "kernel is allowed to kill this process on OOM"), so that you could set the oom_adj in the start script for postmaster (to a value meaning "never kill this one"), and the backends would see their values reset to zero. The new oom_score_adj has a scale of -1000 to +1000, with zero being neutral and -1000 being "never kill". So what we want to do here in most cases, it seems, is set the value to zero whether it's oom_adj or oom_score_adj -- assuming the new code is still causing children processes to inherit the "adj" value from the parent. Now the problem is that we have defined the LINUX_OOM_ADJ symbol as meaning the value we're going to write. Maybe this wasn't the best choice. I mean, it's very flexible, but it doesn't seem to offer any benefit over a plain boolean choice. Is your proposal to create a new LINUX_OOM_SCORE_ADJ cpp symbol with the same semantics? The most thorough documentation on this option seems to be this commit: https://github.com/mirrors/linux-2.6/commit/a63d83f427fbce97a6cea0db2e64b0eb8435cd10#include/linux/oom.h -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
Greg Stark writes: > On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane wrote: >> Does anyone want >> to argue for doing something more complicated, and if so what exactly? > Well there's no harm trying to write to oom_score_adj and if that > fails with EEXISTS trying to write to oom_adj. Well, (1) what value are you going to write (they need to be different for the two files), and (2) the main point of the exercise, at present, is to avoid kernel log messages. I'm not sure that trying to create random files under /proc isn't going to draw bleats in the kernel log. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
On Fri, Sep 16, 2011 at 3:57 PM, Tom Lane wrote: > Does anyone want > to argue for doing something more complicated, and if so what exactly? > Well there's no harm trying to write to oom_score_adj and if that fails with EEXISTS trying to write to oom_adj. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On Fri, Sep 16, 2011 at 9:49 AM, Susanne Ebrecht wrote: > On 16.09.2011 15:59, Dave Page wrote: >> >> other plans less than 2 years ago. For me, a representative would have >> been reporting back to us after each meeting, and discussing points to >> raise before each meeting - not working in isolation, without the >> knowledge of others. > > Dave, > > I exactly did this with Peter. > Afair, I once was told it is enough to report to Peter. > And as I said - David showed interests and we sometimes talk about it too. > I never wanted to bother hackers with all this stuff. -hackers is exactly where we would discuss issues related to development and design of PostgreSQL. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
While testing 9.1 RPMs on Fedora 15 (2.6.40 kernel), I notice messages like these in the kernel log: Sep 11 13:38:56 rhl kernel: [ 415.308092] postgres (18040): /proc/18040/oom_adj is deprecated, please use /proc/18040/oom_score_adj instead. These don't show up on every single PG process launch, but that probably just indicates there's a rate-limiter in the kernel reporting mechanism. So it looks like it behooves us to cater for oom_score_adj in the future. The simplest, least risky change that I can think of is to copy-and-paste the relevant #ifdef code block in fork_process.c. If we do that, then it would be up to the packager whether to #define LINUX_OOM_ADJ or LINUX_OOM_SCORE_ADJ or both depending on the behavior he wants. That would be good enough for my own purposes in building Fedora/RHEL packages, since I can predict with confidence which kernel versions a given build is likely to be used with. I think probably the same would be true for most other distro-specific builds. Does anyone want to argue for doing something more complicated, and if so what exactly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On 16.09.2011 15:59, Dave Page wrote: other plans less than 2 years ago. For me, a representative would have been reporting back to us after each meeting, and discussing points to raise before each meeting - not working in isolation, without the knowledge of others. Dave, I exactly did this with Peter. Afair, I once was told it is enough to report to Peter. And as I said - David showed interests and we sometimes talk about it too. I never wanted to bother hackers with all this stuff. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On 16.09.2011 14:38, Merlin Moncure wrote: So, generally speaking, what kinds of things are going to be brought up at the ISO meeting? Is this an opportunity to get postgres special syntax drafted into the sql standard? Yes and no. You first need to convince your country and then - as country representative - you need to convince the other countries on ISO level. You have country based sql standard groups in several countries. The well known groups are ANSI for USA, BSI for UK, DIN for Germany, JTC for Japan and so on. Inner the country you usually represent your own product. Usually the country based group members are a mix of research folk (e.g. universities) and DB-System companies placed inner the country. Which experts they will let in and which not depends on the country based group. It is good here to be in a smaller country - because the groups are smaller and you can get more of your ideas up to ISO. All the country based groups together are ISO. In ISO every country just has a single vote. This means - even when your country suggested what you wanted then it could happen that there are enough countries against it so that your idea won't get into the standard. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initialization of ResultTupleSlot in AppendNode
> That also holds the plan's output tuple descriptor. If you tried to > remove it, I think the ExecAssignResultTypeFromTL call would crash. > And if you removed *that*, upper nodes would get unhappy, cf > ExecGetResultType. Yes, this is true. However upper nodes doesn't need in all cases, so is it possible that ExecGetResultType should check if ResultTupleSlot is NULL, then does functionality similar to ExecAssignResultTypeFromTL, to return tuple descriptor. This can save everytime allocation of ResultTupleSlot for AppendNode. *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Friday, September 16, 2011 4:24 AM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Initialization of ResultTupleSlot in AppendNode Amit Kapila writes: > I observed that during initialization of planstate for Append Node, we > allocate ResulttupleSlot, however it is used only to send NULL slot indicate > no more tuples. > Is it right or there is any other purpose of it? That also holds the plan's output tuple descriptor. If you tried to remove it, I think the ExecAssignResultTypeFromTL call would crash. And if you removed *that*, upper nodes would get unhappy, cf ExecGetResultType. The use as an end-of-scan signal seems a bit vestigial, since we could just as well return NULL, but it doesn't really cost enough to be worth changing ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initialization of ResultTupleSlot in AppendNode
> That also holds the plan's output tuple descriptor. If you tried to > remove it, I think the ExecAssignResultTypeFromTL call would crash. > And if you removed *that*, upper nodes would get unhappy, cf > ExecGetResultType. Yes, this is true. However upper nodes doesn't need in all cases, so is it possible that ExecGetResultType should check if ResultTupleSlot is NULL, then does functionality similar to ExecAssignResultTypeFromTL, to return tuple descriptor. This can save everytime allocation of ResultTupleSlot for AppendNode. *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Friday, September 16, 2011 4:24 AM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Initialization of ResultTupleSlot in AppendNode Amit Kapila writes: > I observed that during initialization of planstate for Append Node, we > allocate ResulttupleSlot, however it is used only to send NULL slot indicate > no more tuples. > Is it right or there is any other purpose of it? That also holds the plan's output tuple descriptor. If you tried to remove it, I think the ExecAssignResultTypeFromTL call would crash. And if you removed *that*, upper nodes would get unhappy, cf ExecGetResultType. The use as an end-of-scan signal seems a bit vestigial, since we could just as well return NULL, but it doesn't really cost enough to be worth changing ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On 16-09-2011 10:26, Susanne Ebrecht wrote: On 16.09.2011 08:49, Heikki Linnakangas wrote: Even if you can't share drafts, would it be possible to give a summary of things that are being discussed in the committee? That way if there's people in the community with interests in particular topics, they could contact you and get involved. Of course it is. I just not wanted to spam hackers. But if it is community interest, of course it will bother no one here... -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On Fri, Sep 16, 2011 at 8:47 AM, Susanne Ebrecht wrote: > On 16.09.2011 14:47, Dave Page wrote: >> >> My point remains - Sun were never in a position to say who represents >> PostgreSQL. > > Dave, > > the procedure works different. The country representation ask for you. > Because you represent your product on one side - but you also represent your > country. > For example ANSI offered Sun to send some experts. > If BSI wants your expertise then they would ask you or your company (don't > know how BSI works internally). > Germany ask for my PostgreSQL expertise. > > Of course Peter always was and still is in my background. > > Finland just has no active group yet - afair Peter is working on that > problem. You're missing my point completely. You say you represent PostgreSQL on the SQL Committee (or German working group, but that's not the point), yet the PostgreSQL hackers didn't know that, and were making other plans less than 2 years ago. For me, a representative would have been reporting back to us after each meeting, and discussing points to raise before each meeting - not working in isolation, without the knowledge of others. I'd be glad to see us have representation, but I do not believe we have had any yet, and whatever you have done so far (which may or may not be good for us) really doesn't count because it hasn't involved the project in any way. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On 16.09.2011 14:47, Dave Page wrote: My point remains - Sun were never in a position to say who represents PostgreSQL. Dave, the procedure works different. The country representation ask for you. Because you represent your product on one side - but you also represent your country. For example ANSI offered Sun to send some experts. If BSI wants your expertise then they would ask you or your company (don't know how BSI works internally). Germany ask for my PostgreSQL expertise. Of course Peter always was and still is in my background. Finland just has no active group yet - afair Peter is working on that problem. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unite recovery.conf and postgresql.conf
On 15-09-2011 23:54, Fujii Masao wrote: #1 Use empty recovery.ready file to enter arhicve recovery. recovery.conf is not read automatically. All recovery parameters are expected to be specified in postgresql.conf. If you must specify them in recovery.conf, you need to add "include 'recovery.conf'" into postgresql.conf. But note that that recovery.conf will not be renamed to recovery.done at the end of recovery. This is what the patch I've posted does. This is simplest approach, but might confuse people who use the tools which depend on recovery.conf. more or less +1. We don't need two config files.; just one: postgresql.conf. Just turn all recovery.conf parameters to GUCs. As already said, the recovery.conf settings are not different from archive settings, we just need a way to trigger the recovery. And that trigger could be pulled by a GUC (standby_mode) or a file (say recovery -> recovery.done). Also, recovery.done could be filled with recovery information just for DBA record. standby_mode does not create any file, it just trigger the recovery (as it will be used mainly for replication purposes). -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fstat vs. lseek
On Friday 16 Sep 2011 15:19:07 Andrea Suisani wrote: > hi > > On 08/08/2011 07:50 PM, Robert Haas wrote: > > On Mon, Aug 8, 2011 at 1:31 PM, Andres Freund wrote: > >> If its ok I will write a mail to lkml referencing this thread and your > >> numbers inline (with attribution obviously). > > > > That would be great. Please go ahead. > > I've just stumbled across this thread on lkml [1] > "Improve lseek scalability v3". > > and I thought to ping pgsql hackers list > just in case, more to the point they're > asking "are there any real workloads which care > [Make generic lseek lockless safe]" I wrote them a mail sometime ago (some weeks) regarding an earlier version of the patch... Can't find it right now though. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fstat vs. lseek
hi On 08/08/2011 07:50 PM, Robert Haas wrote: On Mon, Aug 8, 2011 at 1:31 PM, Andres Freund wrote: If its ok I will write a mail to lkml referencing this thread and your numbers inline (with attribution obviously). That would be great. Please go ahead. I've just stumbled across this thread on lkml [1] "Improve lseek scalability v3". and I thought to ping pgsql hackers list just in case, more to the point they're asking "are there any real workloads which care [Make generic lseek lockless safe]" maybe I've got it wrong but it seems somewhat related to what has been discussed here and also in Robert Haas's "Linux and glibc Scalability" blog post [1]. [cut] Andrea [1] https://lkml.org/lkml/2011/9/15/399 [2] http://rhaas.blogspot.com/2011/08/linux-and-glibc-scalability.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
Heikki, On 16.09.2011 08:49, Heikki Linnakangas wrote: Even if you can't share drafts, would it be possible to give a summary of things that are being discussed in the committee? That way if there's people in the community with interests in particular topics, they could contact you and get involved. Of course it is. I just not wanted to spam hackers. FWIW, I'm very glad you're on the committee! Even though I have no idea what's going on there, it gives me a warm feeling that there's someone on the committee who knows PostgreSQL. If someone proposes something that would hurt PostgreSQL, like syntax that we already use for something else, I know you're going to speak up. Thanks for the bouquet. This comment let me feel better. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
Excerpts from Susanne Ebrecht's message of vie sep 16 03:24:58 -0300 2011: > Isn't it possible to create a closed mailing list - a list that won't > get published - on which > I can discuss SQL Standard stuff with the folk who wants to support me? It's certainly possible to create a private mailing list to support this idea. How would the membership be approved, however, is not clear to me. Would we let only well-known names from other pgsql lists into it? (I, for one, had no idea you were in the SQL committee.) -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On Fri, Sep 16, 2011 at 1:43 PM, Susanne Ebrecht wrote: > >>> Since 4 years I am PostgreSQL representative in SQL Standard committee. >> >> With respect, I believe you are on the committee as you were an >> employee of MySQL. > > Nope. As Sun employee - I always was responsible for taking care of > Postgresql - taking care of MySQL others did. My point remains - Sun were never in a position to say who represents PostgreSQL. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
Dave, On 16.09.2011 14:33, Dave Page wrote: On Fri, Sep 16, 2011 at 7:24 AM, Susanne Ebrecht wrote: Since 4 years I am PostgreSQL representative in SQL Standard committee. With respect, I believe you are on the committee as you were an employee of MySQL. Nope. As Sun employee - I always was responsible for taking care of Postgresql - taking care of MySQL others did. An event committee is not approving talks because the work is important to the community - they are approving talks that will be of interest to the conference audience. You exactly hit the point here - where I had another opinion for what a community event also is. But doesn't matter. As I said - I just learned. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On Fri, Sep 16, 2011 at 1:24 AM, Susanne Ebrecht wrote: > The next ISO meeting will be soon - and of course there are lots of drafts > that needs > decisions. So, generally speaking, what kinds of things are going to be brought up at the ISO meeting? Is this an opportunity to get postgres special syntax drafted into the sql standard? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On Fri, Sep 16, 2011 at 7:24 AM, Susanne Ebrecht wrote: > > Since 4 years I am PostgreSQL representative in SQL Standard committee. With respect, I believe you are on the committee as you were an employee of MySQL. We've had a number of discussions both online and at one of the more recent developer meetings, and even approved funding around having (if I remember correctly) Peter or Simon represent us on the committee. > Always, when I suggested to talk about my work in the SQL committee on > community > events, a committee rejected it. This just showed me that nobody really has > interests > in my work. > > I now learned that such a event committee not always is able to estimate > interests correct. An event committee is not approving talks because the work is important to the community - they are approving talks that will be of interest to the conference audience. In the case of PG Conference Europe which I suspect you are alluding to there were a significant number of talks submitted that would be of far more interest and benefit to our primary audience of end users. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unite recovery.conf and postgresql.conf
On fre, 2011-09-16 at 11:54 +0900, Fujii Masao wrote: > #1 > Use empty recovery.ready file to enter arhicve recovery. recovery.conf > is not read automatically. All recovery parameters are expected to be > specified in postgresql.conf. If you must specify them in recovery.conf, > you need to add "include 'recovery.conf'" into postgresql.conf. But note > that that recovery.conf will not be renamed to recovery.done at the > end of recovery. This is what the patch I've posted does. This is > simplest approach, but might confuse people who use the tools which > depend on recovery.conf. A small variant to this: When you are actually doing recovery from a backup, having a recovery trigger and a recovery done file is obviously quite helpful and necessary for safety. But when you're setting up a replication slave, it adds extra complexity for the user. The approximately goal ought to be to be able to do pg_basebackup -h master -D there postgres -D there --standby-mode=on --primary-conninfo=master without the need to touch any obscure recovery trigger files. So perhaps recovery.{trigger,ready} should not be needed if, say, standby_mode=on. But then what impact should the presence of recovery.done have? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unite recovery.conf and postgresql.conf
On fre, 2011-09-16 at 01:32 -0400, Tom Lane wrote: > As far as the other issues go, I think there is actually a > prerequisite > discussion to be had here, which is whether we are turning the > recovery > parameters into plain old GUCs or not. If they are plain old GUCs, > then > they will presumably still have their values when we are *not* doing > recovery. That leads to a couple of questions: > * will seeing these values present in pg_settings confuse anybody? How so? We add or change the available parameters all the time. > * can the values be changed when not in recovery, if so what happens, > and again will that confuse anybody? Should be similar to archive_command and archive_mode. You can still see and change archive_command when archive_mode is off. > * is there any security hazard from ordinary users being able to see > what settings had been used? Again, not much different from the archive_* settings. They are, after all, almost the same in the opposite direction. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Double sorting split patch
On 15.09.2011 21:46, Alexander Korotkov wrote: On Thu, Sep 15, 2011 at 7:27 PM, Heikki Linnakangas< heikki.linnakan...@enterprisedb.com> wrote: I've looked at the patch, and took a brief look at the paper - but I still don't understand the algorithm. I just can't get my head around the concepts of split pairs and left/right groups. Can you explain that in layman's terms? Perhaps an example would help? In short algorithm works as following: 1) Each box can be projected to the axis as an interval. Box (x1,y1)-(x2,y2) are projected to X axis as (x1,x2) interval and to the Y axis as (y1,y2) interval. At the first step we search for splits of those intervals and select the best one. 2) At the second step produced split are converting into terms of boxes and ambiguities are solving. Let's see a little deeper how intervals split search are performed by considering an example. We've intervals (0,1), (1,3), (2,3), (2,4). We assume intervals of the groups to be (0,a), (b,4). So, "a" can be some upper bound of interval: {1,3,4}, and "b" can be some lower bound of inteval: {0,1,2}. We consider following splits: each "a" with greatest possible "b" (0,1), (1,4) (0,3), (2,4) (0,4), (2,4) and each "b" with least possible "a". In this example splits will be: (0,1), (0,4) (0,1), (1,4) (0,3), (2,4) By removing the duplicates we've following splits: (0,1), (0,4) (0,1), (1,4) (0,3), (2,4) (0,4), (2,4) Ok, thanks, I understand that now. Proposed algorithm finds following splits by single pass on two arrays: one sorted by lower bound of interval and another sorted by upper bound of interval. That looks awfully complicated. I don't understand how that works. I wonder if two passes would be simpler? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there really no interest in SQL Standard?
On 16.09.2011 09:24, Susanne Ebrecht wrote: The next ISO meeting will be soon - and of course there are lots of drafts that needs decisions. I am not allowed to share the drafts in public. Because the drafts are confidential. I think that's a big part of the problem. It's hard to get excited about something if you don't know what's happening. But I am allowed to share the drafts with the group of ppl who are supporting me. Of course I am allowed to discuss the drafts with my folk before I will give my votes and comments. Even if you can't share drafts, would it be possible to give a summary of things that are being discussed in the committee? That way if there's people in the community with interests in particular topics, they could contact you and get involved. Isn't it possible to create a closed mailing list - a list that won't get published - on which I can discuss SQL Standard stuff with the folk who wants to support me? I could join such a mailing list if you create one, but it would probably be better if specific topics could be discussed on pgsql-hackers. Perhaps this is something you should bring up in the committee. I understand that the committee doesn't want to open its work to the whole world, but I also don't see why work-in-progress features couldn't be discussed in the public. FWIW, I'm very glad you're on the committee! Even though I have no idea what's going on there, it gives me a warm feeling that there's someone on the committee who knows PostgreSQL. If someone proposes something that would hurt PostgreSQL, like syntax that we already use for something else, I know you're going to speak up. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unite recovery.conf and postgresql.conf
Fujii Masao writes: > We have three choices. Which do you like the best? I'm in favor of defining a separate, content-free trigger file to enable archive recovery. Not sure about the name "recovery.ready", though --- that makes it look like one of the WAL archive transfer trigger files, which does not seem like a great analogy. The pg_standby documentation suggests names like "foo.trigger" for failover triggers, which is a bit better analogy because something external to the database creates the file. What about "recovery.trigger"? As far as the other issues go, I think there is actually a prerequisite discussion to be had here, which is whether we are turning the recovery parameters into plain old GUCs or not. If they are plain old GUCs, then they will presumably still have their values when we are *not* doing recovery. That leads to a couple of questions: * will seeing these values present in pg_settings confuse anybody? * can the values be changed when not in recovery, if so what happens, and again will that confuse anybody? * is there any security hazard from ordinary users being able to see what settings had been used? If these settings are to be plain old GUCs, then I think we should just stick them in postgresql.conf and forget about recovery.conf (although of course someone could "include recovery.conf" if he were bent on storing them separately). On the other hand, if we think they are *not* plain old GUCs, maybe they shouldn't be in postgresql.conf. But the source file isn't the first thing to worry about in that case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers