Re: [HACKERS] Bugs/slowness inserting and indexing cubes
On Wed, Feb 15, 2012 at 2:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: ITSM, I found the problem. This piece of code is triggering an error. It assumes each page of corresponding to have initialized buffer. That should be true because we're inserting index tuples from up to down while splits propagate from down to up. But this assumptions becomes false we turn buffer off in the root page. So, root page can produce pages without initialized buffers when splits. Hmm ... can we tighten the error check rather than just remove it? It feels less than safe to assume that a hash-entry-not-found condition *must* reflect a corner-case situation like that. At the very least I'd like to see it verify that we'd turned off buffering before deciding this is OK. Better, would it be practical to make dummy entries in the hash table even after turning buffers off, so that the logic here becomes if (!found) error; else if (entry is dummy) return without doing anything; else proceed; regards, tom lane Ok, there is another patch fixes this problem. Instead of error triggering remove it adds empty buffers on root page split if needed. -- With best regards, Alexander Korotkov. *** a/src/backend/access/gist/gistbuild.c --- b/src/backend/access/gist/gistbuild.c *** *** 668,677 gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer, if (is_split BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO) { GISTBufferingInsertStack *oldroot = gfbb-rootitem; ! Page page = BufferGetPage(buffer); ! ItemId iid; ! IndexTuple idxtuple; ! BlockNumber leftmostchild; gfbb-rootitem = (GISTBufferingInsertStack *) MemoryContextAlloc( gfbb-context, sizeof(GISTBufferingInsertStack)); --- 668,678 if (is_split BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO) { GISTBufferingInsertStack *oldroot = gfbb-rootitem; ! Page page = BufferGetPage(buffer); ! ItemId iid; ! IndexTuple idxtuple; ! BlockNumber leftmostchild; ! OffsetNumber maxoff, i; gfbb-rootitem = (GISTBufferingInsertStack *) MemoryContextAlloc( gfbb-context, sizeof(GISTBufferingInsertStack)); *** *** 694,699 gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer, --- 695,719 oldroot-parent = gfbb-rootitem; oldroot-blkno = leftmostchild; oldroot-downlinkoffnum = InvalidOffsetNumber; + + /* + * If root page split produce new pages on leven which have buffers + * then initialize empty buffers there. + */ + if (LEVEL_HAS_BUFFERS(oldroot-level, gfbb)) + { + maxoff = PageGetMaxOffsetNumber(page); + for (i = FirstOffsetNumber; i = maxoff; i = OffsetNumberNext(i)) + { + iid = PageGetItemId(page, i); + idxtuple = (IndexTuple) PageGetItem(page, iid); + gistGetNodeBuffer(gfbb, + buildstate-giststate, + ItemPointerGetBlockNumber((idxtuple-t_tid)), + i, + gfbb-rootitem); + } + } } if (splitinfo) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_fsync performance
On Wed, Feb 15, 2012 at 02:23, Bruce Momjian br...@momjian.us wrote: On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote: On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote: +1, I was about to suggest the same thing. Running any of these tests for a fixed number of iterations will result in drastic degradation of accuracy as soon as the machine's behavior changes noticeably from what you were expecting. Run them for a fixed time period instead. Or maybe do a few, then check elapsed time and estimate a number of iterations to use, if you're worried about the cost of doing gettimeofday after each write. Good idea, and it worked out very well. I changed the -o loops parameter to -s seconds which calls alarm() after (default) 2 seconds, and then once the operation completes, computes a duration per operation. I was kind of wondering how portable alarm() is, and the answer according to the buildfarm is that it isn't. I'm using following simplistic alarm() implementation for win32: https://github.com/markokr/libusual/blob/master/usual/signal.c#L21 this works with fake sigaction()/SIGALARM hack below - to remember function to call. Good enough for simple stats printing, and avoids win32-specific code spreading around. Wow, I wasn't even aware this compiled in Win32; I thought it was ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this work. (Me wonders if the GoGrid Windows images have compilers.) They don't, since most of the compilers people would ask for don't allow that kind of redistribution. Ping me on im if you need one preconfigured, though... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Different gettext domain needed for error context
I just noticed that we use the same gettext domain for all messages attached to one error. That is wrong in case of context information, where you have a stack of context lines, originating from different modules. The result is that context lines don't always get translated. For example: postgres=# set lc_messages ='de_DE.utf8'; SET postgres=# do $$ begin select 1 / 0; end $$; FEHLER: Division durch Null CONTEXT: SQL-Anweisung »select 1 / 0« PL/pgSQL function inline_code_block line 3 at SQL-Anweisung Notice how the string PL/pgSQL function ... is not translated. The ereport call that raises that error is in int4div, which is in the backend gettext domain, postgres. But the errcontext() call comes from pl_exec.c. If the error originates from src/pl/plpgsql, then it works: postgres=# do $$ begin raise; end $$; FEHLER: RAISE ohne Parameter kann nicht außerhalb einer Ausnahmebehandlung verwendet werden CONTEXT: PL/pgSQL-Funktion »inline_code_block« Zeile 3 bei RAISE In theory, I guess the other fields like errhint() potentially have the same problem, but they're not currently used to provide extra information to messages originating from another module, so I'm inclined to leave them alone for now. To fix this, we need to somehow pass the caller's text domain to errcontext(). The most straightforward way is to pass it as an extra argument. Ideally, errcontext() would be a macro that passes TEXTDOMAIN to the underlying function, so that you don't need to change all the callers of errcontext(): #define errcontext(...) errcontext_domain(TEXTDOMAIN, ...) But that doesn't work, because it would require varags macros. Anyone know a trick to make that work? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml
diff --git a/doc/src/sgml/sepgsql.sgml b/doc/src/sgml/sepgsql.sgml index e45c258..ee0a255 100644 *** a/doc/src/sgml/sepgsql.sgml --- b/doc/src/sgml/sepgsql.sgml *** UPDATE t1 SET x = 2, y = md5sum(y) WHERE *** 358,364 /synopsis In this case we must have literaldb_table:select/ in addition to ! literaldb_table:update/, because literalt1.a/ is referenced within the literalWHERE/ clause. Column-level permissions will also be checked for each referenced column. /para --- 358,364 /synopsis In this case we must have literaldb_table:select/ in addition to ! literaldb_table:update/, because literalt1.z/ is referenced within the literalWHERE/ clause. Column-level permissions will also be checked for each referenced column. /para (It is unclear to me why the same example is cited twice here, but the text around them is consistent with that.) Christoph -- c...@df7cb.de | http://www.df7cb.de/ signature.asc Description: Digital signature
Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [HACKERS] [v9.2] Fix Leaky View Problem)
The attached patch is additional regression tests of ALTER FUNCTION with LEAKPROOF based on your patch. It also moves create_function_3 into the group with create_aggregate and so on. Thanks, 2012/2/14 Kohei KaiGai kai...@kaigai.gr.jp: 2012/2/14 Robert Haas robertmh...@gmail.com: On Tue, Feb 14, 2012 at 4:55 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: I could not find out where is the origin of grammer conflicts, although it does not conflict with any options within ALTER FUNCTION. Do you think the idea of ALTER ... NOT LEAKPROOF should be integrated within v9.2 timeline also? Yes. Did you notice that I attached a patch to make that work? I'll commit that today or tomorrow unless someone comes up with a better solution. Yes. I'll be available to work on the feature based on this patch. It was a headache of mine to implement alter statement to add/remove leakproof attribute. I also think we ought to stick create_function_3 into one of the parallel groups in the regression tests, if possible. Can you investigate that? Not yet. This test does not have dependency with other tests, so, I'm optimistic to run create_function_3 concurrently. Me, too. I tried to move create_function_3 into the group of create_view and create_index, then it works correctly. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- KaiGai Kohei kai...@kaigai.gr.jp pgsql-v9.2-alter-function-leakproof-regtest.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] controlling the location of server-side SSL files
On ons, 2012-02-08 at 09:16 +0100, Magnus Hagander wrote: My best idea at the moment is that we should set these parameters to empty by default, and make users point them to existing files if they want to use that functionality. Comments? +1. Anybody who actually cares about setting up security is likely not going to rely on defaults anyway - and is certainly going to review whatever they are. So there should be no big problem there. Updated patch to reflect this. *** i/doc/src/sgml/config.sgml --- w/doc/src/sgml/config.sgml *** *** 668,673 SET ENABLE_SEQSCAN TO OFF; --- 668,737 /listitem /varlistentry + varlistentry id=guc-ssl-ca-file xreflabel=ssl_ca_file + termvarnamessl_ca_file/varname (typestring/type)/term + indexterm +primaryvarnamessl_ca_file/ configuration parameter/primary + /indexterm + listitem +para + Specifies the name of the file containing the SSL server certificate + authority (CA). The default is empty, meaning no CA file is loaded, + and client certificate verification is not performed. (In previous + releases of PostgreSQL, the name of this file was hard-coded + as filenameroot.crt/filename.) Relative paths are relative to the + data directory. This parameter can only be set at server start. +/para + /listitem + /varlistentry + + varlistentry id=guc-ssl-cert-file xreflabel=ssl_cert_file + termvarnamessl_cert_file/varname (typestring/type)/term + indexterm +primaryvarnamessl_cert_file/ configuration parameter/primary + /indexterm + listitem +para + Specifies the name of the file containing the SSL server certificate. + The default is filenameserver.crt/filename. Relative paths are + relative to the data directory. This parameter can only be set at + server start. +/para + /listitem + /varlistentry + + varlistentry id=guc-ssl-crl-file xreflabel=ssl_crl_file + termvarnamessl_crl_file/varname (typestring/type)/term + indexterm +primaryvarnamessl_crl_file/ configuration parameter/primary + /indexterm + listitem +para + Specifies the name of the file containing the SSL server certificate + revocation list (CRL). The default is empty, meaning no CRL file is + loaded. (In previous releases of PostgreSQL, the name of this file was + hard-coded as filenameroot.crl/filename.) Relative paths are + relative to the data directory. This parameter can only be set at + server start. +/para + /listitem + /varlistentry + + varlistentry id=guc-ssl-key-file xreflabel=ssl_key_file + termvarnamessl_key_file/varname (typestring/type)/term + indexterm +primaryvarnamessl_key_file/ configuration parameter/primary + /indexterm + listitem +para + Specifies the name of the file containing the SSL server private key. + The default is filenameserver.key/filename. Relative paths are + relative to the data directory. This parameter can only be set at + server start. +/para + /listitem + /varlistentry + varlistentry id=guc-ssl-renegotiation-limit xreflabel=ssl_renegotiation_limit termvarnamessl_renegotiation_limit/varname (typeinteger/type)/term indexterm *** i/doc/src/sgml/runtime.sgml --- w/doc/src/sgml/runtime.sgml *** *** 1831,1840 pg_dumpall -p 5432 | psql -d postgres -p 5433 SSL certificates and make sure that clients check the server's certificate. To do that, the server must be configured to accept only literalhostssl/ connections (xref !linkend=auth-pg-hba-conf) and have SSL !filenameserver.key/filename (key) and !filenameserver.crt/filename (certificate) files (xref !linkend=ssl-tcp). The TCP client must connect using literalsslmode=verify-ca/ or literalverify-full/ and have the appropriate root certificate file installed (xref linkend=libpq-connect). --- 1831,1838 SSL certificates and make sure that clients check the server's certificate. To do that, the server must be configured to accept only literalhostssl/ connections (xref !linkend=auth-pg-hba-conf) and have SSL key and certificate files !(xref linkend=ssl-tcp). The TCP client must connect using literalsslmode=verify-ca/ or literalverify-full/ and have the appropriate root certificate file installed (xref linkend=libpq-connect). *** *** 2053,2062 pg_dumpall -p 5432 | psql -d postgres -p 5433 /note para !To start in acronymSSL/ mode, the files filenameserver.crt/ !and filenameserver.key/ must exist in the server's data directory. !These files should contain the server certificate and private key, !
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
(2012/02/14 23:50), Tom Lane wrote: Shigeru Hanadashigeru.han...@gmail.com writes: (2012/02/14 17:40), Etsuro Fujita wrote: As discussed at that thread, it would have to change the PlanForeignScan API to let the FDW generate multiple paths and dump them all to add_path instead of returning a FdwPlan struct. Multiple valuable Paths for a scan of a foreign table by FDW, but changing PlanForeignScan to return list of FdwPlan in 9.2 seems too hasty. I would really like to see that happen in 9.2, because the longer we let that mistake live, the harder it will be to change. More and more FDWs are getting written. I don't think it's that hard to do: we just have to agree that PlanForeignScan should return void and call add_path for itself, possibly more than once. Agreed. I fixed the PlanForeignScan API. Please find attached a patch. If we do that, I'm inclined to think we cou;d get rid of the separate Node type FdwPlan, and just incorporate List *fdw_private into ForeignPath and ForeignScan. +1 While the patch retains the struct FdwPlan, I would like to get rid of it at next version of the patch. Best regards, Etsuro Fujita *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** *** 25,30 --- 25,31 #include miscadmin.h #include nodes/makefuncs.h #include optimizer/cost.h + #include optimizer/pathnode.h #include utils/rel.h #include utils/syscache.h *** *** 93,99 PG_FUNCTION_INFO_V1(file_fdw_validator); /* * FDW callback routines */ ! static FdwPlan *filePlanForeignScan(Oid foreigntableid, PlannerInfo *root, RelOptInfo *baserel); static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es); --- 94,100 /* * FDW callback routines */ ! static void filePlanForeignScan(Oid foreigntableid, PlannerInfo *root, RelOptInfo *baserel); static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es); *** *** 406,432 get_file_fdw_attribute_options(Oid relid) /* * filePlanForeignScan ! *Create a FdwPlan for a scan on the foreign table */ ! static FdwPlan * filePlanForeignScan(Oid foreigntableid, PlannerInfo *root, RelOptInfo *baserel) { FdwPlan*fdwplan; char *filename; List *options; /* Fetch options --- we only need filename at this point */ fileGetOptions(foreigntableid, filename, options); /* Construct FdwPlan with cost estimates */ fdwplan = makeNode(FdwPlan); estimate_costs(root, baserel, filename, fdwplan-startup_cost, fdwplan-total_cost); - fdwplan-fdw_private = NIL; /* not used */ ! return fdwplan; } /* --- 407,447 /* * filePlanForeignScan ! *Create the (single) path for a scan on the foreign table */ ! static void filePlanForeignScan(Oid foreigntableid, PlannerInfo *root, RelOptInfo *baserel) { + ForeignPath *pathnode = makeNode(ForeignPath); FdwPlan*fdwplan; char *filename; List *options; + pathnode-path.pathtype = T_ForeignScan; + pathnode-path.parent = baserel; + pathnode-path.pathkeys = NIL; /* result is always unordered */ + pathnode-path.required_outer = NULL; + pathnode-path.param_clauses = NIL; + /* Fetch options --- we only need filename at this point */ fileGetOptions(foreigntableid, filename, options); /* Construct FdwPlan with cost estimates */ fdwplan = makeNode(FdwPlan); + fdwplan-fdw_private = NIL; /* not used */ estimate_costs(root, baserel, filename, fdwplan-startup_cost, fdwplan-total_cost); ! pathnode-fdwplan = fdwplan; ! ! /* Use costs estimated by FDW */ ! pathnode-path.rows = baserel-rows; ! pathnode-path.startup_cost = fdwplan-startup_cost; ! pathnode-path.total_cost = fdwplan-total_cost; ! ! add_path(baserel, (Path *) pathnode); } /* *** a/doc/src/sgml/fdwhandler.sgml --- b/doc/src/sgml/fdwhandler.sgml *** *** 88,108 para programlisting ! FdwPlan * PlanForeignScan (Oid foreigntableid, PlannerInfo *root, RelOptInfo *baserel); /programlisting ! Plan a scan on a foreign table. This is called when a query is planned. literalforeigntableid/ is the structnamepg_class/ OID of the foreign table. literalroot/ is the planner's global information about the query, and literalbaserel/ is
Re: [HACKERS] Bugs/slowness inserting and indexing cubes
On 15.02.2012 10:18, Alexander Korotkov wrote: On Wed, Feb 15, 2012 at 2:54 AM, Tom Lanet...@sss.pgh.pa.us wrote: Alexander Korotkovaekorot...@gmail.com writes: ITSM, I found the problem. This piece of code is triggering an error. It assumes each page of corresponding to have initialized buffer. That should be true because we're inserting index tuples from up to down while splits propagate from down to up. But this assumptions becomes false we turn buffer off in the root page. So, root page can produce pages without initialized buffers when splits. Hmm ... can we tighten the error check rather than just remove it? It feels less than safe to assume that a hash-entry-not-found condition *must* reflect a corner-case situation like that. At the very least I'd like to see it verify that we'd turned off buffering before deciding this is OK. Better, would it be practical to make dummy entries in the hash table even after turning buffers off, so that the logic here becomes if (!found) error; else if (entry is dummy) return without doing anything; else proceed; regards, tom lane Ok, there is another patch fixes this problem. Instead of error triggering remove it adds empty buffers on root page split if needed. Actually, I think it made sense to simply do nothing if the buffer doesn't exist. The algorithm doesn't require that all the buffers must exist at all times. It is quite accidental that whenever we call gistRelocateBuildBuffersOnSplit(), the page must already have its buffer created (and as we found out, the assumption doesn't hold after a root split, anyway). Also, we talked earlier that it would be good to destroy buffers that become completely empty, to save memory. If we do that, we'd have to remove that check anyway. So, I think we should go with your original fix and simply do nothing in gistRelocateBuildBuffersOnSplit() if the page doesn't have a buffer. Moreover, if the page has a buffer but it's empty, gistRelocateBuildBuffersOnSplit() doesn't need to create buffers for the new sibling pages. In the final emptying phase, that's a waste of time, the buffers we create will never be used, and even before that I think it's better to create the buffers lazily. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On 15 February 2012 06:16, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 10, 2012 at 10:30 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: [ new patch ] I spent quite a bit of time looking at this today - the patch specifically, and the issue of making quicksort fast more generally. It seemed to me that if we're going to have separate copies of the quicksort code for tuple sorting, we might as well go whole hog and specialize those copies to the particular needs of tuplesort.c as much as possible. Accordingly, I whacked the code around so that it knows that it is sorting SortTuple objects and need not conditionalize at runtime on the size of the objects being swapped. You suggested upthread that this might be worthwhile, and it seems that it is, so I think we should do it. Cool. I agree that we should do this. It doesn't need to be justified as a performance optimisation - it makes sense to refactor in this way. If that makes things faster, then so much the better. Your patch removes the CHECK_FOR_INTERRUPTS() call from comparetup_heap, which is no good. However, since I'd already decided to specialize the copies of quicksort intended for sorting as much as possible, it made sense to me to refactor things so that the qsort routine itself, rather than the comparator, is responsible for calling CHECK_FOR_INTERRUPTS(). This slightly reduces the number of times we CHECK_FOR_INTERRUPTS(), but never allows more than a few comparisons before doing it. Well, the idea of that was to have one and only CHECK_FOR_INTERRUPTS() call in the leading comparator. I should perhaps have further stressed that that patch was only intended to establish the idea of having that function pointer call within an inlined leading-key's comparator calling outer comparator. I find that your pg_always_inline macro is equivalent to just plain inline on my system (MacOS X v10.6.8, gcc 4.2.1). It seems to need something like this: +#elif __GNUC__ +#define pg_always_inline inline __attribute__((always_inline)) ...but I'm not very happy about relying on that, because I don't know that it will work on every gcc version (never mind non-gcc compilers), and I'm not convinced it's actually improving performance even on this one. The documentation seems to indicate that this is intended to force inlining even when not optimizing, which may have something to do with the lack of effect: that's not really the point here anyway. There is a scant reference that suggests this, yes, but that's contradicted by other scanty sources. The macro was observed to be helpful on the multi-key case, and the effect was quite apparent and reproducible. At any rate its appearance in my most recenty patch was vestigial - I think that there ought to be no need for it, now that the compiler cost/benefit analysis probably has things right by inlining. What I did instead is to replace template_qsort_arg.h with a script called gen_qsort_tuple.pl, which generates a file called qsort_tuple.c that tuplesort.c then #includes. This seems more flexible to me than the macro-based approach. In particular, it allows me to generate versions of qsort with different call signatures. The attached patch generates two: static void qsort_tuple(SortTuple *a, size_t n, SortTupleComparator cmp_tuple, Tuplesortstate *state); static void qsort_ssup(SortTuple *a, size_t n, SortSupport ssup); The first of these is a drop-in replacement for qsort_arg() - any tuplesort can use it, not just heap sorts. But it is faster than qsort_arg() because of the specializations for the SortTuple data type. The second handles the special case where we are sorting by a single key that has an associated SortSupport object. In this case we don't need to carry the overhead of passing around the Tuplesortstate and dereferencing it, nor do we need the SortTupleComparator: we can just pass the SortSupport itself. Maybe there's a way to get this effect using macros, but I couldn't figure it out. I've seen the pattern where generic programming is aped with the preprocessor in in the style of my patch in, of all things, wxWidgets, where it continues to be used as part of pgAdmin, or was when I worked on wxWidgets 3 support exactly one year ago. One thing that is particularly bizarre about that code is that clients actually #include a cpp file. This is, I'd guess, a legacy of having to target pre C++98 compilers without decent template support. MSVC 6, in particular, was completely inadequate in this area. I am inclined to agree that given that we already use Perl to generate source code like this, it seems natural that we should prefer to do that, if only to avoid paranoia about the inclusion of a dial-a-bloat knob. I am at a disadvantage here, since I've never written a line of Perl. With this patch, I get the following results, as compared with your 2012-02-10 version and master, using the same test cases I tested before. select *
Re: [HACKERS] Bugs/slowness inserting and indexing cubes
On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Actually, I think it made sense to simply do nothing if the buffer doesn't exist. The algorithm doesn't require that all the buffers must exist at all times. It is quite accidental that whenever we call gistRelocateBuildBuffersOnSpli**t(), the page must already have its buffer created (and as we found out, the assumption doesn't hold after a root split, anyway). Also, we talked earlier that it would be good to destroy buffers that become completely empty, to save memory. If we do that, we'd have to remove that check anyway. So, I think we should go with your original fix and simply do nothing in gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer. Moreover, if the page has a buffer but it's empty, gistRelocateBuildBuffersOnSpli**t() doesn't need to create buffers for the new sibling pages. In the final emptying phase, that's a waste of time, the buffers we create will never be used, and even before that I think it's better to create the buffers lazily. I agree. -- With best regards, Alexander Korotkov.
Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml
On Wed, Feb 15, 2012 at 5:38 AM, Christoph Berg c...@df7cb.de wrote: diff --git a/doc/src/sgml/sepgsql.sgml b/doc/src/sgml/sepgsql.sgml index e45c258..ee0a255 100644 *** a/doc/src/sgml/sepgsql.sgml --- b/doc/src/sgml/sepgsql.sgml *** UPDATE t1 SET x = 2, y = md5sum(y) WHERE *** 358,364 /synopsis In this case we must have literaldb_table:select/ in addition to ! literaldb_table:update/, because literalt1.a/ is referenced within the literalWHERE/ clause. Column-level permissions will also be checked for each referenced column. /para --- 358,364 /synopsis In this case we must have literaldb_table:select/ in addition to ! literaldb_table:update/, because literalt1.z/ is referenced within the literalWHERE/ clause. Column-level permissions will also be checked for each referenced column. /para Fixed, but note that I had to recreate the patch by manual examination. Including it inline tends to garble things. (It is unclear to me why the same example is cited twice here, but the text around them is consistent with that.) Fixed this too, and did some related rewording and proofreading. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Harada-san, I checked the v9 patch, however, it still has some uncertain implementation. [memory context of tuple store] It calls tuplestore_begin_heap() under the memory context of festate-scan_cxt at pgsqlBeginForeignScan. On the other hand, tuplestore_gettupleslot() is called under the memory context of festate-tuples. I could not find a callback functions being invoked on errors, so I doubt the memory objects acquired within tuplestore_begin_heap() shall be leaked, even though it is my suggestion to create a sub-context under the existing one. In my opinion, it is a good choice to use es_query_cxt of the supplied EState. What does prevent to apply this per-query memory context? You mention about PGresult being malloc()'ed. However, it seems to me fetch_result() and store_result() once copy the contents on malloc()'ed area to the palloc()'ed area, and PQresult is released on an error using PG_TRY() ... PG_CATCH() block. [Minor comments] Please set NULL to sql variable at begin_remote_tx(). Compiler raises a warnning due to references of uninitialized variable, even though the code path never run. It potentially causes a problem in case when fetch_result() raises an error because of unexpected status (!= PGRES_TUPLES_OK). One code path is not protected with PG_TRY(), and other code path will call PQclear towards already released PQresult. Although it is just a preference of mine, is the exprFunction necessary? It seems to me, the point of push-down check is whether the supplied node is built-in object, or not. So, an sufficient check is is_builtin() onto FuncExpr-funcid, OpExpr-opno, ScalarArrayOpExpr-opno and so on. It does not depend on whether the function implementing these nodes are built-in or not. Thanks, 2012年2月14日9:09 Shigeru Hanada shigeru.han...@gmail.com: (2012/02/14 15:15), Shigeru Hanada wrote: Good catch, thanks. I'll revise pgsql_fdw tests little more. Here are the updated patches. In addition to Fujita-san's comment, I moved DROP OPERATOR statements to clean up section of test script. Regards, -- Shigeru Hanada -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_fsync performance
On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote: On Wed, Feb 15, 2012 at 02:23, Bruce Momjian br...@momjian.us wrote: On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote: On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote: +1, I was about to suggest the same thing. Running any of these tests for a fixed number of iterations will result in drastic degradation of accuracy as soon as the machine's behavior changes noticeably from what you were expecting. Run them for a fixed time period instead. Or maybe do a few, then check elapsed time and estimate a number of iterations to use, if you're worried about the cost of doing gettimeofday after each write. Good idea, and it worked out very well. I changed the -o loops parameter to -s seconds which calls alarm() after (default) 2 seconds, and then once the operation completes, computes a duration per operation. I was kind of wondering how portable alarm() is, and the answer according to the buildfarm is that it isn't. I'm using following simplistic alarm() implementation for win32: https://github.com/markokr/libusual/blob/master/usual/signal.c#L21 this works with fake sigaction()/SIGALARM hack below - to remember function to call. Good enough for simple stats printing, and avoids win32-specific code spreading around. Wow, I wasn't even aware this compiled in Win32; I thought it was ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this work. (Me wonders if the GoGrid Windows images have compilers.) They don't, since most of the compilers people would ask for don't allow that kind of redistribution. Shame. Ping me on im if you need one preconfigured, though... How do you do that? Also, once you create a Windows VM on a public cloud, how do you connect to it? SSH? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml
Re: Robert Haas 2012-02-15 ca+tgmozjutszhpxwwzcm4kjjwh__1admo3kosbbq+fhscqp...@mail.gmail.com Fixed, but note that I had to recreate the patch by manual examination. Including it inline tends to garble things. Hmm, I thought I had :set paste and everything... (It is unclear to me why the same example is cited twice here, but the text around them is consistent with that.) Fixed this too, and did some related rewording and proofreading. Makes much more sense now. Thanks! Christoph -- c...@df7cb.de | http://www.df7cb.de/ signature.asc Description: Digital signature
Re: [HACKERS] pg_test_fsync performance
On Tue, Feb 14, 2012 at 08:23:10PM -0500, Bruce Momjian wrote: On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote: On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote: +1, I was about to suggest the same thing. Running any of these tests for a fixed number of iterations will result in drastic degradation of accuracy as soon as the machine's behavior changes noticeably from what you were expecting. Run them for a fixed time period instead. Or maybe do a few, then check elapsed time and estimate a number of iterations to use, if you're worried about the cost of doing gettimeofday after each write. Good idea, and it worked out very well. I changed the -o loops parameter to -s seconds which calls alarm() after (default) 2 seconds, and then once the operation completes, computes a duration per operation. I was kind of wondering how portable alarm() is, and the answer according to the buildfarm is that it isn't. I'm using following simplistic alarm() implementation for win32: https://github.com/markokr/libusual/blob/master/usual/signal.c#L21 this works with fake sigaction()/SIGALARM hack below - to remember function to call. Good enough for simple stats printing, and avoids win32-specific code spreading around. Wow, I wasn't even aware this compiled in Win32; I thought it was ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this work. (Me wonders if the GoGrid Windows images have compilers.) I see backend/port/win32/timer.c so I might go with a simple create a thread, sleep(2), set flag, exit solution. Yeah, two Windows buildfarm machines have now successfully compiled my patches, so I guess I fixed it; patch attached. The fix was surprisingly easy given the use of threads; scheduling the timeout in the operating system was just too invasive. I would like to eventually know if this fix actually produces the right output. How would I test that? Are the buildfarm output binaries available somewhere? Should I add this as a 9.2 TODO item? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c new file mode 100644 index 02a9e21..7f92bc8 *** a/contrib/pg_test_fsync/pg_test_fsync.c --- b/contrib/pg_test_fsync/pg_test_fsync.c *** *** 28,39 --- 28,54 #define OPS_FORMAT %9.3f ops/sec /* These are macros to avoid timing the function call overhead. */ + #ifndef WIN32 #define START_TIMER \ do { \ alarm_triggered = false; \ alarm(secs_per_test); \ gettimeofday(start_t, NULL); \ } while (0) + #else + /* WIN32 doesn't support alarm, so we create a thread and sleep there */ + #define START_TIMER \ + do { \ + alarm_triggered = false; \ + if (CreateThread(NULL, 0, process_alarm, NULL, 0, NULL) == \ + INVALID_HANDLE_VALUE) \ + { \ + fprintf(stderr, Cannot create thread for alarm\n); \ + exit(1); \ + } \ + gettimeofday(start_t, NULL); \ + } while (0) + #endif #define STOP_TIMER \ do { \ *** static void test_sync(int writes_per_op) *** 62,68 --- 77,87 static void test_open_syncs(void); static void test_open_sync(const char *msg, int writes_size); static void test_file_descriptor_sync(void); + #ifndef WIN32 static void process_alarm(int sig); + #else + static DWORD WINAPI process_alarm(LPVOID param); + #endif static void signal_cleanup(int sig); #ifdef HAVE_FSYNC_WRITETHROUGH *** main(int argc, char *argv[]) *** 82,88 --- 101,109 /* Prevent leaving behind the test file */ signal(SIGINT, signal_cleanup); signal(SIGTERM, signal_cleanup); + #ifndef WIN32 signal(SIGALRM, process_alarm); + #endif #ifdef SIGHUP /* Not defined on win32 */ signal(SIGHUP, signal_cleanup); *** print_elapse(struct timeval start_t, str *** 550,560 --- 571,592 printf(OPS_FORMAT \n, per_second); } + #ifndef WIN32 static void process_alarm(int sig) { alarm_triggered = true; } + #else + static DWORD WINAPI + process_alarm(LPVOID param) + { + /* WIN32 doesn't support alarm, so we create a thread and sleep here */ + Sleep(secs_per_test * 1000); + alarm_triggered = true; + ExitThread(0); + } + #endif static void die(const char *str) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On Wed, Feb 15, 2012 at 8:29 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: Cool. I agree that we should do this. It doesn't need to be justified as a performance optimisation - it makes sense to refactor in this way. If that makes things faster, then so much the better. Well, maybe so, but I think if the performance had been the same I might not have messed with it. I am inclined to agree that given that we already use Perl to generate source code like this, it seems natural that we should prefer to do that, if only to avoid paranoia about the inclusion of a dial-a-bloat knob. I am at a disadvantage here, since I've never written a line of Perl. I think it's still dial-a-bloat, but I feel pretty comfortable about how we've got that knob adjusted in this version. It's almost as much improvement as any previous version, it applies to more cases, and the code footprint is the least of any version I've measured. Hmm. I'll run some tests myself when I get a chance, and attempt to determine what's going on here. I don't have immediate access to a good benchmarking server, which would be preferable, and I'd like to post a revision of pg_stat_statements normalisation today, as it's been too long. Nice work though. Thanks. It strikes me that if we wanted to take this further, we could look at squeezing out ApplySortComparator. For example, suppose that, upon discovering that we can do an in-memory quicksort on a single sort key, we make an initial pass over the data where we check whether it's sorted and, as we go, swap all the entries with isnull1 = true to the end of the memtuples array. We then sort the isnull1 = true entries with the standard comparator, and the isnull1 = false entries with an optimized comparator that elides most of ApplySortComparator and instead just calls the comparison function directly. We then decide on whether to emit the isnull1 = true entries first or last based on NULLS FIRST/LAST, and decide whether to emit the remaining entries in forward or reverse order based on ASC/DESC. Or maybe not exactly that thing, but something like that, so that we sort the null and non-null entries separately. The additional complexity in the read-out logic would probably be more than offset by being able to use a simpler comparator. While it's really easy to be wrong about these things, I'd venture to guess that branch prediction makes this a less than compelling win in practice. That said, if you want to know how good a win it might be, you need only knock out a quick-and-dirty prototype and see how fast a sort is - however well this does will be pretty close to your best case, but not quite, since presumably such a prototype wouldn't worry about the applicability of the optimisation. You might be right. But note that the swapcode improvements were also vulnerable to branch prediction, too, though there could also be other effects there. One effect that should perhaps be considered is the cost of saving and restoring registers. Even if the branch itself doesn't cost much because we know what the result will be, the operands still have to be loaded into registers, which isn't free of itself and also potentially forces those registers to be saved and restored across function calls. Still, I'm inclined not to poke at this right now: there are a lot of patches that still need to be looked at, and at the rate we're currently disposing of patches this CommitFest will still be ongoing come Easter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugs/slowness inserting and indexing cubes
Alexander Korotkov aekorot...@gmail.com writes: On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: So, I think we should go with your original fix and simply do nothing in gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer. I agree. OK, I won't object. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [Bug fix] postmaster always crashes on a debug version of Windows
Hello I encountered a bug which always causes PostgreSQL to crash on Windows. Attached is a patch that fixes it. Please review it and include it in the upcoming minor releases of supported versions. The following is a bug report. Your name :MauMau Your email address :maumau...@gmail.com System Configuration: - Architecture (example: Intel Pentium) : Intel Xeon Operating System (example: Linux 2.4.18) : I found the bug on Windows 2008 R2 Enterprise x64 Edition, but it should apply to all versions of Windows. PostgreSQL version (example: PostgreSQL 9.1.1): PostgreSQL 8.3.12 However, the problem should happen on all versions of PostgreSQL. Compiler used (example: gcc 3.3.5) : Visual Studio 2005 Professional Edition Please enter a FULL description of your problem: On a debug version of Windows, PostgreSQL server fails to start. When I run pg_ctl start on the command prompt, pg_ctl displays server starting, but the PostgreSQL server does not start. No message is left which tells the cause. pg_ctl does not show any error messages on the command prompt. Even when I include 'eventlog' in log_destination parameter, no error message is recorded in the application log nor the system log of Windows event log. This problem does not occur on a release version of Windows. Please describe a way to repeat the problem. Please try to provide a concise reproducible example, if at all possible: -- On a debug version of Windows, do the following: 1. Run initdb. initdb's option is not relevant. 2. Set logging_collector to on in postgresql.conf. 3. Run pg_ctl start on the command prompt. The problem does not happen when logging_collector is off. If you know how this problem might be fixed, list the solution below: - [Cause] postmaster crashes in the following CloseHandle() call, which is at line 591 in src/backend/postmaster/syslogger.c (in case of PostgreSQL 9.1.) /* Now we are done with the write end of the pipe. */ CloseHandle(syslogPipe[1]); The CloseHandle() crashes because it receives an already closed handle. According the following MSDN article, CloseHandle() causes a crash under a debugger. Though I'm not exactly sure, I guess running on the debug version of Windows is equivalent to running under a debugger. http://msdn.microsoft.com/en-us/library/windows/desktop/ms724211(v=vs.85).aspx [Excerpt] If the application is running under a debugger, the function will throw an exception if it receives either a handle value that is not valid or a pseudo-handle value. This can happen if you close a handle twice, or if you call CloseHandle on a handle returned by the FindFirstFile function instead of calling the FindClose function. Then, where is the handle closed? It was closed in close() at three lines above. The pipe handle is associated with a file descriptor by _open_osfhandle(). Calling close() for the file descriptor also closes the underlying handle. See the MSDN article: http://msdn.microsoft.com/en-us/library/bdts1c9x(v=vs.80).aspx [Excerpt] To close a file opened with _open_osfhandle, call _close. The underlying handle is also closed by a call to _close, so it is not necessary to call the Win32 function CloseHandle on the original handle. [Fix] Remove the unnecessary CloseHandle() call. I sought similar extra closing by checking places where _open_osfhandle() or _get_osfhandle() is used, but there was nothing elsewhere. I attached a patch against the master HEAD. Is it OK with a patch that has CR-LF line endings? Regards postmaster_crash_on_debug_windows.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [HACKERS] [v9.2] Fix Leaky View Problem)
On Wed, Feb 15, 2012 at 6:14 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The attached patch is additional regression tests of ALTER FUNCTION with LEAKPROOF based on your patch. It also moves create_function_3 into the group with create_aggregate and so on. Committed, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)
On 13.02.2012 19:13, Fujii Masao wrote: On Mon, Feb 13, 2012 at 8:37 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13.02.2012 01:04, Jeff Janes wrote: Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD. I have no idea if I did it correctly, in particular if calling GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side effects that make that a bad thing to do. I'm not proposing it as the real fix, I just wanted to get around this problem in order to do more testing. Thanks. That's basically the right approach. Attached patch contains a cleaned up version of that. It does get rid of the there is no contrecord flag errors, but recover still does not work. Now the count of tuples in the table is always correct (I never provoke a crash during the initial table load), but sometimes updates to those tuples that were reported to have been committed are lost. This is more subtle, it does not happen on every crash. It seems that when recovery ends on record with zero length at..., that recovery is correct. But when it ends on invalid magic number in log file.. then the recovery is screwed up. Can you write a self-contained test case for that? I've been trying to reproduce that by running the regression tests and pgbench with a streaming replication standby, which should be pretty much the same as crash recovery. No luck this far. Probably I could reproduce the same problem as Jeff got. Here is the test case: $ initdb -D data $ pg_ctl -D data start $ psql -c create table t (i int); insert into t values(generate_series(1,1)); delete from t $ pg_ctl -D data stop -m i $ pg_ctl -D data start The crash recovery emitted the following server logs: LOG: database system was interrupted; last known up at 2012-02-14 02:07:01 JST LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/179CC90 LOG: invalid magic number in log file 0, segment 1, offset 8060928 LOG: redo done at 0/17AD858 LOG: database system is ready to accept connections LOG: autovacuum launcher started After recovery, I could not see the table t which I created before: $ psql -c select count(*) from t ERROR: relation t does not exist Are you still seeing this failure with the latest patch I posted (http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)? That includes Jeff's fix for the original crash you and Jeff saw. With that version, I can't get a crash anymore. I also can't reproduce the inconsistency that Jeff still saw with his fix (http://archives.postgresql.org/message-id/CAMkU=1zGWp2QnTjiyFe0VMu4gc+MoEexXYaVC2u=+orfiyj...@mail.gmail.com). Jeff, can you clarify if you're still seeing an issue with the latest version of the patch? If so, can you give a self-contained test case for that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_test_fsync performance
On Wed, Feb 15, 2012 at 16:14, Bruce Momjian br...@momjian.us wrote: On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote: On Wed, Feb 15, 2012 at 02:23, Bruce Momjian br...@momjian.us wrote: On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote: On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote: +1, I was about to suggest the same thing. Running any of these tests for a fixed number of iterations will result in drastic degradation of accuracy as soon as the machine's behavior changes noticeably from what you were expecting. Run them for a fixed time period instead. Or maybe do a few, then check elapsed time and estimate a number of iterations to use, if you're worried about the cost of doing gettimeofday after each write. Good idea, and it worked out very well. I changed the -o loops parameter to -s seconds which calls alarm() after (default) 2 seconds, and then once the operation completes, computes a duration per operation. I was kind of wondering how portable alarm() is, and the answer according to the buildfarm is that it isn't. I'm using following simplistic alarm() implementation for win32: https://github.com/markokr/libusual/blob/master/usual/signal.c#L21 this works with fake sigaction()/SIGALARM hack below - to remember function to call. Good enough for simple stats printing, and avoids win32-specific code spreading around. Wow, I wasn't even aware this compiled in Win32; I thought it was ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this work. (Me wonders if the GoGrid Windows images have compilers.) They don't, since most of the compilers people would ask for don't allow that kind of redistribution. Shame. Ping me on im if you need one preconfigured, though... How do you do that? Also, once you create a Windows VM on a public cloud, how do you connect to it? SSH? rdesktop. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Tue, Feb 14, 2012 at 4:29 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: An ack about the way it's now implemented would be awesome I'm still missing that, which is only fair, just a ping from me here. I took a brief look at this just now, and in general I'm pleasantly surprised. But, as you might imagine, I have some reservations: 1. I fear that the collection of commands to which this applies is going to end up being kind of a random selection. I suggest that for the first version of the patch, just pick a couple of simple commands like VACUUM and ANALYZE, and do just those. We can add on more later easily enough once the basic patch is committed. Also, that way, if you don't get to absolutely everything, we'll have a coherent subset of the functionality, rather than a subset defined by what Dimitri got to. 2. Currently, we have some objects (views) that support INSTEAD OF triggers and others (tables) that support BEFORE and AFTER triggers. I don't see any advantage in supporting both. 3. I am not at all convinced that it's safe to allow command triggers to silently (i.e. without throwing an error) cancel the operation. I don't have very much confidence that that is in general a safe thing to do; there may be code calling this code that expects that such things will not happen. This diff hunk, for example, scares the crap out of me: - /* check that the locales can be loaded */ - CommandCounterIncrement(); - (void) pg_newlocale_from_collation(newoid); + /* before or instead of command trigger might have cancelled the comman + if (OidIsValid(newoid)) + { + /* check that the locales can be loaded */ + CommandCounterIncrement(); + (void) pg_newlocale_from_collation(newoid); + } I don't immediately understand why that's necessary, but I'm sure there's a good reason, and I bet a nickel that there are other places where similar adjustments are necessary but you haven't found them. I think we should rip out the option to silently cancel the command. We can always add that later, but if we add it here and in the first commit I think we are going to be chasing bugs for months. 4. This API forces all the work of setting up the CommandContext to be done regardless of whether any command triggers are present: + cmd-objectId = InvalidOid; + cmd-objectname = (char *)aggName; + cmd-schemaname = get_namespace_name(aggNamespace); I'll grant you that most people probably do not execute enough DDL for the cost of those extra get_namespace_name() calls to add up, but I'm not completely sure it's a negligible overhead in general, and in any case I think it's a safe bet that there will be continuing demand to add more information to the set of things that are supplied to the trigger. So I think that should be rethought. It'd be nice to have a cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do stuff ... }, emphasis on cheap. There's a definitional problem here, too, which is that you're supplying to the trigger the aggregate name *as supplied by the user* and the schema name as it exists at the time we do the reverse lookup. That reverse lookup isn't guaranteed to work at all, and it's definitely not guaranteed to produce an answer that's consistent with the aggName field. Maybe there's no better way, but it would at least be better to pass the namespace OID rather than the name. That way, the name lookup can be deferred until we are sure that we actually need to call something. 5. I'm not entirely convinced that it makes sense to support command triggers on commands that affect shared objects. It seems odd that such triggers will fire or not fire depending on which database is currently selected. I think that could lead to user confusion, too. 6. Why do we need all this extra copyfuncs/equalfuncs/outfuncs/readfuncs support? I thought we were dropping passing node trees for 9.2. 7. I don't have a strong feeling on what the psql command should be called, but \dcT seems odd. Why one lower-case letter and one upper-case letter? In general, I like the direction that this is going. But I think it will greatly improve its chances of being successful and relatively non-buggy if we strip it down to something very simple for an initial commit, and then add more stuff piece by piece. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On Mon, Feb 13, 2012 at 20:48, Greg Stark st...@mit.edu wrote: I don't think we should be looking at either CUDA or OpenCL directly. We should be looking for a generic library that can target either and is well maintained and actively developed. I understand your point about using some external library for the primitives, but I don't see why it needs to support both CUDA and OpenCL. Libraries for GPU-accelerated primitives generally target OpenCL *or* CUDA, not both. As far as I understand (and someone correct me if I'm wrong), the difference between them is mostly the API and the fact that CUDA had a head start, and thus a larger developer community around it. (All the early adopters went to CUDA) But OpenCL already acts as an abstraction layer. CUDA is NVIDIA-specific, but OpenCL is supported by AMD, Intel as well as NVIDIA. It's pretty rare for servers to have separate graphics cards, but recent Intel and AMD CPUs already have a GPU included on die, which is another bonus for OpenCL. So I'd say, the way things are heading, it's only a matter of time before OpenCL takes over and there will be little reason to look back. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)
On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13.02.2012 19:13, Fujii Masao wrote: On Mon, Feb 13, 2012 at 8:37 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13.02.2012 01:04, Jeff Janes wrote: Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD. I have no idea if I did it correctly, in particular if calling GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side effects that make that a bad thing to do. I'm not proposing it as the real fix, I just wanted to get around this problem in order to do more testing. Thanks. That's basically the right approach. Attached patch contains a cleaned up version of that. It does get rid of the there is no contrecord flag errors, but recover still does not work. Now the count of tuples in the table is always correct (I never provoke a crash during the initial table load), but sometimes updates to those tuples that were reported to have been committed are lost. This is more subtle, it does not happen on every crash. It seems that when recovery ends on record with zero length at..., that recovery is correct. But when it ends on invalid magic number in log file.. then the recovery is screwed up. Can you write a self-contained test case for that? I've been trying to reproduce that by running the regression tests and pgbench with a streaming replication standby, which should be pretty much the same as crash recovery. No luck this far. Probably I could reproduce the same problem as Jeff got. Here is the test case: $ initdb -D data $ pg_ctl -D data start $ psql -c create table t (i int); insert into t values(generate_series(1,1)); delete from t $ pg_ctl -D data stop -m i $ pg_ctl -D data start The crash recovery emitted the following server logs: LOG: database system was interrupted; last known up at 2012-02-14 02:07:01 JST LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/179CC90 LOG: invalid magic number in log file 0, segment 1, offset 8060928 LOG: redo done at 0/17AD858 LOG: database system is ready to accept connections LOG: autovacuum launcher started After recovery, I could not see the table t which I created before: $ psql -c select count(*) from t ERROR: relation t does not exist Are you still seeing this failure with the latest patch I posted (http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)? Yes. Just to be safe, I again applied the latest patch to HEAD, compiled that and tried the same test. Then unfortunately I got the same failure again. I ran the configure with '--enable-debug' '--enable-cassert' 'CPPFLAGS=-DWAL_DEBUG', and make with -j 2 option. When I ran the test with wal_debug = on, I got the following assertion failure. LOG: INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/197 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t LOG: INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/198 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t TRAP: FailedAssertion(!(((bool) (((void*)((target-tid)) != ((void *)0)) (((target-tid))-ip_posid != 0, File: heapam.c, Line: 5578) LOG: xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0 LOG: xlog bg flush request 0/17B4000; write 0/17B; flush 0/17B LOG: server process (PID 16806) was terminated by signal 6: Abort trap This might be related to the original problem which Jeff and I saw. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitfield and gcc
On Mon, Feb 13, 2012 at 5:38 AM, Marti Raudsepp ma...@juffo.org wrote: On Sat, Feb 11, 2012 at 01:54, Gaetano Mendola mend...@gmail.com wrote: I wonder if somewhere in Postgres source we are relying on the GCC correct behaviour regarding the read-modify-write of bitfield in structures. Probably not. I'm pretty sure that we don't have any bitfields, since not all compilers are happy with them. And it looks like this behavior doesn't affect other kinds of struct fields. We do, actually: see spgist_private.h, itemid.h, regis.h, spell.h, and ts_type.h. And maybe some others. I'm not aware, however, of any cases where we put a lock in the same structure as a bitfield, so I think we might be OK in that regard. But the bit about 64-bit spinlocks next to other stuff is a bit alarming. I continue to be astonished at the degree to which the gcc developers seem not to care about the POLA. Padding out all of our spinlocks to 64 bits would not be free: it would cost us significantly in memory usage, if nothing else. I understand that it's not possible to modify individual bits in a bitfield atomically, but generating a 64-bit-wide read-modify-write when the underlying base type is 4 bytes or less is almost pure evil, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade message
pg_upgrade prints something like this: Restoring user relation files /var/lib/postgresql/8.4/main/base/35338/37229 But it's not actually restoring anything, is it? Maybe transferring would be better? (Or copying/linking, to be more precise.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REASSIGN OWNED lacks support for FDWs
As per http://archives.postgresql.org/pgsql-general/2012-02/msg00304.php there is no switch case in shdepReassignOwned for foreign data wrappers. The obvious short-term answer (and probably the only back-patchable one) is to add a case for that object type. But after all the refactoring that's been done in the general area of this type of command, I'm a bit surprised that shdepReassignOwned still looks like this. Can't we merge this knowledge into someplace where it doesn't have to be maintained separately? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Different gettext domain needed for error context
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: To fix this, we need to somehow pass the caller's text domain to errcontext(). The most straightforward way is to pass it as an extra argument. Ideally, errcontext() would be a macro that passes TEXTDOMAIN to the underlying function, so that you don't need to change all the callers of errcontext(): #define errcontext(...) errcontext_domain(TEXTDOMAIN, ...) But that doesn't work, because it would require varags macros. Anyone know a trick to make that work? This is pretty ugly, but AFAICS it should work: #define errcontext set_errcontext_domain(TEXTDOMAIN), real_errcontext But it might be better in the long run to bite the bullet and make people change all their errcontext calls. There aren't that many, especially not outside the core code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
On Sat, Feb 11, 2012 at 1:36 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Kevin Grittner wrote: Tom Lane wrote: I agree it's a bug that you can do what Kevin's example shows. I'll look at it and see if I can pull together a patch. Attached. Basically, if a GUC has a check function, this patch causes it to be run on a RESET just like it is on a SET, to make sure that the resulting value is valid to set within the context. Some messages needed adjustment. While I was there, I made cod a little more consistent among related GUCs. I also added a little to the regression tests to cover this. This patch makes me a little nervous, because the existing behavior seems to have been coded for quite deliberately. Sadly, I don't see any comments explaining why the RESET case was excluded originally. On the other hand, I can't see what it would break, either. Have you gone through all the check hooks and verified that we're not violating any of their assumptions? I assume that you're thinking we'd only fix this in master? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml
Excerpts from Christoph Berg's message of mié feb 15 12:16:52 -0300 2012: Re: Robert Haas 2012-02-15 ca+tgmozjutszhpxwwzcm4kjjwh__1admo3kosbbq+fhscqp...@mail.gmail.com Fixed, but note that I had to recreate the patch by manual examination. Including it inline tends to garble things. Hmm, I thought I had :set paste and everything... If you copied from a pager, those tend to expand tabs to spaces, so the patch gets mangled at that point. At least less does that. OTOH if you :r the patch file, it works fine. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] client performance v.s. server statistics
So, is it client interface (ODBC, libpq) 's cost mainly due to TCP? The difference as compare to your embedded DB you are seeing is mainly seems to be due to TCP. One optimization you can use is to use Unix-domain socket mode of PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf and other related parameters. I am suggesting you this as earlier you were using embedded DB, so your client/server should be on same machine. If now this is not the case then it will not work. Can you please clarify some more things like 1. After doing sequence scan, do you need all the records in client for which seq. scan is happening. If less records then why you have not created index. 2. What is exact scenario for fetching records From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Zhou Han Sent: Wednesday, February 15, 2012 9:30 AM To: pgsql-hackers@postgresql.org Subject: [HACKERS] client performance v.s. server statistics Hi, I am checking a performance problem encountered after porting old embeded DB to postgreSQL. While the system is real-time sensitive, we are concerning for per-query cost. In our environment sequential scanning (select * from ...) for a table with tens of thousands of record costs 1 - 2 seconds, regardless of using ODBC driver or the timing result shown in psql client (which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking the statistics in pg_stat_statement view, the query costs only less than 100ms. So, is it client interface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples from shared buffers to result sets? Could you experts share your views on this big gap? And any suggestions to optimise? P.S. In our original embeded DB a fastpath interface is provided to read directly from shared memory for the records, thus provides extremely realtime access (of course sacrifice some other features such as consistency). Best regards, Han
Re: [HACKERS] client performance v.s. server statistics
So I want to know what exactly the operations are involved in the server side statistics in EXPLAIN ANALYZE It gives the time for execution of Query on server. According to my knowledge, it doesn't account for data to send over TCP. From: Zhou Han [mailto:zhou...@gmail.com] Sent: Wednesday, February 15, 2012 12:32 PM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] client performance v.s. server statistics Hi, I have tried unix domain socket and the performance is similar with TCP socket. It is MIPS architecture so memory copy to/from kernel can occupy much time, and apparently using unit domain socket has no difference than TCP in terms of memory copy. But it is still unbelievable for the ten-fold gap between the client side statistic and the server side statistics. So I want to know what exactly the operations are involved in the server side statistics in EXPLAIN ANALYZE. May I check the code later on when I get time. For the query itself, it was just for performance comparison. There are other index based queries, which are of course much faster, but still result in similar ten-fold of time gap between client side and server side statistics. I am thinking of non-kernel involved client interface, is there such an option, or do I have to develop one from scratch? Best regards, Han On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila amit.kap...@huawei.com wrote: So, is it client interface (ODBC, libpq) 's cost mainly due to TCP? The difference as compare to your embedded DB you are seeing is mainly seems to be due to TCP. One optimization you can use is to use Unix-domain socket mode of PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf and other related parameters. I am suggesting you this as earlier you were using embedded DB, so your client/server should be on same machine. If now this is not the case then it will not work. Can you please clarify some more things like 1. After doing sequence scan, do you need all the records in client for which seq. scan is happening. If less records then why you have not created index. 2. What is exact scenario for fetching records pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Zhou Han Sent: Wednesday, February 15, 2012 9:30 AM To: pgsql-hackers@postgresql.org Subject: [HACKERS] client performance v.s. server statistics Hi, I am checking a performance problem encountered after porting old embeded DB to postgreSQL. While the system is real-time sensitive, we are concerning for per-query cost. In our environment sequential scanning (select * from ...) for a table with tens of thousands of record costs 1 - 2 seconds, regardless of using ODBC driver or the timing result shown in psql client (which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking the statistics in pg_stat_statement view, the query costs only less than 100ms. rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples from shared buffers to result sets? Could you experts share your views on this big gap? And any suggestions to optimise? P.S. In our original embeded DB a fastpath interface is provided to read directly from shared memory for the records, thus provides extremely realtime access (of course sacrifice some other features such as consistency). Best regards, Han
Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label
On Sun, Feb 5, 2012 at 4:09 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The attached part-1 patch moves related routines from hooks.c to label.c because of references to static variables. I have committed this part. Seems like a better location for that code, anyhow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml
On Wed, Feb 15, 2012 at 1:45 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Christoph Berg's message of mié feb 15 12:16:52 -0300 2012: Re: Robert Haas 2012-02-15 ca+tgmozjutszhpxwwzcm4kjjwh__1admo3kosbbq+fhscqp...@mail.gmail.com Fixed, but note that I had to recreate the patch by manual examination. Including it inline tends to garble things. Hmm, I thought I had :set paste and everything... If you copied from a pager, those tend to expand tabs to spaces, so the patch gets mangled at that point. At least less does that. OTOH if you :r the patch file, it works fine. It is also possible that my email client is to blame. All I know for sure is I had to regenerate the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in AtCleanup_Portals
Pavan Deolasee pavan.deola...@gmail.com writes: On Tue, Feb 7, 2012 at 3:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: Hmm. It works fine if you issue an actual ROLLBACK command there, so my gut reaction is that AbortOutOfAnyTransaction isn't sufficiently duplicating the full-fledged ROLLBACK code path. No time to dig further right now though. It works OK for a ROLLBACK command because we create a new unnamed portal for the ROLLBACK command, silently dropping the old one if it already exists. Since the ROLLBACK command then runs successfully, we don't see the same assertion. Would it be safe to drop FAILED unnamed portals during AbortOutAnyTransaction ? May be it is if we can do that before creating a new portal for ROLLBACK command itself. I poked at this some more, and noticed another dangerous-seeming issue: at the time PortalCleanup is called, if it does get called, the portal's stmts and sourceText are already pointing at garbage. The reason is that exec_simple_query blithely does this: /* * We don't have to copy anything into the portal, because everything * we are passing here is in MessageContext, which will outlive the * portal anyway. */ PortalDefineQuery(portal, NULL, query_string, commandTag, plantree_list, NULL); That comment is a true statement for the normal successful path of control, wherein we reach the PortalDrop a few dozen lines below. But it's not true if the command suffers an error. We will mark the portal PORTAL_FAILED right away (in one of various PG_CATCH blocks) but we don't clean it up until another command arrives, so the current contents of MessageContext are long gone when portal cleanup happens. It seems to me that the most reliable fix for these issues is to institute the same policy for transitioning a portal to FAILED state as we already have for transitions to DONE state, ie, we should run the cleanup hook immediately. IOW it would be a good idea to create a function MarkPortalFailed that is just like MarkPortalDone except for the specific portal state transition, and use that instead of merely setting portal-status = PORTAL_FAILED in error cleanup situations. This would ensure that the cleanup hook gets to run before we possibly cut the portal's memory out from under it. It's more than is probably needed to resolve the specific crash you're reporting, but it seems likely to forestall future issues. I haven't actually tested such a fix yet, but will go do that now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade message
On Wed, Feb 15, 2012 at 07:50:41PM +0200, Peter Eisentraut wrote: pg_upgrade prints something like this: Restoring user relation files /var/lib/postgresql/8.4/main/base/35338/37229 But it's not actually restoring anything, is it? Maybe transferring would be better? (Or copying/linking, to be more precise.) What an excellent idea; I changed it to say link/copy, with the attached, applied patch. The new output is: Creating databases in the new cluster ok Adding support functions to new cluster ok Restoring database schema to new clusterok Removing support functions from new cluster ok -- Linking user relation files ok Setting next OID for new clusterok Creating script to delete old cluster ok -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c new file mode 100644 index 54ee5f0..a1e30b1 *** a/contrib/pg_upgrade/relfilenode.c --- b/contrib/pg_upgrade/relfilenode.c *** transfer_all_new_dbs(DbInfoArr *old_db_a *** 37,43 int old_dbnum, new_dbnum; const char *msg = NULL; ! prep_status(Restoring user relation files\n); /* Scan the old cluster databases and transfer their files */ for (old_dbnum = new_dbnum = 0; --- 37,44 int old_dbnum, new_dbnum; const char *msg = NULL; ! prep_status(%s user relation files\n, ! user_opts.transfer_mode == TRANSFER_MODE_LINK ? Linking : Copying); /* Scan the old cluster databases and transfer their files */ for (old_dbnum = new_dbnum = 0; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Speed up in-memory tuplesorting.
Robert Haas rh...@postgresql.org writes: Speed up in-memory tuplesorting. This patch appears to have broken those members of the buildfarm that use VPATH builds. I assume you didn't think about the path to qsort_tuple.c very carefully. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Speed up in-memory tuplesorting.
On Wed, Feb 15, 2012 at 2:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas rh...@postgresql.org writes: Speed up in-memory tuplesorting. This patch appears to have broken those members of the buildfarm that use VPATH builds. I assume you didn't think about the path to qsort_tuple.c very carefully. So it appears. I copied the rule that modifies errcodes.h, but clearly that's not enough. I think I might need to steal and adapt the following logic from src/backend/catalog/Makefile: # locations of headers that genbki.pl needs to read pg_includes = -I$(top_srcdir)/src/include/catalog -I$(top_builddir)/src/include/catalog I'll go test that now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)
On 15.02.2012 18:52, Fujii Masao wrote: On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Are you still seeing this failure with the latest patch I posted (http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)? Yes. Just to be safe, I again applied the latest patch to HEAD, compiled that and tried the same test. Then unfortunately I got the same failure again. Ok. I ran the configure with '--enable-debug' '--enable-cassert' 'CPPFLAGS=-DWAL_DEBUG', and make with -j 2 option. When I ran the test with wal_debug = on, I got the following assertion failure. LOG: INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/197 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t LOG: INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/198 STATEMENT: create table t (i int); insert into t values(generate_series(1,1)); delete from t TRAP: FailedAssertion(!(((bool) (((void*)((target-tid)) != ((void *)0)) (((target-tid))-ip_posid != 0, File: heapam.c, Line: 5578) LOG: xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0 LOG: xlog bg flush request 0/17B4000; write 0/17B; flush 0/17B LOG: server process (PID 16806) was terminated by signal 6: Abort trap This might be related to the original problem which Jeff and I saw. That's strange. I made a fresh checkout, too, and applied the patch, but still can't reproduce. I used the attached script to test it. It's surprising that the crash happens when the records are inserted, not at recovery. I don't see anything obviously wrong there, so could you please take a look around in gdb and see if you can get a clue what's going on? What's the stack trace? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com fujii-crash.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 13/02/2012 19:48, Greg Stark wrote: I don't think we should be looking at either CUDA or OpenCL directly. We should be looking for a generic library that can target either and is well maintained and actively developed. Any GPU code we write ourselves would rapidly be overtaken by changes in the hardware and innovations in parallel algorithms. If we find a library that provides a sorting api and adapt our code to use it then we'll get the benefits of any new hardware feature as the library adds support for them. I think one option is to make the sort function pluggable with a shared library/dll. I see several benefits from this: - It could be in the interest of the hardware vendor to provide the most powerful sort implementation (I'm sure for example that TBB sort implementation is faster that pg_sort) - It can permit people to play with it without being deep involved in pg development and stuffs. - It can relieve the postgres core group the choose about the right language/tool/implementation to use. - Also for people not willing (or not able for the matter) to upgrade postgres engine to change instead the sort function upon an hardware upgrade. Of course if this happens postgres engine has to make some sort of sanity check (that the function for example actually sorts) before to thrust the plugged sort. The engine can even have multiple sort implementation available and use the most proficient one (imagine some sorts acts better on a certain range value or on certain element size). Regards Gaetano Mendola -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 13/02/2012 19:48, Greg Stark wrote: I don't think we should be looking at either CUDA or OpenCL directly. We should be looking for a generic library that can target either and is well maintained and actively developed. Any GPU code we write ourselves would rapidly be overtaken by changes in the hardware and innovations in parallel algorithms. If we find a library that provides a sorting api and adapt our code to use it then we'll get the benefits of any new hardware feature as the library adds support for them. I think one option is to make the sort function plugable with a shared library/dll. I see several benefits from this: - It could be in the interest of the hardware vendor to provide the most powerful sort implementation (I'm sure for example that TBB sort implementation is faster that pg_sort) - It can permit people to play with it without being deep involved in pg development and stuffs. - It can relieve the postgres core group the choose about the right language/tool/implementation to use. - Also for people not willing (or not able for the matter) to upgrade postgres engine to change instead the sort function upon an hardware upgrade. Of course if this happens postgres engine has to make some sort of sanity check (that the function for example actually sorts) before to thrust the plugged sort. The engine can even have multiple sort implementation available and use the most proficient one (imagine some sorts acts better on a certain range value or on certain element size). Regards Gaetano Mendola -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Hi, Thanks for your time reviewing that patch! Robert Haas robertmh...@gmail.com writes: I took a brief look at this just now, and in general I'm pleasantly surprised. But, as you might imagine, I have some reservations: Good starting point, let's see about the details :) 1. I fear that the collection of commands to which this applies is going to end up being kind of a random selection. I suggest that for the first version of the patch, just pick a couple of simple commands like VACUUM and ANALYZE, and do just those. We can add on more later easily enough once the basic patch is committed. Also, that way, if you don't get to absolutely everything, we'll have a coherent subset of the functionality, rather than a subset defined by what Dimitri got to. I share that feeling here. Now, given the current scope of the patch, I think we can add in 9.2 all the commands that make sense to support (yes, including alter operator family). FWIW, I've been adding support for 16 forms of ALTER commands today, triple (implementation of alter rename, owner and namespace are separated). So while your reaction is perfectly understandable, I don't think that's the main thing here, you've just happened to see an intermediate state of things. 2. Currently, we have some objects (views) that support INSTEAD OF triggers and others (tables) that support BEFORE and AFTER triggers. I don't see any advantage in supporting both. That's because you can cancel an INSERT from a BEFORE trigger, that's how you can write an INSTEAD OF trigger on a TABLE. As a VIEW does not support INSERT (directly), an INSTEAD OF trigger is what makes sense here. I don't think it's possible to transpose that thinking to command triggers, so I've been providing for either INSTEAD OF triggers or BEFORE/AFTER triggers. Note that you can't have both with my current patch. It's a convenience only thing, but I think it's worth having it: you don't need to read the trigger procedure to decide if that BEFORE command trigger is in fact an INSTEAD OF command trigger. Also, the code footprint for this feature is very small. 3. I am not at all convinced that it's safe to allow command triggers to silently (i.e. without throwing an error) cancel the operation. I don't have very much confidence that that is in general a safe thing I was trying to offer a parallel with returning NULL from a BEFORE INSERT trigger, which allows you to cancel it. If that turns out to be so bad an idea that we need to withdraw it, so be it. It's still possible to cancel a command by means of RAISE EXCEPTION in a BEFORE command trigger, or by means of an INSTEAD OF trigger that does nothing. to do; there may be code calling this code that expects that such things will not happen. This diff hunk, for example, scares the crap out of me: [...] I don't immediately understand why that's necessary, but I'm sure there's a good reason, and I bet a nickel that there are other places where similar adjustments are necessary but you haven't found them. I Might be. That idea was sound to me when under the illusion that I wouldn't have to edit each and every command implementation in order to implement command triggers. I'm ok to step back now, but read on. think we should rip out the option to silently cancel the command. We can always add that later, but if we add it here and in the first commit I think we are going to be chasing bugs for months. Ok, I'm going to remove that from the BEFORE command implementation. What about having both INSTEAD OF triggers (the command is not executed) and BEFORE trigger (you can cancel its execution only by raising an exception, and that cancels the whole transaction). I'd like that we'd be able to keep that feature. 4. This API forces all the work of setting up the CommandContext to be done regardless of whether any command triggers are present: + cmd-objectId = InvalidOid; + cmd-objectname = (char *)aggName; + cmd-schemaname = get_namespace_name(aggNamespace); At some point while developing the patch I had something way smarter than that, and centralized. I've not revised the API to get back the smartness when breaking out of the centralized implementation, because those elements only can be set from the innards of each command. The API to list which triggers to run already exists and only need the command tag, which we might have earlier in the processing. Note that we have to deal with commands acting on more than one object, as in RemoveObjects() where we loop over a list and call triggers each time: https://github.com/dimitri/postgres/compare/master...command_triggers#diff-19 The best I can think of would be to build the list of triggers to call as soon as we have the command tag, and skip preparing the rest of the CommandContextData structure when that list comes empty. Still no general stop-gap, but that would address your point I think. Now, about INSTEAD OF command triggers,
[HACKERS] Designing an extension for feature-space similarity search
[Preamble: I've been told that the hackers list is appropriate for extension-related topics like this, even if it's not about contributing to core. If I'm misappropriating, please let me know.] Goal: Personalized, context-relevant query results We are building a deeply personalized site; think OKCupid for product recommendations or Pinterest for people with your tastes. We use psych research to measure and predict your personality and traits along a number of scales (dimensions), and then we connect you with people, products and content we think you'll like. I won't go into the design history, but you can read a little here: http://parapoetica.wordpress.com/2012/02/15/feature-space-similarity-search-in-postgresql/ Suffice to say, this ends up needing something like KNN-GiST cubes, only: - The overall concept is more like N-dimensional vectors than cubes - But a dimension might be in any domain, not just floats - All vectors have the same number of dimensions with the same meanings - The distance along each dimension is a domain-specific function - NULLs are allowed (the distance function will handle the semantics) - The distance between two vectors is a function that aggregates the distances of each dimension, along with arbitrary other arguments - for instances, it might take the weighted average of the dimensions That aggregation (which may not literally be an aggregate; I'm not sure yet) needs to happen in a SELECT list, which means it needs to be fast, which means all this (or at least much of it) has to be C. The simplest thing that works is probably to hack up the cube extension, declare that everything (except inner pages) must be a zero-volume cube (cube_is_point()), map our non-float features onto floats somehow, and hard-code all the distance functions and the aggregation function. But I think this sort of similarity-search engine has general utility, and I also want to make it easy for us to add and subtract dimensions without too much pain; that should be DDL, not code. So thinking about how this might evolve... - I'm not sure how to represent arbitrary column-like features without reinventing the wheel and putting a database in the database. hstore only stores text, probably for this reason; I took a look at the earlier json patch and saw that it handled only a few core data types. Have there been any other PoCs that involved collections of hetereogenous data? I almost want an actual instance of an anyarray. - Alternatively, is there a way to index an entire, arbitrary row, rather than on a column on that row? I'm fine with this extension requiring its own table, so I leave the data where it is in the row, and only worry about indexing it. I can't just use functional indexes, because I'll need to provide operators and support functions to GiST. Maybe I have a fake sentinel column, where all the operators use SPI to introspect the row, treat each column as a feature dimension, call the underlying operators on each column's data type, etc. - Can domains have operators, or are operators defined on types? - Does KNN-GiST run into problems when - returns values that don't make sense in the physical world? For instance, let's say NULL - NULL returns a distance of 1.0. That means that NULL1 - NULL2 = 1.0, and NULL2 - NULL3 = 1.0, but NULL1 - NULL3 = 1.0 as well. I think that's fine - that could even describe a triangle - but my spidey sense is tingling on this. - Are there previous discussions, patches, abandoned projects, etc. that this reminds you of and that I should go research? Thanks for any thoughts, and I'd love collaborators or even mentors - we plan to open source whatever we produce here, and I don't have quite the theoretical background it takes to do this properly. Jay Levitt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Speed up in-memory tuplesorting.
On Wed, Feb 15, 2012 at 2:54 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Feb 15, 2012 at 2:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas rh...@postgresql.org writes: Speed up in-memory tuplesorting. This patch appears to have broken those members of the buildfarm that use VPATH builds. I assume you didn't think about the path to qsort_tuple.c very carefully. So it appears. I copied the rule that modifies errcodes.h, but clearly that's not enough. I think I might need to steal and adapt the following logic from src/backend/catalog/Makefile: # locations of headers that genbki.pl needs to read pg_includes = -I$(top_srcdir)/src/include/catalog -I$(top_builddir)/src/include/catalog I'll go test that now. Nope, that's not it. Committed what seems to be the correct fix, after testing it locally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Wed, Feb 15, 2012 at 3:25 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: 1. I fear that the collection of commands to which this applies is going to end up being kind of a random selection. I suggest that for the first version of the patch, just pick a couple of simple commands like VACUUM and ANALYZE, and do just those. We can add on more later easily enough once the basic patch is committed. Also, that way, if you don't get to absolutely everything, we'll have a coherent subset of the functionality, rather than a subset defined by what Dimitri got to. I share that feeling here. Now, given the current scope of the patch, I think we can add in 9.2 all the commands that make sense to support (yes, including alter operator family). FWIW, I've been adding support for 16 forms of ALTER commands today, triple (implementation of alter rename, owner and namespace are separated). So while your reaction is perfectly understandable, I don't think that's the main thing here, you've just happened to see an intermediate state of things. I'm just saying that nobody's realistically going to be able to verify a patch of this size. It's either going to get committed warts and all, or it's going to not get committed. Decomposing it into a series of patches would make it possible to actually verify the logic. I guess on reflection I don't really care whether you decompose it at this point; the parts are pretty independent and it's easy enough to revert pieces of it. But if I commit any of this it's certainly not going to be the whole thing in one go. It's still possible to cancel a command by means of RAISE EXCEPTION in a BEFORE command trigger, or by means of an INSTEAD OF trigger that does nothing. I think that there is no problem with cancelling a command via RAISE EXCEPTION. It's an established precedent that errors can be thrown anywhere, and any code that doesn't deal with that is flat broken. But I think letting either a BEFORE or INSTEAD trigger cancel the command is going to break things, and shouldn't be allowed without a lot of careful study. So -1 from me on supporting INSTEAD triggers in the first version of this. I'll grant you that most people probably do not execute enough DDL for the cost of those extra get_namespace_name() calls to add up, but I'm not completely sure it's a negligible overhead in general, and in any case I think it's a safe bet that there will be continuing demand to add more information to the set of things that are supplied to the trigger. So I think that should be rethought. It'd be nice to have a cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do stuff ... }, emphasis on cheap. There's a definitional problem here, It's as cheap as scanning a catalog, as of now. I didn't install a new catalog cache for command triggers, as I don't think this code path is performance critical enough to pay back for the maintenance cost. Also consider that a big user of command triggers might have as much as a couple of triggers (BEFORE/AFTER) per command, that's about 300 of them. Yowza. A catalog scan is WAY more expensive than a syscache lookup. I definitely don't think you can afford to have every command result in an extra index probe into pg_cmdtrigger. You definitely need some kind of caching there. Or at least, I think you do. You could try pgbench -f foo.sql, where foo.sql repeatedly creates and drops a function. See if there's a significant slowdown with your patch vs. HEAD. If there is, you need some caching. You might actually need some whole new type of sinval message to make this work efficiently. too, which is that you're supplying to the trigger the aggregate name *as supplied by the user* and the schema name as it exists at the time we do the reverse lookup. That reverse lookup isn't guaranteed to work at all, and it's definitely not guaranteed to produce an answer that's consistent with the aggName field. Maybe there's no better way, but it would at least be better to pass the namespace OID rather than the name. That way, the name lookup can be deferred until we are sure that we actually need to call something. I'm confused here, because all error messages that needs to contain the namespace are doing exactly the same thing as I'm doing in my patch. Hmm. I wonder what happens if those errors fire after the schema has been dropped? I suppose the real answer here is probably to add enough locking that that can't happen in the first place... so maybe this isn't an issue for your patch to worry about. 5. I'm not entirely convinced that it makes sense to support command triggers on commands that affect shared objects. It seems odd that such triggers will fire or not fire depending on which database is currently selected. I think that could lead to user confusion, too. You mean tablespace here, I guess, what else? I don't think I've added other shared objects in there yet. I share your
Re: [HACKERS] Command Triggers
Robert Haas robertmh...@gmail.com writes: I'm just saying that nobody's realistically going to be able to verify a patch of this size. It's either going to get committed warts and all, or it's going to not get committed. Decomposing it into a series of patches would make it possible to actually verify the logic. I guess on reflection I don't really care whether you decompose it at this point; the parts are pretty independent and it's easy enough to revert pieces of it. But if I commit any of this it's certainly not going to be the whole thing in one go. Ok, I can perfectly understand that. The principled implementation is not saving us here, we still need to review each call site. The way I read your comment, I continue working on my big patch and we'll see what pieces get in, right? I think that there is no problem with cancelling a command via RAISE EXCEPTION. It's an established precedent that errors can be thrown anywhere, and any code that doesn't deal with that is flat broken. Sure. But I think letting either a BEFORE or INSTEAD trigger cancel the command is going to break things, and shouldn't be allowed without a lot of careful study. So -1 from me on supporting INSTEAD triggers in the first version of this. I'm sad about that, but I hear you. Removing. Yowza. A catalog scan is WAY more expensive than a syscache lookup. I definitely don't think you can afford to have every command result in an extra index probe into pg_cmdtrigger. You definitely need some kind of caching there. Or at least, I think you do. You could try pgbench -f foo.sql, where foo.sql repeatedly creates and drops a function. See if there's a significant slowdown with your patch vs. HEAD. If there is, you need some caching. You might actually need some whole new type of sinval message to make this work efficiently. Ok, I will test that down the road (before the end of this week). I'm confused here, because all error messages that needs to contain the namespace are doing exactly the same thing as I'm doing in my patch. Hmm. I wonder what happens if those errors fire after the schema has been dropped? I suppose the real answer here is probably to add enough locking that that can't happen in the first place... so maybe this isn't an issue for your patch to worry about. I get it that I continue doing things this way, get_namespace_name() and friends are trustworthy as far as I'm concerned. You mean tablespace here, I guess, what else? I don't think I've added other shared objects in there yet. I share your analyze btw, will remove support. And databases. Right, on their way. something else, e.g. \dct would be your choice here? Yeah, probably. Next version will sports that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
Robert Haas robertmh...@gmail.com wrote: On Sat, Feb 11, 2012 at 1:36 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Kevin Grittner wrote: Tom Lane wrote: I agree it's a bug that you can do what Kevin's example shows. I'll look at it and see if I can pull together a patch. Attached. Basically, if a GUC has a check function, this patch causes it to be run on a RESET just like it is on a SET, to make sure that the resulting value is valid to set within the context. Some messages needed adjustment. While I was there, I made cod a little more consistent among related GUCs. I also added a little to the regression tests to cover this. This patch makes me a little nervous, because the existing behavior seems to have been coded for quite deliberately. It does, although I'm not clear *why* it was. I suspect it may have been based on an assumption that whatever value is in the reset_val field had to have been already determined to be good, so it was a waste of cycles to check it again -- without considering that the validity of making a change might depend on context. Sadly, I don't see any comments explaining why the RESET case was excluded originally. That is unfortunate. I guess it points out the value of adding a comment to point out why we would want to check these values even on a reset to a previously-used value. On the other hand, I can't see what it would break, either. Have you gone through all the check hooks and verified that we're not violating any of their assumptions? I studied the code enough to be convinced that the patch as it stands can't break a check hook which only validates the value and/or changes the value to a canonical form. There appear to be 34 check hooks, and I reviewed the 10 in the variable.c file, although not at great depth. I could set aside some time this weekend to look at all of them, in depth, if you think that is warranted. I do think that a check hook would have to be doing something which is probably more appropriate for an assign hook to cause trouble, but I can't swear that that isn't happening without spending about a full day in reviewing it. I assume that you're thinking we'd only fix this in master? Without this, I don't think it's possible for someone to enforce protection of their data through SSI in an ironclad way. So there is at least some case to be made to take it back as far as 9.1. I don't think it makes sense to take it further, because read-only was broken in other ways before 9.1, and I'm not aware of specific threats further back. On the other hand, it is a change in behavior with at least some chance to break code which is functioning as intended, so it's a pretty marginal candidate for back-patching from that point of view. I don't think a decision either way on that would be crazy. Personally I would hope to see it included in a 9.1 patch, perhaps after some settling time on master, but totally understand if the consensus is to just patch master. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
Kevin Grittner kevin.gritt...@wicourts.gov writes: Robert Haas robertmh...@gmail.com wrote: This patch makes me a little nervous, because the existing behavior seems to have been coded for quite deliberately. It does, although I'm not clear *why* it was. I suspect it may have been based on an assumption that whatever value is in the reset_val field had to have been already determined to be good, so it was a waste of cycles to check it again -- without considering that the validity of making a change might depend on context. Yes, I'm inclined to think the same, although obviously we need to review the patch carefully. The GUC code is a bit ticklish. The main thing I would be worried about is whether you're sure that you have separated the RESET-as-a-command case from the cases where we actually are rolling back to a previous state. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
Tom Lane t...@sss.pgh.pa.us wrote: The main thing I would be worried about is whether you're sure that you have separated the RESET-as-a-command case from the cases where we actually are rolling back to a previous state. I will double-check that, and make sure there is regression test coverage of that case. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 15 February 2012 20:00, Gaetano Mendola mend...@gmail.com wrote: On 13/02/2012 19:48, Greg Stark wrote: I don't think we should be looking at either CUDA or OpenCL directly. We should be looking for a generic library that can target either and is well maintained and actively developed. Any GPU code we write ourselves would rapidly be overtaken by changes in the hardware and innovations in parallel algorithms. If we find a library that provides a sorting api and adapt our code to use it then we'll get the benefits of any new hardware feature as the library adds support for them. I think one option is to make the sort function pluggable with a shared library/dll. I see several benefits from this: - It could be in the interest of the hardware vendor to provide the most powerful sort implementation (I'm sure for example that TBB sort implementation is faster that pg_sort) - It can permit people to play with it without being deep involved in pg development and stuffs. Sorry, but I find it really hard to believe that the non-availability of pluggable sorting is what's holding people back here. Some vanguard needs to go and prove the idea by building a rough prototype before we can even really comment on what an API should look like. For example, I am given to understand that GPUs generally sort using radix sort - resolving the impedance mismatch that prevents someone from using a non-comparison based sort sure sounds like a lot of work for an entirely speculative reward. Someone who cannot understand tuplesort, which is not all that complicated, has no business trying to build GPU sorting into Postgres. I had a patch committed a few hours ago that almost included the capability of assigning an alternative sorting function, but only one with the exact same signature as my variant of qsort_arg. pg_qsort isn't used to sort tuples at all, by the way. Threading building blocks is not going to form the basis of any novel sorting implementation, because comparators in general are not thread safe, and it isn't available on all the platforms we support, and because of how longjmp interacts with C++ stack unwinding and so on and so on. Now, you could introduce some kind of parallelism into sorting integers and floats, but that's an awful lot of work for a marginal reward. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Google Summer of Code? Call for mentors.
Hackers, The call is now open for Google Summer of Code. If you are interested in being a GSoC mentor this summer, please reply to this email. I want to gauge whether or not we should participate this summer. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Wed, Feb 15, 2012 at 4:32 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Ok, I can perfectly understand that. The principled implementation is not saving us here, we still need to review each call site. The way I read your comment, I continue working on my big patch and we'll see what pieces get in, right? Yeah, I think that makes sense. I'll have a look at your next version when it shows up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
On Wed, Feb 15, 2012 at 4:47 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: That is unfortunate. I guess it points out the value of adding a comment to point out why we would want to check these values even on a reset to a previously-used value. +1 for such a comment. I assume that you're thinking we'd only fix this in master? Without this, I don't think it's possible for someone to enforce protection of their data through SSI in an ironclad way. So there is at least some case to be made to take it back as far as 9.1. I'm OK with that, but perhaps the only-tangentially-related changes where you swap the order of certain error messages ought to be separated out and committed only to master? That stuff doesn't seem like material for a back-patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Designing an extension for feature-space similarity search
Jay Levitt jay.lev...@gmail.com writes: - I'm not sure how to represent arbitrary column-like features without reinventing the wheel and putting a database in the database. ISTM you could define a composite type and then create operators and an operator class over that type. If you were trying to make a btree opclass there might be a conflict with the built-in record_ops opclass, but since you're only interested in GIST I don't see any real roadblocks. The main potential disadvantage of this is that you'd have the standard tuple header as overhead in index entries --- but maybe the entries are large enough that that doesn't matter, and in any case you could probably make use of the GIST compress method to get rid of most of the header. Maybe convert to MinimalTuple, for instance, if you want to still be able to leverage existing support code for field extraction. - Can domains have operators, or are operators defined on types? I think the current state of play is that you can have such things but the system will only consider them for exact type matches, so you might need more explicit casts than you ordinarily would. However, we only support domains over base types not composites, so this isn't really going to be a profitable direction for you anyway. - Does KNN-GiST run into problems when - returns values that don't make sense in the physical world? Wouldn't surprise me. In general, non-strict index operators are a bad idea. However, if the indexed entities are records, it would be entirely your own business how you handled individual fields being NULL. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] run GUC check hooks on RESET
Robert Haas robertmh...@gmail.com wrote: On Wed, Feb 15, 2012 at 4:47 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: That is unfortunate. I guess it points out the value of adding a comment to point out why we would want to check these values even on a reset to a previously-used value. +1 for such a comment. Will do. I assume that you're thinking we'd only fix this in master? Without this, I don't think it's possible for someone to enforce protection of their data through SSI in an ironclad way. So there is at least some case to be made to take it back as far as 9.1. I'm OK with that, but perhaps the only-tangentially-related changes where you swap the order of certain error messages ought to be separated out and committed only to master? That stuff doesn't seem like material for a back-patch. Agreed. I'm not sure we want to change the message text at all in 9.1. Translations and all that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Notes about fixing regexes and UTF-8 (yet again)
In bug #6457 it's pointed out that we *still* don't have full functionality for locale-dependent regexp behavior with UTF8 encoding. The reason is that there's old crufty code in regc_locale.c that only considers character codes up to 255 when searching for characters that should be considered letters, digits, etc. We could fix that, for some value of fix, by iterating up to perhaps 0x when dealing with UTF8 encoding, but the time that would take is unappealing. Especially so considering that this code is executed afresh anytime we compile a regex that requires locale knowledge. I looked into the upstream Tcl code and observed that they deal with this by having hard-wired tables of which Unicode code points are to be considered letters etc. The tables are directly traceable to the Unicode standard (they provide a script to regenerate them from files available from unicode.org). Nonetheless, I do not find that approach appealing, mainly because we'd be risking deviating from the libc locale code's behavior within regexes when we follow it everywhere else. It seems entirely likely to me that a particular locale setting might consider only some of what Unicode says are letters to be letters. However, we could possibly compromise by using Unicode-derived tables as a guide to which code points are worth probing libc for. That is, assume that a utf8-based locale will never claim that some code is a letter that unicode.org doesn't think is a letter. That would cut the number of required probes by a pretty large factor. The other thing that seems worth doing is to install some caching. We could presumably assume that the behavior of iswupper() et al are fixed for the duration of a database session, so that we only need to run the probe loop once when first asked to create a cvec for a particular category. Thoughts, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 15/02/2012 23:11, Peter Geoghegan wrote: On 15 February 2012 20:00, Gaetano Mendolamend...@gmail.com wrote: On 13/02/2012 19:48, Greg Stark wrote: I don't think we should be looking at either CUDA or OpenCL directly. We should be looking for a generic library that can target either and is well maintained and actively developed. Any GPU code we write ourselves would rapidly be overtaken by changes in the hardware and innovations in parallel algorithms. If we find a library that provides a sorting api and adapt our code to use it then we'll get the benefits of any new hardware feature as the library adds support for them. I think one option is to make the sort function pluggable with a shared library/dll. I see several benefits from this: - It could be in the interest of the hardware vendor to provide the most powerful sort implementation (I'm sure for example that TBB sort implementation is faster that pg_sort) - It can permit people to play with it without being deep involved in pg development and stuffs. Sorry, but I find it really hard to believe that the non-availability of pluggable sorting is what's holding people back here. Some vanguard needs to go and prove the idea by building a rough prototype before we can even really comment on what an API should look like. For example, I am given to understand that GPUs generally sort using radix sort - resolving the impedance mismatch that prevents someone from using a non-comparison based sort sure sounds like a lot of work for an entirely speculative reward. AFAIK thrust library uses the radix sort if the keys you are sorting are POD data comparable with a operator otherwise it does the comparison based sort using the operator provided. http://docs.thrust.googlecode.com/hg/modules.html I'm not saying that the non-availability of pluggable sort completely holds people back, I'm saying that it will simplify the process now and int the future, of course that's my opinion. Someone who cannot understand tuplesort, which is not all that complicated, has no business trying to build GPU sorting into Postgres. That sounds a bit harsh. I'm one of those indeed, I haven't look in the details not having enough time for it. At work we do GPU computing (not the sort type stuff) and given the fact I'm a Postgres enthusiast I asked my self: my server is able to sort around 500 milions integer per seconds, if postgres was able to do that as well it would be very nice. What I have to say? Sorry for my thoughts. I had a patch committed a few hours ago that almost included the capability of assigning an alternative sorting function, but only one with the exact same signature as my variant of qsort_arg. pg_qsort isn't used to sort tuples at all, by the way. Then I did look in the wrong direction. Thank you for point that out. Threading building blocks is not going to form the basis of any novel sorting implementation, because comparators in general are not thread safe, and it isn't available on all the platforms we support, and because of how longjmp interacts with C++ stack unwinding and so on and so on. Now, you could introduce some kind of parallelism into sorting integers and floats, but that's an awful lot of work for a marginal reward. The TBB was just example that did come in my mind. What do you mean with you could introduce some kind of parallelism? As far as I know any algorithm using the divide and conquer can be parallelized. Regards Gaetano Mendola -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 15/02/2012 23:11, Peter Geoghegan wrote: On 15 February 2012 20:00, Gaetano Mendolamend...@gmail.com wrote: On 13/02/2012 19:48, Greg Stark wrote: I don't think we should be looking at either CUDA or OpenCL directly. We should be looking for a generic library that can target either and is well maintained and actively developed. Any GPU code we write ourselves would rapidly be overtaken by changes in the hardware and innovations in parallel algorithms. If we find a library that provides a sorting api and adapt our code to use it then we'll get the benefits of any new hardware feature as the library adds support for them. I think one option is to make the sort function pluggable with a shared library/dll. I see several benefits from this: - It could be in the interest of the hardware vendor to provide the most powerful sort implementation (I'm sure for example that TBB sort implementation is faster that pg_sort) - It can permit people to play with it without being deep involved in pg development and stuffs. Sorry, but I find it really hard to believe that the non-availability of pluggable sorting is what's holding people back here. Some vanguard needs to go and prove the idea by building a rough prototype before we can even really comment on what an API should look like. For example, I am given to understand that GPUs generally sort using radix sort - resolving the impedance mismatch that prevents someone from using a non-comparison based sort sure sounds like a lot of work for an entirely speculative reward. AFAIK thrust library uses the radix sort if the keys you are sorting are POD data comparable with a operator otherwise it does the comparison based sort using the operator provided. http://docs.thrust.googlecode.com/hg/modules.html I'm not saying that the non-availability of pluggable sort completely holds people back, I'm saying that it will simplify the process now and int the future, of course that's my opinion. Someone who cannot understand tuplesort, which is not all that complicated, has no business trying to build GPU sorting into Postgres. That sounds a bit harsh. I'm one of those indeed, I haven't look in the details not having enough time for it. At work we do GPU computing (not the sort type stuff) and given the fact I'm a Postgres enthusiast I asked my self: my server is able to sort around 500 milions integer per seconds, if postgres was able to do that as well it would be very nice. What I have to say? Sorry for my thoughts. I had a patch committed a few hours ago that almost included the capability of assigning an alternative sorting function, but only one with the exact same signature as my variant of qsort_arg. pg_qsort isn't used to sort tuples at all, by the way. Then I did look in the wrong direction. Thank you for point that out. Threading building blocks is not going to form the basis of any novel sorting implementation, because comparators in general are not thread safe, and it isn't available on all the platforms we support, and because of how longjmp interacts with C++ stack unwinding and so on and so on. Now, you could introduce some kind of parallelism into sorting integers and floats, but that's an awful lot of work for a marginal reward. The TBB was just example that did come in my mind. What do you mean with you could introduce some kind of parallelism? As far as I know any algorithm using the divide and conquer can be parallelized. Regards Gaetano Mendola -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On 15 February 2012 15:27, Robert Haas robertmh...@gmail.com wrote: I am inclined to agree that given that we already use Perl to generate source code like this, it seems natural that we should prefer to do that, if only to avoid paranoia about the inclusion of a dial-a-bloat knob. I am at a disadvantage here, since I've never written a line of Perl. I think it's still dial-a-bloat, but I feel pretty comfortable about how we've got that knob adjusted in this version. It's almost as much improvement as any previous version, it applies to more cases, and the code footprint is the least of any version I've measured. I'm happy that the principle that a dial-a-bloat knob isn't necessarily a bad thing has been accepted, though that term is kind of pejorative in that it implies that the knob necessarily adds bloat to the binary. I define bloat here as the addition of dead instructions to the binary, or at least code that doesn't pull its weight. Clearly, that isn't the case here, and I suspect that we will find that it isn't the case in other places too. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
On 15 February 2012 22:54, Gaetano Mendola mend...@gmail.com wrote: That sounds a bit harsh. I'm one of those indeed, I haven't look in the details not having enough time for it. At work we do GPU computing (not the sort type stuff) and given the fact I'm a Postgres enthusiast I asked my self: my server is able to sort around 500 milions integer per seconds, if postgres was able to do that as well it would be very nice. What I have to say? Sorry for my thoughts. I'm not trying to sound harsh. The only reason that my patch *nearly* had support for this was because the implementation that we nearly went with would have only needed another couple of lines of code to support it. It very probably wouldn't have turned out to have been useful for any novel sorting idea, and was really only intended to be used to support user-defined full sorting specialisations. That didn't end up making the cut. My point is that whatever is holding back the development of a useful prototype here, it definitely isn't the lack of an existing API. We don't know what such an API should look like, and just how invasive it needs to be. More importantly, it remains to be seen how useful this idea is in the real world - we don't have so much as a synthetic test case with a single client, as far as I'm aware. I'd encourage the OP to share his work on github or something along those lines. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Gaetano Mendola Sent: Wednesday, February 15, 2012 2:54 PM To: Peter Geoghegan; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] CUDA Sorting On 15/02/2012 23:11, Peter Geoghegan wrote: On 15 February 2012 20:00, Gaetano Mendolamend...@gmail.com wrote: On 13/02/2012 19:48, Greg Stark wrote: I don't think we should be looking at either CUDA or OpenCL directly. We should be looking for a generic library that can target either and is well maintained and actively developed. Any GPU code we write ourselves would rapidly be overtaken by changes in the hardware and innovations in parallel algorithms. If we find a library that provides a sorting api and adapt our code to use it then we'll get the benefits of any new hardware feature as the library adds support for them. I think one option is to make the sort function pluggable with a shared library/dll. I see several benefits from this: - It could be in the interest of the hardware vendor to provide the most powerful sort implementation (I'm sure for example that TBB sort implementation is faster that pg_sort) - It can permit people to play with it without being deep involved in pg development and stuffs. Sorry, but I find it really hard to believe that the non-availability of pluggable sorting is what's holding people back here. Some vanguard needs to go and prove the idea by building a rough prototype before we can even really comment on what an API should look like. For example, I am given to understand that GPUs generally sort using radix sort - resolving the impedance mismatch that prevents someone from using a non-comparison based sort sure sounds like a lot of work for an entirely speculative reward. AFAIK thrust library uses the radix sort if the keys you are sorting are POD data comparable with a operator otherwise it does the comparison based sort using the operator provided. http://docs.thrust.googlecode.com/hg/modules.html I'm not saying that the non-availability of pluggable sort completely holds people back, I'm saying that it will simplify the process now and int the future, of course that's my opinion. Someone who cannot understand tuplesort, which is not all that complicated, has no business trying to build GPU sorting into Postgres. That sounds a bit harsh. I'm one of those indeed, I haven't look in the details not having enough time for it. At work we do GPU computing (not the sort type stuff) and given the fact I'm a Postgres enthusiast I asked my self: my server is able to sort around 500 milions integer per seconds, if postgres was able to do that as well it would be very nice. What I have to say? Sorry for my thoughts. I had a patch committed a few hours ago that almost included the capability of assigning an alternative sorting function, but only one with the exact same signature as my variant of qsort_arg. pg_qsort isn't used to sort tuples at all, by the way. Then I did look in the wrong direction. Thank you for point that out. Threading building blocks is not going to form the basis of any novel sorting implementation, because comparators in general are not thread safe, and it isn't available on all the platforms we support, and because of how longjmp interacts with C++ stack unwinding and so on and so on. Now, you could introduce some kind of parallelism into sorting integers and floats, but that's an awful lot of work for a marginal reward. The TBB was just example that did come in my mind. What do you mean with you could introduce some kind of parallelism? As far as I know any algorithm using the divide and conquer can be parallelized. Radix sorting can be used for any data type, if you create a callback that provides the most significant bits in width buckets. At any rate, I can't imagine why anyone would want to complain about sorting 40 times faster than before, considering the amount of time database spend in ordering data. I have a Cuda card in this machine (NVIDIA GeForce GTX 460) and I would not mind it a bit if my database ORDER BY clause suddenly started running ten times faster than before when I am dealing with a huge volume of data. There have been other experiments along these lines such as: GPU-based Sorting in PostgreSQL Naju Mancheril, School of Computer Science - Carnegie Mellon University www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf (This is for SQLite, but the grammar of SQLite is almost a pure subset of PostgreSQL, including things like vacuum...) http://wiki.postgresql.org/images/6/65/Pgopencl.pdf http://dl.acm.org/citation.cfm?id=1807207 http://www.scribd.com/doc/51484335/PostgreSQL-OpenCL-Procedural-Language-pgEast-March-2011 See also http://highscalability.com/scaling-postgresql-using-cuda -- Sent via pgsql-hackers mailing list
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Hello, sorry for long absense. As far as I see, on an out-of-memory in getAnotherTuple() makes conn-result-resultStatus = PGRES_FATAL_ERROR and qpParseInputp[23]() skips succeeding 'D' messages consequently. When exception raised within row processor, pg_conn-inCursor always positioned in consistent and result-resultStatus == PGRES_TUPLES_OK. The choices of the libpq user on that point are, - Continue to read succeeding tuples. Call PQgetResult() to read 'D' messages and hand it to row processor succeedingly. - Throw away the remaining results. Call pqClearAsyncResult() and pqSaveErrorResult(), then call PQgetResult() to skip over the succeeding 'D' messages. (Of course the user can't do that on current implement.) To make the users able to select the second choice (I think this is rather major), we should only provide and export the new PQ* function to do that, I think. void PQskipRemainingResult(PGconn *conn) { pqClearAsyncResult(conn); /* conn-result is always NULL here */ pqSaveErrorResult(conn); /* Skip over remaining 'D' messages. * / PQgetResult(conn); } User may write code with this function. ... PG_TRY(); { ... res = PQexec(conn, ); ... } PG_CATCH(); { PQskipRemainingResult(conn); goto error; } PG_END_TRY(); Of cource, this is applicable to C++ client in the same manner. try { ... res = PQexec(conn, ); ... } catch (const myexcep ex) { PQskipRemainingResult(conn); throw ex; } By the way, where should I insert this function ? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Google Summer of Code? Call for mentors.
Hello, Josh. You wrote: JB Hackers, JB The call is now open for Google Summer of Code. JB If you are interested in being a GSoC mentor this summer, please reply JB to this email. I want to gauge whether or not we should participate JB this summer. JB -- JB Josh Berkus JB PostgreSQL Experts Inc. JB http://pgexperts.com What are the requirements for mentors? -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers