Re: [HACKERS] gaussian distribution pgbench
Hi, 2014-07-04 19:05 GMT+09:00 Andres Freund : > On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote: > > > > >Yea. I certainly disagree with the patch in it's current state because > it > > >copies the same 15 lines several times with a two word difference. > > >Independent of whether we want those options, I don't think that's going > > >to fly. > > > > I liked a simple static string for the different variants, which means > > replication. Factorizing out the (large) common part will mean malloc & > > sprintf. Well, why not. > > It sucks from a maintenance POV. And I don't see the overhead of malloc > being relevant here... > > > >>OTOH, we've almost reached the consensus that supporting gaussian > > >>and exponential options in \setrandom. So I think that you should > > >>separate those two features into two patches, and we should apply > > >>the \setrandom one first. Then we can discuss whether the other patch > > >>should be applied or not. > > > > >Sounds like a good plan. > > > > Sigh. I'll do that as it seems to be a blocker... > I still agree with Fabien-san. I cannot understand why our logical proposal isn't accepted... I think we also need documentation about the actual mathematical > behaviour of the randomness generators. > > + > > + With the gaussian option, the larger the > threshold, > > + the more frequently values close to the middle of the interval > are drawn, > > + and the less frequently values close to the min > and > > + max bounds. > > + In other worlds, the larger the threshold, > > + the narrower the access range around the middle. > > + the smaller the threshold, the smoother the access pattern > > + distribution. The minimum threshold is 2.0 for performance. > > + > > The only way to actually understand the distribution here is to create a > table, insert random values, and then look at the result. That's not a > good thing. > That's right. Therefore, we create command line option to easy to understand parametrized Gaussian distribution. When you want to know the parameter of distribution, you can use command line option like under followings. [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10 starting vacuum...end. transaction type: Exponential distribution TPC-B (sort of) scaling factor: 1 exponential threshold: 10.0 decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0% highest/lowest percent of the range: 9.5% 0.0% [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5 starting vacuum...end. transaction type: Exponential distribution TPC-B (sort of) scaling factor: 1 exponential threshold: 5.0 decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4% highest/lowest percent of the range: 4.9% 0.0% If you have a better method than our method, please share us. > > The caveat that I have is that without these options there is: > > > > (1) no return about the actual distributions in the final summary, which > > depend on the threshold value, and > > > > (2) no included mean to test the feature, so the first patch is less > > meaningful if the feature cannot be used simply and require a custom > script. > > I personally agree that we likely want that as an additional > feature. Even if just because it makes the results easier to compare. > If we can do positive and logical discussion, I will agree with the proposal about separate patches. However, I think that most opposite hacker decided by his feelings... Actuary, he didn't answer to our proposal about understanding the parametrized distribution... So I also think it is blocker. Command line feature is also needed. Besides, is there a other good method? Please share us. Best regards, -- Mitsumasa KONDO
Re: [HACKERS] PoC: Partial sort
On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov wrote: > Done. Patch is splitted. I took a quick look at this. Have you thought about making your new cmpSortSkipCols() function not use real comparisons? Since in the circumstances in which this optimization is expected to be effective (e.g. your original example) we can also expect a relatively low cardinality for the first n indexed attributes (again, as in your original example), in general when cmpSortSkipCols() is called there is a high chance that it will return true. If any pair of tuples (logically adjacent tuples fed in to cmpSortSkipCols() by an index scan in logical order) are not fully equal (i.e. their leading, indexed attributes are not equal) then we don't care about the details -- we just know that a new sort grouping is required. The idea here is that you can get away with simple binary equality comparisons, as we do when considering HOT-safety. Of course, you might find that two bitwise unequal values are equal according to their ordinary B-Tree support function 1 comparator (e.g. two numerics that differ only in their display scale). AFAICT this should be okay, since that just means that you have smaller sort groupings than strictly necessary. I'm not sure if that's worth it to more or less duplicate heap_tuple_attr_equals() to save a "mere" n expensive comparisons, but it's something to think about (actually, there are probably less than even n comparisons in practice because there'll be a limit). A similar idea appears in my SortSupport for text ("Poor man's normalized key"/strxfrm()) patch. A poor man's key comparison didn't work out, and there may be further differences that aren't captured in the special simple key representation, so we need to do a "proper comparison" to figure it out for sure. However, within the sortsupport routine comparator, we know that we're being called in this context, as a tie-breaker for a poor man's normalized key comparison that returned 0, and so are optimistic about the two datums being fully equal. An optimistic memcmp() is attempted before a strcoll() here if the lengths also match. I have not actually added special hints so that we're optimistic about keys being equal in other places (places that have nothing to do with the general idea of poor man's normalized keys), but that might not be a bad idea. Actually, it might not be a bad idea to just always have varstr_cmp() attempt a memcmp() first when two texts have equal length, no matter how it's called. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending MSVC scripts to support --with-extra-version
On Sun, Jul 13, 2014 at 2:39 AM, Magnus Hagander > Applied, thanks!. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
> I have one last comment, after clarifying this I can move it to "ready for > committer". > 1. In networkjoinsel, For avoiding the case of huge statistics, only some of > the values from mcv and histograms are used (calculated using SQRT). > -- But in my opinion, if histograms and mcv both are exist then its fine, but > if only mcv's are there in that case, we can match complete MCV, it will give > better accuracy. >In other function like eqjoinsel also its matching complete MCV. I was not sure of reducing statistics, at all. I could not find any other selectivity estimation function which does this. After testing it some more, I reached the conclusion that it would be better to only reduce the values of the outer loop on histogram match. Now it matches complete MCV lists to each other. I also switched back to log2() from sqrt() to make the outer list smaller. I rethink your previous advice to threat histogram bucket partially matched when the constant matches the last boundary, and changed it that way. It is better than using the selectivity for only one value. Removing this part also make the function more simple. The new version of the patch attached. While looking at it I find some other small problems and fixed them. I also realized that I forgot to support other join types than inner join. Currently, the default estimation is used for anti joins. I think the patch will need more than trivial amount of change to support anti joins. I can work on it later. While doing it, outer join selectivity estimation can also be improved. I think the patch is better than nothing in its current state. diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c index d0d806f..08ec945 100644 --- a/src/backend/utils/adt/network_selfuncs.c +++ b/src/backend/utils/adt/network_selfuncs.c @@ -1,32 +1,626 @@ /*- * * network_selfuncs.c * Functions for selectivity estimation of inet/cidr operators * - * Currently these are just stubs, but we hope to do better soon. + * Estimates are based on null fraction, distinct value count, most common + * values, and histogram of inet/cidr datatypes. * * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION * src/backend/utils/adt/network_selfuncs.c * *- */ #include "postgres.h" +#include + +#include "access/htup_details.h" +#include "catalog/pg_collation.h" +#include "catalog/pg_operator.h" +#include "catalog/pg_statistic.h" +#include "utils/lsyscache.h" #include "utils/inet.h" +#include "utils/selfuncs.h" +/* Default selectivity constant for the inet overlap operator */ +#define DEFAULT_OVERLAP_SEL 0.01 + +/* Default selectivity constant for the other operators */ +#define DEFAULT_INCLUSION_SEL 0.005 + +/* Default selectivity for given operator */ +#define DEFAULT_SEL(operator) \ + ((operator) == OID_INET_OVERLAP_OP ? \ + DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL) + +static short int inet_opr_order(Oid operator, bool reversed); +static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues, + Datum *constvalue, short int opr_order); +static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1, + int nvalues1, Datum *values2, float4 *numbers2, + int nvalues2, Oid operator, bool reversed); +static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers, + int mcv_nvalues, Datum *his_values, int his_nvalues, + int red_nvalues, short int opr_order, + Selectivity *max_selec_pointer); +static Selectivity inet_his_inclusion_join_selec(Datum *his1_values, + int his1_nvalues, Datum *his2_values, int his2_nvalues, + int red_nvalues, short int opr_order); +static short int inet_inclusion_cmp(inet *left, inet *right, + short int opr_order); +static short int inet_masklen_inclusion_cmp(inet *left, inet *right, + short int opr_order); +static short int inet_his_match_divider(inet *boundary, inet *query, + short int opr_order); + +/* + * Selectivity estimation for the subnet inclusion operators + */ Datum networksel(PG_FUNCTION_ARGS) { - PG_RETURN_FLOAT8(0.001); + PlannerInfo*root = (PlannerInfo *) PG_GETARG_POINTER(0); +
Re: [HACKERS] Extending MSVC scripts to support --with-extra-version
On Mon, Jun 30, 2014 at 1:43 PM, Michael Paquier wrote: > > > > On Mon, Jun 30, 2014 at 7:05 PM, Asif Naeem wrote: >> Patch looks good to me. I think it is ready for committer. Thanks. > Thanks for the extra checks and for taking the time to review it. Applied, thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Securing "make check" (CVE-2014-0067)
On Fri, Jul 11, 2014 at 12:40:09PM +0300, Christoph Berg wrote: > > > > > > I believe pg_upgrade itself still needs a fix. While it's not a > > > > > > security problem to put the socket in $CWD while upgrading (it is > > > > > > using -c unix_socket_permissions=0700), this behavior is pretty > > > > > > unexpected, and does fail if your $CWD is > 107 bytes. > > Here's the patch. Proposed commit message: > > > > Create pg_upgrade sockets in temp directories > > > > pg_upgrade used to use the current directory for UNIX sockets to > > access the old/new cluster. This fails when the current path is > > > 107 bytes. Fix by reusing the tempdir code from pg_regress > > introduced in be76a6d39e2832d4b88c0e1cc381aa44a7f86881. For cleanup, > > we need to remember up to two directories. Thanks. Preliminary questions: > +#ifdef HAVE_UNIX_SOCKETS > +/* make_temp_sockdir() is invoked at most twice from pg_upgrade.c via > get_sock_dir() */ > +#define MAX_TEMPDIRS 2 > +static int n_tempdirs = 0; /* actual number of directories created */ > +static const char *temp_sockdir[MAX_TEMPDIRS]; > +#endif get_sock_dir() currently returns the same directory, the CWD, for both calls; can't it continue to do so? We already have good reason not to start two postmasters simultaneously during pg_upgrade. > +/* > + * Remove the socket temporary directories. pg_ctl waits for postmaster > + * shutdown, so we expect the directory to be empty, unless we are > interrupted > + * by a signal, in which case the postmaster will clean up the sockets, but > + * there's a race condition with us removing the directory. What's the reason for addressing that race condition in pg_regress and not addressing it in pg_upgrade? -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL information view
On Sat, Jul 12, 2014 at 4:36 PM, Tom Lane wrote: > Magnus Hagander writes: >> As an administrator, I find that you fairly often want to know what >> your current connections are actually using as SSL parameters, and >> there is currently no other way than gdb to find that out - something >> we definitely should fix. > > I'm wondering whether it's such a great idea that everybody can see > everybody else's client DN. Other than that, no objection to the > concept. I was thinking that's mostly the equivalent of a username, which we do let everybody see in pg_stat_activity. >> Second, I was planning to implement it by adding fields to >> PgBackendStatus and thus to BackendStatusArray, booleans directly in >> the struct and strings similar to how we track for example hostnames. >> Anybody see a problem with that? > > Space in that array is at a premium, and again the client DN seems > problematic, in that it's not short and has no clear upper bound. > > If you were to drop the DN from the proposed view then I'd be fine > with this. The text fields, like hostname, are tracked in separate parts of shared memory with just a pointer in the main array - I assume that's why, and was planning to do the same. We'd have to cap the length oft he DN at something though (and document as such), to make it fixed length. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL information view
Magnus Hagander writes: > As an administrator, I find that you fairly often want to know what > your current connections are actually using as SSL parameters, and > there is currently no other way than gdb to find that out - something > we definitely should fix. I'm wondering whether it's such a great idea that everybody can see everybody else's client DN. Other than that, no objection to the concept. > Second, I was planning to implement it by adding fields to > PgBackendStatus and thus to BackendStatusArray, booleans directly in > the struct and strings similar to how we track for example hostnames. > Anybody see a problem with that? Space in that array is at a premium, and again the client DN seems problematic, in that it's not short and has no clear upper bound. If you were to drop the DN from the proposed view then I'd be fine with this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tab completion for setting search_path
Hi, On 2014-06-23 19:57:21 -0700, Jeff Janes wrote: > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c > new file mode 100644 > index be5c3c5..dcd1b7d > *** a/src/bin/psql/tab-complete.c > --- b/src/bin/psql/tab-complete.c > *** psql_completion(const char *text, int st > *** 3342,3347 > --- 3342,3354 > > COMPLETE_WITH_LIST(my_list); > } > + else if (pg_strcasecmp(prev2_wd, "search_path") == 0) > + { > + COMPLETE_WITH_QUERY(Query_for_list_of_schemas > + " AND nspname > not like 'pg\\_toast%%' " > + " AND nspname > not like 'pg\\_temp%%' " > + " UNION SELECT > 'DEFAULT' "); > + } > else > { > static const char *const my_list[] = I don't particularly like the explicit comparisons using LIKE, but we can't really do better as we only have pg_my_temp_schema(), pg_is_other_temp_schema() right now. I was tempted to just add pg_is_temp_schema() and pg_is_toast_schema(), but we couldn't rely on them for now anyway due to cross version compatibility. We really should add those functions independently of this though. I'm also not really happy with the fact that we only complete a single search_path item. But it's not easy to do better and when looking around other places (e.g. DROP TABLE) don't support it either. I've thought about adding "$user" to the set of completed things as Fujii wondered about it, but it turns out completions containing $ don't work really great because $ is part of WORD_BREAKS. E.g. check out what happens if you do CREATE TABLE "foo$01"(); CREATE TABLE "foo$02"(); DROP TABLE "foo$ which means that a single schema that requires quoting will break completion of "$user". Pushed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSL information view
As an administrator, I find that you fairly often want to know what your current connections are actually using as SSL parameters, and there is currently no other way than gdb to find that out - something we definitely should fix. You can find it out today through libpq (the PQgetssl functions), the psql banner, or contrib/sslinfo. All of them are client side (the sslinfo module runs on the server, but it's just SQL functions that can show information about the current connection, nothing that can be used to inspect other connections). I recently put together a quick hack (https://github.com/mhagander/pg_sslstatus) that exposes a view with this information, but it's definitely hacky, and it really is functionality that we should include in core. Thus, I'll provide a version of that hack for 9.5. Before doing that, however, I'd like to ask for opinions :) The hack currently exposes a separate view that you can join to pg_stat_activity (or pg_stat_replication) on the pid -- this is sort of the same way that pg_stat_replication works in the first place. Do we want something similar to that for a builtin SSL view as well, or do we want to include the fields directly in pg_stat_activity and pg_stat_replication? Second, I was planning to implement it by adding fields to PgBackendStatus and thus to BackendStatusArray, booleans directly in the struct and strings similar to how we track for example hostnames. Anybody see a problem with that? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSL compression info in psql header
It's today really hard to figure out if your SSL connection is actually *using* SSL compression. This got extra hard when we the default value started getting influenced by environment variables at least on many platforms after the crime attacks. ISTM we should be making this easier for the user. Attached patch adds compression info at least to the header of the psql banner, as that's very non-intrusive. I think this is a small enough change, yet very useful, that we should squeeze it into 9.4 before the next beta. Not sure if it can be qualified enough of a bug to backpatch further than that though. As far as my research shows, the function SSL_get_current_compression() which it uses was added in OpenSSL 0.9.6, which is a long time ago (stopped being maintained in 2004). AFAICT even RHEL *3* shipped with 0.9.7. So I think we can safely rely on it, especially since we only check for whether it returns NULL or not. Comments? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index cede72a..b8a8e35 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1800,8 +1800,9 @@ printSSLInfo(void) return; /* no SSL */ SSL_get_cipher_bits(ssl, &sslbits); - printf(_("SSL connection (protocol: %s, cipher: %s, bits: %d)\n"), - SSL_get_version(ssl), SSL_get_cipher(ssl), sslbits); + printf(_("SSL connection (protocol: %s, cipher: %s, bits: %d, compression: %s)\n"), + SSL_get_version(ssl), SSL_get_cipher(ssl), sslbits, + SSL_get_current_compression(ssl) ? gettext_noop("yes") : gettext_noop("no")); #else /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing autocomplete for CREATE DATABASE
On Fri, Jul 11, 2014 at 1:10 PM, Magnus Hagander wrote: > On Fri, Jul 11, 2014 at 12:01 AM, Vik Fearing wrote: >> On 07/10/2014 09:32 PM, Magnus Hagander wrote: >>> It seems psql is missing autocomplete entries for LC_COLLATE and >>> LC_CTYPE for the CREATE DATABASE command. Attached patch adds that. >>> >>> I doubt this is important enough to backpatch - thoughts? >> >> It won't apply to current head, but otherwise I see no problem with it. > > Meh, thanks for pointing that out. I git-fetch:ed from the wrong repository :) Applied and backpatched. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 12.7.2014 11:39, Simon Riggs wrote: > On 11 July 2014 18:25, Tomas Vondra wrote: > >> Turns out getting this working properly will quite complicated. > > Lets keep this patch simple then. Later research can be another patch. Well, the dense allocation is independent to the NTUP_PER_BUCKET changes, and only happened to be discussed here because it's related to hash joins. My plan was to keep it as a separate patch, thus not making the NTUP patch any more complex. > In terms of memory pressure, having larger joins go x4 faster has a > much more significant reducing effect on memory pressure than > anything else. So my earlier concerns seem less of a concern. OK. > So lets just this change done and then do more later. There's no way back, sadly. The dense allocation turned into a challenge. I like challenges. I have to solve it or I won't be able to sleep. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
Hi Robert, Thank you for checking this! I've added it to commitfest. https://commitfest.postgresql.org/action/patch_view?id=1507 regards, Tomonari Katsumata 2014-07-12 6:07 GMT+09:00 Robert Haas : > On Thu, Jul 10, 2014 at 2:52 AM, Tomonari Katsumata > wrote: > > Several couple weeks ago, I posted a mail to pgsql-doc. > > http://www.postgresql.org/message-id/53992ff8.2060...@po.ntts.co.jp > > > > In this thread, I concluded that it's better to > > round up the value which is less than its unit. > > Current behavior (rounding down) has undesirable setting risk, > > because some GUCs have special meaning for 0. > > > > And then I made a patch for this. > > Please check the attached patch. > > Thanks for the patch. Please add it here: > > https://commitfest.postgresql.org/action/commitfest_view/open > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 11 July 2014 18:25, Tomas Vondra wrote: > Turns out getting this working properly will quite complicated. Lets keep this patch simple then. Later research can be another patch. In terms of memory pressure, having larger joins go x4 faster has a much more significant reducing effect on memory pressure than anything else. So my earlier concerns seem less of a concern. So lets just this change done and then do more later. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers