Re: [HACKERS] KNN searches support for SP-GiST [GSOC'14]
On 08/20/2014 03:35 AM, Vladislav Sterzhanov wrote: Hi there, pg-Hackers! Here I go with the patch which brings up the possibility to perform nearest-neighbour searches on SP-GiSTs (as of now includes implementation for quad and kd trees). Pre-reviewed by my GSoC mentor Alexander Korotkov. Sample benchmarking script included in the attachment (dumps the current geonames archive and runs several searches on the (latitude, longitude) points), which demonstrates the dramatic improvements against plain searches and sorting. Cool! I think this needs a high-level description in access/spgist/README on how this works. You can probably copy-paste the similar description from gist's README, because it's the same design. If I understand correctly, the support functions return distances between the nodes and the query, and the SP-GiST code uses those distances to return the rows in order. An important detail there is that the distance returned for an inner node is a lower bound of the distance of any node in that branch. A binary heap would be a better data structure than a red-black tree for the queue of nodes to visit/return. I know that GiST also uses a red-black for the same thing, but that's no reason not to do better here. There is a binary heap implementation in the backend too (see src/backend/lib/binaryheap.c), so it's not any more difficult to use. Does the code handle ties correctly? The distance function is just an approximation, so it's possible that there are two items with same distance, but are not equal according to the ordering operator. As an extreme example, if you hack the distance function to always return 0.0, do you still get correct results (albeit slowly)? The new suppLen field in spgConfigOut is not mentioned in the docs. Why not support pass-by-value supplementary values? A couple of quick comments on the code: @@ -137,8 +138,8 @@ spg_quad_picksplit(PG_FUNCTION_ARGS) { spgPickSplitIn *in = (spgPickSplitIn *) PG_GETARG_POINTER(0); spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1); - int i; - Point *centroid; + int i; + Point *centroid; #ifdef USE_MEDIAN /* Use the median values of x and y as the centroid point */ Please remove all unnecessary whitespace changes like this. @@ -213,14 +215,19 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS) } Assert(in->nNodes == 4); + + if (in->norderbys > 0) + { + out->distances = palloc(in->nNodes * sizeof (double *)); + } I think that should be "sizeof(double)". + *distances = (double *) malloc(norderbys * sizeof (double *)); No mallocs allowed in the backend, should be palloc. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] August commitfest
Per the schedule agreed on in the Developer Meeting in Ottawa, the August commitfest began five days ago. We don't seem to have a commitfest manager, and the commitfest has so far failed to manage itself. To get things going, I'm picking up the Commitfest Manager Mace I found from behind the dumpster now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] August commitfest
On Wed, Aug 20, 2014 at 7:17 PM, Heikki Linnakangas wrote: > Per the schedule agreed on in the Developer Meeting in Ottawa, the August > commitfest began five days ago. > > We don't seem to have a commitfest manager, and the commitfest has so far > failed to manage itself. To get things going, I'm picking up the Commitfest > Manager Mace I found from behind the dumpster now. > > Thanks Heikki, I was wondering if you knew what the plan is for the remaining ready for committer patches from the June commitfest? Regards David Rowley
Re: [HACKERS] New Model For Role Attributes and Fine Grained Permssions
On 08/19/2014 04:27 AM, Brightwell, Adam wrote: Hi All, This is a "proof-of-concept" patch for a new model around role attributes and fine grained permissions meant to alleviate the current over dependence on superuser. Hmm. How does this get us any closer to fine-grained permissions? I guess we need this, so that you can grant/revoke the permissions, but I thought the hard part is defining what the fine-grained permissions are, in a way that you can't escalate yourself to full superuser through any of them. The new syntax could be useful even if we never get around to do anything about current superuser checks, so I'm going to give this a quick review just on its own merits. Please add documentation. That's certainly required before this can be committed, but it'll make reviewing the syntax much easier too. This is not yet complete and only serves as a proof-of-concept at this point, but I wanted to share it in the hopes of receiving comments, suggestions and general feedback. The general gist of this patch is as follows: * New system catalog "pg_permission" that relates role id's to permissions. * New syntax. - GRANT TO - REVOKE FROM where, is one of an enumerated value, such as "CREATE ROLE" or "CREATE DATABASE". The syntax for GRANT/REVOKE is quite complicated already. Do we want to overload it even more? Also keep in mind that the SQL standard specifies GRANT/REVOKE, so we have to be careful to not clash with the SQL standard syntax, or any conceivable future syntax the SQL committee might add to it (although we have plenty of PostgreSQL extensions in it already). For example, this is currently legal: GRANT CREATE ALL ON TABLESPACE TO Is that too close to the new syntax, to cause confusion? Does the new syntax work for all the more fine-grained permissions you have in mind? I'm particularly worried of conflicts with this existing syntax: GRANT role_name [, ...] TO role_name [, ...] [ WITH ADMIN OPTION ] - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] August commitfest
On 08/20/2014 10:42 AM, David Rowley wrote: On Wed, Aug 20, 2014 at 7:17 PM, Heikki Linnakangas wrote: Per the schedule agreed on in the Developer Meeting in Ottawa, the August commitfest began five days ago. We don't seem to have a commitfest manager, and the commitfest has so far failed to manage itself. To get things going, I'm picking up the Commitfest Manager Mace I found from behind the dumpster now. Thanks Heikki, I was wondering if you knew what the plan is for the remaining ready for committer patches from the June commitfest? There is no plan. I just moved them to the August commitfest, hopefully they will actually get committed or rejected by a committer. I'm doing a quick pass through the list of patches now, and after that I'll start nagging people to get stuff reviewed/committed. (There are a bunch of patches that are in Waiting on author state, and no activity for months; I'm marking them directly as Returned with feedback.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
On 07/07/2014 11:46 AM, Abhijit Menon-Sen wrote: At 2014-07-07 14:02:15 +0530, amit.khande...@enterprisedb.com wrote: Other than some minor comments as mentioned below, I don't have any more issues, it looks all good. Thank you. (Updated patch attached.) I don't think the new GetBufferWithoutRelcache function is in line with the existing ReadBuffer API. I think it would be better to add a new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if it's already in cache, and InvalidBuffer otherwise. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] August commitfest
I have cleaned up the Commitfest of patches that have been in Waiting for Author state for weeks or more. I believe the list is now an accurate representation of the actual state of the patches. Patch authors - Please check if a patch of yours is in Waiting for Author state. It means that you need to amend the patch and post a new version. If you do nothing, the patch will be marked returned with feedback. Also, please pick a patch or two, submitted by other people, and review them. Reviewers - Please review patches. Every little helps. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix search_path default value separator.
On 08/15/2014 04:58 PM, Bruce Momjian wrote: On Fri, Aug 15, 2014 at 10:40:59PM +0900, Fujii Masao wrote: Heh. I'm not particularly averse to changing this, but I guess I don't see any particular benefit of changing it either. Either comma or comma-space is a legal separator, so why worry about it? This change might cause me to update the existing documents (which I need to maintain in my company) including the output example of default search_path. If the change is for the improvement, I'd be happy to do that, but it seems not. Also there might be some PostgreSQL extensions which their regression test shows the default search_path. This patch would make their developers spend the time to update the test. I'm sure that they are fine with that if it's for an improvement. But not. Well, rename GUC often too for clearity, so I don't see adjusting white-space as something to avoid either. It is always about short-term adjustments vs. long-term clarity. I think this is an improvement, although a really minor one. Although Robert & Fujii questioned if this is worth it, I didn't hear anyone objecting strongly, so committed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
Hi Andres, Are you planning to continue working on this? Summarizing the discussion so far: * Robert listed a bunch of little cleanup tasks (http://www.postgresql.org/message-id/ca+tgmozshvvqjakul6p3kdvzvpibtgkzoti3m+fvvjg5v+x...@mail.gmail.com). Amit posted yet more detailed commends. * We talked about changing the file layout. I think everyone is happy with your proposal here: http://www.postgresql.org/message-id/20140702131729.gp21...@awork2.anarazel.de, with an overview description of what goes where (http://www.postgresql.org/message-id/CA+TgmoYuxfsm2dSy48=jgutrh1mozrvmjlhgqrfku7_wpv-...@mail.gmail.com) * Talked about nested spinlocks. The consensus seems to be to (continue to) forbid nested spinlocks, but allow atomic operations while holding a spinlock. When atomics are emulated with spinlocks, it's OK to acquire the emulation spinlock while holding another spinlock. * Talked about whether emulating atomics with spinlocks is a good idea. You posted performance results showing that at least with the patch to use atomics to implement LWLocks, the emulation performs more or less the same as the current spinlock-based implementation. I believe everyone was more or less satisfied with that. So ISTM we have consensus that the approach to spinlock emulation and nesting stuff is OK. To finish the patch, you'll just need to address the file layout and the laundry list of other little details that have been raised. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] implement subject alternative names support for SSL connections
On 07/25/2014 07:10 PM, Alexey Klyukin wrote: Greetings, I'd like to propose a patch for checking subject alternative names entry in the SSL certificate for DNS names during SSL authentication. Thanks! I just ran into this missing feature last week, while working on my SSL test suite. So +1 for having the feature. This patch needs to be rebased over current master branch, thanks to my refactoring that moved all OpenSSL-specific stuff to be-secure-openssl.c. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Compute attr_needed for child relations (was Re: [HACKERS] inherit support for foreign tables)
Hi Ashutish, (2014/08/14 22:30), Ashutosh Bapat wrote: On Thu, Aug 14, 2014 at 10:05 AM, Etsuro Fujita mailto:fujita.ets...@lab.ntt.co.jp>> wrote: (2014/08/08 18:51), Etsuro Fujita wrote: >>> (2014/06/30 22:48), Tom Lane wrote: I wonder whether it isn't time to change that. It was coded like that originally only because calculating the values would've been a waste of cycles at the time. But this is at least the third place where it'd be useful to have attr_needed for child rels. > I've revised the patch. There was a problem with the previous patch, which will be described below. Attached is the updated version of the patch addressing that. Here are some more comments Thank you for the further review! attr_needed also has the attributes used in the restriction clauses as seen in distribute_qual_to_rels(), so, it looks unnecessary to call pull_varattnos() on the clauses in baserestrictinfo in functions check_selective_binary_conversion(), remove_unused_subquery_outputs(), check_index_only(). IIUC, I think it's *necessary* to do that in those functions since the attributes used in the restriction clauses in baserestrictinfo are not added to attr_needed due the following code in distribute_qual_to_rels. /* * If it's a join clause (either naturally, or because delayed by * outer-join rules), add vars used in the clause to targetlists of their * relations, so that they will be emitted by the plan nodes that scan * those relations (else they won't be available at the join node!). * * Note: if the clause gets absorbed into an EquivalenceClass then this * may be unnecessary, but for now we have to do it to cover the case * where the EC becomes ec_broken and we end up reinserting the original * clauses into the plan. */ if (bms_membership(relids) == BMS_MULTIPLE) { List *vars = pull_var_clause(clause, PVC_RECURSE_AGGREGATES, PVC_INCLUDE_PLACEHOLDERS); add_vars_to_targetlist(root, vars, relids, false); list_free(vars); } Although in case of RTE_RELATION, the 0th entry in attr_needed corresponds to FirstLowInvalidHeapAttributeNumber + 1, it's always safer to use it is RelOptInfo::min_attr, in case get_relation_info() wants to change assumption or somewhere down the line some other part of code wants to change attr_needed information. It may be unlikely, that it would change, but it will be better to stick to comments in RelOptInfo 443 AttrNumber min_attr; /* smallest attrno of rel (often <0) */ 444 AttrNumber max_attr; /* largest attrno of rel */ 445 Relids *attr_needed;/* array indexed [min_attr .. max_attr] */ Good point! Attached is the revised version of the patch. Thanks, Best regards, Etsuro Fujita *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** *** 799,806 check_selective_binary_conversion(RelOptInfo *baserel, } /* Collect all the attributes needed for joins or final output. */ ! pull_varattnos((Node *) baserel->reltargetlist, baserel->relid, ! &attrs_used); /* Add all the attributes used by restriction clauses. */ foreach(lc, baserel->baserestrictinfo) --- 799,810 } /* Collect all the attributes needed for joins or final output. */ ! for (i = baserel->min_attr; i <= baserel->max_attr; i++) ! { ! if (!bms_is_empty(baserel->attr_needed[i - baserel->min_attr])) ! attrs_used = bms_add_member(attrs_used, ! i - FirstLowInvalidHeapAttributeNumber); ! } /* Add all the attributes used by restriction clauses. */ foreach(lc, baserel->baserestrictinfo) *** a/src/backend/optimizer/path/allpaths.c --- b/src/backend/optimizer/path/allpaths.c *** *** 577,588 set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, childrel->has_eclass_joins = rel->has_eclass_joins; /* ! * Note: we could compute appropriate attr_needed data for the child's ! * variables, by transforming the parent's attr_needed through the ! * translated_vars mapping. However, currently there's no need ! * because attr_needed is only examined for base relations not ! * otherrels. So we just leave the child's attr_needed empty. */ /* * Compute the child's size. --- 577,585 childrel->has_eclass_joins = rel->has_eclass_joins; /* ! * Compute the child's attr_needed. */ + adjust_appendrel_attr_needed(rel, childrel, appinfo); /* * Compute the child's size. *** *** 2173,2178 remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel) --- 2170,2176 { Bitmapset *attrs_used = NULL; ListCell *lc; + int i; /* * Do nothing if subquery has UNION/INTERSECT/EXCEPT: in principle we *** **
Re: [HACKERS] better atomics - v0.5
Hi, On 2014-08-20 12:43:05 +0300, Heikki Linnakangas wrote: > Are you planning to continue working on this? Yes, I am! I've recently been on holiday and now I'm busy with catching up with everything that has happened since. I hope to have a next version ready early next week. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
Hi Noah, Thank you for the review! (2014/07/02 11:23), Noah Misch wrote: On Fri, Jun 20, 2014 at 05:04:06PM +0900, Kyotaro HORIGUCHI wrote: Attached is the rebased patch of v11 up to the current master. I've been studying this patch. SELECT FOR UPDATE on the inheritance parent fails with a can't-happen error condition, even when SELECT FOR UPDATE on the child foreign table alone would have succeeded. To fix this, I've modified the planner and executor so that the planner adds wholerow as well as ctid and tableoid as resjunk output columns to the plan for an inheritance tree that contains foreign tables, and that while the executor uses the ctid and tableoid in the EPQ processing for child regular tables as before, the executor uses the wholerow and tableoid for child foreign tables. Patch attached. This is based on the patch [1]. The patch adds zero tests. Add principal tests to postgres_fdw.sql. Also, create a child foreign table in foreign_data.sql; this will make dump/reload tests of the regression database exercise an inheritance tree that includes a foreign table. Done. I used your tests as reference. Thanks! The inheritance section of ddl.sgml should mention child foreign tables, at least briefly. Consider mentioning it in the partitioning section, too. Done. Your chosen ANALYZE behavior is fair, but the messaging from a database-wide ANALYZE VERBOSE needs work: INFO: analyzing "test_foreign_inherit.parent" INFO: "parent": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows INFO: analyzing "test_foreign_inherit.parent" inheritance tree WARNING: relcache reference leak: relation "child" not closed WARNING: relcache reference leak: relation "tchild" not closed WARNING: relcache reference leak: relation "parent" not closed Please arrange to omit the 'analyzing "tablename" inheritance tree' message, since this ANALYZE actually skipped that task. (The warnings obviously need a fix, too.) I do find it awkward that adding a foreign table to an inheritance tree will make autovacuum stop collecting statistics on that inheritance tree, but I can't think of a better policy. I think it would be better that this is handled in the same way as an inheritance tree that turns out to be a singe table that doesn't have any descendants in acquire_inherited_sample_rows(). That would still output the message as shown below, but I think that that would be more consistent with the existing code. The patch works so. (The warnings are also fixed.) INFO: analyzing "public.parent" INFO: "parent": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows INFO: analyzing "public.parent" inheritance tree INFO: analyzing "pg_catalog.pg_authid" INFO: "pg_authid": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows The rest of these review comments are strictly observations; they're not requests for you to change the patch, but they may deserve more discussion. I'd like to give my opinions on those things later on. Sorry for the long delay. [1] http://www.postgresql.org/message-id/53f4707c.8030...@lab.ntt.co.jp Best regards, Etsuro Fujita *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** *** 115,120 static void fileGetForeignRelSize(PlannerInfo *root, --- 115,125 static void fileGetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid); + static ForeignPath *fileReparameterizeForeignPath(PlannerInfo *root, + RelOptInfo *baserel, + Oid foreigntableid, + ForeignPath *path, + Relids required_outer); static ForeignScan *fileGetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid, *** *** 143,149 static bool check_selective_binary_conversion(RelOptInfo *baserel, static void estimate_size(PlannerInfo *root, RelOptInfo *baserel, FileFdwPlanState *fdw_private); static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel, ! FileFdwPlanState *fdw_private, Cost *startup_cost, Cost *total_cost); static int file_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, --- 148,154 static void estimate_size(PlannerInfo *root, RelOptInfo *baserel, FileFdwPlanState *fdw_private); static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel, ! FileFdwPlanState *fdw_private, List *join_conds, Cost *startup_cost, Cost *total_cost); static int file_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, *** *** 161,166 file_fdw_handler(PG_FUNCTION_ARGS) --- 166,172 fdwroutine->GetForeignRelSize = fileGetForeignRelSize; fdwroutine->GetForeignPaths = fileGetForeignPaths; + fdwroutine->Reparam
Re: [HACKERS] August commitfest
Hi Heikki, (2014/08/20 17:50), Heikki Linnakangas wrote: I have cleaned up the Commitfest of patches that have been in Waiting for Author state for weeks or more. I believe the list is now an accurate representation of the actual state of the patches. Thank you for the work! I chaged the state of the following patch from "Returned with Feedback" to "Needs Review". https://commitfest.postgresql.org/action/patch_view?id=1386 Patch authors - Also, please pick a patch or two, submitted by other people, and review them. Will do. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add line number as prompt option to psql
Hi, I have reviewed this: I have initialize cur_lineno to UINTMAX - 2. And then observed following behaviour to check wrap-around. postgres=# \set PROMPT1 '%/[%l]%R%# ' postgres[18446744073709551613]=# \set PROMPT2 '%/[%l]%R%# ' postgres[18446744073709551613]=# select postgres[18446744073709551614]-# a postgres[18446744073709551615]-# , postgres[0]-# b postgres[1]-# from postgres[2]-# dual; It is wrapping to 0, where as line number always start with 1. Any issues? I would like to ignore this as UINTMAX lines are too much for a input buffer to hold. It is almost NIL chances to hit this. However, I think you need to use UINT64_FORMAT while printing uint64 number. Currently it is %u which wrap-around at UINT_MAX. See how pset.lineno is displayed. Apart from this, I didn't see any issues in my testing. Patch applies cleanly. make/make install/initdb/make check all are well. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 07/20/2014 07:17 PM, Tomas Vondra wrote: On 19.7.2014 20:24, Tomas Vondra wrote: On 13.7.2014 21:32, Tomas Vondra wrote: The current patch only implemnents this for tuples in the main hash table, not for skew buckets. I plan to do that, but it will require separate chunks for each skew bucket (so we can remove it without messing with all of them). The chunks for skew buckets should be smaller (I'm thinking about ~4kB), because there'll be more of them, but OTOH those buckets are for frequent values so the chunks should not remain empty. I've looked into extending the dense allocation to the skew buckets, and I think we shouldn't do that. I got about 50% of the changes and then just threw it out because it turned out quite pointless. The amount of memory for skew buckets is limited to 2% of work mem, so even with 100% overhead it'll use ~4% of work mem. So there's pretty much nothing to gain here. So the additional complexity introduced by the dense allocation seems pretty pointless. I'm not entirely happy with the code allocating some memory densely and some using traditional palloc, but it certainly seems cleaner than the code I had. So I think the patch is mostly OK as is. Attached is v4 of the patch, mostly with minor improvements and cleanups (removed MemoryContextStats, code related to skew buckets). Can you remind me what kind of queries this is supposed to help with? Could you do some performance testing to demonstrate the effect? And also a worst-case scenario. At a quick glance, I think you're missing a fairly obvious trick in ExecHashIncreaseNumBatches: if a chunk contains only a single tuple, you can avoid copying it to a new chunk and just link the old chunk to the new list instead. Not sure if ExecHashIncreaseNumBatches is called often enough for that to be noticeable, but at least it should help in extreme cases where you push around huge > 100MB tuples. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org
On Sat, Aug 9, 2014 at 2:23 AM, Tom Lane wrote: > If the Debian guidelines think that only SO major version need > be considered, they're wrong, at least for the way we've been treating > that. The Debian approach is that you should have precisely one installed copy of a library for each soname. I guess there's no particular reason you can't have multiple versions in the repository (possiby built by different source packages) but users will only be able to install one of them. This follows from there being a single /usr/lib and ldconfig picking a single version for each soname. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup
I think this has had enough review for a WIP patch. I'm marking this as Returned with Feedback in the commitfest because: * should use LSNs instead of a md5 * this doesn't do anything useful on its own, hence would need to see the whole solution before committing * not clear how this would be extended if you wanted to do more fine-grained than file-level diffs. But please feel free to continue discussing those items. On 08/18/2014 03:04 AM, Marco Nenciarini wrote: Hi Hackers, This is the first piece of file level incremental backup support, as described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup It is not yet complete, but I wish to share it on the list to receive comments and suggestions. The point of the patch is adding an option to pg_basebackup and replication protocol BASE_BACKUP command to generate a backup_profile file. When taking a full backup with pg_basebackup, the user can request Postgres to generate a backup_profile file through the --profile option (-B short option, which I've arbitrarily picked up because both -P and -p was already taken) At the moment the backup profile consists of a file with one line per file detailing modification time, md5, size, tablespace and path relative to tablespace root (PGDATA or the tablespace) To calculate the md5 checksum I've used the md5 code present in pgcrypto contrib as the code in src/include/libpq/md5.h is not suitable for large files. Since a core feature cannot depend on a piece of contrib, I've moved the files contrib/pgcrypto/md5.c contrib/pgcrypto/md5.h to src/backend/utils/hash/md5.c src/include/utils/md5.h changing the pgcrypto extension to use them. There are still some TODOs: * User documentation * Remove the pg_basebackup code duplication I've introduced with the ReceiveAndUnpackTarFileToDir function, which is almost the same of ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It instead simply extract a tar stream in a destination directory. The latter could probably be rewritten using the former as component, but it needs some adjustment to the "progress reporting" part, which is not present at the moment in ReceiveAndUnpackTarFileToDir. * Add header section to backup_profile file which at the moment contains only the body part. I'm thinking to change the original design and put the whole backup label as header, which is IMHO clearer and well known. I would use something like: START WAL LOCATION: 0/E28 (file 0001000E) CHECKPOINT LOCATION: 0/E60 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2014-08-14 18:54:01 CEST LABEL: pg_basebackup base backup END LABEL I've attached the current patch based on master. Any comment will be appreciated. Regards, Marco -- - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On Tue, Aug 19, 2014 at 10:42 PM, Alvaro Herrera wrote: > Is there a way to create a link to a file which only exists as an open > file descriptor? If there was, you could create a temp file, open an > fd, then delete the file. That would remove the issue with files being > leaked due to failures of various kinds. Sort of. On recent Linuxen you can create a file with open(O_TMPFILE) then use linkat(2) to create a link for it only once it's fully written. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
c.f.: O_TMPFILE (since Linux 3.11) Create an unnamed temporary file. The pathname argument specifies a directory; an unnamed inode will be created in that directory's filesystem. Anything written to the resulting file will be lost when the last file descriptor is closed, unless the file is given a name. O_TMPFILE must be specified with one of O_RDWR or O_WRONLY and, optionally, O_EXCL. If O_EXCL is not specified, then linkat(2) can be used to link the temporary file into the filesystem, making it permanent, using code like the following: char path[PATH_MAX]; fd = open("/path/to/dir", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR); /* File I/O on 'fd'... */ snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd); linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", AT_SYMLINK_FOLLOW); In this case, the open() mode argument determines the file permission mode, as with O_CREAT. Specifying O_EXCL in conjunction with O_TMPFILE prevents a temporary file from being linked into the filesystem in the above manner. (Note that the meaning of O_EXCL in this case is different from the meaning of O_EXCL otherwise.) There are two main use cases for O_TMPFILE: * Improved tmpfile(3) functionality: race-free creation of temporary files that (1) are automatically deleted when closed; (2) can never be reached via any pathname; (3) are not subject to symlink attacks; and (4) do not require the caller to devise unique names. * Creating a file that is initially invisible, which is then populated with data and adjusted to have appropriate filesystem attributes (chown(2), chmod(2), fsetxattr(2), etc.) before being atomically linked into the filesystem in a fully formed state (using linkat(2) as described above). O_TMPFILE requires support by the underlying filesystem; only a subset of Linux filesystems provide that support. In the initial implementation, support was provided in the ext2, ext3, ext4, UDF, Minix, and shmem filesystems. XFS support was added in Linux 3.15. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 20 Srpen 2014, 14:05, Heikki Linnakangas wrote: > On 07/20/2014 07:17 PM, Tomas Vondra wrote: >> On 19.7.2014 20:24, Tomas Vondra wrote: >>> On 13.7.2014 21:32, Tomas Vondra wrote: The current patch only implemnents this for tuples in the main hash table, not for skew buckets. I plan to do that, but it will require separate chunks for each skew bucket (so we can remove it without messing with all of them). The chunks for skew buckets should be smaller (I'm thinking about ~4kB), because there'll be more of them, but OTOH those buckets are for frequent values so the chunks should not remain empty. >>> >>> I've looked into extending the dense allocation to the skew buckets, >>> and I think we shouldn't do that. I got about 50% of the changes and >>> then just threw it out because it turned out quite pointless. >>> >>> The amount of memory for skew buckets is limited to 2% of work mem, >>> so even with 100% overhead it'll use ~4% of work mem. So there's >>> pretty much nothing to gain here. So the additional complexity >>> introduced by the dense allocation seems pretty pointless. >>> >>> I'm not entirely happy with the code allocating some memory densely >>> and some using traditional palloc, but it certainly seems cleaner >>> than the code I had. >>> >>> So I think the patch is mostly OK as is. >> >> Attached is v4 of the patch, mostly with minor improvements and cleanups >> (removed MemoryContextStats, code related to skew buckets). > > Can you remind me what kind of queries this is supposed to help with? > Could you do some performance testing to demonstrate the effect? And > also a worst-case scenario. The dense allocation? That was not really directed at any specific queries, it was about lowering hashjoin memory requirements in general. First to make it more correct with respect to work_mem (hashjoin does not account for the palloc overhead, so the actual memory consumption might be much higher than work_mem). Second to compensate for the memory for more buckets due to NTUP_PER_BUCKET, which is tweaked in the 'hashjoin dynamic nbuckets' patch. There are some numbers / more detailed analysis in http://www.postgresql.org/message-id/a17d6217fe0c9e459cb45cb764ad727c.squir...@sq.gransy.com > > At a quick glance, I think you're missing a fairly obvious trick in > ExecHashIncreaseNumBatches: if a chunk contains only a single tuple, you > can avoid copying it to a new chunk and just link the old chunk to the > new list instead. Not sure if ExecHashIncreaseNumBatches is called often > enough for that to be noticeable, but at least it should help in extreme > cases where you push around huge > 100MB tuples. Yeah, I thought about that too, but it seemed like a rare corner case. But maybe you're right - it will also eliminate the 'fluctuation' (allocating 100MB chunk, which might get us over work_mem, ...). Not sure if this is going to help, but it's easy enough to fix I guess. The other optimization I'm thinking about is that when increasing number of batches, the expectation is only about 1/2 the chunks will be necessary. So instead of freeing the chunk, we might keep it and reuse it later. That might lower the palloc overhead a bit (but it's already very low, so I don't think that's measurable difference). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
Greg Stark wrote: > char path[PATH_MAX]; > fd = open("/path/to/dir", O_TMPFILE | O_RDWR, > S_IRUSR | S_IWUSR); > > /* File I/O on 'fd'... */ > > snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd); > linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", > AT_SYMLINK_FOLLOW); Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the O_TMPFILE: you can have an open file descriptor to an "invisible" file simply by creating a normal file and unlinking it. I looked at linkat() yesterday but the idea of using /proc/self didn't occur to me. Nasty trick :-( It seems linkat() is quite a bit more portable than O_TMPFILE, fortunately ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote: > MauMau wrote: > > > With that said, copying to a temporary file like .tmp and > > renaming it to sounds worthwhile even as a basic copy > > utility. I want to avoid copying to a temporary file with a fixed > > name like _copy.tmp, because some advanced utility may want to run > > multiple instances of pg_copy to copy several files into the same > > directory simultaneously. However, I'm afraid multiple .tmp > > files might continue to occupy disk space after canceling copy or > > power failure in some use cases, where the copy of the same file > > won't be retried. That's also the reason why I chose to not use a > > temporary file like cp/copy. > > Is there a way to create a link to a file which only exists as an open > file descriptor? If there was, you could create a temp file, open an > fd, then delete the file. That would remove the issue with files being > leaked due to failures of various kinds. Isn't this a solution looking for a problem? We're using tempfiles in dozens of other places and I really don't see why this is the place to stop doing so. Just copy to .tmp and move it into place. If things crash during that, the command will be repeated shortly afterwards again *anyway*. Let's not get into platform specific games here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
Andres Freund wrote: > On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote: > > MauMau wrote: > > > > > With that said, copying to a temporary file like .tmp and > > > renaming it to sounds worthwhile even as a basic copy > > > utility. I want to avoid copying to a temporary file with a fixed > > > name like _copy.tmp, because some advanced utility may want to run > > > multiple instances of pg_copy to copy several files into the same > > > directory simultaneously. However, I'm afraid multiple .tmp > > > files might continue to occupy disk space after canceling copy or > > > power failure in some use cases, where the copy of the same file > > > won't be retried. That's also the reason why I chose to not use a > > > temporary file like cp/copy. > > > > Is there a way to create a link to a file which only exists as an open > > file descriptor? If there was, you could create a temp file, open an > > fd, then delete the file. That would remove the issue with files being > > leaked due to failures of various kinds. > > Isn't this a solution looking for a problem? We're using tempfiles in > dozens of other places and I really don't see why this is the place to > stop doing so. Just copy to .tmp and move it into place. If things > crash during that, the command will be repeated shortly afterwards again > *anyway*. Let's not get into platform specific games here. The issue is what happens if there's a crash while the temp file is in the middle of being filled. I agree there are bigger problems to solve in this area though. Yes, I agree that a fixed name such as _copy.tmp is a really bad choice for several reasons. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On Wed, Aug 20, 2014 at 2:27 PM, Alvaro Herrera wrote: > Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the > O_TMPFILE: you can have an open file descriptor to an "invisible" file > simply by creating a normal file and unlinking it. I looked at linkat() > yesterday but the idea of using /proc/self didn't occur to me. Nasty > trick :-( It seems linkat() is quite a bit more portable than > O_TMPFILE, fortunately ... Supposedly linkat(2) on Linux refuses to create a link to a file that was opened normally and had its last link removed with unlink(2) due to concerns that this would create security holes. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote: > > > MauMau wrote: > > > > > > > With that said, copying to a temporary file like .tmp and > > > > renaming it to sounds worthwhile even as a basic copy > > > > utility. I want to avoid copying to a temporary file with a fixed > > > > name like _copy.tmp, because some advanced utility may want to run > > > > multiple instances of pg_copy to copy several files into the same > > > > directory simultaneously. However, I'm afraid multiple .tmp > > > > files might continue to occupy disk space after canceling copy or > > > > power failure in some use cases, where the copy of the same file > > > > won't be retried. That's also the reason why I chose to not use a > > > > temporary file like cp/copy. > > > > > > Is there a way to create a link to a file which only exists as an open > > > file descriptor? If there was, you could create a temp file, open an > > > fd, then delete the file. That would remove the issue with files being > > > leaked due to failures of various kinds. > > > > Isn't this a solution looking for a problem? We're using tempfiles in > > dozens of other places and I really don't see why this is the place to > > stop doing so. Just copy to .tmp and move it into place. If things > > crash during that, the command will be repeated shortly afterwards again > > *anyway*. Let's not get into platform specific games here. > > The issue is what happens if there's a crash while the temp file is in > the middle of being filled. The archive command will be be run again a couple seconds and remove the half-filled temp file. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
Greg Stark wrote: > On Wed, Aug 20, 2014 at 2:27 PM, Alvaro Herrera > wrote: > > Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the > > O_TMPFILE: you can have an open file descriptor to an "invisible" file > > simply by creating a normal file and unlinking it. I looked at linkat() > > yesterday but the idea of using /proc/self didn't occur to me. Nasty > > trick :-( It seems linkat() is quite a bit more portable than > > O_TMPFILE, fortunately ... > > Supposedly linkat(2) on Linux refuses to create a link to a file that > was opened normally and had its last link removed with unlink(2) due > to concerns that this would create security holes. Sigh. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
Andres Freund writes: > On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote: >> Andres Freund wrote: >>> Isn't this a solution looking for a problem? We're using tempfiles in >>> dozens of other places and I really don't see why this is the place to >>> stop doing so. Just copy to .tmp and move it into place. >> The issue is what happens if there's a crash while the temp file is in >> the middle of being filled. > The archive command will be be run again a couple seconds and remove the > half-filled temp file. Alternatively, you could use the process PID as part of the temp file name; which is probably a good idea anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On 2014-08-20 10:19:33 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote: > >> Andres Freund wrote: > >>> Isn't this a solution looking for a problem? We're using tempfiles in > >>> dozens of other places and I really don't see why this is the place to > >>> stop doing so. Just copy to .tmp and move it into place. > > >> The issue is what happens if there's a crash while the temp file is in > >> the middle of being filled. > > > The archive command will be be run again a couple seconds and remove the > > half-filled temp file. > > Alternatively, you could use the process PID as part of the temp file > name; which is probably a good idea anyway. I think that's actually worse, because nothing will clean up those unless you explicitly scan for all .$pid files, and somehow kill them. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Wed, Aug 20, 2014 at 12:12 PM, Dilip kumar wrote: > > I have reviewed the patch and did not find any major comments. Thanks for the review. > There are some comments I would like to share with you > > > > 1. Rebase the patch to current GIT head. Done. > > 2. + * Construct symlink file > > + */ > > + initStringInfo(&symlinkfbuf); > > I think declaration and initialization of symlinkfbuf string can be moved under #ifdef WIN32 compile time macro, > > for other platform it’s simply allocated and freed which can be avoided. Agreed, I have changed the patch as per your suggestion. > > 3. + /* > > + * native windows utilites are not able create symlinks while > > + * extracting files from tar. > > + */ > > > > Rephrase the above sentence and fix spelling mistake (utilities are not able to create) Done. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com extend_basebackup_to_include_symlink_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
Andres Freund writes: > On 2014-08-20 10:19:33 -0400, Tom Lane wrote: >> Alternatively, you could use the process PID as part of the temp file >> name; which is probably a good idea anyway. > I think that's actually worse, because nothing will clean up those > unless you explicitly scan for all .$pid files, and somehow > kill them. True. As long as the copy command is prepared to get rid of a pre-existing target file, using a fixed .tmp extension should be fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
Josh Berkus writes: > On 08/15/2014 04:19 PM, Tom Lane wrote: >> Personally I'd prefer to go to the all-lengths approach, but a large >> part of that comes from a subjective assessment that the hybrid approach >> is too messy. Others might well disagree. > ... So, that extraction test is about 1% *slower* than the basic Tom Lane > lengths-only patch, and still 80% slower than original JSONB. And it's > the same size as the lengths-only version. Since it's looking like this might be the direction we want to go, I took the time to flesh out my proof-of-concept patch. The attached version takes care of cosmetic issues (like fixing the comments), and includes code to avoid O(N^2) penalties in findJsonbValueFromContainer and JsonbIteratorNext. I'm not sure whether those changes will help noticeably on Josh's test case; for me, they seemed worth making, but they do not bring the code back to full speed parity with the all-offsets version. But as we've been discussing, it seems likely that those costs would be swamped by compression and I/O considerations in most scenarios with large documents; and of course for small documents it hardly matters. Even if we don't go this way, there are parts of this patch that would need to get committed. I found for instance that convertJsonbArray and convertJsonbObject have insufficient defenses against overflowing the overall length field for the array or object. For my own part, I'm satisfied with the patch as attached (modulo the need to teach pg_upgrade about the incompatibility). There remains the question of whether to take this opportunity to add a version ID to the binary format. I'm not as excited about that idea as I originally was; having now studied the code more carefully, I think that any expansion would likely happen by adding more type codes and/or commandeering the currently-unused high-order bit of JEntrys. We don't need a version ID in the header for that. Moreover, if we did have such an ID, it would be notationally painful to get it to most of the places that might need it. regards, tom lane diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 2fd87fc..456011a 100644 *** a/src/backend/utils/adt/jsonb.c --- b/src/backend/utils/adt/jsonb.c *** jsonb_from_cstring(char *json, int len) *** 196,207 static size_t checkStringLen(size_t len) { ! if (len > JENTRY_POSMASK) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("string too long to represent as jsonb string"), errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.", ! JENTRY_POSMASK))); return len; } --- 196,207 static size_t checkStringLen(size_t len) { ! if (len > JENTRY_LENMASK) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("string too long to represent as jsonb string"), errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.", ! JENTRY_LENMASK))); return len; } diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c index 04f35bf..e47eaea 100644 *** a/src/backend/utils/adt/jsonb_util.c --- b/src/backend/utils/adt/jsonb_util.c *** *** 26,40 * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits * reserved for that in the JsonbContainer.header field. * ! * (the total size of an array's elements is also limited by JENTRY_POSMASK, * but we're not concerned about that here) */ #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK)) #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK)) ! static void fillJsonbValue(JEntry *array, int index, char *base_addr, JsonbValue *result); ! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b); static int compareJsonbScalarValue(JsonbValue *a, JsonbValue *b); static Jsonb *convertToJsonb(JsonbValue *val); static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level); --- 26,41 * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits * reserved for that in the JsonbContainer.header field. * ! * (the total size of an array's elements is also limited by JENTRY_LENMASK, * but we're not concerned about that here) */ #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK)) #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK)) ! static void fillJsonbValue(JEntry *children, int index, ! char *base_addr, uint32 offset, JsonbValue *result); ! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b); static int compareJsonbScalarValue(JsonbValue *a, JsonbValue *b); static Jsonb *convertToJsonb(JsonbValue *val); static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level); *
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
On 17 August 2014 21:45, Fabrízio de Royes Mello wrote: > > On Mon, Jul 28, 2014 at 2:24 PM, Fabrízio de Royes Mello < > fabriziome...@gmail.com> wrote: > > > > > > On Mon, Jul 28, 2014 at 1:41 PM, Christoph Berg wrote: > > > > > > Re: Fabrízio de Royes Mello 2014-07-28 > > > > > There are something that should I do on this patch yet? > > > > > > I haven't got around to have a look at the newest incarnation yet, but > > > I plan to do that soonish. (Of course that shouldn't stop others from > > > doing that as well if they wish.) > > > > > > > Thanks! > > > > Updated version. > Hi Fabrizio, + This form changes the table persistence type from unlogged to permanent or + from unlogged to permanent (see ). Shouldn't this read "unlogged to permanent or from permanent to unlogged"? "ERROR: table test is not permanent" Perhaps this would be better as "table test is unlogged" as "permanent" doesn't match the term used in the DDL syntax. Gave the patch a quick test-drive on a replicated instance, and it appears to operate as described. -- Thom
[HACKERS] Patching for increasing the number of columns
Hello, I am trying to patch the server source to increase the number of columns above 1600. I'm not planning to commit this but as suggested elsewhere [1], someone might suggest a configure option based on this. I came up with a patch which seems to work (see below), but 3 of the 136 tests fail. I understand some will question db design, but, as written elsewhere, "What would be most helpful though is if the answer to this question stop being an attack on the business requirement analysis, database design skills, and/or sanity of the requester" [1] I based my attempts on these discussions: http://www.postgresql.org/message-id/200512221633.jbmgxwm13...@candle.pha.pa.us http://www.postgresql.org/message-id/8914.1289620...@sss.pgh.pa.us http://dba.stackexchange.com/questions/40137/in-postgresql-is-it-possible-to-change-the-maximum-number-of-columns-a-table-ca I build this on Ubuntu 14.04, 64 bits. Bash session follows: == sudo apt-get install flex sudo apt-get install bison build-essential sudo apt-get install libreadline6-dev sudo apt-get install zlib1g-dev sudo apt-get install libossp-uuid-dev version=3_64 # change this if you want to build several versions of postgres in parallel # see also "MODIFY THIS TOO" below echo "current version is" $version mkdir -p ~/bin/postgresql_9.3.4 cd ~/bin/postgresql_9.3.4 wget ftp://ftp.postgresql.org/pub/source/v9.3.4/postgresql-9.3.4.tar.bz2 mkdir -p ~/bin/postgresql_9.3.4/patched_$version tar -xvf postgresql-9.3.*.tar.bz2 -C ~/bin/postgresql_9.3.4/patched_$version cd patched_$version/postgresql-9.3.* # use kate (KDE) or your preferred text editor: kate src/include/access/htup_details.h # See: http://dba.stackexchange.com/questions/40137/in-postgresql-is-it-possible-to-change-the-maximum-number-of-columns-a-table-ca # Replace this: #define MaxTupleAttributeNumber 1664/* 8 * 208 */ # by this: (the '#' sign 'define' should be included) #define MaxTupleAttributeNumber 6656/* 32 * 208 */ # or this: #define MaxTupleAttributeNumber 13312/* 64 * 208 */ # Replace this: #define MaxHeapAttributeNumber1600/* 8 * 200 */ # by this: (the '#' sign before 'define' should be included) #define MaxHeapAttributeNumber6400/* 32 * 200 */ # or this: #define MaxHeapAttributeNumber12800/* 64 * 200 */ # See: http://www.postgresql.org/message-id/8914.1289620...@sss.pgh.pa.us suggests this: uint16t_hoff; # Replace this: (in TWO PLACES) (near lines 148 and lines 523. If you miss one, postgresql segfaults.) uint8 t_hoff; /* sizeof header incl. bitmap, padding */ # by this: (in TWO PLACES) uint32 t_hoff; /* sizeof header incl. bitmap, padding */ # or by this:(in TWO PLACES) uint64 t_hoff; /* sizeof header incl. bitmap, padding */ # Save and close htup_details.h # (TODO: write the above as a command-line patch) ./configure --with-blocksize=32 --prefix=/usr/local/pgsql_patched_$version make make check # join ... FAILED # select_views ... FAILED # without_oid ... FAILED # # 3 of 136 tests failed. FIXME # (not sure whether I can attach the log and diff of the test here). I launched the server anyway and logged in with pgadmin3. I created a few tables with 2000 integer fields or so. Performed a few insert, select, update and join without any issue. So at least basic join works. And in pgadmin3, the "has OIDs" porperties of tables I created is not checked. Just to be sure, I performed again all the tests with 'make check' without any patch and without raising the blocksize (configure option), and this time all the tests passed (NO failure). Would anyone have some hint or advice? Thank you! Best regards, Mayeul [1] http://www.postgresql.org/message-id/8914.1289620...@sss.pgh.pa.us PS: and since it's my first post here: thank you all so much for this wonderful DBMS :-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch status: delta relations in AFTER triggers
On Wed, Aug 20, 2014 at 07:29:39AM -0700, Kevin Grittner wrote: > Heikki Linnakangas wrote: > > > What's the status of the "delta relations in AFTER triggers" > > patch? I saw you had a lively discussion with Amit in the last > > two weeks, but I couldn't tell if it's still "Needs review" or > > should be "Waiting on Author" ? > > The patch that David offered to use the tuplestores in C should > probably be updated to show both direct use based on what is in > TriggerData and SPI use. I will leave that to David to submit;it > doesn't need to be in the same patch-set as the rest; it might, in > fact, be better to offer it separately once the API is solid (i.e., > after commit of the patch-sets I've just described.) I'd be happy to. Is there a repo I can refer to, or should I wait to start until those patch sets are submitted to -hackers? > Thanks for picking up the job of CF manager, BTW. It's a dirty > job, but somebody's got to do it. Indeed, and thanks for wrangling this, Heikki. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patching for increasing the number of columns
Mayeul Kauffmann writes: > I am trying to patch the server source to increase the number of columns > above 1600. I'm not planning to commit this but as suggested elsewhere > [1], someone might suggest a configure option based on this. > I came up with a patch which seems to work (see below), but 3 of the 136 > tests fail. You would have to show us the actual failure diffs to get much useful comment, but in general increasing the size of tuple headers could easily lead to changes in plan choices, which would affect output row ordering (not to mention EXPLAIN results). This is particularly the case if you're testing on a 64-bit machine, since the maxalign'd size of the header would go from 24 to 32 bytes ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
On Wed, Aug 20, 2014 at 12:35 PM, Thom Brown wrote: > > Hi Fabrizio, > > + This form changes the table persistence type from unlogged to permanent or > + from unlogged to permanent (see ). > > Shouldn't this read "unlogged to permanent or from permanent to unlogged"? > Fixed. > "ERROR: table test is not permanent" > > Perhaps this would be better as "table test is unlogged" as "permanent" doesn't match the term used in the DDL syntax. > Fixed. > Gave the patch a quick test-drive on a replicated instance, and it appears to operate as described. > Thanks for your review. Att, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 69a1e14..397b4e6 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] name ENABLE ALWAYS RULE rewrite_rule_name CLUSTER ON index_name SET WITHOUT CLUSTER +SET {LOGGED | UNLOGGED} SET WITH OIDS SET WITHOUT OIDS SET ( storage_parameter = value [, ... ] ) +SET TABLESPACE new_tablespace RESET ( storage_parameter [, ... ] ) INHERIT parent_table NO INHERIT parent_table OF type_name NOT OF OWNER TO new_owner -SET TABLESPACE new_tablespace REPLICA IDENTITY {DEFAULT | USING INDEX index_name | FULL | NOTHING} and table_constraint_using_index is: @@ -447,6 +448,20 @@ ALTER TABLE [ IF EXISTS ] name +SET {LOGGED | UNLOGGED} + + + This form changes the table persistence type from unlogged to permanent or + from permanent to unlogged (see ). + + + Changing the table persistence type acquires an ACCESS EXCLUSIVE lock + and rewrites the table contents and associated indexes into new disk files. + + + + + SET WITH OIDS diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index b1c411a..7f497c7 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -574,7 +574,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose) heap_close(OldHeap, NoLock); /* Create the transient table that will receive the re-ordered data */ - OIDNewHeap = make_new_heap(tableOid, tableSpace, false, + OIDNewHeap = make_new_heap(tableOid, tableSpace, + OldHeap->rd_rel->relpersistence, AccessExclusiveLock); /* Copy the heap data into the new table in the desired order */ @@ -601,7 +602,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose) * data, then call finish_heap_swap to complete the operation. */ Oid -make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp, +make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence, LOCKMODE lockmode) { TupleDesc OldHeapDesc; @@ -613,7 +614,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp, Datum reloptions; bool isNull; Oid namespaceid; - char relpersistence; OldHeap = heap_open(OIDOldHeap, lockmode); OldHeapDesc = RelationGetDescr(OldHeap); @@ -636,16 +636,10 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp, if (isNull) reloptions = (Datum) 0; - if (forcetemp) - { + if (relpersistence == RELPERSISTENCE_TEMP) namespaceid = LookupCreationNamespace("pg_temp"); - relpersistence = RELPERSISTENCE_TEMP; - } else - { namespaceid = RelationGetNamespace(OldHeap); - relpersistence = OldHeap->rd_rel->relpersistence; - } /* * Create the new heap, using a temporary name in the same namespace as @@ -1146,6 +1140,7 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class, Oid relfilenode1, relfilenode2; Oid swaptemp; + char swaprelpersistence; CatalogIndexState indstate; /* We need writable copies of both pg_class tuples. */ @@ -1177,6 +1172,10 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class, relform1->reltablespace = relform2->reltablespace; relform2->reltablespace = swaptemp; + swaprelpersistence = relform1->relpersistence; + relform1->relpersistence = relform2->relpersistence; + relform2->relpersistence = swaprelpersistence; + /* Also swap toast links, if we're swapping by links */ if (!swap_toast_by_content) { diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index 5130d51..a49e66f 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -147,6 +147,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, DestReceiver *dest; bool concurrent; LOCKMODE lockmode; + char relpersistence; /* Determine strength of lock needed. */ concurrent = stmt->concurrent; @@ -233,9 +234,15 @@
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
On Mon, Aug 18, 2014 at 11:02 AM, Baker, Keith [OCDUS Non-J&J] wrote: > My proof of concept code (steps a though e below) avoided any reading or > writing to the pipe (and associated handling of SIGPIPE), it just relied on > postmaster open of PIPE with ENXIO to indicate all is clear. I'm not following. > Robert, Assuming an algorithm choice is agreed upon in the near future, would > you be the logical choice to implement the change? > I am happy to help, especially with any QNX-specific aspects, but don't want > to step on anyone's toes. I'm unlikely to have time to work on this in the immediate future, but I may be able to help review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patching for increasing the number of columns
Tom wrote: > You would have to show us the actual failure diffs to get much useful comment, but in general increasing the size of tuple headers could easily lead to > changes in plan choices Thank you Tom. So there is some hope! In effect the query plan is different for the join and the view tests. The result set is different only for the 'without_oid' test. A side question: Are these tests comprehensive, or should I run other tests just to be sure? Hints on where to find those tests are welcome. Thanks! (diff below) Mayeul *** ~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/expected/join.out 2014-03-17 19:35:47.0 + --- ~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/results/join.out 2014-08-20 15:40:56.248603754 +0100 *** *** 2791,2814 join int4_tbl i1 on b.thousand = f1 right join int4_tbl i2 on i2.f1 = b.tenthous order by 1; !QUERY PLAN ! - Sort Sort Key: b.unique1 !-> Nested Loop Left Join ! -> Seq Scan on int4_tbl i2 -> Nested Loop Left Join Join Filter: (b.unique1 = 42) -> Nested Loop -> Nested Loop -> Seq Scan on int4_tbl i1 !-> Index Scan using tenk1_thous_tenthous on tenk1 b ! Index Cond: ((thousand = i1.f1) AND (i2.f1 = tenthous)) -> Index Scan using tenk1_unique1 on tenk1 a Index Cond: (unique1 = b.unique2) -> Index Only Scan using tenk1_thous_tenthous on tenk1 c Index Cond: (thousand = a.thousand) ! (15 rows) select b.unique1 from tenk1 a join tenk1 b on a.unique1 = b.unique2 --- 2791,2818 join int4_tbl i1 on b.thousand = f1 right join int4_tbl i2 on i2.f1 = b.tenthous order by 1; ! QUERY PLAN ! --- Sort Sort Key: b.unique1 !-> Hash Right Join ! Hash Cond: (b.tenthous = i2.f1) -> Nested Loop Left Join Join Filter: (b.unique1 = 42) -> Nested Loop -> Nested Loop -> Seq Scan on int4_tbl i1 !-> Bitmap Heap Scan on tenk1 b ! Recheck Cond: (thousand = i1.f1) ! -> Bitmap Index Scan on tenk1_thous_tenthous !Index Cond: (thousand = i1.f1) -> Index Scan using tenk1_unique1 on tenk1 a Index Cond: (unique1 = b.unique2) -> Index Only Scan using tenk1_thous_tenthous on tenk1 c Index Cond: (thousand = a.thousand) ! -> Hash !-> Seq Scan on int4_tbl i2 ! (19 rows) select b.unique1 from tenk1 a join tenk1 b on a.unique1 = b.unique2 == *** ~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/expected/select_views_1.out 2014-03-17 19:35:47.0 + --- ~/bin/postgresql_9.3.4/patched_3_64/postgresql-9.3.4/src/test/regress/results/select_views.out 2014-08-20 15:41:01.212603532 +0100 *** *** 1413,1423 WHERE f_leak(cnum) AND ymd >= '2011-10-01' AND ymd < '2011-11-01'; QUERY PLAN -- ! Nested Loop !Join Filter: (l.cid = r.cid) -> Seq Scan on credit_usage r Filter: ((ymd >= '10-01-2011'::date) AND (ymd < '11-01-2011'::date)) !-> Materialize -> Subquery Scan on l Filter: f_leak(l.cnum) -> Hash Join --- 1413,1423 WHERE f_leak(cnum) AND ymd >= '2011-10-01' AND ymd < '2011-11-01'; QUERY PLAN -- ! Hash Join !Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_usage r Filter: ((ymd >= '10-01-2011'::date) AND (ymd < '11-01-2011'::date)) !-> Hash -> Subquery Scan on l Filter: f_leak(l.cnum) -> Hash Join *** *** 1446,1456 Subquery Scan on my_credit_card_usage_secure Filter: f_leak(my_credit_card_usage_secure.cnum) !-> Nested Loop ! Join Filter: (l.cid = r.cid) -> Seq Scan on credit_usage r Filter: ((ymd >= '10-01-2011'::date) AND (ymd < '11-01-2011'::date)) !
Re: [HACKERS] Patch status: delta relations in AFTER triggers
David Fetter wrote: > On Wed, Aug 20, 2014 at 07:29:39AM -0700, Kevin Grittner wrote: >> The patch that David offered to use the tuplestores in C should >> probably be updated to show both direct use based on what is in >> TriggerData and SPI use. I will leave that to David to submit;it >> doesn't need to be in the same patch-set as the rest; it might, in >> fact, be better to offer it separately once the API is solid (i.e., >> after commit of the patch-sets I've just described.) > > I'd be happy to. Is there a repo I can refer to, or should I wait to > start until those patch sets are submitted to -hackers? It would be best to wait for the patches to be posted, or possibly for them to be committed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch status: delta relations in AFTER triggers
On Wed, Aug 20, 2014 at 09:59:11AM -0700, Kevin Grittner wrote: > David Fetter wrote: > > On Wed, Aug 20, 2014 at 07:29:39AM -0700, Kevin Grittner wrote: > > >> The patch that David offered to use the tuplestores in C should > >> probably be updated to show both direct use based on what is in > >> TriggerData and SPI use. I will leave that to David to submit;it > >> doesn't need to be in the same patch-set as the rest; it might, > >> in fact, be better to offer it separately once the API is solid > >> (i.e., after commit of the patch-sets I've just described.) > > > > I'd be happy to. Is there a repo I can refer to, or should I wait > > to start until those patch sets are submitted to -hackers? > > It would be best to wait for the patches to be posted, or possibly > for them to be committed. Works for me. :) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patching for increasing the number of columns
Mayeul Kauffmann writes: > Tom wrote: >> You would have to show us the actual failure diffs to get much useful >> comment, but in general increasing the size of tuple headers could >> easily lead to changes in plan choices > Thank you Tom. So there is some hope! In effect the query plan is > different for the join and the view tests. The result set is different > only for the 'without_oid' test. Hm. I think the without_oid test is not showing that anything is broken; what it's testing is whether a table with oids is physically bigger (more pages) than one without oids but the same data. It's not implausible that your change results in the same number of tuples fitting onto a page in both cases. It'd be worth doing the math to make sure that makes sense. Not sure if there's an easy way to change the table schema so that you get different physical sizes in both cases. The other tests aren't showing any functional issue either AFAICS. The change away from a nestloop plan in join.out is a bit annoying, because IIRC that test is specifically intended to check nestloop parameter management; but that just means the test is brittle. > A side question: Are these tests comprehensive, or should I run other > tests just to be sure? Hints on where to find those tests are welcome. No, they're not comprehensive, and no, we don't have more :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] After switching primary server while using replication slot.
On Tue, Aug 19, 2014 at 6:25 AM, Fujii Masao wrote: > On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko > wrote: >> Hi all, >> After switching primary serer while using repliaction slot, the >> standby server will not able to connect new primary server. >> Imagine this situation, if primary server has two ASYNC standby >> servers, also use each replication slots. >> And the one standby(A) apply WAL without problems. But another one >> standby(B) has stopped after connected to primary server. >> (or sending WAL is too delayed) >> >> In this situation, the standby(B) has not received WAL segment file >> while stopping itself. >> And the primary server can not remove WAL segments which has not been >> received to all standby. >> Therefore the primary server have to keep the WAL segment file which >> has not been received to all standby. >> But standby(A) can do checkpoint itself, and then it's possible to >> recycle WAL segments. >> The number of WAL segment of each server are different. >> ( The number of WAL files of standby(A) having smaller than primary server.) >> After the primary server is crashed, the standby(A) promote to primary, >> we can try to connect standby(B) to standby(A) as new standby server. >> But it will be failed because the standby(A) server might not have WAL >> segment files that standby(B) required. > > This sounds valid concern. > >> To resolve this situation, I think that we should make master server >> to notify about removal of WAL segment to all standby servers. >> And the standby servers recycle WAL segments files base on that information. >> >> Thought? > > How does the server recycle WAL files after it's promoted from the > standby to master? > It does that as it likes? If yes, your approach would not be enough. > > The approach prevents unexpected removal of WAL files while the standby > is running. But after the standby is promoted to master, it might recycle > needed WAL files immediately. So another standby may still fail to retrieve > the required WAL file after the promotion. > > ISTM that, in order to address this, we might need to log all the replication > slot activities and replicate them to the standby. I'm not sure if this > breaks the design of replication slot at all, though. Yuck. I believe that the reason why replication slots are not currently replicated is because we had the idea that the standby could have slots that don't exist on the master, for cascading replication. I'm not sure that works yet, but I think Andres definitely had it in mind in the original design. It seems to me that if every machine needs to keep not only the WAL it requires for itself, but also the WAL that any of other machine in the replication hierarchy might need, that's pretty much sucks. Suppose you have a master with 10 standbys, and each standby has 10 cascaded standbys. If one of those standbys goes down, do we really want all 100 other machines to keep copies of all the WAL? That seems rather unfortunate, since it's likely that only a few of those many standbys are machines to which we would consider failing over. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failures on barnacle (CLOBBER_CACHE_RECURSIVELY) because of memory leaks
Hi, On 13.8.2014 19:17, Tomas Vondra wrote: > On 13.8.2014 17:52, Tom Lane wrote: > >> * I'm a bit dubious about testing -DRANDOMIZE_ALLOCATED_MEMORY in the >> same build as -DCLOBBER_CACHE_RECURSIVELY, because each of these is >> darned expensive and it's not clear you'd learn anything by running >> them both together. I think you might be better advised to run two >> separate buildfarm critters with those two options, and thereby perhaps >> get turnaround in something less than 80 days. > > OK, I removed this for barnacle/addax/mite, let's see what's the impact. > > FWIW We have three other animals running with CLOBBER_CACHE_ALWAYS + > RANDOMIZE_ALLOCATED_MEMORY, and it takes ~20h per branch. But maybe the > price when combined with CLOBBER_CACHE_RECURSIVELY is much higher. > >> * It'd likely be a good idea to take out the TestUpgrade and TestDecoding >> modules from the config too. Otherwise, we won't be seeing barnacle's >> next report before 2015, judging from the runtime of the check step >> compared to some of the other slow buildfarm machines. (I wonder whether >> there's an easy way to skip the installcheck step, as that's going to >> require a much longer run than it can possibly be worth too.) > > OK, I did this too. > > I stopped the already running test on addax and started the test on > barnacle again. Let's see in a few days/weeks/months what is the result. It seems to be running much faster (probably after removing the randomization), and apparently it passed the create_view tests without crashing, but then crashed at 'constraints' (again, because of OOM). PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used PortalHeapMemory: 1024 total in 1 blocks; 840 free (0 chunks); 184 used ExecutorState: 769654952 total in 103 blocks; 114984 free (296 chunks); 769539968 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used I suppose we don't expect 760MB ExecutorState here. Also, there's ~60MB MessageContext. It's still running, so I'm attaching the relevant part of log (again, with MemoryContextStats for backends with VSS >= 512MB . FWIW, it's running against a844c29966d7c0cd6a457e9324f175349bb12df0. regards Tomas [53f203cf.585e:35] LOG: statement: INSERT INTO CHECK_TBL VALUES (1); [53f203cf.5862:12] LOG: statement: CREATE VIEW ro_view8 AS SELECT a, b FROM base_tbl ORDER BY a OFFSET 1; [53f203cf.5862:13] LOG: statement: CREATE VIEW ro_view9 AS SELECT a, b FROM base_tbl ORDER BY a LIMIT 1; [53f203cf.5862:14] LOG: statement: CREATE VIEW ro_view10 AS SELECT 1 AS a; [53f203cf.5862:15] LOG: statement: CREATE VIEW ro_view11 AS SELECT b1.a, b2.b FROM base_tbl b1, base_tbl b2; [53f203cf.5862:16] LOG: statement: CREATE VIEW ro_view12 AS SELECT * FROM generate_series(1, 10) AS g(a); [53f203cf.5862:17] LOG: statement: CREATE VIEW ro_view13 AS SELECT a, b FROM (SELECT * FROM base_tbl) AS t; [53f203cf.5862:18] LOG: statement: CREATE VIEW rw_view14 AS SELECT ctid, a, b FROM base_tbl; TopMemoryContext: 61792 total in 9 blocks; 4264 free (12 chunks); 57528 used CFuncHash: 8192 total in 1 blocks; 1640 free (0 chunks); 6552 used TopTransactionContext: 8192 total in 1 blocks; 6208 free (23 chunks); 1984 used SPI Exec: 50323456 total in 15 blocks; 2568088 free (79 chunks); 47755368 used CachedPlanSource: 3072 total in 2 blocks; 672 free (0 chunks); 2400 used SPI Proc: 8192 total in 1 blocks; 8088 free (0 chunks); 104 used MessageContext: 4194304 total in 10 blocks; 1120840 free (130 chunks); 3073464 used Operator class cache: 8192 total in 1 blocks; 1640 free (0 chunks); 6552 used smgr relation table: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1640 free (0 chunks); 6552 used PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used PortalHeapMemory: 1024 total in 1 blocks; 840 free (0 chunks); 184 used ExecutorState: 243261440 total in 38 blocks; 3436120 free (250 chunks); 239825320 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used CacheMemoryContext: 516096 total in 6 blocks; 260744 free (49 chunks); 255352 used fkeys2p_i: 1024 total in 1 blocks; 144 free (0 chunks); 880 used fkeys2_i: 1024 total in 1 blocks; 8 free (0 chunks); 1016 used EventTriggerCache: 8192 total in 1 blocks; 8160 free (114 chunks); 32 used Event Trigger Cache: 8192 total in 1 blocks; 3712 free (0 chunks); 4480 used pg_auth_members_member_role_index: 1024 total in 1 blocks; 8 free (0 chunks); 1016 used pg_authid_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used pg_authid_rolname_index: 1024 total in 1 blocks; 144 free (0 chunks); 880 used pg_database_oid_index: 1024 total in 1 blocks; 88
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra wrote: > Being able to batch inner and outer relations in a matching way is > certainly one of the reasons why hashjoin uses that particular scheme. > There are other reasons, though - for example being able to answer 'Does > this group belong to this batch?' quickly, and automatically update > number of batches. > > I'm not saying the lookup is extremely costly, but I'd be very surprised > if it was as cheap as modulo on a 32-bit integer. Not saying it's the > dominant cost here, but memory bandwidth is quickly becoming one of the > main bottlenecks. Well, I think you're certainly right that a hash table lookup is more expensive than modulo on a 32-bit integer; so much is obvious. But if the load factor is not too large, I think that it's not a *lot* more expensive, so it could be worth it if it gives us other advantages. As I see it, the advantage of Jeff's approach is that it doesn't really matter whether our estimates are accurate or not. We don't have to decide at the beginning how many batches to do, and then possibly end up using too much or too little memory per batch if we're wrong; we can let the amount of memory actually used during execution determine the number of batches. That seems good. Of course, a hash join can increase the number of batches on the fly, but only by doubling it, so you might go from 4 batches to 8 when 5 would really have been enough. And a hash join also can't *reduce* the number of batches on the fly, which might matter a lot. Getting the number of batches right avoids I/O, which is a lot more expensive than CPU. >> But the situation here isn't comparable, because there's only one >> input stream. I'm pretty sure we'll want to keep track of which >> transition states we've spilled due to lack of memory as opposed to >> those which were never present in the table at all, so that we can >> segregate the unprocessed tuples that pertain to spilled transition >> states from the ones that pertain to a group we haven't begun yet. > > Why would that be necessary or useful? I don't see a reason for tracking > that / segregating the tuples. Suppose there are going to be three groups: A, B, C. Each is an array_agg(), and they're big, so only of them will fit in work_mem at a time. However, we don't know that at the beginning, either because we don't write the code to try or because we do write that code but our cardinality estimates are way off; instead, we're under the impression that all four will fit in work_mem. So we start reading tuples. We see values for A and B, but we don't see any values for C because those all occur later in the input. Eventually, we run short of memory and cut off creation of new groups. Any tuples for C are now going to get written to a tape from which we'll later reread them. After a while, even that proves insufficient and we spill the transition state for B to disk. Any further tuples that show up for C will need to be written to tape as well. We continue processing and finish group A. Now it's time to do batch #2. Presumably, we begin by reloading the serialized transition state for group B. To finish group B, we must look at all the tuples that might possibly fall in that group. If all of the remaining tuples are on a single tape, we'll have to read all the tuples in group B *and* all the tuples in group C; we'll presumably rewrite the tuples that are not part of this batch onto a new tape, which we'll then process in batch #3. But if we took advantage of the first pass through the input to put the tuples for group B on one tape and the tuples for group C on another tape, we can be much more efficient - just read the remaining tuples for group B, not mixed with anything else, and then read a separate tape for group C. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila wrote: > On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas wrote: >> >> On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila >> wrote: >> > 1. >> > +Number of parallel connections to perform the operation. This >> > option will enable the vacuum >> > +operation to run on parallel connections, at a time one table >> > will >> > be operated on one connection. >> > >> > a. How about describing w.r.t asynchronous connections >> > instead of parallel connections? >> >> I don't think "asynchronous" is a good choice of word. > > Agreed. > >>Maybe "simultaneous"? > > Not sure. How about *concurrent* or *multiple*? multiple isn't right, but we could say concurrent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
Robert and Tom, Sorry for any confusion, I will try to clarify. Here is progression of events as I recall them: - My Initial QNX 6.5 port proposal lacked a robust replacement for the existing System V shared memory locking mechanism, a show stopper. - Robert proposed a nice set of possible alternatives for locking (to enable an all POSIX shared memory solution for future platforms). - Tom and Robert seemed to agree that a combination of file-based locking plus pipe-based locking should be a sufficiently robust on platforms without Sys V shared memory (e.g., QNX). - I coded a proof-of-concept patch (fcntl + PIPE) which appeared to work on QNX (steps a through e). - Robert countered with an 11 step algorithm (all in the postmaster) - Tom suggested elimination of steps 5,6,7,8, and 11 (and swapping order 9 and 10) I was just taking a step back to ask what gaps existed in the proof-of-concept patch (steps a through e). Is there a scenario it fails to cover, prompting the seemingly more complex 11 step algorithm (which added writing data to the pipe and handling of SIGPIPE)? I am willing to attempt coding of the set of changes for a QNX port (option for new locking and all POSIX shared memory, plus a few minor QNX-specific tweaks), provided you and Tom are satisfied that the show stoppers have been sufficiently addressed. Please let me know if more discussion is required, or if it would be reasonable for me (or someone else of your choosing) to work on the coding effort (perhaps targeted for 9.5?) If on the other hand it has been decided that a QNX port is not in the cards, I would like to know (I hope that is not the case given the progress made, but no point in wasting anyone's time). Thanks again for your time, effort, patience, and coaching. Keith Baker > -Original Message- > From: Robert Haas [mailto:robertmh...@gmail.com] > Sent: Wednesday, August 20, 2014 12:26 PM > To: Baker, Keith [OCDUS Non-J&J] > Cc: Tom Lane; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL > > On Mon, Aug 18, 2014 at 11:02 AM, Baker, Keith [OCDUS Non-J&J] > wrote: > > My proof of concept code (steps a though e below) avoided any reading or > writing to the pipe (and associated handling of SIGPIPE), it just relied on > postmaster open of PIPE with ENXIO to indicate all is clear. > > I'm not following. > > > Robert, Assuming an algorithm choice is agreed upon in the near future, > would you be the logical choice to implement the change? > > I am happy to help, especially with any QNX-specific aspects, but don't > want to step on anyone's toes. > > I'm unlikely to have time to work on this in the immediate future, but I may > be able to help review. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On 08/20/2014 08:29 AM, Tom Lane wrote: > Since it's looking like this might be the direction we want to go, I took > the time to flesh out my proof-of-concept patch. The attached version > takes care of cosmetic issues (like fixing the comments), and includes > code to avoid O(N^2) penalties in findJsonbValueFromContainer and > JsonbIteratorNext OK, will test. This means we need a beta3, no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
Josh Berkus writes: > This means we need a beta3, no? If we change the on-disk format, I'd say so. So we don't want to wait around too long before deciding. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patching for increasing the number of columns
On 20/08/14 18:17, Tom Lane wrote: Hm. I think the without_oid test is not showing that anything is broken; The other tests aren't showing any functional issue either AFAICS. Thanks a lot Tom! That's very helpful. I have written more details and some basic SQL tests in the wiki of the application (LimeSurvey) which requires this: http://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux I will give update here or on that wiki (where most relevant) should I find issues while testing. Cheers, mayeulk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Wed, Aug 20, 2014 at 2:43 AM, Michael Paquier wrote: > > I had a look at this patch, and here are a couple of comments: > 1) Depending on how ArchiveEntry is called to register an object to > dump, namespace may be NULL, but it is not the case > namespace->dobj.name, so you could get the namespace name at the top > of the function that have their verbose output improved with something > like that: > const char *namespace = tbinfo->dobj.namespace ? >tbinfo->dobj.namespace->dobj.name : NULL; > And then simplify the message output as follows: > if (namespace) >write_msg("blah \"%s\".\"%s\" blah", namespace, classname); > else >write_msg("blah \"%s\" blah", classname); > You can as well safely remove the checks on namespace->dobj.name. Ok > 2) I don't think that this is correct: > - ahlog(AH, 1, "processing data > for table \"%s\"\n", > - te->tag); > + ahlog(AH, 1, "processing data > for table \"%s\".\"%s\"\n", > + AH->currSchema, te->tag); > There are some code paths where AH->currSchema is set to NULL, and I > think that you should use te->namespace instead. Yes, you are correct! > 3) Changing only this message is not enough. The following verbose > messages need to be changed too for consistency: > - pg_dump: creating $tag $object > - pg_dump: setting owner and privileges for [blah] > > I have been pondering as well about doing similar modifications to the > error message paths, but it did not seem worth it as this patch is > aimed only for the verbose output. Btw, I have basically fixed those > issues while doing the review, and finished with the attached patch. > Fabrizio, is this new version fine for you? > Is fine to me. I just change "if (tbinfo->dobj.namespace != NULL)" to "if (tbinfo->dobj.namespace)". Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 3aebac8..07cc10e 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -546,8 +546,13 @@ RestoreArchive(Archive *AHX) /* Both schema and data objects might now have ownership/ACLs */ if ((te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0) { - ahlog(AH, 1, "setting owner and privileges for %s %s\n", - te->desc, te->tag); + /* Show namespace if available */ + if (te->namespace) +ahlog(AH, 1, "setting owner and privileges for %s \"%s\".\"%s\"\n", + te->desc, te->namespace, te->tag); + else +ahlog(AH, 1, "setting owner and privileges for %s \"%s\"\n", + te->desc, te->tag); _printTocEntry(AH, te, ropt, false, true); } } @@ -621,7 +626,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, if ((reqs & REQ_SCHEMA) != 0) /* We want the schema */ { - ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag); + /* Show namespace if available */ + if (te->namespace) + ahlog(AH, 1, "creating %s \"%s\".\"%s\"\n", + te->desc, te->namespace, te->tag); + else + ahlog(AH, 1, "creating %s \"%s\"\n", te->desc, te->tag); + _printTocEntry(AH, te, ropt, false, false); defnDumped = true; @@ -713,8 +724,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, _becomeOwner(AH, te); _selectOutputSchema(AH, te->namespace); - ahlog(AH, 1, "processing data for table \"%s\"\n", - te->tag); + /* Show namespace if available */ + if (te->namespace) + ahlog(AH, 1, "processing data for table \"%s\".\"%s\"\n", + te->namespace, te->tag); + else + ahlog(AH, 1, "processing data for table \"%s\"\n", + te->tag); /* * In parallel restore, if we created the table earlier in diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 5c0f95f..ea32b42 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -1383,6 +1383,8 @@ dumpTableData_copy(Archive *fout, void *dcontext) { TableDataInfo *tdinfo = (TableDataInfo *) dcontext; TableInfo *tbinfo = tdinfo->tdtable; + const char *namespace = tbinfo->dobj.namespace ? + tbinfo->dobj.namespace->dobj.name : NULL; const char *classname = tbinfo->dobj.name; const bool hasoids = tbinfo->hasoids; const bool oids = tdinfo->oids; @@ -1400,7 +1402,16 @@ dumpTableData_copy(Archive *fout, void *dcontext) const char *column_list; if (g_verbose) - write_msg(NULL, "dumping contents of table %s\n", classname); + { + /* Print namespace information if available */ + if (namespace) + write_msg(NULL, "dumping contents of table \"%s\".\"%s\"\n", + namespace, + classname); + else + write_msg
[HACKERS] documentation update for doc/src/sgml/func.sgml
Hi, attached is a small patch which updates doc/src/sgml/func.sgml. The change explains that functions like round() and others might behave different depending on your operating system (because of rint(3)) and that this is according to an IEEE standard. It also points out that #.5 is not always rounded up, as expected from a mathematical point of view. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 13c71af..da30991 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -924,6 +924,25 @@ + +For functions like round(), log() and sqrt() which run against +either fixed-precision (NUMERIC) or floating-point numbers (e.g. REAL), +note that the results of these operations will differ according +to the input type due to rounding. This is most observable with +round(), which can end up rounding down as well as up for +any #.5 value. We use the +http://en.wikipedia.org/w/index.php?title=IEEE_floating_point&oldid=622007055#Rounding_rules";>IEEE's rules +for rounding floating-point numbers which can be machine-specific. +The bitwise operators work only on integral data types, whereas +the others are available for all numeric data types. + + + +The bitwise operators are also available for the bit string types +bit and bit varying, as +shown in . + + shows functions for generating random numbers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
Baker, Keith [OCDUS Non-J&J] wrote: > Please let me know if more discussion is required, or if it would be > reasonable for me (or someone else of your choosing) to work on the > coding effort (perhaps targeted for 9.5?) > If on the other hand it has been decided that a QNX port is not in the > cards, I would like to know (I hope that is not the case given the > progress made, but no point in wasting anyone's time). As I recall, other than the postmaster startup interlock, the other major missing item you mentioned is SA_RESTART. That could well turn out to be a showstopper, so I suggest you study that in more depth. Are there other major items missing? Did you have to use configure --disable-spinlocks for instance? What's your compiler, and what are the underlying hardware platforms you want to support? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On 08/20/2014 08:29 AM, Tom Lane wrote: > Josh Berkus writes: >> On 08/15/2014 04:19 PM, Tom Lane wrote: >>> Personally I'd prefer to go to the all-lengths approach, but a large >>> part of that comes from a subjective assessment that the hybrid approach >>> is too messy. Others might well disagree. > >> ... So, that extraction test is about 1% *slower* than the basic Tom Lane >> lengths-only patch, and still 80% slower than original JSONB. And it's >> the same size as the lengths-only version. > > Since it's looking like this might be the direction we want to go, I took > the time to flesh out my proof-of-concept patch. The attached version > takes care of cosmetic issues (like fixing the comments), and includes > code to avoid O(N^2) penalties in findJsonbValueFromContainer and > JsonbIteratorNext. I'm not sure whether those changes will help > noticeably on Josh's test case; for me, they seemed worth making, but > they do not bring the code back to full speed parity with the all-offsets > version. But as we've been discussing, it seems likely that those costs > would be swamped by compression and I/O considerations in most scenarios > with large documents; and of course for small documents it hardly matters. Table sizes and extraction times are unchanged from the prior patch based on my workload. We should be comparing all-lengths vs length-and-offset maybe using another workload as well ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
Alvaro, Thanks for your interest and questions. At this point I have created a proof-of-concept QNX 6.5 port which appears to work on the surface (passes regression tests), but needs to be deemed "production-quality". To work around lack of SA_RESTART, I added QNX-specific retry macros to port.h With these macros in place "make check" runs cleanly (fails in many place without them). +#if defined(__QNX__) +/* QNX does not support sigaction SA_RESTART. We must retry interrupted calls (EINTR) */ +/* Helper macros, used to build our retry macros */ +#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = (exp); while (_tmp_rc == (val) && errno == EINTR); _tmp_rc; }) +#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int) +#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *) +/* override calls known to return EINTR when interrupted */ +#define close(a) PG_RETRY_EINTR(close(a)) +#define fclose(a) PG_RETRY_EINTR(fclose(a)) +#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b)) +#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b)) +#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c)) +#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c)) +#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c)) +#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b)) +#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c)) +#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1) && errno == EINTR); _tmp_rc; }) +#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c)) +#define stat(a,b) PG_RETRY_EINTR(stat(a,b)) +#define unlink(a) PG_RETRY_EINTR(unlink(a)) ... (Macros for read and write are similar but slightly longer, so I omit them here)... +#endif /* __QNX__ */ Here is what I used for configure, I am open to suggestions: ./configure --without-readline --disable-thread-safety I am targeting QNX 6.5 on x86, using gcc 4.4.2. Also, I have an issue to work out for locale support, but expect I can solve that. Keith Baker > -Original Message- > From: Alvaro Herrera [mailto:alvhe...@2ndquadrant.com] > Sent: Wednesday, August 20, 2014 4:16 PM > To: Baker, Keith [OCDUS Non-J&J] > Cc: Robert Haas; Tom Lane; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL > > Baker, Keith [OCDUS Non-J&J] wrote: > > > Please let me know if more discussion is required, or if it would be > > reasonable for me (or someone else of your choosing) to work on the > > coding effort (perhaps targeted for 9.5?) If on the other hand it has > > been decided that a QNX port is not in the cards, I would like to know > > (I hope that is not the case given the progress made, but no point in > > wasting anyone's time). > > As I recall, other than the postmaster startup interlock, the other major > missing item you mentioned is SA_RESTART. That could well turn out to be a > showstopper, so I suggest you study that in more depth. > > Are there other major items missing? Did you have to use configure -- > disable-spinlocks for instance? > > What's your compiler, and what are the underlying hardware platforms you > want to support? > > -- > Álvaro Herrerahttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
What data are you using right now Josh? There's the github archive http://www.githubarchive.org/ Here's some sample data https://gist.github.com/igrigorik/2017462 -- Arthur Silva On Wed, Aug 20, 2014 at 6:09 PM, Josh Berkus wrote: > On 08/20/2014 08:29 AM, Tom Lane wrote: > > Josh Berkus writes: > >> On 08/15/2014 04:19 PM, Tom Lane wrote: > >>> Personally I'd prefer to go to the all-lengths approach, but a large > >>> part of that comes from a subjective assessment that the hybrid > approach > >>> is too messy. Others might well disagree. > > > >> ... So, that extraction test is about 1% *slower* than the basic Tom > Lane > >> lengths-only patch, and still 80% slower than original JSONB. And it's > >> the same size as the lengths-only version. > > > > Since it's looking like this might be the direction we want to go, I took > > the time to flesh out my proof-of-concept patch. The attached version > > takes care of cosmetic issues (like fixing the comments), and includes > > code to avoid O(N^2) penalties in findJsonbValueFromContainer and > > JsonbIteratorNext. I'm not sure whether those changes will help > > noticeably on Josh's test case; for me, they seemed worth making, but > > they do not bring the code back to full speed parity with the all-offsets > > version. But as we've been discussing, it seems likely that those costs > > would be swamped by compression and I/O considerations in most scenarios > > with large documents; and of course for small documents it hardly > matters. > > Table sizes and extraction times are unchanged from the prior patch > based on my workload. > > We should be comparing all-lengths vs length-and-offset maybe using > another workload as well ... > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com >
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote: > Andres Freund writes: > > On 2014-08-20 10:19:33 -0400, Tom Lane wrote: > >> Alternatively, you could use the process PID as part of the temp file > >> name; which is probably a good idea anyway. > > > I think that's actually worse, because nothing will clean up those > > unless you explicitly scan for all .$pid files, and somehow > > kill them. > > True. As long as the copy command is prepared to get rid of a > pre-existing target file, using a fixed .tmp extension should be fine. Well, then we are back to this comment by MauMau: > With that said, copying to a temporary file like .tmp and > renaming it to sounds worthwhile even as a basic copy utility. > I want to avoid copying to a temporary file with a fixed name like > _copy.tmp, because some advanced utility may want to run multiple > instances of pg_copy to copy several files into the same directory > simultaneously. However, I'm afraid multiple .tmp files might > continue to occupy disk space after canceling copy or power failure in > some use cases, where the copy of the same file won't be retried. > That's also the reason why I chose to not use a temporary file like > cp/copy. Do we want cases where the same directory is used multiple pg_copy processes? I can't imagine how that setup would make sense. I am thinking pg_copy should emit a warning message when it removes an old temp file. This might alert people that something odd is happening if they see the message often. The pid-extension idea would work as pg_copy can test all pid extension files to see if the pid is still active. However, that assumes that the pid is running on the local machine and not on another machines that has NFS-mounted this directory, so maybe this is a bad idea, but again, we are back to the idea that only one process should be writing into this directory. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving
On 2014-08-20 18:58:05 -0400, Bruce Momjian wrote: > On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote: > > Andres Freund writes: > > > On 2014-08-20 10:19:33 -0400, Tom Lane wrote: > > >> Alternatively, you could use the process PID as part of the temp file > > >> name; which is probably a good idea anyway. > > > > > I think that's actually worse, because nothing will clean up those > > > unless you explicitly scan for all .$pid files, and somehow > > > kill them. > > > > True. As long as the copy command is prepared to get rid of a > > pre-existing target file, using a fixed .tmp extension should be fine. > > Well, then we are back to this comment by MauMau: > > With that said, copying to a temporary file like .tmp and > > renaming it to sounds worthwhile even as a basic copy utility. > > I want to avoid copying to a temporary file with a fixed name like > > _copy.tmp, because some advanced utility may want to run multiple > > instances of pg_copy to copy several files into the same directory > > simultaneously. However, I'm afraid multiple .tmp files might > > continue to occupy disk space after canceling copy or power failure in > > some use cases, where the copy of the same file won't be retried. > > That's also the reason why I chose to not use a temporary file like > > cp/copy. > > Do we want cases where the same directory is used multiple pg_copy > processes? I can't imagine how that setup would make sense. I don't think anybody is proposing the _copy.tmp proposal. We've just argued about the risk of .tmp. And I argued - and others seem to agree - the space usage problem isn't really relevant because archive commands and such are rerun after failure and can then clean up the temp file again. > I am thinking pg_copy should emit a warning message when it removes an > old temp file. This might alert people that something odd is happening > if they see the message often. Don't really see a point in this. If the archive command or such failed, that will already have been logged. I'd expect this to be implemented by passing O_CREAT | O_TRUNC to open(), nothing else. > The pid-extension idea would work as pg_copy can test all pid extension > files to see if the pid is still active. However, that assumes that the > pid is running on the local machine and not on another machines that has > NFS-mounted this directory, so maybe this is a bad idea, but again, we > are back to the idea that only one process should be writing into this > directory. I don't actually think we should assume that. There very well could be one process running an archive command, using differently prefixed file names or such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
Alvaro Herrera wrote: > So here's v16, rebased on top of 9bac66020. As far as I am concerned, > this is the last version before I start renaming everything to BRIN and > then commit. FWIW in case you or others have interest, here's the diff between your patch and v16. Also, for illustrative purposes, the diff between versions yours and mine of the code that got moved to mmpageops.c because it's difficult to see it from the partial patch. (There's nothing to do with that partial diff other than read it directly.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services *** a/contrib/pageinspect/mmfuncs.c --- b/contrib/pageinspect/mmfuncs.c *** *** 29,35 PG_FUNCTION_INFO_V1(minmax_page_type); PG_FUNCTION_INFO_V1(minmax_page_items); PG_FUNCTION_INFO_V1(minmax_metapage_info); - PG_FUNCTION_INFO_V1(minmax_revmap_array_data); PG_FUNCTION_INFO_V1(minmax_revmap_data); typedef struct mm_column_state --- 29,34 *** *** 388,394 minmax_revmap_data(PG_FUNCTION_ARGS) values[0] = Int64GetDatum((uint64) 0); /* Extract (possibly empty) list of TIDs in this page. */ ! for (i = 0; i < REGULAR_REVMAP_PAGE_MAXITEMS; i++) { ItemPointer tid; --- 387,393 values[0] = Int64GetDatum((uint64) 0); /* Extract (possibly empty) list of TIDs in this page. */ ! for (i = 0; i < REVMAP_PAGE_MAXITEMS; i++) { ItemPointer tid; *** a/src/backend/access/minmax/Makefile --- b/src/backend/access/minmax/Makefile *** *** 12,17 subdir = src/backend/access/minmax top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = minmax.o mmrevmap.o mmtuple.o mmxlog.o mmsortable.o include $(top_srcdir)/src/backend/common.mk --- 12,17 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = minmax.o mmpageops.o mmrevmap.o mmtuple.o mmxlog.o mmsortable.o include $(top_srcdir)/src/backend/common.mk *** a/src/backend/access/minmax/minmax.c --- b/src/backend/access/minmax/minmax.c *** *** 15,45 */ #include "postgres.h" - #include "access/htup_details.h" #include "access/minmax.h" #include "access/minmax_internal.h" #include "access/minmax_page.h" ! #include "access/minmax_revmap.h" ! #include "access/minmax_tuple.h" #include "access/minmax_xlog.h" #include "access/reloptions.h" #include "access/relscan.h" - #include "access/xlogutils.h" #include "catalog/index.h" - #include "catalog/pg_operator.h" - #include "commands/vacuum.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" #include "storage/freespace.h" - #include "storage/indexfsm.h" - #include "storage/lmgr.h" - #include "storage/smgr.h" - #include "utils/datum.h" - #include "utils/lsyscache.h" - #include "utils/memutils.h" #include "utils/rel.h" - #include "utils/syscache.h" /* --- 15,33 */ #include "postgres.h" #include "access/minmax.h" #include "access/minmax_internal.h" #include "access/minmax_page.h" ! #include "access/minmax_pageops.h" #include "access/minmax_xlog.h" #include "access/reloptions.h" #include "access/relscan.h" #include "catalog/index.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" #include "storage/freespace.h" #include "utils/rel.h" /* *** *** 75,93 static MMBuildState *initialize_mm_buildstate(Relation idxRel, static bool terminate_mm_buildstate(MMBuildState *state); static void summarize_range(MMBuildState *mmstate, Relation heapRel, BlockNumber heapBlk); - static bool mm_doupdate(Relation idxrel, BlockNumber pagesPerRange, - mmRevmapAccess *rmAccess, BlockNumber heapBlk, - Buffer oldbuf, OffsetNumber oldoff, - const MMTuple *origtup, Size origsz, - const MMTuple *newtup, Size newsz, - bool samepage, bool *extended); - static void mm_doinsert(Relation idxrel, BlockNumber pagesPerRange, - mmRevmapAccess *rmAccess, Buffer *buffer, BlockNumber heapblkno, - MMTuple *tup, Size itemsz, bool *extended); - static Buffer mm_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz, - bool *extended); static void form_and_insert_tuple(MMBuildState *mmstate); - static Size mm_page_get_freespace(Page page); /* --- 63,69 *** *** 123,128 mminsert(PG_FUNCTION_ARGS) --- 99,105 rmAccess = mmRevmapAccessInit(idxRel, &pagesPerRange); restart: + CHECK_FOR_INTERRUPTS(); heapBlk = ItemPointerGetBlockNumber(heaptid); /* normalize the block number to be the first block in the range */ heapBlk = (heapBlk / pagesPerRange) * pagesPerRange; *** *** 155,161 restart: addValue = index_getprocinfo(idxRel, keyno + 1, MINMAX_PROCNUM_ADDVALUE); - result = FunctionCall5Coll(addValue, idxRel->rd_indcollation[keyno], PointerGetDatum(mmdesc), --- 132,137
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Thu, Aug 21, 2014 at 5:11 AM, Fabrízio de Royes Mello wrote: > I just change "if (tbinfo->dobj.namespace != NULL)" to "if > (tbinfo->dobj.namespace)". Fine for me. I am marking this patch as ready for committer. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup
On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote: > But more to the point, I thought the consensus was to use the > highest LSN of all the blocks in the file, no? That's essentially > free to calculate (if you have to read all the data anyway), and > isn't vulnerable to collisions. The highest-LSN approach allows you to read only the tail part of each 8k block. Assuming 512-byte storage sector sizes, you only have to read 1/8 of the file. Now, the problem is that you lose kernel prefetch, but maybe posix_fadvise() would fix that problem. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
Hi, On 2014-08-20 21:21:41 +, Baker, Keith [OCDUS Non-J&J] wrote: > To work around lack of SA_RESTART, I added QNX-specific retry macros to port.h > With these macros in place "make check" runs cleanly (fails in many place > without them). > > +#if defined(__QNX__) > +/* QNX does not support sigaction SA_RESTART. We must retry interrupted > calls (EINTR) */ > +/* Helper macros, used to build our retry macros */ > +#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = > (exp); while (_tmp_rc == (val) && errno == EINTR); _tmp_rc; }) > +#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int) > +#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *) > +/* override calls known to return EINTR when interrupted */ > +#define close(a) PG_RETRY_EINTR(close(a)) > +#define fclose(a) PG_RETRY_EINTR(fclose(a)) > +#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b)) > +#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b)) > +#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c)) > +#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c)) > +#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c)) > +#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b)) > +#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c)) > +#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = > open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1) && errno == EINTR); _tmp_rc; > }) > +#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c)) > +#define stat(a,b) PG_RETRY_EINTR(stat(a,b)) > +#define unlink(a) PG_RETRY_EINTR(unlink(a)) > ... (Macros for read and write are similar but slightly longer, so I omit > them here)... > +#endif /* __QNX__ */ I think this is a horrible way to go and unlikely to succeed. You're surely going to miss calls and it's going to need to be maintained continuously. We'll miss adding things which will then only break under load. Which most poeple won't be able to generate under qnx. The only reasonably way to fake kernel SA_RESTART support is doing so is in $platform's libc. In the syscall wrapper. > Here is what I used for configure, I am open to suggestions: > ./configure --without-readline --disable-thread-safety Why is the --disable-thread-safety needed? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup
On Wed, Aug 20, 2014 at 8:24 PM, Bruce Momjian wrote: > On Mon, Aug 18, 2014 at 04:05:07PM +0300, Heikki Linnakangas wrote: >> But more to the point, I thought the consensus was to use the >> highest LSN of all the blocks in the file, no? That's essentially >> free to calculate (if you have to read all the data anyway), and >> isn't vulnerable to collisions. > > The highest-LSN approach allows you to read only the tail part of each > 8k block. Assuming 512-byte storage sector sizes, you only have to read > 1/8 of the file. > > Now, the problem is that you lose kernel prefetch, but maybe > posix_fadvise() would fix that problem. Sequential read of 512-byte blocks or 8k blocks takes the same amount of time in rotating media (if they're scheduled right). Maybe not in SSD media. Not only, the kernel will read in 4k blocks, instead of 8k (at least in linux). So, the benefit is dubious. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
On 2014-07-25 18:29:53 -0400, Tom Lane wrote: > > * QNX lacks sigaction SA_RESTART: I modified "src/include/port.h" > > to define macros to retry system calls upon EINTR (open,read,write,...) > > when compiled on QNX > > That's pretty scary too. For one thing, such macros would affect every > call site whether it's running with SA_RESTART or not. Do you really > need it? It looks to me like we just turn off HAVE_POSIX_SIGNALS if > you don't have SA_RESTART. Maybe that code has bit-rotted by now, but > it did work at one time. I have pretty much no trust that we're maintaining !HAVE_POSIX_SIGNAL. And none that we have that capability of doing so. I seriously doubt there's any !HAVE_POSIX_SIGNAL animals and 873ab97219caabeb2f7b390268a4fe01e2b7518c makes it pretty darn unlikely that we have much chance of finding such mistakes during development. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Wed, Aug 20, 2014 at 8:24 PM, Michael Paquier wrote: > > On Thu, Aug 21, 2014 at 5:11 AM, Fabrízio de Royes Mello > wrote: > > I just change "if (tbinfo->dobj.namespace != NULL)" to "if > > (tbinfo->dobj.namespace)". > Fine for me. I am marking this patch as ready for committer. > Thanks! -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello
Re: [HACKERS] PQgetssl() and alternative SSL implementations
On Tue, Aug 19, 2014 at 03:26:56PM -0400, Stephen Frost wrote: > I think there's been some improvement since I last had to go through the > pain of setting this all up, and some of it is undoubtably OpenSSL's > fault, but there's definitely quite a bit more we could be doing to make > SSL support easier. I'm hopeful that I'll be able to spend more time on > this in the future but it's not a priority currently. I know I updated the docs on this in 2013: commit fa4add50c4ea97c48881fa8cb3863df80141643c Author: Bruce Momjian Date: Fri Dec 6 09:42:08 2013 -0500 docs: clarify SSL certificate authority chain docs Previously, the requirements of how intermediate certificates were handled and their chain to root certificates was unclear. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQgetssl() and alternative SSL implementations
On Tue, Aug 19, 2014 at 03:47:17PM -0400, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: > > BTW, if we're beating on libpq, I wonder if we shouldn't consider > > bumping the soversion at some point. I mean, I know that we > > technically don't need to do that if we're only *adding* functions and > > not changing any of the existing stuff in backward-incompatible ways, > > but we might *want* to make some backward-incompatible changes at some > > point, and I think there's a decent argument that any patch in this > > are is already doing that at least to PQgetSSL(). Maybe this would be > > a good time to think if there's anything else we want to do that > > would, either by itself or in combination, justify a bump. > > I'm not a big fan of doing it for this specific item, though it's > technically an API breakage (which means we should actually have > libpq2-dev packages, make everything that build-deps on libpq-dev > update to build-dep on libpq2-dev, have libpq6, etc..). If there are > other backwards-incompatible things we wish to do, then I agree that > it'd be good to do them all at the same time (perhaps in conjunction > with 10.0...). This is the part where I wish we had been keeping an > updated list of things we want to change (like on the wiki..). We have, called "Wire Protocol Changes": https://wiki.postgresql.org/wiki/Todo Unfortunately, the subsection link doesn't work on Firefox. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add line number as prompt option to psql
On Wed, Aug 20, 2014 at 9:00 PM, Jeevan Chalke wrote: > Hi, > > I have reviewed this: > > I have initialize cur_lineno to UINTMAX - 2. And then observed following > behaviour to check wrap-around. > > postgres=# \set PROMPT1 '%/[%l]%R%# ' > postgres[18446744073709551613]=# \set PROMPT2 '%/[%l]%R%# ' > postgres[18446744073709551613]=# select > postgres[18446744073709551614]-# a > postgres[18446744073709551615]-# , > postgres[0]-# b > postgres[1]-# from > postgres[2]-# dual; > > It is wrapping to 0, where as line number always start with 1. Any issues? > > I would like to ignore this as UINTMAX lines are too much for a input > buffer to hold. It is almost NIL chances to hit this. > > > However, I think you need to use UINT64_FORMAT while printing uint64 > number. Currently it is %u which wrap-around at UINT_MAX. > See how pset.lineno is displayed. > > Apart from this, I didn't see any issues in my testing. > > Patch applies cleanly. make/make install/initdb/make check all are well. > Thank you for reviewing the patch! Attached patch is latest version patch. I modified the output format of cur_lineno. Regards, --- Sawada Masahiko psql-line-number_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Aug 21, 2014 at 12:04 AM, Robert Haas wrote: > > On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila wrote: > > On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas wrote: > >> > >> On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila > >> wrote: > >> > > >> > a. How about describing w.r.t asynchronous connections > >> > instead of parallel connections? > >> > >> I don't think "asynchronous" is a good choice of word. > > > > Agreed. > > > >>Maybe "simultaneous"? > > > > Not sure. How about *concurrent* or *multiple*? > > multiple isn't right, but we could say concurrent. I also find concurrent more appropriate. Dilip, could you please change it to concurrent in doc updates, variables, functions unless you see any objection for the same. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Compute attr_needed for child relations (was Re: [HACKERS] inherit support for foreign tables)
On Wed, Aug 20, 2014 at 3:25 PM, Etsuro Fujita wrote: > Hi Ashutish, > > > (2014/08/14 22:30), Ashutosh Bapat wrote: > >> On Thu, Aug 14, 2014 at 10:05 AM, Etsuro Fujita >> mailto:fujita.ets...@lab.ntt.co.jp>> wrote: >> >> (2014/08/08 18:51), Etsuro Fujita wrote: >> >>> (2014/06/30 22:48), Tom Lane wrote: >> I wonder whether it isn't time to change that. It was coded >> like that >> originally only because calculating the values would've been a >> waste of >> cycles at the time. But this is at least the third place >> where it'd be >> useful to have attr_needed for child rels. >> >> > I've revised the patch. >> >> There was a problem with the previous patch, which will be described >> below. Attached is the updated version of the patch addressing that. >> > > Here are some more comments >> > > Thank you for the further review! > > > attr_needed also has the attributes used in the restriction clauses as >> seen in distribute_qual_to_rels(), so, it looks unnecessary to call >> pull_varattnos() on the clauses in baserestrictinfo in functions >> check_selective_binary_conversion(), remove_unused_subquery_outputs(), >> check_index_only(). >> > > IIUC, I think it's *necessary* to do that in those functions since the > attributes used in the restriction clauses in baserestrictinfo are not > added to attr_needed due the following code in distribute_qual_to_rels. > > That's right. Thanks for pointing that out. > /* > * If it's a join clause (either naturally, or because delayed by > * outer-join rules), add vars used in the clause to targetlists of > their > * relations, so that they will be emitted by the plan nodes that scan > * those relations (else they won't be available at the join node!). > * > * Note: if the clause gets absorbed into an EquivalenceClass then this > * may be unnecessary, but for now we have to do it to cover the case > * where the EC becomes ec_broken and we end up reinserting the > original > * clauses into the plan. > */ > if (bms_membership(relids) == BMS_MULTIPLE) > { > List *vars = pull_var_clause(clause, >PVC_RECURSE_AGGREGATES, >PVC_INCLUDE_PLACEHOLDERS); > > add_vars_to_targetlist(root, vars, relids, false); > list_free(vars); > > } > > Although in case of RTE_RELATION, the 0th entry in attr_needed >> corresponds to FirstLowInvalidHeapAttributeNumber + 1, it's always safer >> to use it is RelOptInfo::min_attr, in case get_relation_info() wants to >> change assumption or somewhere down the line some other part of code >> wants to change attr_needed information. It may be unlikely, that it >> would change, but it will be better to stick to comments in RelOptInfo >> 443 AttrNumber min_attr; /* smallest attrno of rel (often >> <0) */ >> 444 AttrNumber max_attr; /* largest attrno of rel */ >> 445 Relids *attr_needed;/* array indexed [min_attr .. >> max_attr] */ >> > > Good point! Attached is the revised version of the patch. > > If the patch is not in the commit-fest, can you please add it there? From my side, the review is done, it should be marked "ready for committer", unless somebody else wants to review. > > Thanks, > > Best regards, > Etsuro Fujita > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Support for N synchronous standby servers
On Fri, Aug 15, 2014 at 9:28 PM, Fujii Masao wrote: > You added check_synchronous_standby_num() as the GUC check function for > synchronous_standby_num, and checked that there. But that seems to be wrong. > You can easily see the following error messages even if synchronous_standby_num > is smaller than max_wal_senders. The point is that synchronous_standby_num > should be located before max_wal_senders in postgresql.conf. > > LOG: invalid value for parameter "synchronous_standby_num": 0 > DETAIL: synchronous_standby_num cannot be higher than max_wal_senders. I am not sure what I can do here, so I am removing this check in the code, and simply add a note in the docs that a value of _num higher than max_wal_senders does not have much meaning. > I still think that it's strange that replication can be async even when > s_s_num is larger than zero. That is, I think that the transaction must > wait for s_s_num sync standbys whether s_s_names is empty or not. > OTOH, if s_s_num is zero, replication must be async whether s_s_names > is empty or not. At least for me, it's intuitive to use s_s_num primarily > to control the sync mode. Of course, other hackers may have different > thoughts, so we need to keep our ear open for them. Sure, the compromise looks to be what you propose, and I am fine with that. > In the above design, one problem is that the number of parameters > that those who want to set up only one sync replication need to change is > incremented by one. That is, they need to change s_s_num additionally. > If we are really concerned about this, we can treat a value of -1 in > s_s_num as the special value, which allows us to control sync replication > only by s_s_names as we do now. That is, if s_s_names is empty, > replication would be async. Otherwise, only one standby with > high-priority in s_s_names becomes sync one. Probably the default of > s_s_num should be -1. Thought? Taking into account those comments, attached is a patch doing the following things depending on the values of _num and _names: - If _num = -1 and _names is empty, all the standbys are considered as async (same behavior as 9.1~, and default). - If _num = -1 and _names has at least one item, wait for one standby, even if it is not connected at the time of commit. If one node is found as sync, other standbys listed in _names with higher priority than the sync one are in potential state (same as existing behavior). - If _num = 0, all the standbys are async, whatever the values in _names. Priority is enforced to 0 for all the standbys. SyncStandbysDefined is set to false in this case. - If _num > 0, must wait for _num standbys whatever the values in _names The default value of _num is -1. Documentation has been updated in consequence. > The source code comments at the top of syncrep.c need to be udpated. > It's worth checking whether there are other comments to be updated. Done. I have updated some comments in other places than the header. Regards, -- Michael *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 2586,2597 include_dir 'conf.d' Specifies a comma-separated list of standby names that can support synchronous replication, as described in . ! At any one time there will be at most one active synchronous standby; ! transactions waiting for commit will be allowed to proceed after ! this standby server confirms receipt of their data. ! The synchronous standby will be the first standby named in this list ! that is both currently connected and streaming data in real-time ! (as shown by a state of streaming in the pg_stat_replication view). Other standby servers appearing later in this list represent potential --- 2586,2598 Specifies a comma-separated list of standby names that can support synchronous replication, as described in . ! At any one time there will be at a number of active synchronous standbys ! defined by , transactions ! waiting for commit will be allowed to proceed after those standby ! servers confirm receipt of their data. The synchronous standbys will be ! the first entries named in this list that are both currently connected ! and streaming data in real-time (as shown by a state of ! streaming in the pg_stat_replication view). Other standby servers appearing later in this list represent potential *** *** 2627,2632 include_dir 'conf.d' --- 2628,2685 + + synchronous_standby_num (integer) + +synchronous_standby_num configuration parameter + + + + + Specifies the number of standbys that support + synchronous replication. + + + Default value is -1. In this case, if + is empty all the + standby node
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
> When replication slot is not specified in pg_receivexlog, the flush > location in the feedback message always indicates invalid. So there seems > to be no need to send the feedback as soon as fsync is issued, in that > case. > How should this option work when replication slot is not specified? Thanks for the review! The present is not checking the existence of specification of -S. The use case when replication slot is not specified. Because it does fsync, it isn't an original intention. remote_write is set in synchronous_commit. To call attention to the user, append following documents. "If you want to report the flush position to the server, should use -S option." Regards, -- Furuya Osamu pg_receivexlog-fsync-feedback-v4.patch Description: pg_receivexlog-fsync-feedback-v4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Implementing parallel sequence doubts
Hi Hackers, Implementation of "Parallel Sequence Scan" Approach: 1."Parallel Sequence Scan" can achieved by using the background workers doing the job of actual sequence scan including the qualification check also. 2. Planner generates the parallel scan plan by checking the possible criteria of the table to do a parallel scan and generates the tasks (range of blocks). 3. In the executor Init phase, Try to copy the necessary data required by the workers and start the workers. 4. In the executor run phase, just get the tuples which are sent by the workers and process them further in the plan node execution. some of the problems i am thinking: 1. Data structures that are required to be copied from backend to worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate and etc. I see some problems in copying "Estate" data structure into the shared memory because it contains so many pointers. There is a need of some infrastructure to copy these data structures into the shared memory. If the backend takes care of locking the relation and there are no sub plans involved in the qual and targetlist, is the estate is really required in the worker to achieve the parallel scan? Is it possible to Initialize the qual, targetlist and projinfo with the local "Estate" structure created in the worker with the help of "plan" structure received from the backend? Or In case if "estate" is required in the worker for the processing, is there any better way possible to share backend data structures with the workers? Any suggestions? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On 08/13/2014 09:43 PM, Atri Sharma wrote: Sorry, forgot to attach the patch for fixing cube in contrib, which breaks since we now reserve "cube" keyword. Please find attached the same. Ugh, that will make everyone using the cube extension unhappy. After this patch, they will have to quote contrib's cube type and functions every time. I think we should bite the bullet and rename the extension, and its "cube" type and functions. For an application, having to suddenly quote it has the same effect as renaming it; you'll have to find all the callers and change them. And in the long-run, it's clearly better to have an unambiguous name. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Sequence Scan doubts
Corrected subject. Hi Hackers, Implementation of "Parallel Sequence Scan" Approach: 1."Parallel Sequence Scan" can achieved by using the background workers doing the job of actual sequence scan including the qualification check also. 2. Planner generates the parallel scan plan by checking the possible criteria of the table to do a parallel scan and generates the tasks (range of blocks). 3. In the executor Init phase, Try to copy the necessary data required by the workers and start the workers. 4. In the executor run phase, just get the tuples which are sent by the workers and process them further in the plan node execution. some of the problems i am thinking: 1. Data structures that are required to be copied from backend to worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate and etc. I see some problems in copying "Estate" data structure into the shared memory because it contains so many pointers. There is a need of some infrastructure to copy these data structures into the shared memory. If the backend takes care of locking the relation and there are no sub plans involved in the qual and targetlist, is the estate is really required in the worker to achieve the parallel scan? Is it possible to Initialize the qual, targetlist and projinfo with the local "Estate" structure created in the worker with the help of "plan" structure received from the backend? Or In case if "estate" is required in the worker for the processing, is there any better way possible to share backend data structures with the workers? Any suggestions? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 21 August 2014 08:31, Amit Kapila Wrote, > >> > > >Not sure. How about *concurrent* or *multiple*? > > > >multiple isn't right, but we could say concurrent. >I also find concurrent more appropriate. >Dilip, could you please change it to concurrent in doc updates, >variables, functions unless you see any objection for the same. Ok, I will take care along with other comments fix.. Regards, Dilip Kumar