Re: [HACKERS] [PATCH] Support for foreign keys with arrays
Hi Noah, thanks for your unvaluable review, rich of useful and thorough comments and notes. Marco and myself will add your proposed tests as soon as possible (most likely after the Italian PGDay which is this week). However, given the feedback received from other developers too (including Tom), I would first concentrate on defining the syntax and how referential integrity actions should work. Il 17/11/11 05:28, Noah Misch ha scritto: Removing values from the array seems best to me. There's no doubt about what ON UPDATE CASCADE should do, and having ON DELETE CASCADE excise individual array elements is consistent with that. It's less clear for SET NULL, but I'd continue with a per-element treatment. I'd continue to forbid SET DEFAULT. However, Jeff Davis did expect ON DELETE CASCADE to remove entire rows: http://archives.postgresql.org/message-id/1288119207.15279.24.camel@jdavis-ux.asterdata.local So, perhaps the behavior needs to be user-selectable. I would agree with what Tom is saying here, given that SQL specs do not say anything about this feature. We could leave standard REFERENCES keyword handling the array value as it is now. If a user wants to take advantage of in-array referential integrity, we could implement the special keyword ARRAY REFERENCES as Tom proposes (or a similar keyword). Consequently, we need to agree on what the actions on delete and update operations are. In case of ARRAY REFERENCES, I would be inclined to leave the same meaning of ROW scope actions to CASCADE and SET NULL actions, while disallowing the SET DEFAULT action (as Noah suggests too). At the same time, I would add two actions for ARRAY REFERENCES which will be processing array elements: * ARRAY CASCADE * ARRAY SET NULL (Of course if you are welcome to propose a better naming convention). This table summarises the scope of the actions. --- - - | ON| ON| Action | DELETE | UPDATE | --- - - CASCADE| Row | Element | SET NULL | Row | Row | ARRAY CASCADE | Element | Element | ARRAY SET NULL | Element | Element | SET DEFAULT| Error | Error | NO ACTION |-|-| RESTRICT |-|-| --- - - For instance, with an ARRAY REFERENCES ... ON DELETE CASCADE, I would expect that the whole row is deleted (as Jeff et al. say). However, if I specify ARRAY REFERENCES ... ON DELETE ARRAY CASCADE, I would expect that elements in the referencing array are removed. Similary the ARRAY REFERENCES ... ON DELETE SET NULL will set the row to NULL, whereas ARRAY REFERENCES ... ON DELETE ARRAY SET NULL will set individual elements in the referencing array to NULL. In case of updates, SET NULL and ARRAY SET NULL works the same (updating the whole row or the single elements). CASCADE and ARRAY CASCADE are synonyms, as they would work in individual elements (which is the action that makes more sense anyway). I believe that, before we proceed with one implementation or another, it is important we discuss this sort of things and agree on a possible long-term path (so that we can organise intermediate deliverables). Thanks, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: better support for debugging of overloaded functions
2011/11/18 Robert Haas robertmh...@gmail.com: On Fri, Nov 18, 2011 at 6:24 AM, Pavel Stehule pavel.steh...@gmail.com wrote: CONTEXT: PL/pgSQL function assign_rslts line 50 at assignment (oid: 65903) \sf+ 65903 I'm pretty unenthused by the idea of making OIDs more user-visible than they already are. If the message is ambiguous, we should include argument types and (if not the object that would be visible under the current search_path) a schema qualification. Spitting out a five (or six or seven or eight) digit number doesn't seem like a usability improvement. Is possible to add GUC variable plpgsql.log_function_signature (maybe just log_function_signature (for all PL))? I am not sure about GUC name. When this variable is true, then CONTEXT line will contain a qualified function's signature instead function name I don't would to check if function name is ambiguous or not after exception is raised. There is a problem with access to system tables and then exception handling can be slower. Using a qualified name is necessary, because psql meta statements are not smart - they are based on search_path and fact, so name is not ambiguous doesn't help there. Regards Pavel Stehule p.s. Other issue is missing CONTEXT line for RAISE EXCEPTION -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
Hi Gabriele and Marco, On Sun, Nov 20, 2011 at 10:36:15AM +0100, Gabriele Bartolini wrote: --- - - | ON| ON| Action | DELETE | UPDATE | --- - - CASCADE| Row | Element | SET NULL | Row | Row | ARRAY CASCADE | Element | Element | ARRAY SET NULL | Element | Element | SET DEFAULT| Error | Error | NO ACTION |-|-| RESTRICT |-|-| --- - - thank you for this very clear and concise summary! I agree with your appeal for a broad discussion on the proposed syntax, and I will use the same language to express my proposal (for clarity and to simplify the discussion): -- - --- | ON| ON | Action| DELETE | UPDATE| -- - --- CASCADE | Element | Element | SET NULL | Element | Element | SET DEFAULT | Error | Error| ARRAY CASCADE | Row | Element = Row | ARRAY SET NULL| Row | Row | ARRAY SET DEFAULT | Row | Row | NO ACTION |-|- | RESTRICT |-|- | -- - --- I have swapped your syntax in the following way which looks cleaner to me: the ARRAY (CASCADE | SET NULL | SET DEFAULT) syntax denote operations that happen on the whole array, and CASCADE | SET NULL | SET DEFAULT denote instead operations that happen on the elements of the array. Associating the Element behaviour with the ON DELETE CASCADE syntax is also consistent with the case where the referencing table A is constructed as GROUP BY of another table B, and the array reference on A is built by aggregating a non-array reference on B with ON DELETE CASCADE syntax. In other words, the same syntax (ON DELETE CASCADE) would denote the same behaviour in both the aggregated case ( = one row per object, using array references) and the non-aggregated case (multiple rows for one object, using equality references), which represent two distinct implementations of the same abstraction. The Row behaviour would instead be associated to a new syntax (ON DELETE ARRAY CASCADE), which cannot be obtained via the existing syntax in the non-aggregated implementation, on the grounds that it might be useful for some semantics (for instance: if you remove a vertex from a polygon, you can either destroy the polygon or transform it into a polygon with less vertices). The same principle of considering the two implementations as the same abstraction would also confirm your choice to raise an exception on ON (DELETE | UPDATE) SET DEFAULT. It would also suggest to enable ON (DELETE | UPDATE) ARRAY SET DEFAULT. The reasoning is that we can actually specify a default array in the referencing column, but we can't specify a default element. Before I briefly thought to use the referenced column default as a default for the single element, but it seems a bad idea: a default is an expression (possibly non-constant) which is evaluated only when a new row is created in the referenced table, and using it outside of that context looks inappropriate. Regarding ON UPDATE ARRAY CASCADE, I agree to make it a synonym, since updating the whole array to take into account the update on the referenced field is equivalent to updating the single element to take into account the same fact. Finally, ON UPDATE ARRAY SET NULL would still have an use case as a different behaviour than ON UPDATE SET NULL, which we make available to the database designer: instead of replacing the updated element in the array with a NULL, we replace the whole array with a NULL. This is essentially the same difference that we have between ON DELETE ARRAY CASCADE and ON DELETE CASCADE. Thanks, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
On Sun, Nov 20, 2011 at 10:36:15AM +0100, Gabriele Bartolini wrote: I would agree with what Tom is saying here, given that SQL specs do not say anything about this feature. We could leave standard REFERENCES keyword handling the array value as it is now. If a user wants to take advantage of in-array referential integrity, we could implement the special keyword ARRAY REFERENCES as Tom proposes (or a similar keyword). No objection to that. --- - - | ON| ON| Action | DELETE | UPDATE | --- - - CASCADE| Row | Element | SET NULL | Row | Row | ARRAY CASCADE | Element | Element | ARRAY SET NULL | Element | Element | SET DEFAULT| Error | Error | NO ACTION |-|-| RESTRICT |-|-| --- - - I like this. CASCADE and ARRAY CASCADE are synonyms, as they would work in individual elements (which is the action that makes more sense anyway). What about making ON UPDATE CASCADE an error? That way, we can say that ARRAY action always applies to array elements, and plain action always applies to entire rows. SET DEFAULT should now be fine to allow. It's ARRAY SET DEFAULT, in your new terminology, that wouldn't make sense. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql setenv command
On Wed, Nov 2, 2011 at 5:36 PM, Andrew Dunstan and...@dunslane.net wrote: Updated patch is attached - adding to Nov commitfest. Review of the v2 patch: * Submission Review Patch applies with some fuzz and builds without warnings. I noticed some tab characters being used in psql-ref.sgml where they shouldn't be. * Usability Review I sort of doubt I'll use the feature myself, but there was at least one +1 for the feature idea already, so it seems useful enough. That said, the stated motivation for the patch: First, it would be useful to be able to set pager options and possibly other settings, so my suggestion is for a \setenv command that could be put in a .psqlrc file, something like: seems like it would be more suited to being set in the user's ~/.profile (perhaps via an 'alias' if you didn't want the changes to apply globally). So unless I miss something, the real niche for the patch seems to be for people to interactively change environment settings while within psql. Again, not something I'm keen on for my own use, but if others think it's useful, that's good enough for me. * Coding Review / Nitpicks The patch implements \setenv via calls to unsetenv() and putenv(). As the comment notes, | That means we | leak a bit of memory here, but not enough to worry about. which seems true; the man page basically says there's nothing to be done about the situation. The calls to putenv() and unsetenv() are done without any real input checking. So this admittedly-pathological case produces surprising results: test=# \setenv foo=bar baz test=# \! echo $foo bar=baz Perhaps 'envvar' arguments with a '=' in the argument should just be disallowed. Also, should the malloc() of newval just use pg_malloc() instead? * Reviewer Review I haven't tested on Windows. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas robertmh...@gmail.com wrote: Hmm. There's obviously something that's different in your environment or configuration from what I tested, but I don't know what it is. The fact that your scale factor is larger than shared_buffers might matter; or Intel vs. AMD. Or maybe you're running with synchronous_commit=on? Yes, I had synchronous_commit = on for these runs. Here are the settings: cat $PGDATA/postgresql.conf EOM; max_connections = 200 max_pred_locks_per_transaction = 256 shared_buffers = 10GB maintenance_work_mem = 1GB checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 wal_writer_delay = 20ms seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.05 effective_cache_size = 40GB default_transaction_isolation = '$iso' EOM Is there any chance that having pg_xlog on a separate RAID 10 set of drives with it's own BBU controller would explain anything? I mean, I always knew that was a good idea for a big, heavily-loaded box, but I remember being surprised at how *big* a difference that made when a box accidentally went into production without moving the pg_xlog directory there. There is one other things which might matter, I didn't use the -n pgbench option, and on the sample you showed, you were using it. Here is the median of five from the latest runs. On these read/write tests there was very little spread within each set of five samples, with no extreme outliers like I had on the SELECT-only tests. In the first position s means simple protocol and p means prepared protocol. In the second position m means master, f means with the flexlock patch. sm1 tps = 1092.269228 (including connections establishing) sf1 tps = 1090.511552 (including connections establishing) sm2 tps = 2171.867100 (including connections establishing) sf2 tps = 2158.609189 (including connections establishing) sm4 tps = 4278.541453 (including connections establishing) sf4 tps = 4269.921594 (including connections establishing) sm8 tps = 8472.257182 (including connections establishing) sf8 tps = 8476.150588 (including connections establishing) sm16 tps = 15905.074160 (including connections establishing) sf16 tps = 15937.372689 (including connections establishing) sm32 tps = 22331.817413 (including connections establishing) sf32 tps = 22861.258757 (including connections establishing) sm64 tps = 26388.391614 (including connections establishing) sf64 tps = 26529.152361 (including connections establishing) sm80 tps = 25617.651194 (including connections establishing) sf80 tps = 26560.541237 (including connections establishing) sm96 tps = 24105.455175 (including connections establishing) sf96 tps = 26569.244384 (including connections establishing) sm128 tps = 21467.530210 (including connections establishing) sf128 tps = 25883.023093 (including connections establishing) pm1 tps = 1629.265970 (including connections establishing) pf1 tps = 1619.024905 (including connections establishing) pm2 tps = 3164.061963 (including connections establishing) pf2 tps = 3137.469377 (including connections establishing) pm4 tps = 6114.787505 (including connections establishing) pf4 tps = 6061.750200 (including connections establishing) pm8 tps = 11884.534375 (including connections establishing) pf8 tps = 11870.670086 (including connections establishing) pm16 tps = 20575.737107 (including connections establishing) pf16 tps = 20437.648809 (including connections establishing) pm32 tps = 27664.381103 (including connections establishing) pf32 tps = 28046.846479 (including connections establishing) pm64 tps = 26764.294547 (including connections establishing) pf64 tps = 26631.589294 (including connections establishing) pm80 tps = 27716.198263 (including connections establishing) pf80 tps = 28393.642871 (including connections establishing) pm96 tps = 26616.076293 (including connections establishing) pf96 tps = 28055.921427 (including connections establishing) pm128 tps = 23282.912620 (including connections establishing) pf128 tps = 23072.766829 (including connections establishing) Note that on this 32 core box, performance on the read/write pgbench is peaking at 64 clients, but without a lot of variance between 32 and 96 clients. And with the patch, performance still hasn't fallen off too badly at 128 clients. This is good news in terms of not having to sweat connection pool sizing quite as much as earlier releases. Next I will get the profile for the SELECT-only runs. It seems to make sense to profile at the peak performance level, which was 64 clients. I can run one more set of tests tonight before I have to give it back to the guy who's putting it into production. It sounds like a set like the above except with synchronous_commit = off might be desirable? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. Currently when trying to process a large table in PL/Python you have slurp it all into memory (that's what plpy.execute does). J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Overview Feature Review --- This patch adds cursor support to plpython. SPI cursors will allow a plpython function to read process a large results set without having to read it all into memory at once. This is a good thing. Without this patch I think you could accomplish the same with using SQL DECLARE CURSOR and SQL fetch.This feature allows you to use a python cursor as an iterator resulting in much cleaner python code than the SQL FETCH approach. I think the feature is worth having Usability Review -- The patch adds the user methods cursor=plpy.cursor(query_or_plan) cursor.fetch(100) cursor.close() Do we like the name plpy.cursor or would we rather call it something like plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...) Since we will be mostly stuck with the API once we release 9.2 this is worth some opinions on. I like cursor() but if anyone disagrees now is the time. This patch does not provide a wrapper around SPI_cursor_move. The patch is useful without that and I don't see anything that preculdes someone else adding that later if they see a need. Documentation Review - The patch includes documentation updates that describes the new feature. The Database Access page doesn't provide a API style list of database access functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is clearer than the python one and we should think about a doing some documentaiton refactoring. That should be done as a seperate patch and shouldn't be a barrier to committing this one. Code Review in PLy_cursor_plan line 4080 + PG_TRY(); + { + Portalportal; + char *volatile nulls; + volatile int j; + + if (nargs 0) + nulls = palloc(nargs * sizeof(char)); + else + nulls = NULL; + + for (j = 0; j nargs; j++) + { + PyObject *elem; I am probably not seeing a code path or misunderstanding something about the setjmp/longjump usages but I don't see why nulls and j need to be volatile here. line 444 PLy_cursor(PyObject *self, PyObject *args) + { + char*query; + PyObject*plan; + PyObject *planargs = NULL; + + if (PyArg_ParseTuple(args, s, query)) + return PLy_cursor_query(query); + Should query be freed with PyMem_free() Testing --- I tested both python 2.6 and 3 on a Linux system create or replace function x() returns text as $$ cur=None try: with plpy.subtransaction(): cur=plpy.cursor('select generate_series(1,1000)') rows=cur.fetch(10); plpy.execute('select f()') except plpy.SPIError: rows=cur.fetch(10); return rows[0]['generate_series'] return 'none' $$ language plpythonu; select x(); crashes the backend test=# select x(); The connection to the server was lost. Attempting reset: LOG: server process (PID 3166) was terminated by signal 11: Segmentation fault The below test gives me a strange error message: create or replace function x1() returns text as $$ plan=None try: with plpy.subtransaction(): plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)') plan=plpy.prepare('select * FROM z') plpy.execute('select * FROM does_not_exist') except plpy.SPIError, e: cur=plpy.cursor(plan) rows=cur.fetch(10) return rows[0]['generate_series'] return '1' $$ language plpythonu; select x1(); test=# select x1() test-# ; ERROR: TypeError: Expected sequence of 82187072 arguments, got 0: NULL CONTEXT: Traceback (most recent call last): PL/Python function x1, line 9, in module cur=plpy.cursor(plan) PL/Python function x1 STATEMENT: select x1() I was expecting an error from the function just a bit more useful one. Performance --- I did not do any specific performance testing but I don't see this patch as having any impact to performance -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On 20/11/11 19:14, Steve Singer wrote: On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Hi Steve, thanks a lot for the review, I'll investigate the errors you were getting and post a follow-up. Good catch on trying cursors with explicit subtransactions, I didn't think about how they would interact. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing ProcArrayLock patches
Robert Haas robertmh...@gmail.com wrote: I was actually thinking it would be interesting to oprofile the read-only test; see if we can figure out where those slowdowns are coming from. CPU: Intel Core/i7, speed 2262 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 3124242 5.7137 postgress_lock 254 4.6737 postgresAllocSetAlloc 2403412 4.3954 postgresGetSnapshotData 1967132 3.5975 postgresSearchCatCache 1872176 3.4239 postgresbase_yyparse 1327256 2.4273 postgreshash_search_with_hash_value 1040131 1.9022 postgres_bt_compare 1038976 1.9001 postgresLWLockAcquire 8171221.4944 postgresMemoryContextAllocZeroAligned 7383211.3503 postgrescore_yylex 6226131.1386 postgresMemoryContextAlloc 5970541.0919 postgresPinBuffer 5561381.0171 postgresScanKeywordLookup 5523181.0101 postgresexpression_tree_walker 4942790.9039 postgresLWLockRelease 4886280.8936 postgreshash_any 4729060.8649 postgresnocachegetattr 3964820.7251 postgresgrouping_planner 3829740.7004 postgresLockAcquireExtended 3751860.6861 postgresAllocSetFree 3750720.6859 postgresProcArrayLockRelease 3736680.6834 postgresnew_list 3659170.6692 postgresfmgr_info_cxt_security 3013980.5512 postgresProcArrayLockAcquire 3006470.5498 postgresLockReleaseAll 2920730.5341 postgresDirectFunctionCall1Coll 2857450.5226 postgresMemoryContextAllocZero 2846840.5206 postgresFunctionCall2Coll 2827010.5170 postgresSearchSysCache max_connections = 100 max_pred_locks_per_transaction = 64 shared_buffers = 8GB maintenance_work_mem = 1GB checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 wal_writer_delay = 20ms seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.05 effective_cache_size = 40GB default_transaction_isolation = '$iso' pgbench -i -s 100 pgbench -S -M simple -T 300 -c 80 -j 80 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 80 number of threads: 80 duration: 300 s number of transactions actually processed: 104391011 tps = 347964.636256 (including connections establishing) tps = 347976.389034 (excluding connections establishing) vmstat 1 showed differently this time -- no clue why. procs --memory- ---swap-- -io r b swpdfree buffcache si sobibo ---system -cpu-- in cs us sy id wa st 91 0 8196 4189436 203925700 5231449200 0 0 32255 1522807 85 13 1 0 0 92 0 8196 4189404 203925700 5231449200 0 0 32796 1525463 85 14 1 0 0 67 0 8196 4189404 203925700 5231448800 0 0 32343 1527988 85 13 1 0 0 93 0 8196 4189404 203925700 5231448800 0 0 32701 1535827 85 13 1 0 0 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FOSDEM 2012 - PostgreSQL Devroom: Call for Speakers
Hi all, FOSDEM 2012 - PostgreSQL Devroom: Call for Speakers The PostgreSQL project will have a Devroom at FOSDEM 2012, which takes place on February 4-5 in Brussels, Belgium. The Devroom will mainly cover topics for PostgreSQL users, developers and contributors. For more information about the event itself, please see the website at http://www.fosdem.org/2012/ . We are now accepting proposals for talks. Please note that we only accept talks in English. Each session will last 45 minutes (including discussion), and may be on any topic related to PostgreSQL. Suggested topic areas include: * Developing applications for PostgreSQL * Administering large scale PostgreSQL installations * Case studies and/or success stories of PostgreSQL deployments * PostgreSQL tools and utilities * PostgreSQL hacking * Community user groups * Tuning the server * Migrating from other systems * Scaling/replication * Benchmarking hardware * PostgreSQL related products Of course, we're happy to receive proposals for talks on other PostgreSQL related topics as well. Please use our conference website to submit your proposal: https://www.postgresql.eu/events/callforpapers/fosdem2012/ The deadline for submissions is December 20th, 2011. The schedule will be published and speakers will be informed by the end of the year. Please also note my email about hotel reservation: http://archives.postgresql.org/pgeu-general/2011-11/msg0.php -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
Part of my problem with not producing specialisations that I really neglected to complain about until now is the inconsistency between types, and the need to deal with that. We benefit from assuming in our specialisation that we're only dealing with POD types, simply not considering things that we would otherwise have had to for the benefit of types that have a Datum representation that is pass-by-reference, or have collations, or have multiple scankeys. Leaving aside the question of straight compiler-optimisation benefits for the moment, the remaining benefit of what I've done comes not so much from avoiding the usual function call machinery per se, as from doing so *as well as* cutting down on what currently happens in comparetup_heap to handle every single compound and scalar type. Compare the function comparetup_heap with my meta-function TYPE##AppSort to see what I mean. The function comparetup_heap is the comparator directly used by qsort_arg when sorting heap tuples, and qsort_arg outsources to comparetup_heap some things that you might not expect it to (very little has changed about qsort_arg since we lifted it from NetBSD back in 2006). So while you might imagine that that loop in comparetup_heap and things like its use of the heap_getattr macros won't expand to that many instructions, you really don't want them in the inner loop of a long operation like qsorting, paid for up to O(n ^ 2) times. There's also the obvious implications for compiler optimisations, particularly relating to effective usage of CPU cache. I'm trying to get you what you asked for: A straight choice between what Tom suggested and what I suggested, with perhaps some compromises between the two positions. That's sort of tricky though, especially considering the issues raised above. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Singleton range constructors versus functional coercion notation
On Sat, 2011-11-19 at 15:57 -0500, Tom Lane wrote: I'm hesitant to remove them because the alternative is significantly more verbose: numrange(1.0, 1.0, '[]'); Right. The question is, does the case occur in practice often enough to justify a shorter notation? I'm not sure. Well, if there were a good shorter notation, then probably so. But it doesn't look like we have a good idea, so I'm fine with dropping it. One thing I've been thinking for a bit is that for discrete ranges, I find the '[)' canonical form to be surprising/confusing. If we were to fix range_adjacent along the lines suggested by Florian, would it be practical to go over to '[]' as the canonical form? One good thing about that approach is that canonicalization wouldn't have to involve generating values that might overflow. I think we had that discussion before, and Florian brought up some good points (both then and in a reply now). Also, Robert wasn't convinced that '[]' is necessarily better for discrete ranges: http://archives.postgresql.org/pgsql-hackers/2011-10/msg00598.php Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Collecting statistics on CSV file data
Hi Hanada-san, Thank you for your valuable comments. I will improve the items pointed out by you at the next version of the patch, including documentation on the purpose of AnalyzeForeignTable, how to write it, and so on. Here I comment only one point: - Why file_fdw skips sample tuples which have NULL value? AFAIS original acquire_sample_rows doesn't do so. To be precise, I've implemented to skip tuples that have null values in certain column(s) but that are not allowed to contain null values in that(those) column(s) by NOT NULL constrain. file_fdw's sample_row_acquirer considers those tuples as dead tuples. This is for the consistency with NOT NULL constrain. (But I don't know why fileIterateForeignScan routine allows such dead tuples. I may have missed something.) Best regards, Etsuro Fujita (2011/11/18 21:00), Shigeru Hanada wrote: (2011/11/18 16:25), Etsuro Fujita wrote: Thank you for your testing. I updated the patch according to your comments. Attached is the updated version of the patch. I'd like to share result of my review even though it's not fully finished. So far I looked from viewpoint of API design, code formatting, and documentation. I'll examine effectiveness of the patch and details of implementation next week, and hopefully try writing ANALYZE handler for pgsql_fdw :) New patch has correct format, and it could be applied to HEAD of master branch cleanly. All regression tests including those of contrib modules have passed. It contains changes of codes and regression tests related to the issue, and they have enough comments. IMO the document in this patch is not enough to show how to write analyze handler to FDW authors, but it can be enhanced afterward. With this patch, FDW author can provide optional ANALYZE handler which updates statistics of foreign tables. Planner would be able to generate better plan by using statistics. Yes. But in the updated version, I've refactored analyze.c a little bit to allow FDW authors to simply call do_analyze_rel(). snip The updated version enables FDW authors to just write their own acquire_sample_rows(). On the other hand, by retaining to hook AnalyzeForeignTable routine at analyze_rel(), higher level than acquire_sample_rows() as before, it allows FDW authors to write AnalyzeForeignTable routine for foreign tables on a remote server to ask the server for its current stats instead, as pointed out earlier by Tom Lane. IIUC, this patch offers three options to FDWs: a) set AnalyzeForeignTable to NULL to indicate lack of capability, b) provide AnalyzeForeignTable which calls do_analyze_rel with custom sample_row_acquirer, and c) create statistics data from scratch by FDW itself by doing similar things to do_analyze_rel with given argument or copying statistics from remote PostgreSQL server. ISTM that this design is well-balanced between simplicity and flexibility. Maybe these three options would suit web-based wrappers, file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that adding more details of FdwRoutine, such as purpose of new callback function and difference from required ones, would help FDW authors, including me :) I have some random comments: - I think separated typedef of sample_acquire_rows would make codes more readable. In addition, parameters of the function should be described explicitly. - I couldn't see the reason why file_fdw sets ctid of sample tuples, though I guess it's for Vitter's random sampling algorithm. If every FDW must set valid ctid to sample tuples, it should be mentioned in document of AnalyzeForeignTable. Exporting some functions from analyze.c relates this issue? - Why file_fdw skips sample tuples which have NULL value? AFAIS original acquire_sample_rows doesn't do so. - Some comment lines go past 80 columns. - Some headers included in file_fdw.c seems unnecessary. Regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Collecting statistics on CSV file data
(2011/11/19 0:54), Robert Haas wrote: 2011/11/18 Shigeru Hanadashigeru.han...@gmail.com: - I couldn't see the reason why file_fdw sets ctid of sample tuples, though I guess it's for Vitter's random sampling algorithm. If every FDW must set valid ctid to sample tuples, it should be mentioned in document of AnalyzeForeignTable. Exporting some functions from analyze.c relates this issue? If every FDW must set valid ctid to sample tuples, it should be fixed so that they don't have to, I would think. It's for neither Vitter's algorithm nor exporting functions from analyze.c. It's for foreign index scan on CSV file data that I plan to propose in the next CF. So, it is meaningless for now. I'm sorry. I will fix it at the next version of the patch so that they don't have to. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers