Re: [HACKERS] GIN improvements part2: fast scan
On Fri, Feb 7, 2014 at 5:33 PM, Heikki Linnakangas hlinnakan...@vmware.comwrote: On 02/06/2014 01:22 PM, Alexander Korotkov wrote: Difference is very small. For me, it looks ready for commit. Great, committed! Now, to review the catalog changes... I've rebased catalog changes with last master. Patch is attached. I've rerun my test suite with both last master ('committed') and attached patch ('ternary-consistent'). method | sum +-- committed | 143491.71501 fast-scan-11 | 126916.11199 fast-scan-light| 137321.211 fast-scan-light-heikki | 138168.02801 master | 446976.288 ternary-consistent | 125923.514 I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8 to 4. However I'm not sure why ternary-consistent show so good results. Probably it's because some optimizations you did in committed version which wasn't visible because of change of MAX_MAYBE_ENTRIES. I'm not sure about decision to reserve separate procedure number for ternary consistent. Probably, it would be better to add ginConfig method. It would be useful for post 9.4 improvements. -- With best regards, Alexander Korotkov. gin-ternary-consistent.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small GIN optimizations (after 9.4)
On Thu, Feb 6, 2014 at 8:31 PM, PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at wrote: i think there is one more thing which would be really good in GIN and which would solve a ton of issues. atm GIN entries are sorted by item pointer. if we could sort them by a column it would fix a couple of real work issues such as ... SELECT ... FROM foo WHERE tsearch_query ORDER BY price DESC LIMIT 10 ... or so. it many cases you want to search for a, say, product and find the cheapest / most expensive one. if the tsearch_query yields a high number of rows (which it often does) the subsequent sort will kill you. This is not intended to be a small change. However, some solution might be possible in post 9.4 gin improvements or in new secret indexing project which will be presented at PGCon :-) -- With best regards, Alexander Korotkov.
Re: [HACKERS] Small GIN optimizations (after 9.4)
On Thu, Feb 6, 2014 at 3:36 PM, Heikki Linnakangas hlinnakan...@vmware.comwrote: While hacking on the GIN patches, I've come up with a few different ideas for improving performance. It's too late for 9.4, but I'll list them here if someone wants to work on them later: * Represent ItemPointers as uint64's, to speed up comparisons. ginCompareItemPointers is inlined into only a few instructions, but it's still more expensive than a single-instruction 64-bit comparison. ginCompareItemPointers is called very heavily in a GIN scan, so even a small improvement there would make for a noticeable speedup. It might be an improvement in code clarity, too. * Keep the entry streams of a GinScanKey in a binary heap, to quickly find the minimum curItem among them. I did this in various versions of the fast scan patch, but then I realized that the straightforward way of doing it is wrong, because a single GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is updated as part of advancing one key, and the entry is in a heap of another key, updating the curItem can violate the heap property of the other entry's heap. * Build a truth table (or cache) of consistent-function's results, and use that instead of calling consistent for every item. Caching seems more appropriate for me. Intuition tells me that when number of entries is high then far not all consistent combinations will be used. However, intuition might be false :-) * Deduce AND or OR logic from the consistent function. Or have the opclass provide a tree of AND/OR/NOT nodes directly, instead of a consistent function. For example, if the query is foo bar, we could avoid calling consistent function altogether, and only return items that match both. I also had this idea. But this solution doesn't cover similarity queries. If you have 20 entries and consistent function returns true when at least 10 of 20 are present then representation in AND/OR/NOT nodes would be too enormous, so useless. * Delay decoding segments during a scan. Currently, we decode all segments of a posting tree page into a single array at once. But with fast scan, we might be able to skip over all entries in some of the segments. So it would be better to copy the segments into backend-private memory in compressed format, and decode them one segment at a time (or maybe even one item at a time), when needed. That would avoid the unnecessary decoding of segments that can be skipped over, and would also reduce memory usage of a scan. I would like to add another one as continue of fast-scan: * Skip scanning of some entries at all forcing recheck instead. Correct decision should be done based on costs. However, I'm not sure about design. Because it's like a planning feature. How correct to do this inside of GIN? -- With best regards, Alexander Korotkov.
Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second
On Mon, Feb 3, 2014 at 4:14 AM, Sawada Masahiko sawada.m...@gmail.comwrote: On Sat, Feb 1, 2014 at 8:29 PM, Oskari Saarenmaa o...@ohmu.fi wrote: 31.01.2014 10:59, Sawada Masahiko kirjoitti: I think the idea in the new progress_report() call (with force == true) is to make sure that there is at least one progress_report call that actually writes the progress report. Otherwise the final report may go missing if it gets suppressed by the time-based check. The force argument as used in the new call skips that check. I understood. I have two concerns as follows. - I think that there is possible that progress_report() is called frequently ( less than 1 second). That is, progress_report() is called with force == true after progress_report was called with force == false and execute this function. - progress_report() is called even if -P option is disabled. I'm concerned about that is cause of performance degradation. I looked over the latest version, and the only real problem I see here is your second point, which is the calling with -P not specified. I doubt it's going to be much, but in theory I guess the call to time(NULL) many times could have an effect. I've fixed that by just moving it to after a check for showprogress. As for the first one - I believe that's the point. progress_report should be called with force==true after it was called with it false, that's the intended design. I've applied the patch, with that minor adjustment and an added comment. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] notify duplicate elimination
I have prepared a patch to backends/commands/async,c to speed up duplicate elimination. rdtsc timing results are sent back via ereport. *** a/src/backend/commands/async.c --- b/src/backend/commands/async.c *** *** 326,337 typedef struct Notification --- 326,353 { char *channel; /* channel name */ char *payload; /* payload string (can be empty) */ + uint32 hash ; /* speed up search for duplicates */ + struct Notification *left ; + struct Notification *right ; + } Notification; static List *pendingNotifies = NIL; /* list of Notifications */ static List *upperPendingNotifies = NIL; /* list of upper-xact lists */ + static Notification *treerootNotifies = NULL ; /* speed up search for duplicates */ + #warning rdtsc just included for lack of a suitable reference + static uint64 rdtsc_total ; + static int rdtsc_count ; + static inline uint64 rdtsc(void) + { + uint64 x; + __asm__ volatile (.byte 0x0f, 0x31 : =A (x)); + return x; + } + + /* * State for inbound notifications consists of two flags: one saying whether * the signal handler is currently allowed to call ProcessIncomingNotify *** *** 382,389 static void ProcessIncomingNotify(void); static void NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid); - static bool AsyncExistsPendingNotify(const char *channel, const char *payload); static void ClearPendingActionsAndNotifies(void); /* * We will work on the page range of 0..QUEUE_MAX_PAGE. --- 398,408 static void NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid); static void ClearPendingActionsAndNotifies(void); + static Notification **AsyncSearchPendingNotifies(const char *channel, + const char *payload, + uint32 *hash); + /* Does pendingNotifies include the given channel/payload? */ /* * We will work on the page range of 0..QUEUE_MAX_PAGE. *** *** 533,538 Async_Notify(const char *channel, const char *payload) --- 552,560 { Notification *n; MemoryContext oldcontext; + Notification **nn ; + uint64 t1,t2 ; + uint32 hash ; if (Trace_notify) elog(DEBUG1, Async_Notify(%s), channel); *** *** 557,564 Async_Notify(const char *channel, const char *payload) } /* no point in making duplicate entries in the list ... */ ! if (AsyncExistsPendingNotify(channel, payload)) ! return; /* * The notification list needs to live until end of transaction, so store --- 579,592 } /* no point in making duplicate entries in the list ... */ ! t1 = rdtsc() ; ! nn = AsyncSearchPendingNotifies(channel,payload,hash) ; ! t2 = rdtsc(); ! rdtsc_total += t2-t1 ; ! rdtsc_count += 1 ; ! if ( !nn ) /* this was a duplicate entry */ ! return ; ! /* * The notification list needs to live until end of transaction, so store *** *** 566,584 Async_Notify(const char *channel, const char *payload) */ oldcontext = MemoryContextSwitchTo(CurTransactionContext); ! n = (Notification *) palloc(sizeof(Notification)); n-channel = pstrdup(channel); if (payload) n-payload = pstrdup(payload); else n-payload = ; ! /* ! * We want to preserve the order so we need to append every notification. ! * See comments at AsyncExistsPendingNotify(). */ pendingNotifies = lappend(pendingNotifies, n); - MemoryContextSwitchTo(oldcontext); } --- 594,625 */ oldcontext = MemoryContextSwitchTo(CurTransactionContext); ! n = (Notification *) palloc(sizeof(*n)); n-channel = pstrdup(channel); if (payload) n-payload = pstrdup(payload); else n-payload = ; + + /* append to search tree */ + n-left = NULL ; + n-right = NULL ; + n-hash = hash ; + *nn = n ; ! /* We want to preserve the order so we need to append every notification. ! * As we are not checking our parents' lists, we can still get duplicates ! * in combination with subtransactions, like in: ! * ! * begin; ! * notify foo '1'; ! * savepoint foo; ! * notify foo '1'; ! * commit; ! *-- */ + pendingNotifies = lappend(pendingNotifies, n); MemoryContextSwitchTo(oldcontext); } *** *** 2149,2156 NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid) elog(INFO, NOTIFY for \%s\ payload \%s\, channel, payload); } ! /* Does pendingNotifies include the given channel/payload? */ ! static bool AsyncExistsPendingNotify(const char *channel, const char *payload) { ListCell *p; --- 2190,2314 elog(INFO, NOTIFY for \%s\ payload \%s\, channel, payload); } ! ! static inline uint32 murmurhash2( const char* key, uint32 seed ) ! #warning yet another copy of murmurhash2 ! #warning should we use murmurhash3 or one of the hash functions from
[HACKERS] Terminating pg_basebackup background streamer
If an error occurs in the foreground (backup) process of pg_basebackup, and we exit in a controlled way, the background process (streaming xlog process) would stay around and keep streaming. This can happen for example if disk space runs out and there is very low activity on the server. (If there is activity on the server, the background streamer will also run out of disk space and exit) Attached patch kills it off in disconnect_and_exit(), which seems like the right thing to do to me. Any objections to applying and backpatching that for the upcoming minor releases? Should we perhaps also consider adding a sigterm handler to pg_basebackup, so if you send it a SIGTERM it kills off the background process as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index a6e320c..c27c633 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -76,6 +76,7 @@ static PQExpBuffer recoveryconfcontents = NULL; /* Function headers */ static void usage(void); +static void disconnect_and_exit(int code); static void verify_dir_is_empty_or_create(char *dirname); static void progress_report(int tablespacenum, const char *filename, bool force); @@ -88,6 +89,26 @@ static void BaseBackup(void); static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline, bool segment_finished); + +static void disconnect_and_exit(int code) +{ + if (conn != NULL) + PQfinish(conn); + +#ifndef WIN32 + /* + * On windows, our background thread dies along with the process. + * But on Unix, if we have started a subprocess, we want to kill + * it off so it doesn't remain running trying to stream data. + */ + if (bgchild 0) + kill(bgchild, SIGTERM); +#endif + + exit(code); +} + + #ifdef HAVE_LIBZ static const char * get_gz_error(gzFile gzf) diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index 8a702e3..0f191ce 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -45,6 +45,13 @@ static void StreamLog(); static bool stop_streaming(XLogRecPtr segendpos, uint32 timeline, bool segment_finished); +#define disconnect_and_exit(code)\ + { \ + if (conn != NULL) PQfinish(conn); \ + exit(code); \ + } + + static void usage(void) { diff --git a/src/bin/pg_basebackup/streamutil.h b/src/bin/pg_basebackup/streamutil.h index bb3c34d..7c7d022 100644 --- a/src/bin/pg_basebackup/streamutil.h +++ b/src/bin/pg_basebackup/streamutil.h @@ -11,10 +11,4 @@ extern char *replication_slot; /* Connection kept global so we can disconnect easily */ extern PGconn *conn; -#define disconnect_and_exit(code)\ - { \ - if (conn != NULL) PQfinish(conn); \ - exit(code); \ - } - extern PGconn *GetConnection(void); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
Hello, I revise my gaussian pgbench patch which wss requested from community. With a lot of delay for which I apologise, please find hereafter the review. Gaussian Pgbench v3 patch by Mitsumasa KONDO review * The purpose of the patch is to allow a pgbench script to draw from normally distributed integer values instead of uniformly distributed. This is a valuable contribution to enable pgbench to generate more realistic loads, which is seldom uniform in practice. However, ISTM that other distributions such an exponantial one would make more sense, and also the values should be further randomized so that neighboring values are not more likely to be drawn. The latest point is non trivial. * Compilation The patch applies and compiles against current head. It works as expected, although there is few feedback from the script to show that. * Mathematical soundness We want to derive a discrete normal distribution from a uniform one. Well, normal distributions are for continuous variables... Anyway, this is done by computing a continuous normal distribution which is then projected onto integers. I'm basically fine with that. The system uses a Box-Muller transform (1958) to do this transformation. The Ziggurat method seems to be prefered for this purpose, *but* it would require precalculated tables which depends on the target values. So I'm fine with the Box-Muller transform for pgbench. The BM method uses 2 uniformly distributed numbers to derive 2 normally distributed numbers. The implementation computes one of these, and loops over till one match a threshold criterion. More explanations, at least in comments, are needed about this threshold and its meaning. It is required to be more than 2. I guess is that it allows to limit the number of iterations of the while loop, but in what proportion is unclear. The documentation does not also help the user to understand this value and its meaning. What I think it is: it is the deviation for the FURTHEST point around the mean, that is the actual deviation associated to the min and max target values. The 2 minimum value induces that there is a least 4 stddev lengths between min max, with the most likely mean in the middle. If the threshold test fails, one of the 2 uniform number is redrawn, a new candidate value is tested. I'm not at ease about why only 1 value is redrawn and not both, some explanations would be welcome. Also, on the other hand, why not test the other possible value (with cos) if the first one fails? Also, as suggested above, I would like some explanations about how much this while loop may iterate without success, say with the expected average number of iterations with its explanation in a comment. * Implementation Random values : double rand1 = 1.0 - rand; // instead of the LONG_MAX computation limits.h rand2 should be in (0, 1], but it is in [0, 1), use 1.0 - ... as well?! What is called stdev* in getrand() is really the chosen deviation from the target mean, so it would make more sense to name it dev. I do not think that the getrand refactoring was such a good idea. I'm sorry if I may have suggested that in a previous comment. The new getrand possibly ignores its parameters, h. ISTM that it would be much simpler in the code to have a separate and clean getrand_normal or getrand_gauss called for \setgaussian, and that's it. This would allow to get rid of DistType and all of getrand changes in the code. There are heavy constants computations (sqrt(log()) within the while loop which would be moved out of the loop. ISTM that the while condition would be easier to read as: while ( dev - threshold || threshold dev ) Maybe the \\setgaussian argument handling may be transformed into a function, so that it could be used easily later for some other distribution (say some setexp:-) * Options ISTM that the test options would be better if made orthogonal, i.e. not to have three --gaussian* options. I would suggest to have only one --gaussian=NUM which would trigger gaussian tests with this threshold, and --gaussian=3.5 --select-only would use the select-only variant, and so on. * Typos gausian - gaussian patern - pattern * Conclusion : - this is a valuable patch to help create more realistic load and make pgbench a more useful tool. I'm greatly in favor of having such a functionality. - it seems to me that the patch should be further improved before being committed, in particular I would suggest: (1) improve the explanations in the code and in the documentation, especially about what is the deviation threshold and its precise link to generated values. (2) simplify the code with a separate gaussian getrand, and simpler or more efficient code here and there, see comments above. (3) use only one option to trigger gaussian tests. (bonus) \setexp would be a nice:-) --
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/09/2014 01:12 AM, Marco Atzeri wrote: On 09/02/2014 00:06, Andrew Dunstan wrote: On 02/08/2014 05:34 PM, Tom Lane wrote: Hiroshi Inoue in...@tpf.co.jp writes: Though I'm not a MINGW expert at all, I know dllwrap is a deprecated tool and dlltool is almost a deprecated tool. Cygwin port is removing the use of dllwrap and dlltool now. Isn't it better for MINGW port to follow it? Only way to make that happen is to prepare and test a patch ... Yeah. Incidentally, we didn't quite get rid of dlltool for Cygwin. We did get rid of dllwrap. But I agree this is worth trying for Mingw. we should have get rid of dlltool on cygwin. At least it is not used on my build Regards Marco The send in a patch. The patch you sent in previously did not totally remove it IIRC. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [DOCS] [HACKERS] Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
On Sat, Feb 8, 2014 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Gavin Flower gavinflo...@archidevsys.co.nz writes: How about adding URL's for the online versions of HISTORY README's (or their rough equivalents - perhaps the online version of the latest 'Appendix E. Release Notes' would be sufficient?) to the INSTALL file? Actually, what I had in mind was to replace the dynamically-generated HISTORY and README files with small text files that contain those URL references. Here's a proposed patch against HEAD for this. It also gets rid of some rather quaint instructions for using Netscape to construct these files ;-) Barring objection, I'd like to update all the live branches this way before the upcoming releases. I'm tired of having to worry about whether the release notes will build as plain text; but that worry won't go away unless we nuke the text output in all the branches. Sounds OK to me. If there's as many as two people using those files, I'll be surprised. (Of course, I've been surprised before.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Thu, Feb 6, 2014 at 11:41 PM, Greg Stark st...@mit.edu wrote: That doesn't explain the other instance or the other copies of this database. I think the most productive thing I can do is switch my attention to the other database to see if it really looks like the same problem. So here's an instance in the other database, this one is on a different continent from the first one so it's definitely a different physical machine. I've had to copy the blocks over to another machine because the database is down and still in standby mode anyways. I don't have the xlog file yet. Bad block's page header -- this is in the 56'th relation segment: =# select (page_header(E'\\x2005583b05aa050028001805002004201098e00f2090e00f088d24061885e00f')).*; lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid --+-+---+---+---+-+--+-+--- 520/AA053B58 | 5 | 0 |40 | 1304 |8192 | 8192 | 4 | 0 (1 row) =# select (heap_page_items(E'\\x2005583b05aa050028001805002004201098e00f2090e00f088d24061885e00f000')).*; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid| t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ++--++-++--+-+-++++--- 1 | 6160 |1 | 2032 | 7635393 | 0 |0 | (4773121,1) | 3 | 2306 | 24 || 2 | 4128 |1 | 2032 | 7635393 | 0 |0 | (4773121,2) | 3 | 2306 | 24 || 3 | 3336 |1 |786 | 7635393 | 0 |0 | (4773121,3) | 3 | 2306 | 24 || 4 | 1304 |1 | 2032 | 7635428 | 0 |0 | (4773121,4) | 3 | 2306 | 24 || (4 rows) Looking at the block at offset 4773121 (which is in the 36th segment): =# select (heap_page_items(E'\\x2005a00a0bad05002c00a002002004201098e00f2090e00f088d24061885e00fa082ec04000.')).*; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid| t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ++--++-++--+-+-++++--- 1 | 6160 |1 | 2032 | 7635393 | 0 |0 | (4773121,1) | 3 | 2306 | 24 || 2 | 4128 |1 | 2032 | 7635393 | 0 |0 | (4773121,2) | 3 | 2306 | 24 || 3 | 3336 |1 |786 | 7635393 | 0 |0 | (4773121,3) | 3 | 2306 | 24 || 4 | 1304 |1 | 2032 | 7635428 | 0 |0 | (4773121,4) | 3 | 2306 | 24 || 5 |672 |1 |630 | 7635580 | 0 |0 | (4773121,5) | 3 | 2306 | 24 || (5 rows) d9de7pcqls4ib6=# select (page_header(E'\\x2005a00a0bad05002c00a002002004201098e00f2090e00f088d24061885e00fa082ec04')).*; lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid --+-+---+---+---+-+--+-+--- 520/AD0B0AA0 | 5 | 0 |44 | 672 |8192 | 8192 | 4 | 0 (1 row) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, February 7, 2014 00:47, Andrew Dunstan wrote: Attached are updated patches. jsonb-10.patch.gz nested-hstore-10.patch.gz Small changes to json documentation, mostly of typo caliber. Thanks, Erik Rijkers --- doc/src/sgml/datatype.sgml.orig 2014-02-09 14:27:55.264512678 +0100 +++ doc/src/sgml/datatype.sgml 2014-02-09 14:36:47.758675826 +0100 @@ -4254,8 +4254,8 @@ There are two JSON data types: typejson/type and typejsonb/type. Both accept identical sets of values as input. The difference is primarily a matter of efficiency. The typejson/type data type stores an exact -copy of the the input text, and the processing functions have to reparse -it to precess it, while the typejsonb/type is stored in a decomposed +copy of the input text, and the processing functions have to reparse +it to process it, while the typejsonb/type is stored in a decomposed form that makes it slightly less efficient to input but very much faster to process, since it never needs reparsing. /para @@ -4275,7 +4275,7 @@ para In general, most applications will find it advantageous to store JSON data -as typejsonb/type, unless they have quite specialised needs. +as typejsonb/type, unless they have quite specialized needs. /para para--- doc/src/sgml/func.sgml.orig 2014-02-09 14:28:12.888989051 +0100 +++ doc/src/sgml/func.sgml 2014-02-09 15:15:44.378580545 +0100 @@ -10157,13 +10157,8 @@ entry Builds a heterogeneously-typed json array out of a variadic argument list. /entry - entryliteralSELECT json_build_array(1,2,'3',4,5);/literal/entry - entry -programlisting - json_build_array - [1, 2, 3, 4, 5] - /programlisting + entryliteraljson_build_array(1,2,'3',4,5);/literal/entry + entryliteral [1, 2, 3, 4, 5]/literal /entry /row row @@ -10177,12 +10172,7 @@ constructed out of alternating name/value arguments. /entry entryliteralSELECT json_build_object('foo',1,'bar',2);/literal/entry - entry -programlisting - json_build_object - - {foo : 1, bar : 2} - /programlisting + entryliteral{foo : 1, bar : 2}/literal /entry /row row @@ -10197,7 +10187,7 @@ such that each inner array has exactly two elements, which are taken as a name/value pair. /entry - entryliteralselect * from json_object('{a, 1, b, def, c, 3.5}') or literalselect json_object('{{a, 1},{b, def},{c, 3.5}}')/literal/literal/entry + entryliteralSELECT * FROM json_object('{a, 1, b, def, c, 3.5}') or literalselect json_object('{{a, 1},{b, def},{c, 3.5}}')/literal/literal/entry entry programlisting json_object @@ -10215,13 +10205,8 @@ The two-argument form of JSON object takes keys and values pairwise from two separate arrays. In all other respects it is identical to the one-argument form. /entry - entryliteralselect json_object('{a, b}', '{1,2}');/literal/entry - entry -programlisting - json_object - - {a : 1, b : 2} - /programlisting + entryliteraljson_object('{a, b}', '{1,2}');/literal/entry + entryliteral{a : 1, b : 2}/literal /entry /row /tbody @@ -10419,12 +10404,12 @@ entrytypeanyelement/type/entry entry Expands the object in replaceablefrom_json/replaceable to a row whose columns match - the record type defined by base. Conversion will be best + the record type defined by parameterbase/parameter. Conversion will be best effort; columns in base with no corresponding key in replaceablefrom_json/replaceable will be left null. When processing typejson/type, if a column is specified more than once, the last value is used. /entry - entryliteralselect * from json_populate_record(null::x, '{a:1,b:2}')/literal/entry + entryliteralSELECT * FROM json_populate_record(null::x, '{a:1,b:2}')/literal/entry entry programlisting a | b @@ -10440,13 +10425,13 @@ entrytypeSETOF anyelement/type/entry entry Expands the outermost set of objects in replaceablefrom_json/replaceable to a set - whose columns match the record type defined by base. - Conversion will be best effort; columns in base with no + whose columns match the record type defined by parameterbase/parameter. + Conversion will be best effort; columns in parameterbase/parameter with no corresponding key in replaceablefrom_json/replaceable will be left null. When processing typejson/type, if a column is specified more than once, the last value is used. /entry - entryliteralselect * from json_populate_recordset(null::x, '[{a:1,b:2},{a:3,b:4}]')/literal/entry + entryliteralSELECT * FROM
Re: [HACKERS] narwhal and PGDLLIMPORT
On 09/02/2014 14:10, Andrew Dunstan wrote: On 02/09/2014 01:12 AM, Marco Atzeri wrote: On 09/02/2014 00:06, Andrew Dunstan wrote: On 02/08/2014 05:34 PM, Tom Lane wrote: Hiroshi Inoue in...@tpf.co.jp writes: Though I'm not a MINGW expert at all, I know dllwrap is a deprecated tool and dlltool is almost a deprecated tool. Cygwin port is removing the use of dllwrap and dlltool now. Isn't it better for MINGW port to follow it? Only way to make that happen is to prepare and test a patch ... Yeah. Incidentally, we didn't quite get rid of dlltool for Cygwin. We did get rid of dllwrap. But I agree this is worth trying for Mingw. we should have get rid of dlltool on cygwin. At least it is not used on my build Regards Marco The send in a patch. The patch you sent in previously did not totally remove it IIRC. cheers andrew attached patch versus latest git. except prepared_xacts test all others are fine === All 140 tests passed. === Regards Marco diff --git a/src/Makefile.global.in b/src/Makefile.global.in index e0e9b79..eaf6ddf 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -518,6 +518,11 @@ ifeq ($(PORTNAME),win32) LIBS += -lws2_32 -lshfolder endif +# missing for link on cygwin ? +ifeq ($(PORTNAME),cygwin) +libpq_pgport += $(LDAP_LIBS_FE) +endif + # Not really standard libc functions, used by the backend. TAS = @TAS@ diff --git a/src/Makefile.shlib b/src/Makefile.shlib index a95e4d6..3d4b989 100644 --- a/src/Makefile.shlib +++ b/src/Makefile.shlib @@ -273,7 +273,7 @@ endif ifeq ($(PORTNAME), cygwin) LINK.shared = $(CC) -shared ifdef SO_MAJOR_VERSION -shlib = cyg$(NAME)$(DLSUFFIX) +shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX) endif haslibarule = yes endif diff --git a/src/backend/Makefile b/src/backend/Makefile index 356890d..bc744b9 100644 --- a/src/backend/Makefile +++ b/src/backend/Makefile @@ -62,18 +62,8 @@ endif ifeq ($(PORTNAME), cygwin) -postgres: $(OBJS) postgres.def libpostgres.a - $(DLLTOOL) --dllname $@$(X) --output-exp $@.exp --def postgres.def - $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) -o $@$(X) -Wl,--base-file,$@.base $@.exp $(call expand_subsys,$(OBJS)) $(LIBS) - $(DLLTOOL) --dllname $@$(X) --base-file $@.base --output-exp $@.exp --def postgres.def - $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) -Wl,--stack,$(WIN32_STACK_RLIMIT) -o $@$(X) $@.exp $(call expand_subsys,$(OBJS)) $(LIBS) - rm -f $@.exp $@.base - -postgres.def: $(OBJS) - $(DLLTOOL) --export-all --output-def $@ $(call expand_subsys,$^) - -libpostgres.a: postgres.def - $(DLLTOOL) --dllname postgres.exe --def postgres.def --output-lib $@ +postgres libpostgres.a: $(OBJS) + $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(export_dynamic) $(call expand_subsys,$^) $(LIBS) -o $@ -Wl,--stack,$(WIN32_STACK_RLIMIT) -Wl,--export-all-symbols -Wl,--out-implib=libpostgres.a endif # cygwin diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index 7f2d901..721e248 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -45,7 +45,7 @@ OBJS += ip.o md5.o OBJS += encnames.o wchar.o ifeq ($(PORTNAME), cygwin) -override shlib = cyg$(NAME)$(DLSUFFIX) +override shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX) endif ifeq ($(PORTNAME), win32) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
On Sat, Jan 11, 2014 at 2:47 AM, Peter Eisentraut pete...@gmx.net wrote: On Sat, 2014-01-11 at 00:48 -0200, Fabrízio de Royes Mello wrote: Now, if bdr is installed but the validation doesn't happen unless bdr is loaded in some sense, then that is an implementation deficiency that I think we can insist be rectified before this feature is accepted. Check if extension is already installed is not enough for the first version of this feature? Elsewhere it was argued that tying this to extensions is not appropriate. I agree. It depends on how this feature is supposed to be used exactly. A replication plugin might very well be loaded via session_preload_libraries and not appear in SQL at all. In that case you need some C-level hook. In another case, an extension might want to inspect relation options from user-space triggers. So you'd need to register some SQL-level function for option validation. This could end up being two separate but overlapping features. Hi all, I taken this weekend to work on this patch and on monday or tuesday I'll send it. But I have some doubts: 1) I'm not convinced to tying this to extensions. I think this feature must enable us to just store a custom GUC. We can set custom GUCs in a backend session using SET class.variable = value, and this feature could just enable us to store it for relations/attributes. Without the complexity and overhead to register a function to validate them. That way we can use this feature to extensions and other needs too. 2) If we're implement the Robert's idea to have a function to validate the extension options then we must think about how a extension developer will register this function. Beacuse when we install a extension must have one way to get de pg_proc OID and store it in the pg_extension (or a different catalog). Or we'll implement some way to register this function at the SQL level, like ALTER EXTENSION bdr SET VALIDATE FUNCTION bdr_options_validate(); or another sintax of course. I don't know if you guys understood my concerns!! :-) Comments? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Minor performance improvement in transition to external sort
On Fri, Feb 7, 2014 at 4:28 PM, Jeremy Harris j...@wizmail.org wrote: On 06/02/14 22:12, Jeremy Harris wrote: Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? Summary (low numbers better): Random ints: 83% compares, level on time. Sorted ints: level compares, 70% time. Reverse-sorted ints: 10% compares, 15% time (!) Constant ints: 200% compares, 360% time(ouch, and not O(n)) Pipe-organ ints: 80% compares, 107% time Random text: 83% compares, 106% time This is kind of what I expected to happen. The problem is that it's hard to make some cases better without making other cases worse. I suspect (hope?) there's some simple fix for the constant-int case. But the last two cases are trickier. It seems intuitively that reducing comparisons ought to reduce runtime, but if I'm reading the results, the runtime actually went up even though the number of comparisons went down. This is hard to account for, but we probably need to at least understand why that's happening. I feel like this algorithm ought to be a win, but these data don't provide a compelling case for change. I wonder if it would be worth trying this test with text data as well. Text comparisons are much more expensive than integer comparisons, so the effect of saving comparisons ought to be more visible there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] specifying repeatable read in PGOPTIONS
On Fri, Feb 7, 2014 at 5:06 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-04 12:02:45 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-04 11:36:22 -0500, Tom Lane wrote: -1. This is not a general solution to the problem. There are other GUCs for which people might want spaces in the value. Sure, I didn't say it was. But I don't see any oother values that are likely being passed via PGOPTIONS that frequently contain spaces. application_name --- weren't we just reading about people passing entire command lines there? (They must be using some other way of setting it currently, but PGOPTIONS doesn't seem like an implausible source.) You can't easily use PGOPTIONS to set application_name in many cases anyway, libpq's support for it gets in the way since it takes effect later. And I think libpq is much more likely way to set it. Also you can simply circumvent the problem by using a different naming convention, that's not problem with repeatable read. So I still think we should add read_committed, repeatable_read as aliases. Like Tom, I'm -1 on this. This is fixing the problem from the wrong end. But introducing an escaping convention seems more than prudent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] specifying repeatable read in PGOPTIONS
On 2014-02-09 12:00:02 -0500, Robert Haas wrote: On Fri, Feb 7, 2014 at 5:06 AM, Andres Freund and...@2ndquadrant.com wrote: So I still think we should add read_committed, repeatable_read as aliases. Like Tom, I'm -1 on this. This is fixing the problem from the wrong end. Why? We do have other options with aliases for option values and all other enum option has taken care not to need spaces. But introducing an escaping convention seems more than prudent. I've attached a patch implementing \ escapes one mail up... But I don't really see that as prohibiting also adding sensible aliases. It's just annoying that it's currently not possible to run to pgbenches testing both without either restarting the user or ALTER ROLE/DATABASE. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] specifying repeatable read in PGOPTIONS
On Sun, Feb 9, 2014 at 12:10 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-09 12:00:02 -0500, Robert Haas wrote: On Fri, Feb 7, 2014 at 5:06 AM, Andres Freund and...@2ndquadrant.com wrote: So I still think we should add read_committed, repeatable_read as aliases. Like Tom, I'm -1 on this. This is fixing the problem from the wrong end. Why? We do have other options with aliases for option values and all other enum option has taken care not to need spaces. I think that's probably mostly a happy coincidence; I'm not committed to a policy of ensuring that all GUCs can be set to whatever value you want without using the space character. Besides, what's so special about enum GUCs? There can certainly be spaces in string-valued GUCs, and you're not going to be able to get around the problem there with one-off kludges. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Thu, Feb 6, 2014 at 12:39 PM, Marti Raudsepp ma...@juffo.org wrote: On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote: Hmm, sounds a little steep. Why is it so expensive? I'm probably missing something here, because I would have thought that planner support for partial sorts would consist mostly of considering the same sorts we consider today, but with the costs reduced by the batching. I guess it's because the patch undoes some optimizations in the mergejoin planner wrt caching merge clauses and adds a whole lot of code to find_mergeclauses_for_pathkeys. In other code paths the overhead does seem to be negligible. Notice the removal of: /* Select the right mergeclauses, if we didn't already */ /* * Avoid rebuilding clause list if we already made one; * saves memory in big join trees... */ This is not only place that worry me about planning overhead. See get_cheapest_fractional_path_for_pathkeys. I had to estimate number of groups for each sorting column in order to get right fractional path. For partial sort path, cost of first batch should be included into initial cost. If don't do so, optimizer can pick up strange plans basing on assumption that it need only few rows from inner node. See an example. create table test1 as ( select id, (random()*100)::int as v1, (random()*1)::int as v2 from generate_series(1,100) id); create table test2 as ( select id, (random()*100)::int as v1, (random()*1)::int as v2 from generate_series(1,100) id); create index test1_v1_idx on test1 (v1); Plan without fraction estimation in get_cheapest_fractional_path_for_pathkeys: postgres=# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1 order by t1.v1, t1.id limit 10; QUERY PLAN -- Limit (cost=198956893.20..198956913.33 rows=10 width=24) - Partial sort (cost=198956893.20..19909637942.82 rows=9791031169 width=24) Sort Key: t1.v1, t1.id Presorted Key: t1.v1 - Nested Loop (cost=0.42..19883065506.84 rows=9791031169 width=24) Join Filter: (t1.v1 = t2.v1) - Index Scan using test1_v1_idx on test1 t1 (cost=0.42..47600.84 rows=100 width=12) - Materialize (cost=0.00..25289.00 rows=100 width=12) - Seq Scan on test2 t2 (cost=0.00..15406.00 rows=100 width=12) (9 rows) Current version of patch: postgres=# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1 order by t1.v1, t1.id limit 10; QUERY PLAN -- Limit (cost=3699913.43..3699913.60 rows=10 width=24) - Partial sort (cost=3699913.43..173638549.67 rows=9791031169 width=24) Sort Key: t1.v1, t1.id Presorted Key: t1.v1 - Merge Join (cost=150444.79..147066113.70 rows=9791031169 width=24) Merge Cond: (t1.v1 = t2.v1) - Index Scan using test1_v1_idx on test1 t1 (cost=0.42..47600.84 rows=100 width=12) - Materialize (cost=149244.84..154244.84 rows=100 width=12) - Sort (cost=149244.84..151744.84 rows=100 width=12) Sort Key: t2.v1 - Seq Scan on test2 t2 (cost=0.00..15406.00 rows=100 width=12) (11 rows) I don't compare actual execution times because I didn't wait until first plan execution ends up :-) But anyway costs are extraordinary and inner sequential scan of 100 rows is odd. -- With best regards, Alexander Korotkov.
Re: [HACKERS] specifying repeatable read in PGOPTIONS
Robert Haas robertmh...@gmail.com writes: On Sun, Feb 9, 2014 at 12:10 PM, Andres Freund and...@2ndquadrant.com wrote: Why? We do have other options with aliases for option values and all other enum option has taken care not to need spaces. I think that's probably mostly a happy coincidence; I'm not committed to a policy of ensuring that all GUCs can be set to whatever value you want without using the space character. Besides, what's so special about enum GUCs? There can certainly be spaces in string-valued GUCs, and you're not going to be able to get around the problem there with one-off kludges. Pathname GUCs can have spaces in them (that's even pretty common, on certain platforms). Other GUCs contain SQL identifiers, which can legally have spaces in them too. So really this is a mechanism deficiency, not something we should work around by instituting a policy against spaces in GUC values. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] specifying repeatable read in PGOPTIONS
On 2014-02-09 12:38:18 -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Feb 9, 2014 at 12:10 PM, Andres Freund and...@2ndquadrant.com wrote: Why? We do have other options with aliases for option values and all other enum option has taken care not to need spaces. I think that's probably mostly a happy coincidence; I'm not committed to a policy of ensuring that all GUCs can be set to whatever value you want without using the space character. Besides, what's so special about enum GUCs? There can certainly be spaces in string-valued GUCs, and you're not going to be able to get around the problem there with one-off kludges. Pathname GUCs can have spaces in them (that's even pretty common, on certain platforms). Other GUCs contain SQL identifiers, which can legally have spaces in them too. But pretty much all of those GUCs are either PGC_SIGHUP or PGC_POSTMASTER and thus cannot be set via PGOPTIONS anyway. The two exceptions are application_name (which can in many cases not set because libpq overrides it with a SET) and search_path. Anybody setting the latter to schemas containing spaces deserves having to escape values. So really this is a mechanism deficiency, not something we should work around by instituting a policy against spaces in GUC values. Please note, I am absolutely *not* against such a mechanism, that's why I submitted a patch implementing one. But unneccearily requiring escaping still annoys me. We imo should add the escaping mechanism to master and backpatch the aliases (read_committed, repeatable_read). There's already not enough benchmarking during beta/rc, we shouldn't make it unneccesarily hard. And there's sufficient reason to benchmark the difference between isolation modes, with mvcc catalog snapshots and such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Amit Kapila amit.kapil...@gmail.com writes: On Sun, Feb 9, 2014 at 4:04 AM, Tom Lane t...@sss.pgh.pa.us wrote: In the meantime, here's a short patch trying the #define extern approach. Anyone want to try this on a Windows machine or two? There are quite a few warnings and errors in build: Hmm. Looks like the #define is messing up the handling of externs appearing in system headers. There's likely no good way around that, at least none with the simplicity that's the only redeeming value of this approach. So this won't work, and we should be looking at whether it will help to not use dlltool anymore (per other emails in this thread). Thanks for doing the testing! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] trgm regex index peculiarity
On Thu, Jan 16, 2014 at 3:34 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: Revised version of patch with necessary comments. I looked at this patch a bit. It seems like this: + *BLANK_COLOR_SIZE - How much blank character is more frequent than + * other character in average + #define BLANK_COLOR_SIZE 32 is a drastic overestimate. Using the program Erik provided to generate some sample data, I find that blanks in trigrams are maybe about three times more common than other characters. Possibly Erik's data isn't very representative of real text, but if that's the case we'd better get a more representative sample case before we start optimizing ... Now maybe the above number is okay for this purpose anyway, but if so it needs some different justification; maybe call it a penalty? But nonetheless it seems like a darn large penalty, particularly for x type trigrams where the value would effectively be squared. Compared to the proposed values of MAX_TRGM_SIZE and WISH_TRGM_SIZE, it seems like you might as well forget the sizing approach and just flat out reject trigrams containing COLOR_BLANK, because they'll never get through the size filter. It seems to be right decision to me when we have other trigrams can reject them. But there are cases when we can't. I'm inclined to think you need a larger MAX_TRGM_SIZE; and WISH_TRGM_SIZE seems mighty small as well, compared to what the code would have done before. OK, I would like to make more reasoning for penalty. Let's consider word of length n. It contains n+1 trigrams including: 1 __x trigram 1 _xx trigram 1 xx_ trigram n - 2 xxx trigrams Assume alphabet of size m those trigrams have following average frequencies: __x 1/((n+1)*m) _xx 1/((n+1)*m^2) xx_ 1/((n+1)*m^2) xxx (n-2)/((n+1)*m^3) The ratio of this frequencies with blanks to frequency of xxx is: __x m^2/(n-2) _xx and xx_ m/(n-2) In order to estimate n I've analyzed some classics: Ernest Hemingway A farewell to arms - 3.8 Leo Tolstoy War and Peace - 5.0 In english with alphabet size = 26 we have __x m^2/(n-2) = 375 _xx and xx_ m/(n-2) = 14.4 In russian with alphabet size = 33 we have __x m^2/(n-2) = 363 _xx and xx_ m/(n-2) = 11 Assuming these calculations is approximate, we can safely use same values for both languages. Does such reasoning make sense? Also, surely this bit: ! trgmNFA-totalTrgmCount = (int) totalTrgmSize; is flat out wrong? expandColorTrigrams() expects that trgmNFA-totalTrgmCount is the truth, not some penalty-bloated abstraction. The fact that the patch hasn't failed on you completely demonstrates that you've not tested any cases where blank-containing trigrams got through the filter, reinforcing my feeling that it's probably too strict now. Oh, I wonder how can I leave such weird bug in patch :-( I wonder if it would be more appropriate to continue to measure the limit MAX_TRGM_COUNT in terms of actual trigram counts, and just use the penalized size as the sort key while determining which color trigrams to discard first. Agree. But I would like to save some equivalent of WISH_TRGM_SIZE. Another thought is that it's not clear that you should apply the same penalty to blanks in all three positions. Because of the padding settings, any one word will normally lead to padded strings a, ab, yz , so that spaces in the first position are about twice as common as spaces in the other positions. (It's a little less than that because single-letter words produce only a and a ; but I'd think that such words aren't very common in text that trigrams are effective for.) So I'm thinking the penalty ought to take that into account. I'm also inclined to think that it might be worth adding a size field to ColorTrgmInfo rather than repeatedly calculating the size estimate. We only allow a thousand and some ColorTrgmInfos at most, so the extra space wouldn't be that large, and I'm concerned by the expense the current patch adds to the sort comparator. Agree. Another point is that this comment: * Note: per-color-trigram counts cannot overflow an int so long as * COLOR_COUNT_LIMIT is not more than the cube root of INT_MAX, ie about * 1290. However, the grand total totalTrgmCount might conceivably * overflow an int, so we use int64 for that within this routine. is no longer valid, or at least it fails to demonstrate that the result of getColorTrigramSize() can't overflow an int; indeed I rather fear it can. The patch failed to even update the variable name in this comment, let alone address the substantive question. There are some other cosmetic things I didn't like, for instance colorTrgmInfoCountCmp() is no longer comparing counts but you didn't rename it. That's about it for substantive comments though. Thanks, will be fixed. -- With best regards, Alexander Korotkov.
Re: [HACKERS] trgm regex index peculiarity
Alexander Korotkov aekorot...@gmail.com writes: On Thu, Jan 16, 2014 at 3:34 AM, Tom Lane t...@sss.pgh.pa.us wrote: I looked at this patch a bit. It seems like this: + *BLANK_COLOR_SIZE - How much blank character is more frequent than + * other character in average + #define BLANK_COLOR_SIZE 32 is a drastic overestimate. OK, I would like to make more reasoning for penalty. Let's consider word of length n. It contains n+1 trigrams including: 1 __x trigram 1 _xx trigram 1 xx_ trigram n - 2 xxx trigrams Assume alphabet of size m those trigrams have following average frequencies: __x 1/((n+1)*m) _xx 1/((n+1)*m^2) xx_ 1/((n+1)*m^2) xxx (n-2)/((n+1)*m^3) Hmm, but you're assuming that the m letters are all equally frequent, which is surely not true in normal text. I didn't have a machine-readable copy of Hemingway or Tolstoy at hand, but I do have a text file with the collected works of H.P. Lovecraft, so I tried analyzing the trigrams in that. I find that * The average word length is 4.78 characters. * There are 5652 distinct trigrams (discounting some containing digits), the most common one (' t') occurring 81222 times; the average occurrence count is 500.0. * Considering only trigrams not containing any blanks, there are 4937 of them, the most common one ('the') occurring 45832 times, with an average count of 266.9. * There are (unsurprisingly) exactly 26 trigrams of the form ' x', with an average count of 19598.5. * There are 689 trigrams of the form ' xx' or 'xx ', the most common one (' th') occurring 58200 times, with an average count of 1450.1. So, we've rediscovered the fact that 'the' is the most common word in English text ;-). But I think the significant conclusion for our purposes here is that single-space trigrams are on average about 1450.1/266.9 = 5.4 times more common than the average space-free trigram, and two-space trigrams about 19598.5/266.9 = 73.4 times more common. So this leads me to the conclusion that we should be using a BLANK_COLOR_SIZE value around 5 or 6 (with maybe something other than a square-law rule for two-space trigrams). Even using your numbers, it shouldn't be 32. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal, patch: allow multiple plpgsql plugins
Hello updated patch - now plugin_info is per plpgsq_estate/plugin again. Regards Pavel 2014-01-17 20:26 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: 2014/1/16 Marko Tiikkaja ma...@joh.to Hi Pavel, First of all, thanks for working on this! On 1/12/14, 8:58 PM, Pavel Stehule wrote: I still not happy with plugin_info - it is only per plugin now and should be per plugin and per function. I'm not sure I understand the point of plugin_info in the first place, but what would having a separate info per (plugin, function) achieve that can't be done otherwise? First use case - I would to protect repeated call of plpgsql_check_function in passive mode. Where I have to store information about successful first start? It is related to the function instance, so function oid can be ambiguous (for function with polymorphic parameters). When function instance is destroyed, then this information should be destroyed. It is impossible do this check from plugin. Second use case - attach session life cycle plugin data with some function - for example for coverage calculation. Inside plugin without function specific data you have to hold a hash of all used function, and you have to search again and again. When plpgsql hold this info in internal plpgsql function structures, then you don't need search anything. Regards Pavel As for the current patch, I'd like to see improvements on a few things: 1) It doesn't currently compile because of extra semicolons in the PLpgSQL_plugin struct. 2) The previous comment above the same struct still talk about the rendezvous variable which is now gone. The comment should be updated to reflect the new API. 3) The same comment talks about how important it is to unregister a plugin if its _PG_fini() is ever called, but the current API does not support unregistering. That should probably be added? I'm not sure when _PG_fini() would be called. 4) The comment /* reserved for use by optional plugin */ seems a bit weird in its new context. Regards, Marko Tiikkaja commit eecd714579b3683b02a814b685b1d559ba0e0da8 Author: Pavel Stehule pavel.steh...@gmail.com Date: Sun Feb 9 22:08:18 2014 +0100 initial diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile index 852b0c7..37d17a8 100644 --- a/src/pl/plpgsql/src/Makefile +++ b/src/pl/plpgsql/src/Makefile @@ -19,7 +19,7 @@ rpath = OBJS = pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o -DATA = plpgsql.control plpgsql--1.0.sql plpgsql--unpackaged--1.0.sql +DATA = plpgsql.control plpgsql--1.1.sql plpgsql--1.0--1.1.sql plpgsql--unpackaged--1.0.sql all: all-lib diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 3749fac..ed540a9 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -86,6 +86,22 @@ typedef struct SimpleEcontextStackEntry static EState *shared_simple_eval_estate = NULL; static SimpleEcontextStackEntry *simple_econtext_stack = NULL; +/* + * List of pointers and info of registered plugins. + */ +typedef struct PluginPtrEntry +{ + PLpgSQL_plugin *plugin_ptr; + struct PluginPtrEntry *next; +} PluginPtrEntry; + +/* + * Allocated in TopMemoryContext + */ +static PluginPtrEntry *plugins = NULL; +static int nplugins = 0; +static int used_plugin_hook_types = 0; + / * Local function forward declarations / @@ -235,6 +251,18 @@ static char *format_expr_params(PLpgSQL_execstate *estate, const PLpgSQL_expr *expr); static char *format_preparedparamsdata(PLpgSQL_execstate *estate, const PreparedParamsData *ppd); +static void **get_plugin_info(PLpgSQL_execstate *estate, int plugin_number); + + +/* Bits for used plugin callback types */ +#define PLUGIN_HOOK_TYPE_FUNC_SETUP (1 0) +#define PLUGIN_HOOK_TYPE_FUNC_BEG (1 2) +#define PLUGIN_HOOK_TYPE_FUNC_END (1 3) +#define PLUGIN_HOOK_TYPE_STMT_BEG (1 4) +#define PLUGIN_HOOK_TYPE_STMT_END (1 5) + +#define EXEC_PLUGIN_HOOK_TYPE(type) \ + ((used_plugin_hook_types (PLUGIN_HOOK_TYPE_ ## type)) == (PLUGIN_HOOK_TYPE_ ## type)) /* -- @@ -331,10 +359,28 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo, exec_set_found(estate, false); /* - * Let the instrumentation plugin peek at this function + * Let the instrumentation plugins peek at this function */ - if (*plugin_ptr (*plugin_ptr)-func_beg) - ((*plugin_ptr)-func_beg) (estate, func); + if (EXEC_PLUGIN_HOOK_TYPE(FUNC_BEG)) + { + PluginPtrEntry *pe; + int i; + + Assert(plugins != NULL); + + for (i = 0, pe = plugins; pe != NULL; pe = pe-next, i++) + { + PLpgSQL_plugin *pl_ptr = pe-plugin_ptr; + + if (pl_ptr-func_beg) + { +void *plugin_info; + +plugin_info = get_plugin_info(estate, i); +(pl_ptr-func_beg)
Re: [HACKERS] GIN improvements part2: fast scan
On 3.2.2014 07:53, Oleg Bartunov wrote: Tomasa, it'd be nice if you use real data in your testing. One very good application of gin fast-scan is dramatic performance improvement of hstore/jsonb @ operator, see slides 57, 58 http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf. I'd like not to lost this benefit :) Oleg PS. I used data delicious-rss-1250k.gz from http://randomwalker.info/data/delicious/ Hi Oleg, I'm working on extending the GIN testing to include this test (and I'll use it to test both for GIN and hstore-v2 patches). I do have the dataset, but I need the queries too - how did you generate the queries for your benchmark? Do you have some query generator at hand? In your Dublin talk I see just this query type select count(*) from hs where h @ 'tags={{term=NYC}}'; but that seems inadequate for representative benchmark. Are there other types of queries that need to be tested / might be interesting? E.g. queries with multiple search terms etc.? regards Tommas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Sun, February 9, 2014 22:35, Tomas Vondra wrote: On 3.2.2014 07:53, Oleg Bartunov wrote: PS. I used data delicious-rss-1250k.gz from http://randomwalker.info/data/delicious/ I'm working on extending the GIN testing to include this test (and I'll use it to test both for GIN and hstore-v2 patches). I do have the dataset, but I need the queries too - how did you generate the queries for your benchmark? Do you have some query generator at hand? In your Dublin talk I see just this query type select count(*) from hs where h @ 'tags={{term=NYC}}'; but that seems inadequate for representative benchmark. Are there other types of queries that need to be tested / might be interesting? E.g. queries with multiple search terms etc.? There is the hstore operators table (Table F-6) that you can perhaps use to generate queries? (I am working on this too.) At least it contains already a handful of queries. To get at the bits in that table, perhaps the perl program attached here helps: http://www.postgresql.org/message-id/f29d70631e2046655c40dfcfce6db3c3.squir...@webmail.xs4all.nl YMMV, I guess... Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 9.2.2014 22:51, Erik Rijkers wrote: On Sun, February 9, 2014 22:35, Tomas Vondra wrote: On 3.2.2014 07:53, Oleg Bartunov wrote: PS. I used data delicious-rss-1250k.gz from http://randomwalker.info/data/delicious/ I'm working on extending the GIN testing to include this test (and I'll use it to test both for GIN and hstore-v2 patches). I do have the dataset, but I need the queries too - how did you generate the queries for your benchmark? Do you have some query generator at hand? In your Dublin talk I see just this query type select count(*) from hs where h @ 'tags={{term=NYC}}'; but that seems inadequate for representative benchmark. Are there other types of queries that need to be tested / might be interesting? E.g. queries with multiple search terms etc.? There is the hstore operators table (Table F-6) that you can perhaps use to generate queries? (I am working on this too.) At least it contains already a handful of queries. To get at the bits in that table, perhaps the perl program attached here helps: http://www.postgresql.org/message-id/f29d70631e2046655c40dfcfce6db3c3.squir...@webmail.xs4all.nl YMMV, I guess... It seems to me the purpose of the program is to demonstrate some differences between expected and actual result of an operator. If that's true then it's not really of much use - it's not a query generator. I need a script that generates large number of 'reasonable' queries, i.e. queries that make sense and resemble what the users actually do. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
On Wed, Jul 4, 2012 at 12:41 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jul 3, 2012 at 11:36 PM, Amit Kapila amit.kap...@huawei.com wrote: Hi Shigeru/Robert, The way fixing oid2name and pgbench seems reasonable, so applying it to vacuumlo (as Peter mentioned) would be enough for this issue. Shall I consider following 2 points to update the patch: 1. Apply changes similar to pgbench and oid2name for vacuumlo 2. Remove the modifications for dblink. I've done these two things and committed this. Along the way, I also fixed it to use a stack-allocated array instead of using malloc, since there's no need to malloc a fixed-size array with 7 elements. Thanks for the patch. Since this commit (17676c785a95b2598c573), pgbench no longer uses .pgpass to obtain passwords, but instead prompts for a password This problem is in 9.3 and 9.4dev According to strace, it is reading the .pgpass file, it just seem like it is not using it. Cheers, Jeff
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
On Sun, Feb 9, 2014 at 6:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Jul 4, 2012 at 12:41 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jul 3, 2012 at 11:36 PM, Amit Kapila amit.kap...@huawei.com wrote: Hi Shigeru/Robert, The way fixing oid2name and pgbench seems reasonable, so applying it to vacuumlo (as Peter mentioned) would be enough for this issue. Shall I consider following 2 points to update the patch: 1. Apply changes similar to pgbench and oid2name for vacuumlo 2. Remove the modifications for dblink. I've done these two things and committed this. Along the way, I also fixed it to use a stack-allocated array instead of using malloc, since there's no need to malloc a fixed-size array with 7 elements. Thanks for the patch. Since this commit (17676c785a95b2598c573), pgbench no longer uses .pgpass to obtain passwords, but instead prompts for a password This problem is in 9.3 and 9.4dev According to strace, it is reading the .pgpass file, it just seem like it is not using it. Hmm. I tried pgbench with the following .pgpass file and it worked OK. Removing the file caused pgbench to prompt for a password. *:*:*:*:foo Presumably whatever behavior difference you are seeing is somehow related to the use of PQconnectdbParams() rather than PQsetdbLogin(), but the fine manual does not appear to document a different between those functions as regards password-prompting behavior or .pgpass usage. My guess is that it's getting as far as PasswordFromFile() and then judging whatever entry you have in the file not to match the connection attempt. If you're correct about this commit being to blame, then my guess is that one of hostname, port, dbname, and username must end up initialized differently when PQconnectdbParams() rather than PQsetdbLogin() is used... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Breaking compile-time dependency cycles of Postgres subdirs?
On Fri, Feb 7, 2014 at 7:39 AM, Christian Convey christian.con...@gmail.com wrote: This question is mostly just curiosity... There are build-time dependency cycles between some of Postgres' code subdirectories. For example, storage and access have such a cycle: storage/buffpage.h #includes access/xlogdefs.h access/visibilitymap.h #includes storage/block.h Has there been any discussion about reorganizing these directories so that no such cycles exist? Not to my knowledge. As someone very new to this code base, I think these cycles make it a little harder to figure out the runtime and compile-time dependencies between the subsystems these directories seem to represent. I wonder if that's a problem others face as well? There are probably some cases that could be improved, but I have my doubts about whether eliminating cycles is a reasonable goal. Sometimes, two modules really do depend on each other. And, you're talking about this not just on the level of individual files but entire subtrees. There are 90,000 lines of code in src/backend/access (whose headers are in src/include/access) and more than 38,000 in src/backend/storage (whose headers are in src/include/storage); expecting all dependencies between those modules to go in one direction doesn't feel terribly reasonable. If it could be done at all, you'd probably end up separating code into lots of little tiny directories, splitting apart modules with logically related functionality into chunks living in entirely different parts of the source tree - and I don't think that would be an improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/05/2014 01:52 PM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: On 02/05/2014 06:29 AM, Tom Lane wrote: I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. I'm kind of horrified that the dynamic linker doesn't throw its toys when it sees this. Indeed :-(. The truly strange part of this is that it seems that the one Windows buildfarm member that's telling the truth (or most nearly so, anyway) is narwhal, which appears to have the oldest and cruftiest toolchain of the lot. I'd really like to come out the other end of this investigation with a clear understanding of why the newer toolchains are failing to report a link problem, and yet not building working executables. For MSVC, here's a patch that makes gendef.pl emit DATA annotations for global var exports. Unfortunately, my Windows test machine has been chewing up its file system with memory errors due to a hardware fault, so compilation attempts (of anything) are currently generating internal compiler errors. The output of the script looks correct, but I can't get a good build with or without the patch. I'll try to get my build box working for testing, but have to get on to other things now, so I won't be able to work further on it today. Also attached is a patch to make vcregress.pl produce a better error message when there's no build output, instead of just reporting that .. is not a recognized internal or external command, operable program, or batch file -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 8a30733daf62afbb4e01869be1032b80f10f5dae Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@2ndquadrant.com Date: Sun, 9 Feb 2014 20:33:56 +0800 Subject: [PATCH 1/2] Emit DATA annotations for global vars in gendef.pl --- src/tools/msvc/gendef.pl | 206 +++ 1 file changed, 155 insertions(+), 51 deletions(-) diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl index 8ef0422..c6710ec 100644 --- a/src/tools/msvc/gendef.pl +++ b/src/tools/msvc/gendef.pl @@ -1,72 +1,176 @@ my @def; +use warnings; +use strict; +use 5.8.0; +use List::Util qw(max); + # # Script that generates a .DEF file for all objects in a directory # # src/tools/msvc/gendef.pl # -die Usage: gendef.pl modulepath platform\n - unless (($ARGV[0] =~ /\\([^\\]+$)/) - ($ARGV[1] == 'Win32' || $ARGV[1] == 'x64')); +sub dumpsyms +{ +my ($objfile, $symfile) = @_; +system(dumpbin /symbols /out:symbols.out $_ NUL) + die Could not call dumpbin; +rename(symbols.out, $symfile); +} + +# Given a symbol file path, loops over its contents +# and returns a list of symbols of interest as a dictionary +# of 'symbolname' - symtype, where symtype is: +# +# 0a CODE symbol, left undecorated in the .DEF +# 1A DATA symbol, i.e. global var export +# +sub extract_syms +{ +my ($symfile, $def) = @_; +open(F, $symfile) || die Could not open $symfile for $_\n; +while (F) +{ +# Expected symbol lines look like: +# +# 0 12 345 6 +# IDX SYMBOL SECT SYMTYPE SYMSTATIC SYMNAME +# +# 02E 0130 SECTA notype External | _standbyState +# 02F 0009 SECT9 notype Static | _LocalRecoveryInProgress +# 064 0020 SECTC notype ()Static | _XLogCheckBuffer +# 065 UNDEF notype ()External | _BufferGetTag +# +# See http://msdn.microsoft.com/en-us/library/b842y285.aspx +# +# We're not interested in the symbol index or offset. +# +# SECT[ION] is only examined to see whether the symbol is defined in a +# COFF section of the local object file; if UNDEF, it's a symbol to be +# resolved at link time from another object so we can't export it. +# +# SYMTYPE is always notype for C symbols as there's no typeinfo and no +# way to get the symbol type from name (de)mangling. However, we care +# if notype is suffixed by () or not. The presence of () means the +# symbol is a function, the absence means it isn't. +# +# SYMSTATIC indicates whether it's a compilation-unit local static +# symbol (Static), or whether it's available for use from other +# compilation units (External). We export all symbols that aren't +# static as part of the whole program DLL interface to produce UNIX-like +# default linkage. +# +# SYMNAME is, obviously, the symbol name. The leading underscore indicates +#
Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration
On Fri, Feb 7, 2014 at 2:44 PM, Tatsuo Ishii is...@postgresql.org wrote: One idea is, calling pgstat_report_activity(STATE_IDLE) in exec_execute_message() of postgres.c. The function has already called pgstat_report_activity(STATE_RUNNING) which shows active state in pg_stat_actviity view. So why cann't we call pgstat_report_activity(STATE_IDLE) here. Somebody might claim that idle is a transaction state term. Idle means The backend is waiting for a new client command., which is certainly not true especially in case of 'E' message as still sync message processing is left. In the case, I propose to add new state name, say finished. So above proposal would calling pgstat_report_activity(STATE_FINISHED) instead. Okay, so by state finish, it can mean The backend has finished execution of a query.. In that case I think this might need to be called at end of exec_simple_query() as well, but then there will be very less difference between idle and finish for simple query. Of course. The argument here could be do we really need a new state for such a short window between completion of 'E' message and processing of 'S' sync message considering updation of state is not a very light call which can be called between processing of 2 messages. It might make sense for cases where sync message processing can take longer time. Would it be not sufficient, If we just explain this in docs. Do users really face any inconvenience or it's a matter of clear understanding for users? Well... maybe it's a matter of doc. Pgpool-II issues such SELECTs intenally to retrieve system catalog info. The query is piggy backed on the same connection to PostgreSQL opend by user (pgpool-II cannot issue sync because it closes the transaction, which in turn closes user's unnamed portal). If user's query is SELECT, it can be sent to standbys because of load balance. After such internal queries are sent to master, which will remain active for long time because sync is not issued. In that case, will it not be better if pgpool-II start a transaction explicitly (BEGIN/START TRANSACTION) rather than relying on automatic commit mode? If you're issuing a flush instead, maybe we could consider whether it's reasonable to do an extra pgstat_report_activity() upon receipt of a flush message. I think it might be reasonable to call pgstat_report_activity(), as sending Flush message indicates user got the complete response of the command sent to backend, but I am not sure if it is good idea to set send_ready_for_query = true; and sending ReadyForQuery() message to FE (Frontend), as it is expected that FE can send more messages like 'B', 'E', so we might need to set the status bases on current state of backend. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Sat, Feb 8, 2014 at 12:10 PM, Peter Eisentraut pete...@gmx.net wrote: On 1/29/14, 7:37 PM, Haribabu Kommi wrote: On Tue, Jan 28, 2014 at 1:17 PM, Peter Eisentraut wrote: On 11/30/13, 6:59 AM, Haribabu kommi wrote: To detect provided data and xlog directories are same or not, I reused the Existing make_absolute_path() code as follows. I note that initdb does not detect whether the data and xlog directories are the same. I think there is no point in addressing this only in pg_basebackup. If we want to forbid it, it should be done in initdb foremost. Thanks for pointing it. if the following approach is fine for identifying the identical directories then I will do the same for initdb also. I wouldn't bother. It's a lot of work for little benefit. Any mistake a user would make can easily be fixed. I also felt a lot of work for little benefit but as of now I am not able to find an easier solution to handle this problem. can we handle the same later if it really requires? -- Regards, Hari Babu Fujitsu Australia Software Technology
Re: [HACKERS] narwhal and PGDLLIMPORT
On Mon, Feb 10, 2014 at 8:51 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 02/05/2014 01:52 PM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: On 02/05/2014 06:29 AM, Tom Lane wrote: I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. I'm kind of horrified that the dynamic linker doesn't throw its toys when it sees this. Indeed :-(. The truly strange part of this is that it seems that the one Windows buildfarm member that's telling the truth (or most nearly so, anyway) is narwhal, which appears to have the oldest and cruftiest toolchain of the lot. I'd really like to come out the other end of this investigation with a clear understanding of why the newer toolchains are failing to report a link problem, and yet not building working executables. For MSVC, here's a patch that makes gendef.pl emit DATA annotations for global var exports. Is this change intended to avoid the usage of __declspec (dllimport) for global variables? Won't it increase the build time for Windows as it seems to me you are traversing each symbol file to find the global vars? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/10/2014 01:59 PM, Amit Kapila wrote: On Mon, Feb 10, 2014 at 8:51 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 02/05/2014 01:52 PM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: On 02/05/2014 06:29 AM, Tom Lane wrote: I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. I'm kind of horrified that the dynamic linker doesn't throw its toys when it sees this. Indeed :-(. The truly strange part of this is that it seems that the one Windows buildfarm member that's telling the truth (or most nearly so, anyway) is narwhal, which appears to have the oldest and cruftiest toolchain of the lot. I'd really like to come out the other end of this investigation with a clear understanding of why the newer toolchains are failing to report a link problem, and yet not building working executables. For MSVC, here's a patch that makes gendef.pl emit DATA annotations for global var exports. Is this change intended to avoid the usage of __declspec (dllimport) for global variables? Won't it increase the build time for Windows as it seems to me you are traversing each symbol file to find the global vars? gendefs.pl does that anyway. This change just annotates emitted entries with DATA if they're exported vars. It takes a couple of seconds to run gendefs.pl, so I don't think it's really a big concern. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/06/2014 01:48 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 02/05/2014 11:40 AM, Tom Lane wrote: switching to binary is the same as text may well be the most prudent path here. If we do that we're going to have to live with that forever, aren't we? Yeah, but the other side of that coin is that we'll have to live forever with whatever binary format we pick, too. If it turns out to be badly designed, that could be much worse than eating some parsing costs during dump/restore. If we had infinite time/manpower, this wouldn't really be an issue. We don't, though, and so I suggest that this may be one of the better things to toss overboard. Can't we just reject attempts to transfer these via binary copy, allowing only a text format? So rather than sending text when the binary is requested, we just require clients to use text for this type. That way it's possible to add the desired binary format later, without rushed decisions. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers