Re: [HACKERS] Vitesse DB call for testing
On 18/10/14 07:13, Josh Berkus wrote: CK, Before we go any further on this, how is Vitesse currently licensed? last time we talked it was still proprietary. If it's not being open-sourced, we likely need to take discussion off this list. +1 Guys, you need to 'fess up on the licensing! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
Indeed! A big part of our implementation is based on the Neumann paper. There are also a few other papers that impacted our implemented: A. Ailamaki, D. DeWitt, M. Hill, D. Wood. DBMSs On A Modern Processor: Where Does Time Go? Peter Boncz, Marcin Zukowski, Niels Nes. MonetDB/X100: Hyper-Pipelining Query Execution M. Zukowski el al. Super-Scalar RAM-CPU Cache Compression Of course, we need to adapt a lot of the design to Postgres to make something that could stand up harmoniously with the Postgres system, and also to take care that we would be able to merge easily with future versions of Postgres -- the implementation needs to be as non-invasive as possible. Regards, -cktan On Fri, Oct 17, 2014 at 8:40 PM, David Gould wrote: > On Fri, 17 Oct 2014 13:12:27 -0400 > Tom Lane wrote: > >> CK Tan writes: >> > The bigint sum,avg,count case in the example you tried has some >> > optimization. We use int128 to accumulate the bigint instead of numeric in >> > pg. Hence the big speed up. Try the same query on int4 for the improvement >> > where both pg and vitessedb are using int4 in the execution. >> >> Well, that's pretty much cheating: it's too hard to disentangle what's >> coming from JIT vs what's coming from using a different accumulator >> datatype. If we wanted to depend on having int128 available we could >> get that speedup with a couple hours' work. >> >> But what exactly are you "compiling" here? I trust not the actual data >> accesses; that seems far too complicated to try to inline. >> >> regards, tom lane >> >> > > I don't have any inside knowledge, but from the presentation given at the > recent SFPUG followed by a bit of google-fu I think these papers are > relevant: > > http://www.vldb.org/pvldb/vol4/p539-neumann.pdf > http://sites.computer.org/debull/A14mar/p3.pdf > > -dg > > -- > David Gould 510 282 0869 da...@sonic.net > If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
On Oct 17, 2014, at 3:18 PM, Tom Lane wrote: > Yeah, exactly. Personally I'm *not* for this, but if we do it we should > do it consistently: every comma-separated list in the SQL syntax should > work the same. PL/pgSQL, too, I presume. D smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Hello, As I said in the previous mail, I looked into the latest PCI DSS 3.0 to find out whether and how pgaudit fulfills the requirements related to auditing. I believe that even the initial version of pgaudit needs to have enough functionalities to meet the requirements of some well-known standard, in order to demonstrate that PostgreSQL provides a really useful auditing feature. I chose PCI DSS because it seems popular worldwide. Most requirement items are met, but some are not or I'm not sure if and/or how. I only listed items which I'd like to ask for opinions. I'm sorry I can't have good suggestions, but I hope this report will lead to good discussion and better auditing feature we can boast of. Of course, I'll suggest any idea when I think of something. [requirement] 3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches: ... 3.4.d Examine a sample of audit logs to confirm that the PAN is rendered unreadable or removed from the logs. [my comment] Do the audit log entries always hide the actual bind parameter values (with $1, $2, etc.) if the application uses parameterized SQL statements? Should we advise users to use parameterized statements in the pgaudit documentation? (I think so) [requirement] 10.2.2 Verify all actions taken by any individual with root or administrative privileges are logged. 10.2.6 Verify the following are logged: . Initialization of audit logs . Stopping or pausing of audit logs. [my comment] The database superuser can hide his activity by "SET pgaudit.log = '';", but this SET is audit-logged. Therefore, I think these requirements is met because the fact that the superuser's suspicious behavior (hiding activity) is recorded. Do you think this interpretation is valid? [requirement] 10.2.3 Verify access to all audit trails is logged. Malicious users often attempt to alter audit logs to hide their actions, and a record of access allows an organization to trace any inconsistencies or potential tampering of the logs to an individual account. Having access to logs identifying changes, additions, and deletions can help retrace steps made by unauthorized personnel. [my comment] I'm totally unsure how this can be fulfilled. [requirement] 10.2.4 Verify invalid logical access attempts are logged. Malicious individuals will often perform multiple access attempts on targeted systems. Multiple invalid login attempts may be an indication of an unauthorized user’s attempts to “brute force” or guess a password. [my comment] Login attempts also need to be audit-logged to meet this requirement. [requirement] 10.2.5.a Verify use of identification and authentication mechanisms is logged. Without knowing who was logged on at the time of an incident, it is impossible to identify the accounts that may have been used. Additionally, malicious users may attempt to manipulate the authentication controls with the intent of bypassing them or impersonating a valid account. [my comment] Can we consider this is met because pgaudit records the session user name? [requirement] 10.3 Record at least the following audit trail entries for all system components for each event: 10.3.4 Verify success or failure indication is included in log entries. 10.3.5 Verify origination of event is included in log entries. [my comment] This doesn't seem to be fulfilled because the failure of SQL statements and the client address are not part of the audit log entry. [requirement] 10.5 Secure audit trails so they cannot be altered. 10.5 Interview system administrators and examine system configurations and permissions to verify that audit trails are secured so that they cannot be altered as follows: 10.5.1 Only individuals who have a job-related need can view audit trail files. Adequate protection of the audit logs includes strong access control (limit access to logs based on “need to know” only), and use of physical or network segregation to make the logs harder to find and modify. Promptly backing up the logs to a centralized log server or media that is difficult to alter keeps the logs protected even if the system generating the logs becomes compromised. 10.5.2 Protect audit trail files from unauthorized modifications. [my comment] I don't know how to achieve these, because the DBA (who starts/stops the server) can modify and delete server log files without any record. [requirement] 10.6 Review logs and security events for all system components to identify anomalies or suspicious activity. Note: Log harvesting, parsing, and alerting tools may be used to meet this Requirement. The log review process does not have to be manual. The use of log harvesting, parsing, and alerting tools can help facilitate the process by identifying log events that need to be reviewed. [my comment] What commercial and open source products are well known as the "log harvesting, parsing, and
Re: [HACKERS] Vitesse DB call for testing
On Fri, 17 Oct 2014 13:12:27 -0400 Tom Lane wrote: > CK Tan writes: > > The bigint sum,avg,count case in the example you tried has some > > optimization. We use int128 to accumulate the bigint instead of numeric in > > pg. Hence the big speed up. Try the same query on int4 for the improvement > > where both pg and vitessedb are using int4 in the execution. > > Well, that's pretty much cheating: it's too hard to disentangle what's > coming from JIT vs what's coming from using a different accumulator > datatype. If we wanted to depend on having int128 available we could > get that speedup with a couple hours' work. > > But what exactly are you "compiling" here? I trust not the actual data > accesses; that seems far too complicated to try to inline. > > regards, tom lane > > I don't have any inside knowledge, but from the presentation given at the recent SFPUG followed by a bit of google-fu I think these papers are relevant: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf http://sites.computer.org/debull/A14mar/p3.pdf -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
I wrote: > Because it needs up-to-date min/max values in order to avoid being > seriously misled about selectivities of values near the endpoints. > See commit 40608e7f949fb7e4025c0ddd5be01939adc79eec. BTW, on re-reading that code I notice that it will happily seize upon the first suitable index ("first" in OID order), regardless of how many lower-order columns that index has got. This doesn't make any difference I think for get_actual_variable_range's own purposes, because it's only expecting to touch the endmost index page regardless. However, in light of Marko's complaint maybe we should teach it to check all the indexes and prefer the matching one with fewest columns? It would only take a couple extra lines of code, and probably not that many added cycles considering we're going to do an index access of some sort. But I'm not sure if it's worth any extra effort --- I think in his example case, there wasn't any narrower index anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
Bruce Momjian writes: > On Fri, Oct 17, 2014 at 06:15:37PM -0400, Tom Lane wrote: >> Those stats were perfectly valid: what the planner is looking for is >> accurate minimum and maximum values for the index's leading column, and >> that's what it got. You're correct that a narrower index could have given >> the same results with a smaller disk footprint, but the planner got the >> results it needed from the index you provided for it to work with. > Uh, why is the optimizer looking at the index on a,b,c and not just the > stats on column a, for example? I am missing something here. Because it needs up-to-date min/max values in order to avoid being seriously misled about selectivities of values near the endpoints. See commit 40608e7f949fb7e4025c0ddd5be01939adc79eec. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
On Fri, Oct 17, 2014 at 06:15:37PM -0400, Tom Lane wrote: > Marko Tiikkaja writes: > > On 10/17/14, 11:59 PM, Tom Lane wrote: > >> Well, the index might've been getting used in queries too in a way that > >> really only involved the first column. I think you're solving the wrong > >> problem here. The right problem is how to identify indexes that are > >> being used in a way that doesn't exploit all the columns. > > > I'm not sure I agree with that. Even if there was some information the > > planner could have extracted out of the index by using all columns (thus > > appearing "fully used" in these hypothetical new statistics), I still > > would've wanted the index gone. But in this particular case, an index > > on foo(a) alone was not selective enough and it would have been a bad > > choice for practically every query, so I'm not sure what good those > > statistics were in the first place. > > Those stats were perfectly valid: what the planner is looking for is > accurate minimum and maximum values for the index's leading column, and > that's what it got. You're correct that a narrower index could have given > the same results with a smaller disk footprint, but the planner got the > results it needed from the index you provided for it to work with. Uh, why is the optimizer looking at the index on a,b,c and not just the stats on column a, for example? I am missing something here. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions
* Bruce Momjian (br...@momjian.us) wrote: > On Thu, Oct 16, 2014 at 12:01:28PM -0400, Stephen Frost wrote: > > This started out as a request for a non-superuser to be able to review > > the log files without needing access to the server. Now, things can > > certainly be set up on the server to import *all* logs and then grant > > access to a non-superuser, but generally it's "I need to review the log > > from X to Y" and not *all* logs need to be stored or kept in PG. > > Why is this patch showing up before being discussed? You are having to > back into the discusion because of this. For my part, I didn't actually see it as being a questionable use-case from the start.. That was obviously incorrect, though I didn't know that previously. The general idea has been discussed a couple of times before, at least as far back as 2005: http://www.postgresql.org/message-id/430f78e0.9020...@cs.concordia.ca It's also a feature available in other databases (at least MySQL and Oracle, but I'm pretty sure others also). I can also recall chatting with folks about it a couple of times over the years at various conferences. Still, perhaps it would have been better to post about the idea before the patch, but hindsight is often 20/20. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Optimizer on sort aggregate
On Fri, Oct 17, 2014 at 6:25 PM, Feng Tian wrote: > I feel sorting string as if it is bytea is particularly interesting. I am > aware Peter G's patch and I think it is great, but for this sort agg case, > first, I believe it is still slower than sorting bytea, and second, Peter > G's patch depends on data. A common long prefix will make the patch less > effective, which is probably not so uncommon (for example, URL with a domain > prefix). I don't see any downside of sort bytea, other than lost the > interest ordering. FWIW, that's probably less true than you'd think. Using Robert's test program: pg@hamster:~/code$ ./strxfrm-binary en_US.UTF-8 "http://www.something"; "http://www.something"; -> 131f1f1b221e1a18101f131419120109090909090909090909090909090909010909090909090909090909090909090901053d014201420444 (59 bytes) pg@hamster:~/code$ ./strxfrm-binary en_US.UTF-8 "http://www.another"; "http://www.another"; -> 131f1f1b220c191a1f13101d01090909090909090909090909090901090909090909090909090909090901053d014201420444 (53 bytes) So the first eight bytes of the first string is 0x131F1F1B221E, and the second 0x131F1F1B220C. The last byte is different. That's because the way the Unicode algorithm [1] works, there is often a significantly greater concentration of entropy in the first 8 bytes as compared to raw C strings compared with memcmp() - punctuation characters and so on are not actually described at the primary weight level. If we can get even a single byte to somewhat differentiate each string, we can still win by a very significant amount - just not an enormous amount. The break even point is hard to characterize exactly, but I'm quite optimistic that a large majority of real-world text sorts will see at least some benefit, while a smaller majority will be much, much faster. [1] http://www.unicode.org/reports/tr10/#Notation -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions
On Thu, Oct 16, 2014 at 12:01:28PM -0400, Stephen Frost wrote: > This started out as a request for a non-superuser to be able to review > the log files without needing access to the server. Now, things can > certainly be set up on the server to import *all* logs and then grant > access to a non-superuser, but generally it's "I need to review the log > from X to Y" and not *all* logs need to be stored or kept in PG. Why is this patch showing up before being discussed? You are having to back into the discusion because of this. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizer on sort aggregate
Hi, David, Yes, switch sorting order would loose an interesting order so if user dictates order by t, i; planner need to resort to its cost model. Estimating cardinality of groupby is a much bigger topic than this thread. I feel sorting string as if it is bytea is particularly interesting. I am aware Peter G's patch and I think it is great, but for this sort agg case, first, I believe it is still slower than sorting bytea, and second, Peter G's patch depends on data. A common long prefix will make the patch less effective, which is probably not so uncommon (for example, URL with a domain prefix). I don't see any downside of sort bytea, other than lost the interest ordering. Thanks, Feng On Fri, Oct 17, 2014 at 4:36 PM, David Rowley wrote: > On Sat, Oct 18, 2014 at 5:10 AM, Feng Tian wrote: > >> Hi, >> >> Consider the following queries. >> >> create table t(i int, j int, k int, t text); >> insert into t select i, i % 100, i % 1000, 'AABBCCDD' || i from >> generate_series(1, 100) i; >> >> ftian=# explain select count(distinct j) from t group by t, i; >>QUERY PLAN >> >> GroupAggregate (cost=158029.84..178029.84 rows=100 width=22) >>-> Sort (cost=158029.84..160529.84 rows=100 width=22) >> Sort Key: t, i >> -> Seq Scan on t (cost=0.00..17352.00 rows=100 width=22) >> (4 rows) >> >> >> The query, >> select count(distinct j) from t group by t, i; >> >> runs for 35 seconds. However, if I change the query to, >> select count(distinct j) from t group by i, t; -- note switching the >> ordering >> select count(distinct j) from t group by decode(t, 'escape'), i; -- >> convert t to bytea >> >> Run times are just about 5 and 6.5 seconds. The reason is clear, compare >> a string with collation is slow, which is well understood by pg hackers. >> However, here, the sorting order is forced by the planner, not user. >> Planner can do the following optimizations, >> >> 1. for the sort we generated for sort agg, planner can switch column >> ordering, put int before string, >> 2. for the sort we generated for sort agg, use bytea compare instead of >> string compare. >> >> They will bring big improvement to this common query. Is this something >> reasonable? >> >> > It seems like it might be worth looking into, but I think it's likely more > complex than sorting the group by order into narrowest column first. > > If the query was: > > select count(distinct j) from t group by t, i order by t, i; > > Then if that was rewritten to make the group by i,t then the planner > would then need to perform an additional sort on t,i to get the correct > order by for the final result, which may or may not be faster, it would > depend on how many groups there were to sort. Though I guess you could > possibly just skip the optimisation if the next node up didn't need any > specific ordering. > > I also wonder if taking into account the column's width is not quite > enough. For example if the 'i' column only had 1 distinct value, then > performing the group by on 'i' first wouldn't help at all. Keep in mind > that the columns could be much closer in width than in your example, e.g > int and bigint. Perhaps you'd need to include some sort of heuristics to > look at the number of distinct values and the width, and form some sort of > weighted estimates about which column to put first. > > Regards > > David Rowley >
Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)
David Rowley writes: > On Fri, Oct 17, 2014 at 3:34 AM, Tom Lane wrote: >> I don't want to go there. It would be a lot better to expend the effort >> on a better regression testing infrastructure that wouldn't *need* >> bitwise-identical output across platforms. (mysql is ahead of us in that >> department: they have some hacks for selective matching of the output.) > Perhaps we could introduce some sort of wildcard matching in the regression > tests. So that we could stick something like: > Execution time: * ms > Into the expected results, though, probably we'd need to come up with some > wildcard character which is a bit less common than * I was imagining that we might as well allow regexp matching, so you could be as specific as Execution time: \d+\.\d+ ms if you had a mind to. But with or without that, it would be difficult to pick a meta-character that never appears in expected-output files today. What we'd probably want to do (again, I'm stealing ideas from what I remember of the mysql regression tests) is add metasyntax to switch between literal and wildcard/regexp matching. So perhaps an expected file could contain something like -- !!match regexp ... expected output including regexps ... -- !!match literal ... normal expected output here Not sure how we get there without writing our own diff engine though :-(. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)
On Fri, Oct 17, 2014 at 3:34 AM, Tom Lane wrote: > Andres Freund writes: > > On 2014-10-16 10:06:59 -0400, Tom Lane wrote: > >> No, it wasn't. I'm not convinced either that that patch will get in at > >> all, or that it has to have regression tests of that particular form, > >> or that such a switch would be sufficient to make such tests platform > >> independent. > > > Why should the EXPLAIN ANALYZE output without timing information be less > > consistent for sensibly selected cases than EXPLAIN itself? > > To take just one example, the performance numbers that get printed for > a sort, such as memory consumption, are undoubtedly platform-dependent. > Maybe your idea of "sensibly selected cases" excludes sorting, but > I don't find such an argument terribly convincing. I think if we go > down this road, we are going to end up with an EXPLAIN that has one > hundred parameters turning on and off tiny pieces of the output, none > of which are of any great use for anything except the regression tests. > I don't want to go there. It would be a lot better to expend the effort > on a better regression testing infrastructure that wouldn't *need* > bitwise-identical output across platforms. (mysql is ahead of us in that > department: they have some hacks for selective matching of the output.) > > I saw this, and I was about to ask the same question as Andres I think I now see what you're worried about. Next we'd need a flag to disable external disk sort sizes too... Perhaps we could introduce some sort of wildcard matching in the regression tests. So that we could stick something like: Execution time: * ms Into the expected results, though, probably we'd need to come up with some wildcard character which is a bit less common than * It might be hard to generate a useful diff with this for when a test fails, but maybe it'd be good enough to just run the 2 files through this wildcard matching programme and then just do a diff if it fails. Regards David Rowley
Re: [HACKERS] Materialized views don't show up in information_schema
* Peter Eisentraut (pete...@gmx.net) wrote: > On 10/16/14 9:45 AM, Stephen Frost wrote: > > Alright, coming back to this, I have to ask- how are matviews different > > from views from the SQL standard's perspective? I tried looking through > > the standard to figure it out (and I admit that I probably missed > > something), but the only thing appears to be a statement in the standard > > that (paraphrased) "functions are run with the view is queried" and that > > strikes me as a relatively minor point.. > > To me, the main criterion is that you cannot DROP VIEW a materialized view. That is an entirely correctable situation. We don't require 'DROP UNLOGGED TABLE'. > Generally, if the information schema claims that a > view/table/function/etc. named "foo" exists, then I should be able to > operate on "foo" using the basic operations for a > view/table/function/etc. of that name. I think think DROP VIEW is a > basic operation for a view. Others might disagree. This strikes me as a reason to allow DROP VIEW and perhaps other operations against a matview, not as a reason why matviews aren't views. > More subtly, if we claim that a materialized view is a view, then we > cannot have asynchronously updated materialized views, because then we > have different semantics. This is, at least, a reason I can understand, though I'm not sure I see it as sufficient to say matviews are so different from views that they shouldn't be listed as such. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Materialized views don't show up in information_schema
* Nicolas Barbier (nicolas.barb...@gmail.com) wrote: > 2014-10-16 Stephen Frost : > > > Alright, coming back to this, I have to ask- how are matviews different > > from views from the SQL standard's perspective? > > Matviews that are always up to date when you access them are > semantically exactly the same as normal views. Matviews that can get > out of date, however, are not. And when we have matviews which can be kept up to date..? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Optimizer on sort aggregate
On Sat, Oct 18, 2014 at 12:35 PM, Tatsuo Ishii wrote: > > The query, > > select count(distinct j) from t group by t, i; > > > > runs for 35 seconds. However, if I change the query to, > > select count(distinct j) from t group by i, t; -- note switching the > > ordering > > select count(distinct j) from t group by decode(t, 'escape'), i; -- > convert > > t to bytea > > > > Run times are just about 5 and 6.5 seconds. The reason is clear, > compare a > > string with collation is slow, which is well understood by pg hackers. > > However, here, the sorting order is forced by the planner, not user. > > Planner can do the following optimizations, > > Interesting. I got following result: > > test=# explain analyze select count(distinct j) from t group by t, i; > QUERY PLAN > > -- > GroupAggregate (cost=137519.84..157519.84 rows=100 width=22) (actual > time=1332.937..2431.238 rows=100 loops=1) >Group Key: t, i >-> Sort (cost=137519.84..140019.84 rows=100 width=22) (actual > time=1332.922..1507.413 rows=100 loops=1) > Sort Key: t, i > Sort Method: external merge Disk: 33232kB > -> Seq Scan on t (cost=0.00..17352.00 rows=100 width=22) > (actual time=0.006..131.406 rows=100 loops=1) > Planning time: 0.031 ms > Execution time: 2484.271 ms > (8 rows) > > Time: 2484.520 ms > > test=# explain analyze select count(distinct j) from t group by i, t; > QUERY PLAN > > -- > GroupAggregate (cost=137519.84..157519.84 rows=100 width=22) (actual > time=602.510..1632.087 rows=100 loops=1) >Group Key: i, t >-> Sort (cost=137519.84..140019.84 rows=100 width=22) (actual > time=602.493..703.274 rows=100 loops=1) > Sort Key: i, t > Sort Method: external sort Disk: 33240kB > -> Seq Scan on t (cost=0.00..17352.00 rows=100 width=22) > (actual time=0.014..129.213 rows=100 loops=1) > Planning time: 0.176 ms > Execution time: 1685.575 ms > (8 rows) > > Time: 1687.641 ms > > Not so big difference here (maybe because I use SSD) but there is > still about 50% difference in execution time. Note that I disable > locale support. > > I think this is more likely your locale settings, as if I do: create table t(i int, j int, k int, t text collate "C"); The GROUP BY t,i runs about 25% faster. I've not looked at it yet, but Peter G's patch here https://commitfest.postgresql.org/action/patch_view?id=1462 will quite likely narrow the performance gap between the 2 queries. Regards David Rowley
Re: [HACKERS] Optimizer on sort aggregate
On Sat, Oct 18, 2014 at 5:10 AM, Feng Tian wrote: > Hi, > > Consider the following queries. > > create table t(i int, j int, k int, t text); > insert into t select i, i % 100, i % 1000, 'AABBCCDD' || i from > generate_series(1, 100) i; > > ftian=# explain select count(distinct j) from t group by t, i; >QUERY PLAN > > GroupAggregate (cost=158029.84..178029.84 rows=100 width=22) >-> Sort (cost=158029.84..160529.84 rows=100 width=22) > Sort Key: t, i > -> Seq Scan on t (cost=0.00..17352.00 rows=100 width=22) > (4 rows) > > > The query, > select count(distinct j) from t group by t, i; > > runs for 35 seconds. However, if I change the query to, > select count(distinct j) from t group by i, t; -- note switching the > ordering > select count(distinct j) from t group by decode(t, 'escape'), i; -- > convert t to bytea > > Run times are just about 5 and 6.5 seconds. The reason is clear, compare > a string with collation is slow, which is well understood by pg hackers. > However, here, the sorting order is forced by the planner, not user. > Planner can do the following optimizations, > > 1. for the sort we generated for sort agg, planner can switch column > ordering, put int before string, > 2. for the sort we generated for sort agg, use bytea compare instead of > string compare. > > They will bring big improvement to this common query. Is this something > reasonable? > > It seems like it might be worth looking into, but I think it's likely more complex than sorting the group by order into narrowest column first. If the query was: select count(distinct j) from t group by t, i order by t, i; Then if that was rewritten to make the group by i,t then the planner would then need to perform an additional sort on t,i to get the correct order by for the final result, which may or may not be faster, it would depend on how many groups there were to sort. Though I guess you could possibly just skip the optimisation if the next node up didn't need any specific ordering. I also wonder if taking into account the column's width is not quite enough. For example if the 'i' column only had 1 distinct value, then performing the group by on 'i' first wouldn't help at all. Keep in mind that the columns could be much closer in width than in your example, e.g int and bigint. Perhaps you'd need to include some sort of heuristics to look at the number of distinct values and the width, and form some sort of weighted estimates about which column to put first. Regards David Rowley
Re: [HACKERS] Optimizer on sort aggregate
> The query, > select count(distinct j) from t group by t, i; > > runs for 35 seconds. However, if I change the query to, > select count(distinct j) from t group by i, t; -- note switching the > ordering > select count(distinct j) from t group by decode(t, 'escape'), i; -- convert > t to bytea > > Run times are just about 5 and 6.5 seconds. The reason is clear, compare a > string with collation is slow, which is well understood by pg hackers. > However, here, the sorting order is forced by the planner, not user. > Planner can do the following optimizations, Interesting. I got following result: test=# explain analyze select count(distinct j) from t group by t, i; QUERY PLAN -- GroupAggregate (cost=137519.84..157519.84 rows=100 width=22) (actual time=1332.937..2431.238 rows=100 loops=1) Group Key: t, i -> Sort (cost=137519.84..140019.84 rows=100 width=22) (actual time=1332.922..1507.413 rows=100 loops=1) Sort Key: t, i Sort Method: external merge Disk: 33232kB -> Seq Scan on t (cost=0.00..17352.00 rows=100 width=22) (actual time=0.006..131.406 rows=100 loops=1) Planning time: 0.031 ms Execution time: 2484.271 ms (8 rows) Time: 2484.520 ms test=# explain analyze select count(distinct j) from t group by i, t; QUERY PLAN -- GroupAggregate (cost=137519.84..157519.84 rows=100 width=22) (actual time=602.510..1632.087 rows=100 loops=1) Group Key: i, t -> Sort (cost=137519.84..140019.84 rows=100 width=22) (actual time=602.493..703.274 rows=100 loops=1) Sort Key: i, t Sort Method: external sort Disk: 33240kB -> Seq Scan on t (cost=0.00..17352.00 rows=100 width=22) (actual time=0.014..129.213 rows=100 loops=1) Planning time: 0.176 ms Execution time: 1685.575 ms (8 rows) Time: 1687.641 ms Not so big difference here (maybe because I use SSD) but there is still about 50% difference in execution time. Note that I disable locale support. > 1. for the sort we generated for sort agg, planner can switch column > ordering, put int before string, > 2. for the sort we generated for sort agg, use bytea compare instead of > string compare. > > They will bring big improvement to this common query. Is this something > reasonable? Not looking at sort and planner code closely, I guess planner does not recognize the cost difference between "external merge" and "external sort" because cost estimations for sort step in each plan are exactly same (137519.84..140019.84). However I am not sure if we should fix the planner or should fix our sort machinery since it is possible that the sort machinery should not expose such a big difference in the two sort methods. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
On 10/18/14, 12:15 AM, Tom Lane wrote: Marko Tiikkaja writes: I think there's a big difference between "this index was used to look up stuff for planning" and "this index was used to answer queries quickly". I think that's utter nonsense. Well you probably know a bit more about the optimizer than I do. But I can't see a case where the stats provided by the index would be useful for choosing between two (or more) plans that don't use the index in the actual query. If you're saying that there are such cases, then clearly I don't know something, and my thinking is in the wrong here. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow format 0000-0000-0000 in postgresql MAC parser
> It has been registered now > (https://commitfest.postgresql.org/action/patch_view?id=1585). I've got > an updated version of the patch with the documentation fix. > Looks like the patch is all good. I'm marking as ready for commiter. On a side note, i'm noticing from http://en.wikipedia.org/wiki/MAC_address, that there is three numbering namespace for MAC: MAC-48, EUI-48 and EUI-64. The last one is 64 bits long (8 bytes). Currently PostgreSQL's macaddr is only 6 bytes long. Should we change it to 8 bytes (not in this patch, of course)? Regards, -- Ali Akbar
Re: [HACKERS] Superuser connect during smart shutdown
On 10/16/14, 11:46 PM, David G Johnston wrote: Tom Lane-2 wrote Something else mentioned was that once you start a smart shutdown you have no good way (other than limited ps output) to see what the shutdown is waiting on. I'd like to have some way to get back into the database to see what's going on. Perhaps we could allow superusers to connect while waiting for shutdown. I think this idea is going to founder on the fact that the postmaster has no way to tell whether an incoming connection is for a superuser. You don't find that out until you've connected to a database and run a transaction (so you can read pg_authid). And by that point, you've already had a catastrophic impact on any attempt to shut things down. This quote from the documentation seems suspect in light of your comment... "While backup mode is active, new connections will still be allowed, but only to superusers (this exception allows a superuser to connect to terminate online backup mode)." http://www.postgresql.org/docs/9.3/interactive/server-shutdown.html check_hba() does if (!check_role(port->user_name, roleid, hba->roles)) continue; And check_role(char **newval, void **extra, GucSource source) does is_superuser = ((Form_pg_authid) GETSTRUCT(roleTup))->rolsuper; ... myextra->roleid = roleid; myextra->is_superuser = is_superuser; *extra = (void *) myextra; So presumably with some changes to how we're calling check_role() we could determine if port->user_name is a superuser. I also like the idea of specifying that a connection should be terminated by a smart shutdown; I agree that'd be useful for monitoring tools and what-not. If folks agree with that I can take a stab at implementing it. Since I tend to be paranoid, I like smart being the default, but seems I'm in the minority there. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
Jim Nasby writes: > As I originally posted, if we're going to do this I think we should do it > *EVERYWHERE* commas are used as delimiters, save COPY input and output. Or we > should at least get close to doing it everywhere. I think the only way things > could get more annoying is if we accepted extra commas in SELECT but not in > CREATE TABLE (as one example). > To me completeness is more important than whether we do it or not; that said, > I like the idea (as well as supporting leading extra commas). Yeah, exactly. Personally I'm *not* for this, but if we do it we should do it consistently: every comma-separated list in the SQL syntax should work the same. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
Marko Tiikkaja writes: > On 10/17/14, 11:59 PM, Tom Lane wrote: >> Well, the index might've been getting used in queries too in a way that >> really only involved the first column. I think you're solving the wrong >> problem here. The right problem is how to identify indexes that are >> being used in a way that doesn't exploit all the columns. > I'm not sure I agree with that. Even if there was some information the > planner could have extracted out of the index by using all columns (thus > appearing "fully used" in these hypothetical new statistics), I still > would've wanted the index gone. But in this particular case, an index > on foo(a) alone was not selective enough and it would have been a bad > choice for practically every query, so I'm not sure what good those > statistics were in the first place. Those stats were perfectly valid: what the planner is looking for is accurate minimum and maximum values for the index's leading column, and that's what it got. You're correct that a narrower index could have given the same results with a smaller disk footprint, but the planner got the results it needed from the index you provided for it to work with. > I think there's a big difference between "this index was used to look up > stuff for planning" and "this index was used to answer queries quickly". I think that's utter nonsense. Even if there were any validity to the position, it wouldn't be enough to justify doubling the stats footprint in order to track system-driven accesses separately from query-driven ones. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
On 10/16/14, 11:48 PM, David Johnston wrote: We might as well allow a final trailing (or initial leading) comma on a values list at the same time: do you know, so this feature is a proprietary and it is not based on ANSI/SQL? Any user, that use this feature and will to port to other database will hate it. I've got no complaint if "at the same time" means that neither behavior is ever implemented... As I originally posted, if we're going to do this I think we should do it *EVERYWHERE* commas are used as delimiters, save COPY input and output. Or we should at least get close to doing it everywhere. I think the only way things could get more annoying is if we accepted extra commas in SELECT but not in CREATE TABLE (as one example). To me completeness is more important than whether we do it or not; that said, I like the idea (as well as supporting leading extra commas). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
On 10/17/14, 11:59 PM, Tom Lane wrote: Marko Tiikkaja writes: On 10/17/14, 11:47 PM, Tom Lane wrote: Marko Tiikkaja writes: So what I'd like to have is a way to be able to distinguish between indexes being used to answer queries, and ones being only used for stats lookups during planning. Why? Used is used. Because I don't need a 30GB index on foo(a,b,c) to look up statistics. If I ever have a problem, I can replace it with a 5GB one on foo(a). Well, the index might've been getting used in queries too in a way that really only involved the first column. I think you're solving the wrong problem here. The right problem is how to identify indexes that are being used in a way that doesn't exploit all the columns. I'm not sure I agree with that. Even if there was some information the planner could have extracted out of the index by using all columns (thus appearing "fully used" in these hypothetical new statistics), I still would've wanted the index gone. But in this particular case, an index on foo(a) alone was not selective enough and it would have been a bad choice for practically every query, so I'm not sure what good those statistics were in the first place. I think there's a big difference between "this index was used to look up stuff for planning" and "this index was used to answer queries quickly". In my mind the first one belongs to the category "this index was considered", and the latter is "this index was actually useful". But maybe I'm not seeing the big picture here. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue with mkdtemp() in port.h
Caleb Welton writes: > A little while back some users started complaining that the contrib module > I develop (MADlib) was failing to build with the following error: > /usr/include/postgresql/9.2/server/port.h:480:32: error: declaration of > 'char* mkdtemp(char*)' has a different exception specifier > /usr/include/stdlib.h:663:14: error: from previous declaration 'char* > mkdtemp(char*) throw ()' > After some research I've tracked this down to the following commit from ~4 > months ago: > https://github.com/postgres/postgres/commit/a919937f112eb2f548d5f9bd1b3a7298375e6380 Hm. Looks like the extern that added should have been protected by #ifndef HAVE_MKDTEMP similarly to the other cases where we conditionally provide a substitute function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
On 10/17/14, 4:49 PM, Marko Tiikkaja wrote: On 10/17/14, 11:47 PM, Tom Lane wrote: Marko Tiikkaja writes: This week we had one of the most annoying problems I've ever encountered with postgres. We had a big index on multiple columns, say, foo(a, b, c). According to pg_stat_all_indexes the index was being used *all the time*. However, after looking into our queries more closely, it turns out that it was only being used to look up statistics for the foo.a column to estimate merge scan viability during planning. But this took hours for two people to track down. So what I'd like to have is a way to be able to distinguish between indexes being used to answer queries, and ones being only used for stats lookups during planning. Why? Used is used. Because I don't need a 30GB index on foo(a,b,c) to look up statistics. If I ever have a problem, I can replace it with a 5GB one on foo(a). That problem can exist with user queries too. Perhaps it would be better to find a way to count scans that didn't use all the fields in the index. I do also see value in differentiating planning use from real query processing; not doing that can certainly cause confusion. What I don't know is if the added stats bloat is worth it. If we do go down that road, I think it'd be better to add an indicator to EState. Aside from allowing stats for all planning access, it should make it less likely that someone adds a new access path and forgets to mark it as internal (especially if the added field defaults to an invalid value). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
Marko Tiikkaja writes: > On 10/17/14, 11:47 PM, Tom Lane wrote: >> Marko Tiikkaja writes: >>> So what I'd like to have is a way to be able to distinguish between >>> indexes being used to answer queries, and ones being only used for stats >>> lookups during planning. >> Why? Used is used. > Because I don't need a 30GB index on foo(a,b,c) to look up statistics. > If I ever have a problem, I can replace it with a 5GB one on foo(a). Well, the index might've been getting used in queries too in a way that really only involved the first column. I think you're solving the wrong problem here. The right problem is how to identify indexes that are being used in a way that doesn't exploit all the columns. Which is not necessarily wrong in itself --- what you'd want is to figure out when the last column(s) are *never* used. The existing stats aren't terribly helpful for that, I agree. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
On 10/17/14, 11:47 PM, Tom Lane wrote: Marko Tiikkaja writes: This week we had one of the most annoying problems I've ever encountered with postgres. We had a big index on multiple columns, say, foo(a, b, c). According to pg_stat_all_indexes the index was being used *all the time*. However, after looking into our queries more closely, it turns out that it was only being used to look up statistics for the foo.a column to estimate merge scan viability during planning. But this took hours for two people to track down. So what I'd like to have is a way to be able to distinguish between indexes being used to answer queries, and ones being only used for stats lookups during planning. Why? Used is used. Because I don't need a 30GB index on foo(a,b,c) to look up statistics. If I ever have a problem, I can replace it with a 5GB one on foo(a). .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
Marko Tiikkaja writes: > This week we had one of the most annoying problems I've ever encountered > with postgres. We had a big index on multiple columns, say, foo(a, b, > c). According to pg_stat_all_indexes the index was being used *all the > time*. However, after looking into our queries more closely, it turns > out that it was only being used to look up statistics for the foo.a > column to estimate merge scan viability during planning. But this took > hours for two people to track down. > So what I'd like to have is a way to be able to distinguish between > indexes being used to answer queries, and ones being only used for stats > lookups during planning. Why? Used is used. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
Hi, This week we had one of the most annoying problems I've ever encountered with postgres. We had a big index on multiple columns, say, foo(a, b, c). According to pg_stat_all_indexes the index was being used *all the time*. However, after looking into our queries more closely, it turns out that it was only being used to look up statistics for the foo.a column to estimate merge scan viability during planning. But this took hours for two people to track down. So what I'd like to have is a way to be able to distinguish between indexes being used to answer queries, and ones being only used for stats lookups during planning. Perhaps the easiest way would be adding a new column or two into pg_stat_all_indexes, which we would increment in get_actual_variable_range() when fetching data. Any thoughts? .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json function volatility
Alvaro Herrera writes: > Merlin Moncure wrote: >> On Fri, Oct 17, 2014 at 3:03 PM, Andrew Dunstan wrote: >>> Following up something Pavel wrote, I notice that json_agg() and >>> json_object_agg() are both marked as immutable, even though they invoke IO >>> functions, while json_object is marked stable, even though it does not, and >>> can probably be marked as immutable. Mea maxima culpa. >>> I'm not sure what we should do about these things now. Is it a tragedy if we >>> let these escape into the 9.4 release that way? >> Is it too late to change them? > One thing to consider is the catversion bump, which we don't want this > late in the cycle. Still, you could change the catalogs but not the > version, and advise those with the older definitions to tweak the > catalogs by hand if they need it. I think we did this once. I'm fairly sure that the system doesn't actually pay attention to the volatility marking of aggregates, so there's no huge harm done by the incorrect markings of those. The incorrect marking of json_object might be a small optimization block. +1 for changing these in 9.4 without bumping catversion. I don't think we need to give advice for manual corrections, either: the risk of doing that wrong probably outweighs the value of fixing it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views don't show up in information_schema
On 10/17/14, 4:31 AM, David G Johnston wrote: Since the standard doesn't distinguish between read and write aspects of the object types there isn't a safe way to add matviews to the information schema that doesn't violate the intent of the provided view. If the application/users wants to support/use PostgreSQL specific features it/they have to be ready and able to use the catalog. +1. If we add matviews to information_schema while they're not part of that standard then we're going to regret it at some point. Perhaps the answer to this problem is to restart the old pg_newsysviews project. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json function volatility
Merlin Moncure wrote: > On Fri, Oct 17, 2014 at 3:03 PM, Andrew Dunstan wrote: > > Following up something Pavel wrote, I notice that json_agg() and > > json_object_agg() are both marked as immutable, even though they invoke IO > > functions, while json_object is marked stable, even though it does not, and > > can probably be marked as immutable. Mea maxima culpa. > > > > I'm not sure what we should do about these things now. Is it a tragedy if we > > let these escape into the 9.4 release that way? > > Is it too late to change them? One thing to consider is the catversion bump, which we don't want this late in the cycle. Still, you could change the catalogs but not the version, and advise those with the older definitions to tweak the catalogs by hand if they need it. I think we did this once. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json function volatility
On Fri, Oct 17, 2014 at 1:44 PM, Merlin Moncure wrote: > Is it too late to change them? Either way, it seems fairly > implausible someone would come up with a case to stick json_agg(), or > any aggregate function really, inside of an index. So it's not exactly > the crime of the century. Indexes reject aggregates outright as acceptable expressions for indexing. That's not the only issue, of course. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Optimizer on sort aggregate
Hi, Consider the following queries. create table t(i int, j int, k int, t text); insert into t select i, i % 100, i % 1000, 'AABBCCDD' || i from generate_series(1, 100) i; ftian=# explain select count(distinct j) from t group by t, i; QUERY PLAN GroupAggregate (cost=158029.84..178029.84 rows=100 width=22) -> Sort (cost=158029.84..160529.84 rows=100 width=22) Sort Key: t, i -> Seq Scan on t (cost=0.00..17352.00 rows=100 width=22) (4 rows) The query, select count(distinct j) from t group by t, i; runs for 35 seconds. However, if I change the query to, select count(distinct j) from t group by i, t; -- note switching the ordering select count(distinct j) from t group by decode(t, 'escape'), i; -- convert t to bytea Run times are just about 5 and 6.5 seconds. The reason is clear, compare a string with collation is slow, which is well understood by pg hackers. However, here, the sorting order is forced by the planner, not user. Planner can do the following optimizations, 1. for the sort we generated for sort agg, planner can switch column ordering, put int before string, 2. for the sort we generated for sort agg, use bytea compare instead of string compare. They will bring big improvement to this common query. Is this something reasonable? Thanks,
[HACKERS] Issue with mkdtemp() in port.h
A little while back some users started complaining that the contrib module I develop (MADlib) was failing to build with the following error: -- /usr/include/postgresql/9.2/server/port.h:480:32: error: declaration of 'char* mkdtemp(char*)' has a different exception specifier /usr/include/stdlib.h:663:14: error: from previous declaration 'char* mkdtemp(char*) throw ()' -- After some research I've tracked this down to the following commit from ~4 months ago: -- https://github.com/postgres/postgres/commit/a919937f112eb2f548d5f9bd1b3a7298375e6380 -- Which added a definition of mkdtemp into port.h that conflicts with the definition in the system header files. The following is a simple program that demonstrates the issue: -- bash$ cat /tmp/foo.cpp #include "postgres.h" int main() { return 0; } bash$ gcc -o foo foo.cpp -I`pg_config --includedir-server` -pedantic In file included from /usr/pgsql-9.2/include/server/c.h:860, from /usr/pgsql-9.2/include/server/postgres.h:47, from foo.cpp:1: /usr/pgsql-9.2/include/server/port.h:479: error: declaration of ‘char* mkdtemp(char*)’ throws different exceptions /usr/include/stdlib.h:663: error: from previous declaration ‘char* mkdtemp(char*) throw ()’ -- Reproducible on ubuntu 14.04, centos6, and likely others. Regards, Caleb
Re: [HACKERS] json function volatility
On Fri, Oct 17, 2014 at 3:03 PM, Andrew Dunstan wrote: > Following up something Pavel wrote, I notice that json_agg() and > json_object_agg() are both marked as immutable, even though they invoke IO > functions, while json_object is marked stable, even though it does not, and > can probably be marked as immutable. Mea maxima culpa. > > I'm not sure what we should do about these things now. Is it a tragedy if we > let these escape into the 9.4 release that way? Is it too late to change them? Either way, it seems fairly implausible someone would come up with a case to stick json_agg(), or any aggregate function really, inside of an index. So it's not exactly the crime of the century. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views don't show up in information_schema
On 10/16/14 9:45 AM, Stephen Frost wrote: > Alright, coming back to this, I have to ask- how are matviews different > from views from the SQL standard's perspective? I tried looking through > the standard to figure it out (and I admit that I probably missed > something), but the only thing appears to be a statement in the standard > that (paraphrased) "functions are run with the view is queried" and that > strikes me as a relatively minor point.. To me, the main criterion is that you cannot DROP VIEW a materialized view. Generally, if the information schema claims that a view/table/function/etc. named "foo" exists, then I should be able to operate on "foo" using the basic operations for a view/table/function/etc. of that name. I think think DROP VIEW is a basic operation for a view. Others might disagree. More subtly, if we claim that a materialized view is a view, then we cannot have asynchronously updated materialized views, because then we have different semantics. All of this is a judgement call in corner cases. But I don't think this is a corner case at all. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] json function volatility
Following up something Pavel wrote, I notice that json_agg() and json_object_agg() are both marked as immutable, even though they invoke IO functions, while json_object is marked stable, even though it does not, and can probably be marked as immutable. Mea maxima culpa. I'm not sure what we should do about these things now. Is it a tragedy if we let these escape into the 9.4 release that way? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP signatures
On Mon, Sep 15, 2014 at 4:37 AM, Marko Tiikkaja wrote: > > I've changed the patch back to ignore signatures when not using the > decrypt_verify() functions in the attached. Hi Marko, This patch needs a rebase now that the armor header patch has been committed. Thanks, Jeff
Re: [HACKERS] 2014-10 CommitFest
Kevin Grittner wrote: > Unless someone else wants to pick it up, I'll manage this one. > Since there was no previous warning, I'll allow a grace day for the > cut-off on submissions for this CF; if it isn't registered in the > web application 24 hours after this email, I will move it to the > next CF. So look for those patches that "fell through the cracks" > without getting registered, and post the ones you've got. [ rings bell ] So the 2014-10 CF is now In Progress and the 2014-12 CF is Open. There are 86 patches listed at this time: Needs Review: 69 Waiting on Author: 5 Ready for Committer: 10 Committed: 2 We've got four weeks if we're going to return or commit all of these by the scheduled end of the CF. Since many of these have already had significant review, my hope is that we can "hit the ground running" and wrap things up in a timely fashion so we have a break before the start of the *next* CF. Please remember, if you are submitting patches, you should be reviewing patches, too. Otherwise development will hit a logjam. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] documentation update for doc/src/sgml/func.sgml
On 09/14/2014 06:32 PM, Peter Eisentraut wrote: On 9/12/14 3:13 PM, Andreas 'ads' Scherbaum wrote: Of course a general rule how to link to WP would be nice ... I think Wikipedia links should be avoided altogether. We can assume that readers are technically proficient to look up general technical concepts on their own using a reference system of their choice. In cases where a link is warranted, it is better to construct a proper bibliographic citation to the primary source material, such as an IEEE standard or an academic paper, in a way that will stand the test of time. That's a clear statement, and makes sense. Should be written down somewhere, so it can be found again. Independent of that, it is actually not correct that "we use the IEEE's rules", because "we" don't use any rules, that is up to the operating system/platform. While most platforms indeed do use the IEEE floating-point standard more less, some don't. Section 8.1.3 tries to point that out. New version attached, WP link removed, wording changed. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 13c71af..d54cf58 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -924,6 +924,25 @@ + +For functions like round(), log() and sqrt() which run against +either fixed-precision (NUMERIC) or floating-point numbers (e.g. REAL), +note that the results of these operations will differ according +to the input type due to rounding. This is most observable with +round(), which can end up rounding down as well as up for +any #.5 value. PostgreSQL's handling +of floating-point values depends on the operating system, which +may or may not follow the IEEE floating-point standard. + + + +The bitwise operators work only on integral data types, whereas +the others are available for all numeric data types. The bitwise +operators are also available for the bit string types +bit and bit varying, as +shown in . + + shows functions for generating random numbers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: dynahash replacement for buffer table
On 2014-10-16 20:22:24 -0400, Robert Haas wrote: > On Thu, Oct 16, 2014 at 6:53 PM, Andres Freund wrote: > > When using shared_buffers = 96GB there's a performance benefit, but not > > huge: > > master (f630b0dd5ea2de52972d456f5978a012436115e): 153621.8 > > master + LW_SHARED + lockless StrategyGetBuffer(): 477118.4 > > master + LW_SHARED + lockless StrategyGetBuffer() + chash: 496788.6 > > master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 499562.7 > > > > But with shared_buffers = 16GB: > > master (f630b0dd5ea2de52972d456f5978a012436115e): 177302.9 > > master + LW_SHARED + lockless StrategyGetBuffer(): 206172.4 > > master + LW_SHARED + lockless StrategyGetBuffer() + chash: 413344.1 > > master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 426405.8 > > Very interesting. This doesn't show that chash is the right solution, > but it definitely shows that doing nothing is the wrong solution. Absolutely. The thing worrying me most (but not all that much on an absolute scale) about chash is that lots of the solutions to memory management it builds are specific to it... And generalizing afterwards will be hard because we'll have to prove that that general solution is as performant as the special case code... > It shows that, even with the recent bump to 128 lock manager > partitions, and LW_SHARED on top of that, workloads that actually > update the buffer mapping table still produce a lot of contention > there. FWIW, I couldn't see much of a benefit without LW_SHARED even though I a *few* things. The bottleneck simply is completely elsewhere. > This hasn't been obvious to me from profiling, but the numbers > above make it pretty clear. So I'm not super surprised about that. There very well might be cases where it's a bad bottleneck before, but I've not seen them. > It also seems to suggest that trying to get rid of the memory barriers > isn't a very useful optimization project. I'm not yet convinced of that. Yes, it's not showing up hugely in the numbers here, but that's simply because the workload is again completely dominated by the kernel copying data for the read()s, GetSnapshotData(), the buffer mapping cache misses and a few other familiar friends. > We might get a couple of > percent out of it, but it's pretty small potatoes, so unless it can be > done more easily than I suspect, it's probably not worth bothering > with. I still think that moving the barrier to the reading side would be simple (implementation wise) and correct for x86. If you think about it, otherwise our spinlock implementation for x86 would be broken. As a unlock is just a compiler barrier the full barrier on acquire better be a full synchronization point. Am I missing something? > An approach I think might have more promise is to have bufmgr.c > call the CHash stuff directly instead of going through buf_table.c. I don't see all that much point in buf_table.c currently - on the other hand it has lead to it replacing the buffer mapping being slightly easier... Maybe it should just go to a header... > Right now, for example, BufferAlloc() creates and initializes a > BufferTag and passes a pointer to that buffer tag to BufTableLookup, > which copies it into a BufferLookupEnt. But it would be just as easy > for BufferAlloc() to put the BufferLookupEnt on its own stack, and > then you wouldn't need to copy the data an extra time. Now a 20-byte > copy isn't a lot, but it's completely unnecessary and looks easy to > get rid of. Worthwile to try. > > I had to play with setting max_connections+1 sometimes to get halfway > > comparable results for master - unaligned data otherwise causes wierd > > results otherwise. Without doing that the performance gap between master > > 96/16G was even bigger. We really need to fix that... > > > > This is pretty awesome. > > Thanks. I wasn't quite sure how to test this or where the workloads > that it would benefit would be found, so I appreciate you putting time > into it. And I'm really glad to hear that it delivers good results. I wasn't sure either ;). Lucky that it played out so impressively. After the first results I was nearly ready to send out a 'Meh.' type of message ;) Btw, CHashTableGetNode isn't exactly cheap. It shows up noticeably in profiles. It results in several non-pipelineable instructions in a already penalized section of the code... Replacing arena_stride by a constant provided measurable improvements (no imul is required anymore, instead you get shr;lea or so). I'm not sure how to deal with that. If it shows up even on my quite new laptop, it'll be more so noticeable on older x86 platforms. > I think it would be useful to plumb the chash statistics into the > stats collector or at least a debugging dump of some kind for testing. I'm not sure it's solid enough at this point to be in the stats collector. But I surely would like to access it somehow. I'm e.g. absolutely not s
Re: [HACKERS] Vitesse DB call for testing
Happy to contribute to that decision :-) On Fri, Oct 17, 2014 at 11:35 AM, Tom Lane wrote: > Andres Freund writes: >> On 2014-10-17 13:12:27 -0400, Tom Lane wrote: >>> Well, that's pretty much cheating: it's too hard to disentangle what's >>> coming from JIT vs what's coming from using a different accumulator >>> datatype. If we wanted to depend on having int128 available we could >>> get that speedup with a couple hours' work. > >> I think doing that when configure detects int128 would make a great deal >> of sense. > > Yeah, I was wondering about that myself: use int128 if available, > else fall back on existing code path. > > regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash index creation warning
On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote: > David G Johnston writes: > > The question is whether we explain the implications of not being WAL-logged > > in an error message or simply state the fact and let the documentation > > explain the hazards - basically just output: > > "hash indexes are not WAL-logged and their use is discouraged" > > +1. The warning message is not the place to be trying to explain all the > details. OK, updated patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml new file mode 100644 index e469b17..43df32f *** a/doc/src/sgml/ref/create_index.sgml --- b/doc/src/sgml/ref/create_index.sgml *** Indexes: *** 474,480 Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. ! For these reasons, hash index use is presently discouraged. --- 474,481 Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. ! Hash indexes are also not properly restored during point-in-time ! recovery. For these reasons, hash index use is presently discouraged. diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c new file mode 100644 index 8a1cb4b..3c1e90e *** a/src/backend/commands/indexcmds.c --- b/src/backend/commands/indexcmds.c *** DefineIndex(Oid relationId, *** 491,497 if (strcmp(accessMethodName, "hash") == 0) ereport(WARNING, ! (errmsg("hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers"))); if (stmt->unique && !accessMethodForm->amcanunique) ereport(ERROR, --- 491,497 if (strcmp(accessMethodName, "hash") == 0) ereport(WARNING, ! (errmsg("hash indexes are not WAL-logged and their use is discouraged"))); if (stmt->unique && !accessMethodForm->amcanunique) ereport(ERROR, diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out new file mode 100644 index a2bef7a..8326e94 *** a/src/test/regress/expected/create_index.out --- b/src/test/regress/expected/create_index.out *** DROP TABLE array_gin_test; *** 2238,2250 -- HASH -- CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops); -- -- Test functional index --- 2238,2250 -- HASH -- CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops); ! WARNING: hash indexes are not WAL-logged and their use is discouraged CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops); ! WARNING: hash indexes are not WAL-logged and their use is discouraged CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops); ! WARNING: hash indexes are not WAL-logged and their use is discouraged CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops); ! WARNING: hash indexes are not WAL-logged and their use is discouraged -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops); -- -- Test functional index diff --git a/src/test/regress/expected/enum.out b/src/test/regress/expected/enum.out new file mode 100644 index fa23b52..1a61a5b *** a/src/test/regress/expected/enum.out --- b/src/test/regress/expected/enum.out *** DROP INDEX enumtest_btree; *** 383,389 -- Hash index / opclass with the = operator -- CREATE INDEX enumtest_hash ON enumtest USING hash (col); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers SELECT * FROM enumtest WHERE col = 'orange'; col --- 383,389 -- Hash index / opclass with the = operator -- CREATE INDEX enumtest_hash ON enumtest USING hash (col); ! WARNING: hash indexes are not WAL-logged and their use is discourag
Re: [HACKERS] Vitesse DB call for testing
Andres Freund writes: > On 2014-10-17 13:12:27 -0400, Tom Lane wrote: >> Well, that's pretty much cheating: it's too hard to disentangle what's >> coming from JIT vs what's coming from using a different accumulator >> datatype. If we wanted to depend on having int128 available we could >> get that speedup with a couple hours' work. > I think doing that when configure detects int128 would make a great deal > of sense. Yeah, I was wondering about that myself: use int128 if available, else fall back on existing code path. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
On Fri, Oct 17, 2014 at 1:21 PM, Peter Geoghegan wrote: > On Fri, Oct 17, 2014 at 11:08 AM, Feng Tian wrote: >> I agree using that using int128 in stock postgres will speed up things too. >> On the other hand, that is only one part of the equation. For example, if >> you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we >> are 8x faster. > > I'm curious about how the numbers look when stock Postgres is built > with the same page size as your fork. You didn't mention whether or > not your Postgres numbers came from a standard build. I downloaded the 8kb varant. vitesse (median of 3): postgres=# select count(*), sum(i*i), avg(i) from t; count │sum │ avg ─┼┼─ 100 │ 338350 │ 50.5000 (1 row) Time: 39.197 ms stock (median of 3): postgres=# select count(*), sum(i*i), avg(i) from t; count │sum │ avg ─┼┼─ 100 │ 338350 │ 50.5000 (1 row) Time: 667.362 ms (stock int4 ops) postgres=# select sum(1::int4) from t; sum ─ 100 (1 row) Time: 75.265 ms What I'm wondering is how complex the hooks are that tie the technology in. Unless a bsd licensed patch materializes, the conversation (beyond the intitial wow! factor) should probably focus on a possible integration points and/or implementation of technology into core in a general way. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
On 2014-10-17 13:12:27 -0400, Tom Lane wrote: > Well, that's pretty much cheating: it's too hard to disentangle what's > coming from JIT vs what's coming from using a different accumulator > datatype. If we wanted to depend on having int128 available we could > get that speedup with a couple hours' work. I think doing that when configure detects int128 would make a great deal of sense. It's not like we'd save a great deal of complicated code by removing the existing accumulator... We'd still have to return a numeric, but that's likely lost in the noise cost wise. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
On Fri, Oct 17, 2014 at 11:08 AM, Feng Tian wrote: > I agree using that using int128 in stock postgres will speed up things too. > On the other hand, that is only one part of the equation. For example, if > you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we > are 8x faster. I'm curious about how the numbers look when stock Postgres is built with the same page size as your fork. You didn't mention whether or not your Postgres numbers came from a standard build. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-17 17:14:16 +0530, Amit Kapila wrote: > On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila > wrote: > > > > > > I am not sure why we are seeing difference even though running > > on same m/c with same configuration. > > I have tried many times, but I could not get the numbers you have > posted above with HEAD, however now trying with the latest version > [1] posted by you, everything seems to be fine at this workload. > The data at higher client count is as below: I'll try to reproduce it next week. But I don't think it matters all that much. Personally so far the performance numbers don't seem to indicate much reason to wait any further. We sure improve further, but I don't see much reason to wait because of that. > HEAD – commit 494affb > > Shared_buffers=8GB; Scale Factor = 3000 > > Client Count/No. Of Runs (tps) 64 128 Run-1 271799 24 Run-2 274341 > 245207 Run-3 275019 252258 > > > > > > HEAD – commit 494affb + wait free lw_shared_v2 > > Shared_buffers=8GB; Scale Factor = 3000 > > Client Count/No. Of Runs (tps) 64 128 Run-1 286209 274922 Run-2 289101 > 274495 Run-3 289639 273633 So here the results with LW_SHARED were consistently better, right? You saw performance degradations here earlier? > So I am planning to proceed further with the review/test of your > latest patch. > According to me, below things are left from myside: > a. do some basic tpc-b tests with patch > b. re-review latest version posted by you Cool! > I know that you have posted optimization into StrategyGetBuffer() in > this thread, however I feel we can evaluate it separately unless you > are of opinion that both the patches should go together. > > [1] > http://www.postgresql.org/message-id/20141010111027.gc6...@alap3.anarazel.de No, I don't think they should go together - I wrote that patch because it was the bottleneck in the possibly regressing test and I wanted to see the full effect. Although I do think we should apply it ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
CK, Before we go any further on this, how is Vitesse currently licensed? last time we talked it was still proprietary. If it's not being open-sourced, we likely need to take discussion off this list. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Fwd: [HACKERS] Vitesse DB call for testing
Hi, Tom, Sorry for double post to you. Feng -- Forwarded message -- From: Feng Tian Date: Fri, Oct 17, 2014 at 10:29 AM Subject: Re: [HACKERS] Vitesse DB call for testing To: Tom Lane Hi, Tom, I agree using that using int128 in stock postgres will speed up things too. On the other hand, that is only one part of the equation. For example, if you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we are 8x faster. I am not sure why do you mean by "actual data access". Data is still in stock postgres format on disk. We indeed jit-ed all data fields access (deform tuple).To put things in perspective, I just timed select count(*) and select count(l_orderkey) from tpch1.lineitem; Our code is bottlenecked by memory bandwidth and difference is pretty much invisible. Thanks, Feng ftian=# set vdb_jit = 0; SET Time: 0.155 ms ftian=# select count(*) from tpch1.lineitem; count - 6001215 (1 row) Time: 688.658 ms ftian=# select count(*) from tpch1.lineitem; count - 6001215 (1 row) Time: 690.753 ms ftian=# select count(l_orderkey) from tpch1.lineitem; count - 6001215 (1 row) Time: 819.452 ms ftian=# set vdb_jit = 1; SET Time: 0.167 ms ftian=# select count(*) from tpch1.lineitem; count - 6001215 (1 row) Time: 203.543 ms ftian=# select count(l_orderkey) from tpch1.lineitem; count - 6001215 (1 row) Time: 202.253 ms ftian=# On Fri, Oct 17, 2014 at 10:12 AM, Tom Lane wrote: > CK Tan writes: > > The bigint sum,avg,count case in the example you tried has some > optimization. We use int128 to accumulate the bigint instead of numeric in > pg. Hence the big speed up. Try the same query on int4 for the improvement > where both pg and vitessedb are using int4 in the execution. > > Well, that's pretty much cheating: it's too hard to disentangle what's > coming from JIT vs what's coming from using a different accumulator > datatype. If we wanted to depend on having int128 available we could > get that speedup with a couple hours' work. > > But what exactly are you "compiling" here? I trust not the actual data > accesses; that seems far too complicated to try to inline. > > regards, tom lane > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Vitesse DB call for testing
CK Tan writes: > The bigint sum,avg,count case in the example you tried has some optimization. > We use int128 to accumulate the bigint instead of numeric in pg. Hence the > big speed up. Try the same query on int4 for the improvement where both pg > and vitessedb are using int4 in the execution. Well, that's pretty much cheating: it's too hard to disentangle what's coming from JIT vs what's coming from using a different accumulator datatype. If we wanted to depend on having int128 available we could get that speedup with a couple hours' work. But what exactly are you "compiling" here? I trust not the actual data accesses; that seems far too complicated to try to inline. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Alvaro Herrera writes: > Dag-Erling Smørgrav wrote: >> I understand this policy. However, this new feature a) has absolutely >> no impact unless the admin makes a conscious decision to use it and b) >> will make life much easier for everyone if a POODLE-like vulnerability >> is discovered in TLS. > I see this as vaguely related to > http://www.postgresql.org/message-id/20131114202733.gb7...@eldon.alvh.no-ip.org > where we want to have SSL behavior configurable in the back branches due > to renegotiation issues: there was talk in that thread about introducing > new GUC variables in back branches to control the behavior. The current > patch really doesn't add much in the way of features (SSLv3 support and > so on already exist in back branches) --- what it does add is a way to > control whether these are used. This looks to me like re-fighting the last war. Such a GUC has zero value *unless* some situation exactly like the POODLE bug comes up again, and the odds of that are not high. Moreover, the GUC could easily be misused to decrease rather than increase one's security, if it's carelessly set. Remember that we only recently fixed bugs that prevented use of the latest TLS version. If we have a setting like this, I fully anticipate that people will set it to "only use TLS 1.2" and then forget that they ever did that; years from now they'll still be using 1.2 when it's deprecated. So I think the argument that this is a useful security contribution is pretty weak; and on the whole we don't need another marginally-useful GUC. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash index creation warning
David G Johnston writes: > The question is whether we explain the implications of not being WAL-logged > in an error message or simply state the fact and let the documentation > explain the hazards - basically just output: > "hash indexes are not WAL-logged and their use is discouraged" +1. The warning message is not the place to be trying to explain all the details. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Segmentation fault] pg_dump binary-upgrade fail for type without element
Rushabh Lathia writes: > pg_dump binary-upgrade fail with segmentation fault for type without > element. Ooops. > Looking further into code I found that rather then fetching typrelid, we can > use the already stored typrelid from TypeInfo structure. Agreed. Patch committed, thanks! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash index creation warning
Bruce Momjian wrote > Now that we have the create hash index warning in 9.5, I realized that > we don't warn about hash indexes with PITR, only crash recovery and > streaming. This patch fixes that. > > Is the wording "cannot be used" too vague. The CREATE INDEX manual > page has the words "give wrong answers to queries", which might be > better, but is kind of long for an error message. Suggestions? Something like the following is more specific without being more wordy: "hash indexes are not WAL-logged: they are corrupted during recovery and changes do not replicate to standby servers." The question is whether we explain the implications of not being WAL-logged in an error message or simply state the fact and let the documentation explain the hazards - basically just output: "hash indexes are not WAL-logged and their use is discouraged" David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-index-creation-warning-tp5823443p5823445.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Dag-Erling Smørgrav wrote: > Michael Paquier writes: > > Please note that new features can only be added to the version > > currently in development, aka 9.5. > > I understand this policy. However, this new feature a) has absolutely > no impact unless the admin makes a conscious decision to use it and b) > will make life much easier for everyone if a POODLE-like vulnerability > is discovered in TLS. I see this as vaguely related to http://www.postgresql.org/message-id/20131114202733.gb7...@eldon.alvh.no-ip.org where we want to have SSL behavior configurable in the back branches due to renegotiation issues: there was talk in that thread about introducing new GUC variables in back branches to control the behavior. The current patch really doesn't add much in the way of features (SSLv3 support and so on already exist in back branches) --- what it does add is a way to control whether these are used. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hash index creation warning
Now that we have the create hash index warning in 9.5, I realized that we don't warn about hash indexes with PITR, only crash recovery and streaming. This patch fixes that. Is the wording "cannot be used" too vague. The CREATE INDEX manual page has the words "give wrong answers to queries", which might be better, but is kind of long for an error message. Suggestions? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml new file mode 100644 index e469b17..43df32f *** a/doc/src/sgml/ref/create_index.sgml --- b/doc/src/sgml/ref/create_index.sgml *** Indexes: *** 474,480 Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. ! For these reasons, hash index use is presently discouraged. --- 474,481 Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. ! Hash indexes are also not properly restored during point-in-time ! recovery. For these reasons, hash index use is presently discouraged. diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c new file mode 100644 index 8a1cb4b..03833d7 *** a/src/backend/commands/indexcmds.c --- b/src/backend/commands/indexcmds.c *** DefineIndex(Oid relationId, *** 491,497 if (strcmp(accessMethodName, "hash") == 0) ereport(WARNING, ! (errmsg("hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers"))); if (stmt->unique && !accessMethodForm->amcanunique) ereport(ERROR, --- 491,497 if (strcmp(accessMethodName, "hash") == 0) ereport(WARNING, ! (errmsg("hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers"))); if (stmt->unique && !accessMethodForm->amcanunique) ereport(ERROR, diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out new file mode 100644 index a2bef7a..11325e4 *** a/src/test/regress/expected/create_index.out --- b/src/test/regress/expected/create_index.out *** DROP TABLE array_gin_test; *** 2238,2250 -- HASH -- CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops); -- -- Test functional index --- 2238,2250 -- HASH -- CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random int4_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers CREATE INDEX hash_name_index ON hash_name_heap USING hash (random name_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used for point-in-time recovery or on standby servers -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops); -- -- Test functional index diff --git a/src/test/regress/expected/enum.out b/src/test/regress/expected/enum.out new file mode 100644 index fa23b52..47ac5a6 *** a/src/test/regress/expected/enum.out --- b/src/test/regress/expected/enum.out *** DROP INDEX enumtest_btree; *** 383,389 -- Hash index / opclass with the = operator -- CREATE INDEX enumtest_hash ON enumtest USING hash (col); ! WARNING: hash indexes are not WAL-logged and thus are not crash-safe and cannot be used on standby servers SELECT * FROM enumtest WHERE co
Re: [HACKERS] Support UPDATE table SET(*)=...
On Oct 17, 2014 6:16 PM, "Tom Lane" wrote: > A more realistic objection goes like this: > > create table foo(f int, g int); > update foo x set x = (1,2); -- works > alter table foo add column x int; > update foo x set x = (1,2,3); -- no longer works > > It's not a real good thing if a column addition or renaming can > so fundamentally change the nature of a query. I think a significant use case for this feature is when you already have a row-value and want to persist it in the database, like you can do with INSERT: insert into foo select * from populate_record_json(null::foo, '{...}'); In this case the opposite is true: requiring explicit column names would break the query if you add columns to the table. The fact that you can't reasonably use populate_record/_json with UPDATE is a significant omission. IMO this really speaks for supporting shorthand whole-row assignment, whatever the syntax. Regards, Marti
Re: [HACKERS] Vitesse DB call for testing
On Fri, Oct 17, 2014 at 10:47 AM, CK Tan wrote: > Merlin, glad you tried it. > > We take the query plan exactly as given by the planner, decide whether to JIT > or to punt depending on the cost. If we punt, it goes back to pg executor. If > we JIT, and if we could not proceed (usually of some operators we haven't > implemented yet), we again punt. Once we were able to generate the code, > there is no going back; we call into LLVM to obtain the function entry point, > and run it to completion. The 3% improvement you see in OLTP tests is > definitely noise. > > The bigint sum,avg,count case in the example you tried has some optimization. > We use int128 to accumulate the bigint instead of numeric in pg. Hence the > big speed up. Try the same query on int4 for the improvement where both pg > and vitessedb are using int4 in the execution. > > The speed up is really noticeable when the data type is nonvarlena. In the > varlena cases, we still call into pg routines most of the times. Again, try > the sum,avg,count query on numeric, and you will see what I mean. > > Also, we don't support UDF at the moment. So all queries involving UDF gets > sent to pg executor. > > On your question of 32k page size, the rational is that some of our customers > could be interested in a data warehouse on pg. 32k page size is a big win > when all you do is seqscan all day long. > > We are looking for bug reports at these stage and some stress tests done > without our own prejudices. Some test on real data in non prod setting on > queries that are highly CPU bound would be ideal. One thing that I noticed is that when slamming your benchmark query via pgbench, resident memory consumption was really aggressive and would have taken down the server had I not spuriously stopped the test. Memory consumption did return to baseline after that so I figured some type of llvm memory management games were going on. This isn't really a problem for most OLAP workloads but it's something to be aware of. Via 'perf top' on stock postgres, you see the usual suspects: palloc, hash_search_etc. On your build though HeapTuplesSatisfiesMVCC zooms right to the top of the stack which is pretty interesting...the executor is you've built is very lean and mean for sure. A drop in optimization engine with little no schema/sql changes is pretty neat -- your primary competitor here is going to be column organized type table solutions to olap type problems. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Fri, Oct 17, 2014 at 10:30 AM, Tom Lane wrote: > Merlin Moncure writes: >> On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane wrote: >>> I think it wouldn't; Merlin is proposing that f would be taken as the >>> field name. A more realistic objection goes like this: >>> >>> create table foo(f int, g int); >>> update foo x set x = (1,2); -- works >>> alter table foo add column x int; >>> update foo x set x = (1,2,3); -- no longer works >>> >>> It's not a real good thing if a column addition or renaming can >>> so fundamentally change the nature of a query. > >> That's exactly how SELECT works. I also dispute that the user should >> be surprised in such cases. > > Well, the reason we have a problem in SELECT is that we support an > unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that > "SELECT foo FROM foo" could represent a whole-row selection is nowhere > to be found in the SQL standard, for good reason IMO. But we've never > had the courage to break cleanly with this Berkeley leftover and > insist that people spell it SQL-style as "SELECT ROW(foo.*) FROM foo". > I'd just as soon not introduce the same kind of ambiguity into UPDATE > if we have a reasonable alternative. Ah, interesting point (I happen to like the terse syntax and use it often). This is for posterity anyways since you guys seem to like Atri's proposal, which surprised me. However, I think you're over simplifying things here. Syntax aside: I think SELECT f FROM foo f; and a hypothetical SELECT row(f.*) FROM foo f; give different semantics. The former gives an object of type 'f' and the latter gives type 'row'. To get parity, you'd have to add an extra cast which means you'd have to play tricky games to avoid extra performance overhead besides being significantly more verbose. I'm aware some of the other QUELisms are pretty dodgy and have burned us in the past (like that whole function as record reference thing) but pulling a record as a field in select is pretty nice. It's also widely used and quite useful in json serialization. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
Merlin, glad you tried it. We take the query plan exactly as given by the planner, decide whether to JIT or to punt depending on the cost. If we punt, it goes back to pg executor. If we JIT, and if we could not proceed (usually of some operators we haven't implemented yet), we again punt. Once we were able to generate the code, there is no going back; we call into LLVM to obtain the function entry point, and run it to completion. The 3% improvement you see in OLTP tests is definitely noise. The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint instead of numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and vitessedb are using int4 in the execution. The speed up is really noticeable when the data type is nonvarlena. In the varlena cases, we still call into pg routines most of the times. Again, try the sum,avg,count query on numeric, and you will see what I mean. Also, we don't support UDF at the moment. So all queries involving UDF gets sent to pg executor. On your question of 32k page size, the rational is that some of our customers could be interested in a data warehouse on pg. 32k page size is a big win when all you do is seqscan all day long. We are looking for bug reports at these stage and some stress tests done without our own prejudices. Some test on real data in non prod setting on queries that are highly CPU bound would be ideal. Thanks, -cktan > On Oct 17, 2014, at 6:43 AM, Merlin Moncure wrote: > >> On Fri, Oct 17, 2014 at 8:14 AM, Merlin Moncure wrote: >>> On Fri, Oct 17, 2014 at 7:32 AM, CK Tan wrote: >>> Hi everyone, >>> >>> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor >>> designed for compute intensive OLAP workload. We have gotten it to a >>> reasonable state and would like to open it up to the pg hackers >>> community for testing and suggestions. >>> >>> Vitesse DB offers >>> -- JIT Compilation for compute-intensive queries >>> -- CSV parsing with SSE instructions >>> -- 100% binary compatibility with PG9.3.5. >>> >>> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X >>> faster. >>> >>> Our TPCH 1GB benchmark results is also available at >>> http://vitessedata.com/benchmark/ . >>> >>> Please direct any questions by email to ck...@vitessedata.com . >> >> You offer a binary with 32k block size...what's the rationale for that? > > (sorry for the double post) > > OK, I downloaded the ubuntu binary and ran your benchmarks (after > making some minor .conf tweaks like disabling SSL). I then ran your > benchmark (after fixing the typo) of the count/sum/avg test -- *and > noticed a 95% reduction in runtime performance* which is really quite > amazing IMNSHO. I also ran a select only test on small scale factor > pgbench and didn't see any regression there -- in fact you beat stock > by ~ 3% (although this could be measurement noise). So now you've > got my attention. So, if you don't mind, quit being coy and explain > how the software works and all the neat things it does and doesn't do. > > merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
Merlin Moncure writes: > On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane wrote: >> I think it wouldn't; Merlin is proposing that f would be taken as the >> field name. A more realistic objection goes like this: >> >> create table foo(f int, g int); >> update foo x set x = (1,2); -- works >> alter table foo add column x int; >> update foo x set x = (1,2,3); -- no longer works >> >> It's not a real good thing if a column addition or renaming can >> so fundamentally change the nature of a query. > That's exactly how SELECT works. I also dispute that the user should > be surprised in such cases. Well, the reason we have a problem in SELECT is that we support an unholy miscegenation of SQL-spec and PostQUEL syntax: the idea that "SELECT foo FROM foo" could represent a whole-row selection is nowhere to be found in the SQL standard, for good reason IMO. But we've never had the courage to break cleanly with this Berkeley leftover and insist that people spell it SQL-style as "SELECT ROW(foo.*) FROM foo". I'd just as soon not introduce the same kind of ambiguity into UPDATE if we have a reasonable alternative. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
Pavel Stehule wrote: > do you know, so this feature is a proprietary and it is not based > on ANSI/SQL? Any user, that use this feature and will to port to > other database will hate it. I remember that Sybase ASE allowed a trailing comma within the parentheses of a table definition, which was handy. I checked on SQL Fiddle and found that MS SQL Server and MySQL both allow that, too; although Oracle does not. I'm not taking a position on whether we should allow this in PostgreSQL, but not having it is likely to annoy some users moving *to* PostgreSQL, while having it is likely to annoy some users moving *away* from PostgreSQL. None of the products I tried allowed a leading comma. I didn't test, and have no knowledge regarding, how other products treat extra commas elsewhere. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Fri, Oct 17, 2014 at 10:16 AM, Tom Lane wrote: > Marko Tiikkaja writes: >> local:marko=#* create table foo(f int); >> CREATE TABLE >> local:marko=#* update foo f set f=1; >> UPDATE 0 > >> This query would change meaning with your suggestion. > > I think it wouldn't; Merlin is proposing that f would be taken as the > field name. A more realistic objection goes like this: > > create table foo(f int, g int); > update foo x set x = (1,2); -- works > alter table foo add column x int; > update foo x set x = (1,2,3); -- no longer works > > It's not a real good thing if a column addition or renaming can > so fundamentally change the nature of a query. That's exactly how SELECT works. I also dispute that the user should be surprised in such cases. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
Marko Tiikkaja writes: > local:marko=#* create table foo(f int); > CREATE TABLE > local:marko=#* update foo f set f=1; > UPDATE 0 > This query would change meaning with your suggestion. I think it wouldn't; Merlin is proposing that f would be taken as the field name. A more realistic objection goes like this: create table foo(f int, g int); update foo x set x = (1,2); -- works alter table foo add column x int; update foo x set x = (1,2,3); -- no longer works It's not a real good thing if a column addition or renaming can so fundamentally change the nature of a query. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Fri, Oct 17, 2014 at 10:10 AM, Tom Lane wrote: > Merlin Moncure writes: >> On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja wrote: >>> I don't know about Tom, but I didn't like that: >>> http://www.postgresql.org/message-id/5364c982.7060...@joh.to > >> Hm, I didn't understand your objection: > >> >> So e.g.: >>UPDATE foo f SET f = ..; > >> would resolve to the table, despite there being a column called "f"? >> That would break backwards compatibility. >> > >> That's not correct: it should work exactly as 'select' does; given a >> conflict resolve the field name, so no backwards compatibility issue. > > The point is that it's fairly messy (and bug-prone) to have a syntax > where we have to make an arbitrary choice between two reasonable > interpretations. > > If you look back at the whole thread Marko's above-cited message is in, > we'd considered a bunch of different possible syntaxes for this, and > none of them had much support. The "(*)" idea actually is starting to > look pretty good to me. Hm, I'll take it then. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
Merlin Moncure writes: > On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma wrote: >> Thanks for the links, but this patch only targets SET(*) case, which, if I >> understand correctly, the patch you mentioned doesn't directly handle (If I >> understand correctly, the target of the two patches is different). > Yeah -- in fact, there was some discussion about this exact case. > This patch solves a very important problem: when doing record > operations to move data between databases with identical schema > there's currently no way to 'update' in a generic way without building > out the entire field list via complicated and nasty dynamic SQL. I'm > not sure about the proposed syntax though; it seems a little weird to > me. Any particular reason why you couldn't have just done: > UPDATE table1 SET * = a,b,c, ... > also, > UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); > seems cleaner than the proposed syntax for row assignment. Tom > objected though IIRC. That last proposal is no good because it would be ambiguous if the table contains a column by that name. The "(*)" idea actually seems not bad, since it's shorthand for a parenthesized column list. I'm not sure about the patch itself though --- in particular, it doesn't seem to me that it should be touching transformTargetList, since that doesn't have anything to do with expansion of multiassignments today. Probably this is a symptom of having chosen a poor representation of the construct in the raw grammar output. However, I've not exactly wrapped my head around what that representation is ... the lack of any comments explaining it doesn't help. A larger question is whether it's appropriate to do the *-expansion at parse time, or whether we'd need to postpone it to later in order to handle reasonable use-cases. Since we expand "SELECT *" at parse time (and are mandated to do so by the SQL spec, I believe), it seems consistent to do this at parse time as well; but perhaps there is a counter argument. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
Merlin Moncure writes: > On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja wrote: >> I don't know about Tom, but I didn't like that: >> http://www.postgresql.org/message-id/5364c982.7060...@joh.to > Hm, I didn't understand your objection: > > So e.g.: >UPDATE foo f SET f = ..; > would resolve to the table, despite there being a column called "f"? > That would break backwards compatibility. > > That's not correct: it should work exactly as 'select' does; given a > conflict resolve the field name, so no backwards compatibility issue. The point is that it's fairly messy (and bug-prone) to have a syntax where we have to make an arbitrary choice between two reasonable interpretations. If you look back at the whole thread Marko's above-cited message is in, we'd considered a bunch of different possible syntaxes for this, and none of them had much support. The "(*)" idea actually is starting to look pretty good to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 17 October 2014 14:05, Alvaro Herrera wrote: > Of course, this is a task that requires much more thinking, design, and > discussion than just adding multi-process capability to vacuumdb ... Yes, please proceed with this patch as originally envisaged. No more comments from me. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On 10/17/14 5:03 PM, Merlin Moncure wrote: Hm, I didn't understand your objection: So e.g.: UPDATE foo f SET f = ..; would resolve to the table, despite there being a column called "f"? That would break backwards compatibility. That's not correct: it should work exactly as 'select' does; given a conflict resolve the field name, so no backwards compatibility issue. local:marko=# show server_version; server_version 9.1.13 (1 row) local:marko=#* create table foo(f int); CREATE TABLE local:marko=#* update foo f set f=1; UPDATE 0 This query would change meaning with your suggestion. I'm not saying it would be a massive problem in practice, but I think we should first consider options which don't break backwards compatibility, even if some consider them "less clean". .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Fri, Oct 17, 2014 at 9:55 AM, Marko Tiikkaja wrote: > On 10/17/14 4:15 PM, Merlin Moncure wrote: >> >> Any particular reason why you couldn't have just done: >> >> UPDATE table1 SET * = a,b,c, ... > > > That just looks wrong to me. I'd prefer (*) = .. over that any day. > >> UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); >> >> seems cleaner than the proposed syntax for row assignment. Tom >> objected though IIRC. > > I don't know about Tom, but I didn't like that: > http://www.postgresql.org/message-id/5364c982.7060...@joh.to Hm, I didn't understand your objection: So e.g.: UPDATE foo f SET f = ..; would resolve to the table, despite there being a column called "f"? That would break backwards compatibility. That's not correct: it should work exactly as 'select' does; given a conflict resolve the field name, so no backwards compatibility issue. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On 10/17/14 4:15 PM, Merlin Moncure wrote: Any particular reason why you couldn't have just done: UPDATE table1 SET * = a,b,c, ... That just looks wrong to me. I'd prefer (*) = .. over that any day. UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); seems cleaner than the proposed syntax for row assignment. Tom objected though IIRC. I don't know about Tom, but I didn't like that: http://www.postgresql.org/message-id/5364c982.7060...@joh.to .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Fri, Oct 17, 2014 at 7:45 PM, Merlin Moncure wrote: > On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma wrote: > > > > > > On Wednesday, October 15, 2014, Marti Raudsepp wrote: > >> > >> Hi > >> > >> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma > wrote: > >> > Please find attached a patch which implements support for UPDATE > table1 > >> > SET(*)=... > >> > >> I presume you haven't read Tom Lane's proposal and discussion about > >> multiple column assignment in UPDATE: > >> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us > >> (Assigning all columns was also discussed there) > >> > >> And there's a WIP patch: > >> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us > > > > Thanks for the links, but this patch only targets SET(*) case, which, if > I > > understand correctly, the patch you mentioned doesn't directly handle > (If I > > understand correctly, the target of the two patches is different). > > Yeah -- in fact, there was some discussion about this exact case. > This patch solves a very important problem: when doing record > operations to move data between databases with identical schema > there's currently no way to 'update' in a generic way without building > out the entire field list via complicated and nasty dynamic SQL. Thanks! > I'm > not sure about the proposed syntax though; it seems a little weird to > me. Any particular reason why you couldn't have just done: > > UPDATE table1 SET * = a,b,c, ... > > also, > > UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); > I honestly have not spent a lot of time thinking about the exact syntax that may be acceptable. If we have arguments for or against a specific syntax, I will be glad to incorporate them. > >
Re: [HACKERS] Support UPDATE table SET(*)=...
On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma wrote: > > > On Wednesday, October 15, 2014, Marti Raudsepp wrote: >> >> Hi >> >> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma wrote: >> > Please find attached a patch which implements support for UPDATE table1 >> > SET(*)=... >> >> I presume you haven't read Tom Lane's proposal and discussion about >> multiple column assignment in UPDATE: >> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us >> (Assigning all columns was also discussed there) >> >> And there's a WIP patch: >> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us > > Thanks for the links, but this patch only targets SET(*) case, which, if I > understand correctly, the patch you mentioned doesn't directly handle (If I > understand correctly, the target of the two patches is different). Yeah -- in fact, there was some discussion about this exact case. This patch solves a very important problem: when doing record operations to move data between databases with identical schema there's currently no way to 'update' in a generic way without building out the entire field list via complicated and nasty dynamic SQL. I'm not sure about the proposed syntax though; it seems a little weird to me. Any particular reason why you couldn't have just done: UPDATE table1 SET * = a,b,c, ... also, UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...); seems cleaner than the proposed syntax for row assignment. Tom objected though IIRC. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
On Fri, Oct 17, 2014 at 8:14 AM, Merlin Moncure wrote: > On Fri, Oct 17, 2014 at 7:32 AM, CK Tan wrote: >> Hi everyone, >> >> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor >> designed for compute intensive OLAP workload. We have gotten it to a >> reasonable state and would like to open it up to the pg hackers >> community for testing and suggestions. >> >> Vitesse DB offers >> -- JIT Compilation for compute-intensive queries >> -- CSV parsing with SSE instructions >> -- 100% binary compatibility with PG9.3.5. >> >> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster. >> >> Our TPCH 1GB benchmark results is also available at >> http://vitessedata.com/benchmark/ . >> >> Please direct any questions by email to ck...@vitessedata.com . > > You offer a binary with 32k block size...what's the rationale for that? (sorry for the double post) OK, I downloaded the ubuntu binary and ran your benchmarks (after making some minor .conf tweaks like disabling SSL). I then ran your benchmark (after fixing the typo) of the count/sum/avg test -- *and noticed a 95% reduction in runtime performance* which is really quite amazing IMNSHO. I also ran a select only test on small scale factor pgbench and didn't see any regression there -- in fact you beat stock by ~ 3% (although this could be measurement noise). So now you've got my attention. So, if you don't mind, quit being coy and explain how the software works and all the neat things it does and doesn't do. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao wrote: > In this case, the patch seems to make the restartpoint recycle even WAL > files > which have .ready files and will have to be archived later. Thought? > The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done. -- Michael
Re: [HACKERS] Vitesse DB call for testing
On Fri, Oct 17, 2014 at 7:32 AM, CK Tan wrote: > Hi everyone, > > Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor > designed for compute intensive OLAP workload. We have gotten it to a > reasonable state and would like to open it up to the pg hackers > community for testing and suggestions. > > Vitesse DB offers > -- JIT Compilation for compute-intensive queries > -- CSV parsing with SSE instructions > -- 100% binary compatibility with PG9.3.5. > > Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster. > > Our TPCH 1GB benchmark results is also available at > http://vitessedata.com/benchmark/ . > > Please direct any questions by email to ck...@vitessedata.com . You offer a binary with 32k block size...what's the rationale for that? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao wrote: > On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier > wrote: >> >> >> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier >> wrote: >>> >>> The additional process at promotion sounds like a good idea, I'll try to >>> get a patch done tomorrow. This would result as well in removing the >>> XLogArchiveForceDone stuff. Either way, not that I have been able to >>> reproduce the problem manually, things can be clearly solved. >> >> Please find attached two patches aimed to fix this issue and to improve the >> situation: >> - 0001 prevents the apparition of those phantom WAL segment file by ensuring >> that when a node is in recovery it will remove it whatever its status in >> archive_status. This patch is the real fix, and should be applied down to >> 9.2. >> - 0002 is a patch implementing Heikki's idea of enforcing all the segment >> files present in pg_xlog to have their status to .done, marking them for >> removal. When looking at the code, I finally concluded that Fujii-san's >> point, about marking in all cases as .done segment files that have been >> fully streamed, actually makes more sense to not be backward. This patch >> would actually not be mandatory for back-patching, but it makes the process >> more robust IMO. > > Thanks for the patches! While reviewing the patch, I found another bug related to WAL file in recovery mode. The problem is that exitArchiveRecovery() always creates .ready file for the last WAL file of the old timeline even when it's restored from the archive and has .done file. So this causes the already-archived WAL file to be archived again Attached patch fixes this bug. Regards, -- Fujii Masao *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 5351,5368 exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo) * for the new timeline. * * Notify the archiver that the last WAL segment of the old timeline is ! * ready to copy to archival storage. Otherwise, it is not archived for a ! * while. */ if (endTLI != ThisTimeLineID) { XLogFileCopy(endLogSegNo, endTLI, endLogSegNo); ! if (XLogArchivingActive()) ! { ! XLogFileName(xlogpath, endTLI, endLogSegNo); ! XLogArchiveNotify(xlogpath); ! } } /* --- 5351,5367 * for the new timeline. * * Notify the archiver that the last WAL segment of the old timeline is ! * ready to copy to archival storage if its .done file doesn't exist ! * (e.g., if it's the restored WAL file, it's expected to have .done file). ! * Otherwise, it is not archived for a while. */ if (endTLI != ThisTimeLineID) { XLogFileCopy(endLogSegNo, endTLI, endLogSegNo); ! /* Create .ready file only when neither .ready nor .done files exist */ ! XLogFileName(xlogpath, endTLI, endLogSegNo); ! XLogArchiveCheckDone(xlogpath); } /* *** a/src/backend/access/transam/xlogarchive.c --- b/src/backend/access/transam/xlogarchive.c *** *** 516,521 XLogArchiveNotify(const char *xlog) --- 516,527 char archiveStatusPath[MAXPGPATH]; FILE *fd; + /* + * We should not create .ready file for already archived WAL file to + * prevent it from being archived again. + */ + Assert(XLogArchiveIsBusy(xlog)); + /* insert an otherwise empty file called .ready */ StatusFilePath(archiveStatusPath, xlog, ".ready"); fd = AllocateFile(archiveStatusPath, "w"); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Amit Kapila wrote: > On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs wrote: > > > > On 16 October 2014 15:09, Amit Kapila wrote: > > c) seems like the only issue that needs any thought. I don't think its > > going to be that hard. > > > > I don't see any problems with the other points. You can make a > > function wait, if you wish. > > It is quite possible, but still I think to accomplish such a function, > we need to have some mechanism where it can inform auto vacuum > and then some changes in auto vacuum to receive/read that information > and reply back. I don't think any such mechanism exists. You're right, it doesn't. I think we have plenty more infrastructure for that than we had when autovacuum was initially developed. It shouldn't be that hard. Of course, this is a task that requires much more thinking, design, and discussion than just adding multi-process capability to vacuumdb ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 17 October 2014 12:52, Amit Kapila wrote: > It is quite possible, but still I think to accomplish such a function, > we need to have some mechanism where it can inform auto vacuum > and then some changes in auto vacuum to receive/read that information > and reply back. I don't think any such mechanism exists. That's exactly how the CHECKPOINT command works, in conjunction with the checkpointer process. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
Hi, On 2014-10-17 05:32:13 -0700, CK Tan wrote: > Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor > designed for compute intensive OLAP workload. We have gotten it to a > reasonable state and would like to open it up to the pg hackers > community for testing and suggestions. > > Vitesse DB offers > -- JIT Compilation for compute-intensive queries > -- CSV parsing with SSE instructions > -- 100% binary compatibility with PG9.3.5. > > Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster. How are these modifications licensed? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
Michael Paquier writes: > Please note that new features can only be added to the version > currently in development, aka 9.5. I understand this policy. However, this new feature a) has absolutely no impact unless the admin makes a conscious decision to use it and b) will make life much easier for everyone if a POODLE-like vulnerability is discovered in TLS. > You should as well register your patch to the current commit fest, I > think you are still in time: > https://commitfest.postgresql.org/action/commitfest_view?id=24 Thanks for reminding me. DES -- Dag-Erling Smørgrav - d...@des.no -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE POLICY and RETURNING
On 2014-10-17 14:57:03 +0800, Craig Ringer wrote: > On 10/17/2014 02:49 AM, Robert Haas wrote: > > I think you could probably make the DELETE policy control what can get > > deleted, but then have the SELECT policy further filter what gets > > returned. > > That seems like the worst of both worlds to me. > > Suddenly DELETE ... RETURNING might delete more rows than it reports a > resultset for. As well as being potentially dangerous for people using > it in wCTEs, etc, to me that's the most astonishing possible outcome of all. > > I'd be much happier with even: > > ERROR: RETURNING not permitted with SELECT row-security policy FWIW, that doesn't sound acceptable to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Vitesse DB call for testing
Hi everyone, Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor designed for compute intensive OLAP workload. We have gotten it to a reasonable state and would like to open it up to the pg hackers community for testing and suggestions. Vitesse DB offers -- JIT Compilation for compute-intensive queries -- CSV parsing with SSE instructions -- 100% binary compatibility with PG9.3.5. Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster. Our TPCH 1GB benchmark results is also available at http://vitessedata.com/benchmark/ . Please direct any questions by email to ck...@vitessedata.com . Thank you for your help. -- CK Tan Vitesse Data, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE POLICY and RETURNING
On Fri, Oct 17, 2014 at 7:46 AM, Stephen Frost wrote: > Thoughts on 'WITH RETURNING' / 'WITHOUT RETURNING' and what the default > should be? That sounds like a horrendous pile of nasty complication for no good purpose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier wrote: > > > On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier > wrote: >> >> The additional process at promotion sounds like a good idea, I'll try to >> get a patch done tomorrow. This would result as well in removing the >> XLogArchiveForceDone stuff. Either way, not that I have been able to >> reproduce the problem manually, things can be clearly solved. > > Please find attached two patches aimed to fix this issue and to improve the > situation: > - 0001 prevents the apparition of those phantom WAL segment file by ensuring > that when a node is in recovery it will remove it whatever its status in > archive_status. This patch is the real fix, and should be applied down to > 9.2. > - 0002 is a patch implementing Heikki's idea of enforcing all the segment > files present in pg_xlog to have their status to .done, marking them for > removal. When looking at the code, I finally concluded that Fujii-san's > point, about marking in all cases as .done segment files that have been > fully streamed, actually makes more sense to not be backward. This patch > would actually not be mandatory for back-patching, but it makes the process > more robust IMO. Thanks for the patches! I found one problem in the 0002 patch. The patch changes the recovery so that it creates .done files for every WAL files which exist in pg_xlog directory at the end of recovery. But even WAL files which will have to be archived later can exist in pg_xlog at that moment. For example, the latest, recycled and fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files. The patch wrongly prevents them from being archvied at all. ISTM that the 0001 patch has the similar problem. Please imagine the following scenario. 1. There are many unarchived WAL files in pg_xlog because of the continuous failure of archive_command, for example, and then the server unfortunately crashes because of the corruption of database itself. 2. DBA restores the backup onto the server and copies all the WAL files from old pg_xlog to new one. Then he or she prepares for archive recovery. 3. DBA starts the server and the archive recovery starts. 4. After all the archived WAL files are replayed, the server tries to replay the WAL files in pg_xlog. Since there are many WAL files in pg_xlog, more than one restartpoints happen while they are being replayed. In this case, the patch seems to make the restartpoint recycle even WAL files which have .ready files and will have to be archived later. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE POLICY and RETURNING
* David G Johnston (david.g.johns...@gmail.com) wrote: > How about returning a placeholder row but with all the values replaced with > NULL? I don't think that would be a good approach.. A user actually looking at those rows would be highly confused. > In the absence of returning does the delete count show the total number of > rows deleted or only the number of rows deleted that the user would be aware > of if they issued a select with the same criteria? Whatever the answer the > number of rows returned with returning should match the row count normally > noted. Today, everything matches up, yes. Having rows which are deleted but which don't show up in RETURNING could certainly surprise people and applications, which is why I tend to favor the 'all-or-error' approach that others have also suggested. Adding that wouldn't be difficult, though we'd need to decide which should be the default. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs wrote: > > On 16 October 2014 15:09, Amit Kapila wrote: > > > I think doing anything on the server side can have higher complexity like: > > a. Does this function return immediately after sending request to > > autovacuum, if yes then the behaviour of this new functionality > > will be different as compare to vacuumdb which user might not > > expect? > > b. We need to have some way to handle errors that can occur in > > autovacuum (may be need to find a way to pass back to user), > > vacuumdb or Vacuum can report error to user. > > c. How does nworkers input relates to autovacuum_max_workers > > which is needed at start for shared memory initialization and in calc > > of maxbackends. > > d. How to handle database wide vacuum which is possible via vacuumdb > > e. What is the best UI (a command like above or via config parameters)? > > > c) seems like the only issue that needs any thought. I don't think its > going to be that hard. > > I don't see any problems with the other points. You can make a > function wait, if you wish. It is quite possible, but still I think to accomplish such a function, we need to have some mechanism where it can inform auto vacuum and then some changes in auto vacuum to receive/read that information and reply back. I don't think any such mechanism exists. > > I think we can find a way for above and may be if any other similar things > > needs to be taken care, but still I think it is better that we have this > > feature > > via vacuumdb rather than adding complexity in server code. Also the > > current algorithm used in patch is discussed and agreed upon in this > > thread and if now we want to go via some other method (auto vacuum), > > it might take much more time to build consensus on all the points. > > Well, I read Alvaro's point from earlier in the thread and agreed with > it. All we really need is an instruction to autovacuum to say "be > aggressive". > > Just because somebody added something to the TODO list doesn't make it > a good idea. I apologise to Dilip for saying this, it is not anything > against him, just the idea. > > Perhaps we just accept the patch and change AV in the future. So shall we move ahead with review of this patch and make a note of changing AV in future (may be TODO)? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] CREATE POLICY and RETURNING
* Thom Brown (t...@linux.com) wrote: > On 17 October 2014 07:57, Craig Ringer wrote: > > On 10/17/2014 02:49 AM, Robert Haas wrote: > > > I think you could probably make the DELETE policy control what can get > > > deleted, but then have the SELECT policy further filter what gets > > > returned. > > > > That seems like the worst of both worlds to me. > > > > Suddenly DELETE ... RETURNING might delete more rows than it reports a > > resultset for. As well as being potentially dangerous for people using > > it in wCTEs, etc, to me that's the most astonishing possible outcome of > > all. > > > > I'd be much happier with even: > > > > ERROR: RETURNING not permitted with SELECT row-security policy > > > > than this. > > +1 > > This suggestion is most in line with what I would expect to occur. This was along the lines that I've been thinking for how to address this also and I think it's the least surprising- but I want it controllable.. Thoughts on 'WITH RETURNING' / 'WITHOUT RETURNING' and what the default should be? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] CREATE POLICY and RETURNING
* Fujii Masao (masao.fu...@gmail.com) wrote: > Another minor problem that I found is that pg_dump always fails when > there is a row-level policy for update. I think that the attached patch > should be applied. Urgh. That was tested before but we switched the characters used and evidently missed that. :( Will fix, thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Tue, Oct 14, 2014 at 11:34 AM, Amit Kapila wrote: > > > I am not sure why we are seeing difference even though running > on same m/c with same configuration. I have tried many times, but I could not get the numbers you have posted above with HEAD, however now trying with the latest version [1] posted by you, everything seems to be fine at this workload. The data at higher client count is as below: HEAD – commit 494affb Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 64 128 Run-1 271799 24 Run-2 274341 245207 Run-3 275019 252258 HEAD – commit 494affb + wait free lw_shared_v2 Shared_buffers=8GB; Scale Factor = 3000 Client Count/No. Of Runs (tps) 64 128 Run-1 286209 274922 Run-2 289101 274495 Run-3 289639 273633 So I am planning to proceed further with the review/test of your latest patch. According to me, below things are left from myside: a. do some basic tpc-b tests with patch b. re-review latest version posted by you I know that you have posted optimization into StrategyGetBuffer() in this thread, however I feel we can evaluate it separately unless you are of opinion that both the patches should go together. [1] http://www.postgresql.org/message-id/20141010111027.gc6...@alap3.anarazel.de With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
On Fri, Oct 17, 2014 at 7:58 PM, Dag-Erling Smørgrav wrote: > The default is "ALL:-SSLv2" in 9.0-9.3 and "ALL:-SSL" in 9.4 and up. > This corresponds to the current hardcoded values, so the default > behavior is unchanged, but the admin now has the option to select a > different settings, e.g. if a serious vulnerability is found in TLS 1.0. > Please note that new features can only be added to the version currently in development, aka 9.5. You should as well register your patch to the current commit fest, I think you are still in time: https://commitfest.postgresql.org/action/commitfest_view?id=24 -- Michael
Re: [HACKERS] CREATE POLICY and RETURNING
On Fri, Oct 17, 2014 at 3:49 AM, Robert Haas wrote: >>> That's an argument in favour of only applying a read-filtering policy >>> where a RETURNING clause is present, but that introduces the "surprise! >>> the effects of your DELETE changed based on an unrelated clause!" issue. >> >> No- if we were going to do this, I wouldn't want to change the existing >> structure but rather provide either: >> >> a) a way to simply disable RETURNING if the policy is in effect and the >>policy creator doesn't wish to allow it >> b) allow the user to define another clause which would be applied to the >>rows in the RETURNING set > > I think you could probably make the DELETE policy control what can get > deleted, but then have the SELECT policy further filter what gets > returned. +1 That's more intuitive to me because another security feature "privilege" works in that way, i.e., SELECT privilege not DELETE controls RETURNING. Another minor problem that I found is that pg_dump always fails when there is a row-level policy for update. I think that the attached patch should be applied. Regards, -- Fujii Masao diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index c56a4cb..16ebc12 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -2939,7 +2939,7 @@ dumpRowSecurity(Archive *fout, DumpOptions *dopt, RowSecurityInfo *rsinfo) cmd = "SELECT"; else if (strcmp(rsinfo->rseccmd, "a") == 0) cmd = "INSERT"; - else if (strcmp(rsinfo->rseccmd, "u") == 0) + else if (strcmp(rsinfo->rseccmd, "w") == 0) cmd = "UPDATE"; else if (strcmp(rsinfo->rseccmd, "d") == 0) cmd = "DELETE"; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
2014-10-17 9:14 GMT+02:00 Petr Jelinek : > On 16/10/14 13:29, Pavel Stehule wrote: > >> Hi >> >> 2014-10-14 22:57 GMT+02:00 Petr Jelinek > >> Short review of the patch. Note that this has nothing to do with >> actual assertions, at the moment it's just enhancement of RAISE >> statement to make it optionally conditional. As I was one of the >> people who voted for it I do think we want this and I like the syntax. >> >> Code applies cleanly, seems formatted according to project standards >> - there is unnecessary whitespace added in variable declaration part >> of exec_stmt_raise which should be removed. >> >> >> fixed >> >> >> Passes make check, I would prefer to have little more complex >> expression than just "true" in the new regression test added for >> this feature. >> >> >> fixed >> >> >> Did some manual testing, seems to work as advertised. >> >> There are no docs at the moment which is the only show-stopper that >> I can see so far. >> >> >> fixed >> >> > Great, looks good to me, marking as ready for committer. Thank you very much Pavel > > > -- > Petr Jelinek http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On 17 October 2014 09:55, wrote: >>A new parameter to send feedback should be called --feedback >>A second parameter to decide whether to fsync should be called --fsync > > I think keep using "--reply-fsync" and "--fsync-interval" is better than make > new options. > Thought? We already have hot_standby_feedback, so using the name feedback is best idea. I am suggesting that we send feedback even if we do not fsync, to allow the master to track our progress. Hence the name of the second parameter was just fsync. So both names were suggested because of links to those terms already being used for similar reasons elsewhere in Postgres. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] add ssl_protocols configuration option
The attached patches add an ssl_protocols configuration option which control which versions of SSL or TLS the server will use. The syntax is similar to Apache's SSLProtocols directive, except that the list is colon-separated instead of whitespace-separated, although that is easy to change if it proves unpopular. Summary of the patch: - In src/backend/libpq/be-secure.c: - Add an SSLProtocols variable for the option. - Add a function, parse_SSL_protocols(), that parses an ssl_protocols string and returns a bitmask suitable for SSL_CTX_set_options(). - Change initialize_SSL() to call parse_SSL_protocols() and pass the result to SSL_CTX_set_options(). - In src/backend/utils/misc/guc.c: - Add an extern declaration for SSLProtocols. - Add an entry in the ConfigureNamesString array for the ssl_protocols option. - In src/backend/utils/misc/postgresql.conf.sample: - Add a sample ssl_protocols line. - In doc/src/sgml/config.sgml: - Document the ssl_protocols option. The file names are slightly different in 9.5, since be-secure.c was split in two and the declaration was moved into libpq.h. The default is "ALL:-SSLv2" in 9.0-9.3 and "ALL:-SSL" in 9.4 and up. This corresponds to the current hardcoded values, so the default behavior is unchanged, but the admin now has the option to select a different settings, e.g. if a serious vulnerability is found in TLS 1.0. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6ee17d8..7233a73 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1027,6 +1027,34 @@ include_dir 'conf.d' + + ssl_protocols (string) + + ssl_protocols configuration parameter + + + +Specifies a colon-separated list of SSL protocols that are +allowed to be used on secure connections. Each entry in the list can +be prefixed by a + (add to the current list) +or - (remove from the current list). If no prefix is used, +the list is replaced with the specified protocol. + + +The full list of supported protocols can be found in the +the openssl manual page. In addition to the names of +individual protocols, the following keywords can be +used: ALL (all protocols supported by the underlying +crypto library), SSL (all supported versions +of SSL) and TLS (all supported versions +of TLS). + + +The default is ALL:-SSL. + + + + ssl_ciphers (string) diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index b05364c..f440b77 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -87,6 +87,7 @@ static int verify_cb(int, X509_STORE_CTX *); static void info_cb(const SSL *ssl, int type, int args); static void initialize_ecdh(void); static const char *SSLerrmessage(void); +static long parse_SSL_protocols(const char *str); /* are we in the middle of a renegotiation? */ static bool in_ssl_renegotiation = false; @@ -245,15 +246,16 @@ be_tls_init(void) SSLerrmessage(; } - /* set up ephemeral DH keys, and disallow SSL v2/v3 while at it */ + /* set up ephemeral DH keys */ SSL_CTX_set_tmp_dh_callback(SSL_context, tmp_dh_cb); - SSL_CTX_set_options(SSL_context, - SSL_OP_SINGLE_DH_USE | - SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3); + SSL_CTX_set_options(SSL_context, SSL_OP_SINGLE_DH_USE); /* set up ephemeral ECDH keys */ initialize_ecdh(); + /* set up the allowed protocol list */ + SSL_CTX_set_options(SSL_context, parse_SSL_protocols(SSLProtocols)); + /* set up the allowed cipher list */ if (SSL_CTX_set_cipher_list(SSL_context, SSLCipherSuites) != 1) elog(FATAL, "could not set the cipher list (no valid ciphers available)"); @@ -1053,3 +1055,106 @@ SSLerrmessage(void) snprintf(errbuf, sizeof(errbuf), _("SSL error code %lu"), errcode); return errbuf; } + + +/* + * Parse the SSL allowed protocol list + * + * The logic here is inverted. OpenSSL does not take a list of + * protocols to use, but a list of protocols to avoid. We use the + * same bits with the opposite meaning, then invert the result. + */ + +#define SSL_PROTO_SSLv2 SSL_OP_NO_SSLv2 +#define SSL_PROTO_SSLv3 SSL_OP_NO_SSLv3 +#define SSL_PROTO_SSL (SSL_PROTO_SSLv2 | SSL_PROTO_SSLv3) +#define SSL_PROTO_TLSv1 SSL_OP_NO_TLSv1 +#ifdef SSL_OP_NO_TLSv1_1 +#define SSL_PROTO_TLSv1_1 SSL_OP_NO_TLSv1_1 +#else +#define SSL_PROTO_TLSv1_1 0 +#endif +#ifdef SSL_OP_NO_TLSv1_2 +#define SSL_PROTO_TLSv1_2 SSL_OP_NO_TLSv1_2 +#else +#define SSL_PROTO_TLSv1_2 0 +#endif +#define SSL_PROTO_TLS (SSL_PROTO_TLSv1 | SSL_PROTO_TLSv1_1 | SSL_PROTO_TLSv1_2) +#define SSL_PROTO_ALL (SSL_PROTO_SSL | SSL_PROTO_TLS) +#define SSL_PROTO_NONE 0 + +#define str_is_token(str, tok, len) \ + (len == sizeof(tok) - 1 && pg_strncasecmp(str, tok, len) == 0) + +static long +parse_SSL_