Re: [HACKERS] unique index violation after pg_upgrade to PG10
On Tue, Oct 24, 2017 at 01:30:19PM -0500, Justin Pryzby wrote: > On Tue, Oct 24, 2017 at 01:27:14PM -0500, Kenneth Marshall wrote: > > On Tue, Oct 24, 2017 at 01:14:53PM -0500, Justin Pryzby wrote: > > > > Note: > > > I run a script which does various combinations of ANALYZE/VACUUM > > > (FULL/ANALYZE) > > > following the upgrade, and a script runs nightly with REINDEX and > > > pg_repack > > > (and a couple of CLUSTER), so you should assume that any combination of > > > those > > > maintenance commands have been run. > > > > > > In our reindex/repack log I found the first error due to duplicates: > > > Tue Oct 24 01:27:53 MDT 2017: sites: sites_idx(repack non-partitioned)... > > > WARNING: Error creating index "public"."index_61764": ERROR: could not > > > create unique index "index_61764" > > > DETAIL: Key (site_office, site_location)=(CRCLMT-DOEMS0, 1120) is > > > duplicated. > > > WARNING: Skipping index swapping for "sites", since no new indexes built > > > WARNING: repack failed for "sites_idx" > > > reindex: warning, dropping invalid/unswapped index: index_61764 > > > > > > > Hi Justin, > > > > This sounds like a pg_repack bug and not a PostgreSQL bug. What version are > > you running? > > Really ? pg_repack "found" and was victim to the duplicate keys, and rolled > back its work. The CSV logs clearly show that our application INSERTed rows > which are duplicates. > > [pryzbyj@database ~]$ rpm -qa pg_repack10 > pg_repack10-1.4.2-1.rhel6.x86_64 > > Justin Hi Justin, I just dealt with a similar problem with pg_repack and a PostgreSQL 9.5 DB, the exact same error. It seemed to caused by a tuple visibility issue that allowed the "working" unique index to be built, even though a duplicate row existed. Then the next pg_repack would fail with the error you got. In our case I needed the locality of reference to keep the DB performance acceptable and it was not a critical issue if there was a duplicate. We would remove the duplicates if we had a failure. We never had a problem with the NO pg_repack scenarios. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unique index violation after pg_upgrade to PG10
On Tue, Oct 24, 2017 at 01:14:53PM -0500, Justin Pryzby wrote: > I upgrade another instance to PG10 yesterday and this AM found unique key > violations. > > Our application is SELECTing FROM sites WHERE site_location=$1, and if it > doesn't find one, INSERTs one (I know that's racy and not ideal). We ended up > with duplicate sites, despite a unique index. We removed the duplicate rows > and reindexed fine. This is just a heads up with all the detail I can fit in > a > mail (but there's more if needed). > ... > Note: > I run a script which does various combinations of ANALYZE/VACUUM > (FULL/ANALYZE) > following the upgrade, and a script runs nightly with REINDEX and pg_repack > (and a couple of CLUSTER), so you should assume that any combination of those > maintenance commands have been run. > > In our reindex/repack log I found the first error due to duplicates: > Tue Oct 24 01:27:53 MDT 2017: sites: sites_idx(repack non-partitioned)... > WARNING: Error creating index "public"."index_61764": ERROR: could not > create unique index "index_61764" > DETAIL: Key (site_office, site_location)=(CRCLMT-DOEMS0, 1120) is duplicated. > WARNING: Skipping index swapping for "sites", since no new indexes built > WARNING: repack failed for "sites_idx" > reindex: warning, dropping invalid/unswapped index: index_61764 > Hi Justin, This sounds like a pg_repack bug and not a PostgreSQL bug. What version are you running? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Wed, Aug 16, 2017 at 05:58:41PM -0400, Tom Lane wrote: > Robert Haaswrites: > > Attached is a quick sketch of how this could perhaps be done (ignoring > > for the moment the relatively-boring opclass pushups). It introduces > > a new function hash_any_extended which differs from hash_any() in that > > (a) it combines both b and c into the result and (b) it accepts a seed > > which is mixed into the initial state if it's non-zero. > > > Comments? > > Hm. Despite the comment at lines 302-304, I'm not sure that we ought > to do this simply by using "b" as the high order bits. AFAICS that > exposes little or no additional randomness; in particular it seems > unlikely to meet Jenkins' original design goal that "every 1-bit and > 2-bit delta achieves avalanche". There might be some simple way to > extend the existing code to produce a mostly-independent set of 32 more > bits, but I wonder if we wouldn't be better advised to just keep Jenkins' > code as-is and use some other method entirely for producing the > 32 new result bits. > > ... In fact, on perusing the linked-to page > http://burtleburtle.net/bob/hash/doobs.html > Bob says specifically that taking b and c from this hash does not > produce a fully random 64-bit result. He has a new one that does, > lookup3.c, but probably he hasn't tried to make that bit-compatible > with the 1997 algorithm. > Hi, The updated hash functions that we currently use are based on Bob Jenkins lookup3.c and using b as the higher order bits is pretty darn good. I had lobbied to present the 64-bit b+c hash in the original work for similar reasons. We are definitely not using a lookup2.c version from 1997. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Data at rest encryption
On Wed, Jun 14, 2017 at 12:04:26PM +0300, Aleksander Alekseev wrote: > Hi Ants, > > On Tue, Jun 13, 2017 at 09:07:49AM -0400, Peter Eisentraut wrote: > > On 6/12/17 17:11, Ants Aasma wrote: > > > I'm curious if the community thinks this is a feature worth having? > > > Even considering that security experts would classify this kind of > > > encryption as a checkbox feature. > > > > File system encryption already exists and is well-tested. I don't see > > any big advantages in re-implementing all of this one level up. You > > would have to touch every single place in PostgreSQL backend and tool > > code where a file is being read or written. Yikes. > > I appreciate your work, but unfortunately I must agree with Peter. > > On Linux you can configure the full disc encryption using LUKS / > dm-crypt in like 5 minutes [1]. On FreeBSD you can do the same using > geli [2]. In my personal opinion PostgreSQL is already complicated > enough. A few companies that hired system administrators that are too > lazy to read two or three man pages is not a reason to re-implement file > system encryption (or compression, or mirroring if that matters) in any > open source RDBMS. > Hi Aleksander, While I agree that configuring full disk encryption is not technically difficult, it requires much more privileged access to the system and basically requires the support of a system administrator. In addition, if a volume is not available for encryption, PostgreSQL support for encryption would still allow for its data to be encrypted and as others have mentioned can be enabled by the DBA alone. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On Fri, May 12, 2017 at 02:23:14PM -0400, Robert Haas wrote: > > What about integers? I think we're already assuming two's-complement > arithmetic, which I think means that the only problem with making the > hash values portable for integers is big-endian vs. little-endian. > That's sounds solveable-ish. > xxhash produces identical hashes independent for big-endian and little- endian. https://github.com/Cyan4973/xxHash Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE inlining
On Wed, May 03, 2017 at 02:33:05PM -0300, Alvaro Herrera wrote: > David Fetter wrote: > > > When we add a "temporary" GUC, we're taking on a gigantic burden. > > Either we support it forever somehow, or we put it on a deprecation > > schedule immediately and expect to be answering questions about it for > > years after it's been removed. > > > > -1 for the GUC. > > Absolutely. > > So ISTM we have three choices: > > 1) we switch unmarked CTEs as inlineable by default in pg11. What seems > likely to happen for a user that upgrades to pg11 is that 5 out of 10 > CTE-using queries are going to become faster than with pg10, and they > are going to be happy; 4 out of five are going to see no difference, but > they didn't have to do anything about it; and the remaining query is > going to become slower, either indistinguishably so (in which case they > don't care and they remain happy because of the other improvements) or > notably so, in which case they can easily figure where to add the > MATERIALIZED option and regain the original performance. > > > 2) unmarked CTEs continue to be an optimization barrier, but we add > "WITH INLINED" so that they're inlineable. Some users may wonder about > it and waste a lot of time trying to figure out which CTEs to add it to. > They see a benefit in half the queries, which makes them happy, but they > are angry that they had to waste all that time on the other queries. > > > 3) We don't do anything, because we all agree that GUCs are not > suitable. No progress. No anger, but nobody is happy either. > +1 for option 1. I just finished rewriting a well written CTE query to avoid the optimization fence and get reasonable performance. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bytea_output output of base64
On Thu, Feb 23, 2017 at 03:52:46PM -0800, David Fetter wrote: > On Thu, Feb 23, 2017 at 05:55:37PM -0500, Bruce Momjian wrote: > > On Thu, Feb 23, 2017 at 04:08:58PM -0500, Tom Lane wrote: > > > Bruce Momjianwrites: > > > > Is there a reason we don't support base64 as a bytea_output output > > > > option, except that no one has implemented it? > > > > > > How about "we already have one too many bytea output formats"? > > > I don't think forcing code to try to support still another one > > > is a great thing ... especially not if it couldn't be reliably > > > distinguished from the hex format. > > > > Is there a reason we chose hex over base64? > > Whether there was or not, there's not a compelling reason now to break > people's software. When people want compression, methods a LOT more > effective than base64 are common. Gzip, for example. > > Best, > David. First, hex encoding is very simple to perform. Second, most applications have routines to handle it trivially. And third, base64 encoding has some padding constraints that can complicate is processing. Like David suggests, if you want compact, run it through lz4/gzip/lzop...for a much better size return. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Mon, Sep 19, 2016 at 12:14:26PM +0530, Amit Kapila wrote: > On Mon, Sep 19, 2016 at 11:20 AM, Mark Kirkwood >wrote: > > > > > > On 17/09/16 06:38, Andres Freund wrote: > >> > >> On 2016-09-16 09:12:22 -0700, Jeff Janes wrote: > >>> > >>> On Thu, Sep 15, 2016 at 7:23 AM, Andres Freund > >>> wrote: > > One earlier question about this is whether that is actually a worthwhile > goal. Are the speed and space benefits big enough in the general case? > Could those benefits not be achieved in a more maintainable manner by > adding a layer that uses a btree over hash(columns), and adds > appropriate rechecks after heap scans? > > Note that I'm not saying that hash indexes are not worthwhile, I'm just > doubtful that question has been explored sufficiently. > >>> > >>> I think that exploring it well requires good code. If the code is good, > >>> why not commit it? > >> > >> Because getting there requires a lot of effort, debugging it afterwards > >> would take effort, and maintaining it would also takes a fair amount? > >> Adding code isn't free. > >> > >> I'm rather unenthused about having a hash index implementation that's > >> mildly better in some corner cases, but otherwise doesn't have much > >> benefit. That'll mean we'll have to step up our user education a lot, > >> and we'll have to maintain something for little benefit. > >> > > > > While I see the point of what you are saying here, I recall all previous > > discussions about has indexes tended to go a bit like this: > > > > - until WAL logging of hash indexes is written it is not worthwhile trying > > to make improvements to them > > - WAL logging will be a lot of work, patches 1st please > > > > Now someone has done that work, and we seem to be objecting that because > > they are not improved then the patches are (maybe) not worthwhile. > > > > I think saying hash indexes are not improved after proposed set of > patches is an understatement. The read performance has improved by > more than 80% as compare to HEAD [1] (refer data in Mithun's mail). > Also, tests by Mithun and Jesper has indicated that in multiple > workloads, they are better than BTREE by 30~60% (in fact Jesper > mentioned that he is seeing 40~60% benefit on production database, > Jesper correct me if I am wrong.). I agree that when index column is > updated they are much worse than btree as of now, but no work has been > done improve it and I am sure that it can be improved for those cases > as well. > > In general, I thought the tests done till now are sufficient to prove > the importance of work, but if still Andres and others have doubt and > they want to test some specific cases, then sure we can do more > performance benchmarking. > > Mark, thanks for supporting the case for improving Hash Indexes. > > > [1] - > https://www.postgresql.org/message-id/CAD__OugX0aOa7qopz3d-nbBAoVmvSmdFJOX4mv5tFRpijqH47A%40mail.gmail.com > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com > +1 Throughout the years, I have seen benchmarks that demonstrated the performance advantages of even the initial hash index (without WAL) over the btree of a hash variant. It is pretty hard to dismiss the O(1) versus O(log(n)) difference. There are classes of problems for which a hash index is the best solution. Lack of WAL has hamstrung development in those areas for years. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] README of hash index
On Fri, Sep 16, 2016 at 04:50:53PM +0530, Amit Kapila wrote: > Currently README of hash module contain algorithms written in below form. > > The insertion algorithm is rather similar: > > pin meta page and take buffer content lock in shared mode > loop: > compute bucket number for target hash key > release meta page buffer content lock > if (correct bucket page is already locked) > break > release any existing bucket page lock (if a concurrent split happened) > take heavyweight bucket lock in shared mode > retake meta page buffer content lock in shared mode > -- (so far same as reader) > release pin on metapage > .. > .. > > I have mostly updated them in the patches I have proposed to improve > hash index. However, each time I try to update them, I find that it > is easy to follow the code than to read and understand the existing > algorithm written in above form from README. > > Do others find it useful to maintain the algorithms in above form? > Hi Amit, Speaking for myself, I think it does help to have a text description of the algorithm to use as a guide while trying to understand the code. For beginners (me), it is not always obvious what a section of code is doing. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hash index with larger hashes
Hello Developers, I have been following the recent discussions on increasing the size of the hash function used in Postgres and the work to provide WAL and performance improvements for hash indexes. I know it was mentioned when we moved to the new hashing functions, but the existing functions do provide an additional 32-bits of hash. We currently do not use them, but they are already calculated. It had occurred to me that one way to decrease the space used to store the hash value would be to include information about the page number to determine the actual value. For example, a hash index of 65k pages (540mb) would get two additional bytes of hash with no associated storage cost. Also, if you subdivided the hash page into say 128 sub-pages you would get the extra 2 bytes of hash in a 4mb index. In this way, the bigger the hash index is, the more bits you get for free. Just wanted to throw it out there. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup wish list
On Tue, Jul 12, 2016 at 11:06:39AM -0700, Jeff Janes wrote: > On Tue, Jul 12, 2016 at 10:48 AM, Peter Eisentraut >wrote: > > On 7/12/16 12:53 PM, Jeff Janes wrote: > >> The --help message for pg_basebackup says: > >> > >> -Z, --compress=0-9 compress tar output with given compression level > >> > >> But -Z0 is then rejected as 'invalid compression level "0"'. The real > >> docs do say 1-9, only the --help message has this bug. Trivial patch > >> attached. > > > > pg_dump --help and man page say it supports 0..9. Maybe we should make > > that more consistent. > > pg_dump actually does support -Z0, though. Well, sort of. It outputs > plain text. Rather than plain text wrapped in some kind of dummy gzip > header, which is what I had naively expected. > > Is that what -Z0 in pg_basebackup should do as well, just output > uncompressed tar data, and not add the ".gz" to the "base.tar" file > name? > > Cheers, > > Jeff > Hi, Yes, please support the no compression option. It can be useful in situations where the bottleneck is the compression itself (quite easily done with zlib based options, another plug for a higher performance option). Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: EOL for 8.2 (was Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers)
On Thu, Apr 21, 2011 at 06:04:09PM +0100, Dave Page wrote: On Thu, Apr 21, 2011 at 5:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ man, this thread has totally outlived its title, could we change that? ?I'll start with this subtopic ] Robert Haas robertmh...@gmail.com writes: In fact, I've been wondering if we shouldn't consider extending the support window for 8.2 past the currently-planned December 2011. There seem to be quite a lot of people running that release precisely because the casting changes in 8.3 were so painful, and I think the incremental effort on our part to extend support for another year would be reasonably small. ?I guess the brunt of the work would actually fall on the packagers. ?It looks like we've done 5 point releases of 8.2.x in the last year, so presumably if we did decide to extend the EOL date by a year or so that's about how much incremental effort would be needed. I agree that the incremental effort would not be so large, but what makes you think that the situation will change given another year? My expectation is that'd just mean people will do nothing about migrating for a year longer. More generally: it took a lot of argument to establish the current EOL policy, and bending it the first time anyone feels any actual pain will pretty much destroy the whole concept. It would also make at least one packager very unhappy as the 8.2 Windows build is by far the hardest and most time consuming to do and I happen to know he's been counting the days until it goes. More generally, keeping it for longer means we might end up supporting 6 major releases at once. That may not be so much work on a day to day basis, but it adds up to a lot at release times, which was one of the reasons why we agreed on the 5 year support window. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company +1 for cutting the cord on 8.2. People using it still will need to use the last release available, upgrade, or consult to have a back-port/build made. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Correctly producing array literals for prepared statements
On Wed, Feb 23, 2011 at 09:34:06AM -0600, Merlin Moncure wrote: On Tue, Feb 22, 2011 at 10:16 PM, Peter Geoghegan peter.geoghega...@gmail.com wrote: I'm investigating the possibility of developing a utility function for our C++ client library, libpqxx, that produces array literals that can be used in prepared statements. This problem appears to be a bit of a tar pit, so I'm hoping that someone can help me out. My goal is to produce a template function that accepts arbitrarily nested standard library containers, that contain at the most nested level constants/literals of some type that can be fed into a stream, such as an int or a std::string. I'm aware that I cannot assume that types are delimited by a single quote, even for built-in types. I thought that I would put the onus on the client to specify the correct delimiter, by checking pg_type themselves if necessary, but default to ',' . Is this a reasonable approach? Escaping/quoting individual elements seems tricky. I have produced a generic and superficially well behaved implementation by using double quotes for constants. However, I have now opened the door to malicious parties injecting multiple array elements where only one is allowed, or causing malformed array literal errors by simply including a double quote of their own. It's not clear where the responsibility should rest for escaping constants/ensuring that constants don't contain double quotes. Can someone suggest a better approach? I can't very well use single quotes, because they are escaped/doubled up when we pass the array literal to something similar to PQexecPrepared(), and they shouldn't be - strings end up looking like this: 'has errant single quotes on either side'. You can send nested arrays safely. You just have to be very formal about escaping *everything* both as you get it and as it goes into the container. This is what postgres does on the backend as it sends arrays out the door in text. It might be instructive to see what the server does in terms of escaping. Note that the way this works it's not impossible to see 128+ consecutive backslashes when dealing with arrays of composites. Since Postgres only supports encodings that are ASCII supersets, I don't believe that I have to consider encoding - only my clients do. Can someone please point me in the direction of an established client library/driver where all corner cases are covered, or at least enough of them to produce a net gain in usefulness? There may well be additional subtleties that have not occurred to me. yes: libpqtypes. it manages everything in binary. i've been thinking for a while that libpqtypes could be wrapped with variadic templates or other c++ trickery. Because libpqtypes does everything in binary, it completely sidesteps all the escaping nastiness. merlin Avoiding the escaping by using binary parameter transmission is the best method. Shameless plug: libpqtypes is great! I hope that it can be eventually included in the core distribution. It is not uncommon to get It's an add-on package??? and avoidance of pieces outside of the standard dist regardless of its value. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Correctly producing array literals for prepared statements
On Wed, Feb 23, 2011 at 03:34:45PM -0500, Andrew Chernow wrote: On 2/23/2011 3:06 PM, Peter Geoghegan wrote: On 23 February 2011 15:34, Merlin Moncuremmonc...@gmail.com wrote: You can send nested arrays safely. You just have to be very formal about escaping *everything* both as you get it and as it goes into the container. This is what postgres does on the backend as it sends arrays out the door in text. It might be instructive to see what the server does in terms of escaping. Note that the way this works it's not impossible to see 128+ consecutive backslashes when dealing with arrays of composites. Sounds tedious. It is tedious, which is one reason why libpqtypes went binary. There are some compelling performance reasons as well that affect both client and server. libpqtypes was originally developed to serve a very particular need and wasn't aiming to be general purpose. That came about along the way trying to solve the problem. Personally, PQexec is dead to me as well as text results from a C/C++ app. I see no advantage over libpqtypes in that context. Unless I am missing your ultimate goal, you'd probably get what you want by wrapping libpqtypes. The performance is one of the big reasons to use binary parameters. Converting/packing/transmitting/unpacking/converting use a lot of CPU resources on both the server and the client in addition to the larger communication resources needed by the text-based methods. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIKE, CHAR(), and trailing spaces
On Wed, Feb 02, 2011 at 07:48:38PM -0500, Bruce Momjian wrote: Brendan Jurd wrote: On 3 February 2011 10:54, Bruce Momjian br...@momjian.us wrote: It seems LIKE is considering the trailing CHAR(10) field spaces as significant, even though our documentations says: -- snip -- It says trailing spaces are not significant for character comparisons --- the real question is whether LIKE is a comparison. ?Obvioiusly '=' is a comparison, but the system does not treat LIKE as a comparison in terms of trailing spaces. ?Is that desired behavior? Interesting. I would have to say that from the user point of view, LIKE is definitely a comparison, and if the rest of the operators on bpchar ignore whitespace then LIKE ought to as well. Is the situation the same for regex matches (~ operators)? Yes, I think so: test= SELECT 'a'::char(10) ~ 'a$'; ?column? -- f (1 row) test= SELECT 'a'::char(10) ~ 'a *$'; ?column? -- t (1 row) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com In my mind LIKE/~ are pattern matching operators and not a simple comparison operator. PostgreSQL is doing the right thing in restricting the somewhat bizarre treatment of trailing spaces to the '=' comparison function. I can only imagine what would be needed to allow exceptions to the pattern matching syntax to allow you to actually work with and match the trailing spaces otherwise. +10 for leaving the behavior as is. Regards, Ken + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?
On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote: Hi, Here is the test table, postgres=# \d big_wf Table public.big_wf Column | Type | Modifiers +-+--- age| integer | id | integer | postgres=# \dt+ big_wf List of relations Schema | Name | Type | Owner | Size | Description ++---+--++- public | big_wf | table | workshop | 142 MB | The first query sorting on one column: postgres=# explain analyze select * from big_wf order by age; QUERY PLAN - Sort (cost=565525.45..575775.45 rows=410 width=8) (actual time=11228.155..16427.149 rows=410 loops=1) Sort Key: age Sort Method: external sort Disk: 72112kB - Seq Scan on big_wf (cost=0.00..59142.00 rows=410 width=8) (actual time=6.196..4797.620 rows=410 loops=1) Total runtime: 19530.452 ms (5 rows) The second query sorting on two columns: postgres=# explain analyze select * from big_wf order by age,id; QUERY PLAN - Sort (cost=565525.45..575775.45 rows=410 width=8) (actual time=37544.779..48206.702 rows=410 loops=1) Sort Key: age, id Sort Method: external merge Disk: 72048kB - Seq Scan on big_wf (cost=0.00..59142.00 rows=410 width=8) (actual time=6.796..5518.663 rows=410 loops=1) Total runtime: 51258.000 ms (5 rows) The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the first column(age) of all the tuples are of the same value, so the second column(id) is always needed for comparison. While the first sorting takes about only 6 seconds, the second one takes over 30 seconds, Is this too much than expected? Is there any possible optimization ? Thanks, Li Jie Hi Li, If I understand your description, in the first query the sort does not actually have to do anything because the column values for age are all degenerate. In the second query, you actually need to sort the values which is why it takes longer. If the first column values are the same, then simply sorting by id alone would be faster. You could also bump up work_mem for the query to perform the sort in memory. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is sorting on two columns so slower than sortingon one column?
On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote: Hi Ken, Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random. postgres=# explain analyze select * from big_wf order by id; QUERY PLAN - Sort (cost=565525.45..575775.45 rows=410 width=8) (actual time=25681.875..36458.824 rows=410 loops=1) Sort Key: id Sort Method: external merge Disk: 72048kB - Seq Scan on big_wf (cost=0.00..59142.00 rows=410 width=8) (actual time=8.595..5569.500 rows=410 loops=1) Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the cost are the same (575775.45). Thanks, Li Jie In the first query, the cost is basically the I/O cost to read the table from disk. The actual sort does not do anything since the sort values are the same. In the second query, the sort has to swap things in memory/disk to get them in the correct order for the result. This actually takes CPU and possibly additional I/O which is why it is slower. In the case of sorting by just the id column, the size of the sorted values is smaller which would need fewer batches to complete the sort since the sort is bigger than the work_mem. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?
On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote: - Original Message - From: Kenneth Marshall k...@rice.edu To: Li Jie jay23j...@gmail.com Cc: pgsql-hackers pgsql-hackers@postgresql.org Sent: Thursday, December 23, 2010 10:30 PM Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column? On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote: Hi Ken, Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random. postgres=# explain analyze select * from big_wf order by id; QUERY PLAN - Sort (cost=565525.45..575775.45 rows=410 width=8) (actual time=25681.875..36458.824 rows=410 loops=1) Sort Key: id Sort Method: external merge Disk: 72048kB - Seq Scan on big_wf (cost=0.00..59142.00 rows=410 width=8) (actual time=8.595..5569.500 rows=410 loops=1) Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the cost are the same (575775.45). Thanks, Li Jie In the first query, the cost is basically the I/O cost to read the table from disk. The actual sort does not do anything since the sort values are the same. In the second query, the sort has to swap things in memory/disk to get them in the correct order for the result. This actually takes CPU and possibly additional I/O which is why it is slower. In the case of sorting by just the id column, the size of the sorted values is smaller which would need fewer batches to complete the sort since the sort is bigger than the work_mem. Cheers, Ken Hi Ken, Thanks for your analysis. But in the last query that sorts on id, since the query selects all the columns for output, the actual sorted size is the same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the comparison. Am I right? Thanks, Li Jie I think you are right. Sorry for the confusion. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FeatureRequest] Base Convert Function
On Tue, Dec 21, 2010 at 11:28:17PM +0200, Pavel Golub wrote: Hello, Pavel. You wrote: PS Hello PS Dne 21. prosince 2010 21:11 Tom Mudru??ka to...@mudrunka.cz napsal(a): Thx for you answers :-) Well... i know that i can write my own plugin and i am familiar with C so this is not the problem, but i think that such feature should be implemented directly in PgSQL because there are already functions for converting to/from base 16 so why don't make this more flexible and generalize it to any other radix? It's quite simple to do and i don't see any reason why 16 should be there and 8, 32 or 36 shouldn't :-) PS * It isn't a typical and often request, PS * There are not hard breaks for custom implementation, PS * You can use plperu or plpython based solutions, PS * It's not part of ANSI SQL But MySQL has such function. What's wrong with us? ;) You are not really helping to make a good case... :) Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, patch v20 (bitrot fixes)
On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: David Fetter da...@fetter.org writes: On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: I think you mean Unicode is not a superset of all character sets. I've heard this before but never found what's missing. [citation needed]? Windows-1252, ISO-2022-JP-2 and EUC-TW are such encodings. [citation needed]? Exactly what characters are missing, and why would the Unicode people have chosen to leave them out? It's not like they've not heard of those encodings, I'm sure. regards, tom lane Here is an interesting description of some of the gotchas: http://en.wikipedia.org/wiki/Windows-1252 Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, patch v20 (bitrot fixes)
On Mon, Dec 20, 2010 at 03:08:48PM -0500, Tom Lane wrote: Kenneth Marshall k...@rice.edu writes: On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: [citation needed]? Exactly what characters are missing, and why would the Unicode people have chosen to leave them out? It's not like they've not heard of those encodings, I'm sure. Here is an interesting description of some of the gotchas: http://en.wikipedia.org/wiki/Windows-1252 Well, it's interesting, but I see no glyphs on that page that lack Unicode assignments. regards, tom lane You are correct. I mis-read the text. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Default mode for shutdown
On Wed, Dec 15, 2010 at 09:39:12AM -0500, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: I'm sure this has been up before, but hey, let's take it another round. Why don't we change the default shutdown mode for pg_ctl from smart to fast? I've never come across a single usecase where smart is what people *want*... Really? Personally I'm quite happy with that default. regards, tom lane +1 I think the default is perfect. Even if the usecase that is wanted is fast, it should be requested each time to verify that a more destructive shutdown is wanted. If it is really an issue, a script or shell alias can be defined to perform the more aggressive shutdown processes. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why percent_rank is so slower than rank?
On Thu, Dec 09, 2010 at 05:18:57PM -0500, Tom Lane wrote: I wrote: We're throwing away one tuple at a time as we advance forward through the tuplestore, and moving 10+ tuple pointers each time. Ugh. This code was all right when written, because (IIRC) the mergejoin case was actually the only caller. But it's not all right for WindowAgg's less-predictable usage patterns. I thought for a bit about changing things around so that the first-used tuple slot isn't necessarily state-memtuples[0], but just like the comment says, that complicates a lot of other logic. And there isn't any easy place to reclaim the wasted slots later. What seems like the best bet is to put in a heuristic to make tuplestore_trim simply not do anything until nremove reaches some reasonably large amount, perhaps 10% of the number of stored tuples. This wastes up to 10% of the alloted memory, but that seems tolerable. On reflection I think just not doing anything isn't a very good idea. The problem with that is that a mis-coded caller could try to fetch tuples that it had already told the tuplestore could be trimmed away; and this would work, most of the time, until you got unlucky and the trim operation had actually deleted them. I think it's pretty important for bug-catching purposes that the tuplestore enforce that those tuples are not available anymore. Hence the attached patch, which combines the two ideas by recycling tuples immediately but not sliding the pointer array until a reasonable amount of movement has occurred. This fixes the complained-of performance problem AFAICT. I'm not sure whether or not to back-patch this into 9.0 and 8.4. The code in tuplestore.c hasn't changed at all since 8.4, so there's not much risk of cross-version bugs, but if I did miss anything we could be shipping a buggy version next week. Thoughts? regards, tom lane +1 for back patching. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggesting a libpq addition
On Mon, Dec 06, 2010 at 10:14:55AM -0500, Merlin Moncure wrote: On Mon, Dec 6, 2010 at 9:55 AM, Marc Balmer m...@msys.ch wrote: Am 06.12.10 15:37, schrieb Merlin Moncure: On Sun, Dec 5, 2010 at 5:10 AM, Magnus Hagander mag...@hagander.net wrote: On Sun, Dec 5, 2010 at 10:22, Marc Balmer m...@msys.ch wrote: I am suggesting adding a function to libpq: PGresult *PQvexec(PGconn *conn, const char *fmt, ...); It behaves similar to PQexec, but it allows for printf style varargs and How is that not a horrible idea, compared to using PQexecParams()? You have to remember to do all your escaping and things manually, whereas PQexecParams() does it automatically. It's only horrible if you stick to printf style formatting and you are using sting techniques to inject parameters into the query. ?Non parameterized queries should obviously be discouraged. ?However, it's entirely possible to wrap the parameterized interfaces with vararg interface (I should know, because we did exactly that) :-). ?This gives you the best of both worlds, easy coding without sacrificing safety. ?You might not remember the libpqtypes proposal, but libpq was specifically extended with callbacks so that libpqtypes could exist after the community determined that libpqtypes was too big of a change to the libpq library. ?I think ultimately this should be revisited, with libpqtypes going in core or something even richer...I've been thinking for a while that postgres types should be abstracted out of the backend into a library that both client and server depend on. With libpqtypes, we decided to use postgres style format markers: select PQexecf(conn, select %int4 + %int8, an_int, a_bigint); Everything is schema qualified, so that user types are supported (of course, this requires implementing handling on the client). Data routed through the binary protocol, with all the byte swapping etc handled by the library. ?No escaping necessary. ?We also added full support for arrays and composites, which are a nightmare to deal with over straight libpq, and various other niceties like thread safe error handling. That would be a *HUGE* piece of software compared the relatively small thing I am suggesting... well, it's already written. All you would have to do is compile it. As for escaping (or not escaping) of string arguments, that can be seen as a bug or a feature. ?I do not wan't automatic escaping of string arguments in all cases, e.g. I might to construct an SQL statement with dynamic parts WHERE xy or AND a = b. libpqtypes doesn't escape at all. It uses the internal parameterized interfaces that don't require it. For particular types, like bytea and timestamps, this much faster because we use the binary wire format. Less load on the client and the server. hypothetical example: filter = WHERE name like 'Balmer%'; if (sort == SORT_DESC) ? ? ? ?sort = ORDER BY name DESCENDING; PQvexec(conn, SELECT name, nr, id FROM address %s%s, filter, sort); So what I am aiming at right now is a PQvexec() function that basically has printf() like semantics, but adds an additional token to the format string (printf uses %s and %b to produce strings.) I am thinking of adding %S and %B, which produce strings that are escaped. That would be a small function, and reasonably safe. ?Or rather, the safety is in the hands of the programmer. What you are suggesting doesn't provide a lot of value over sprintf the query first, then exec it. You can do what you are suggesting yourself, wrapping PQexec: A hypothetical wrapper would be implemented something like: va_list ap; char buf[BUFSZ]; va_start(ap, query) vsnprintf(buf, BUFSZ. query, ap); va_end(ap); return PQexec(buf); This is a bad idea (security, escaping, performance)...we wrote a faster, safer way to do it, with richer type support. Or you can do it yourself. merlin I have used the libpqtypes library and it is very easy to use. +1 for adding it or something like it to the PostgreSQL core. I have people who will try and roll their own because it does not come with the core. While it is a hoot to see what reinventing the wheel produces, it is also prone to mistakes. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: Linux huge pages with Postgres
On Sat, Nov 27, 2010 at 02:27:12PM -0500, Tom Lane wrote: We've gotten a few inquiries about whether Postgres can use huge pages under Linux. In principle that should be more efficient for large shmem regions, since fewer TLB entries are needed to support the address space. I spent a bit of time today looking into what that would take. My testing was done with current Fedora 13, kernel version 2.6.34.7-61.fc13.x86_64 --- it's possible some of these details vary across other kernel versions. You can test this with fairly minimal code changes, as illustrated in the attached not-production-grade patch. To select huge pages we have to include SHM_HUGETLB in the flags for shmget(), and we have to be prepared for failure (due to permissions or lack of allocated hugepages). I made the code just fall back to a normal shmget on failure. A bigger problem is that the shmem request size must be a multiple of the system's hugepage size, which is *not* a constant even though the test patch just uses 2MB as the assumed value. For a production-grade patch we'd have to scrounge the active value out of someplace in the /proc filesystem (ick). I would expect that you can just iterate through the size possibilities pretty quickly and just use the first one that works -- no /proc groveling. In addition to the code changes there are a couple of sysadmin requirements to make huge pages available to Postgres: 1. You have to configure the Postgres user as a member of the group that's permitted to allocate hugepage shared memory. I did this: sudo sh -c id -g postgres /proc/sys/vm/hugetlb_shm_group For production use you'd need to put this in the PG initscript, probably, to ensure it gets re-set after every reboot and before PG is started. Since it would take advantage of them automatically, this would be just a normal DBA/admin task. 2. You have to manually allocate some huge pages --- there doesn't seem to be any setting that says just give them out on demand. I did this: sudo sh -c echo 600 /proc/sys/vm/nr_hugepages which gave me a bit over 1GB of space reserved as huge pages. Again, this'd have to be done over again at each system boot. Same. For testing purposes, I figured that what I wanted to stress was postgres process swapping and shmem access. I built current git HEAD with --enable-debug and no other options, and tested with these non-default settings: shared_buffers 1GB checkpoint_segments 50 fsyncoff (fsync intentionally off since I'm not trying to measure disk speed). The test machine has two dual-core Nehalem CPUs. Test case is pgbench at -s 25; I ran several iterations of pgbench -c 10 -T 60 bench in each configuration. And the bottom line is: if there's any performance benefit at all, it's on the order of 1%. The best result I got was about 3200 TPS with hugepages, and about 3160 without. The noise in these numbers is more than 1% though. This is discouraging; it certainly doesn't make me want to expend the effort to develop a production patch. However, perhaps someone else can try to show a greater benefit under some other test conditions. regards, tom lane I would not really expect to see much benefit in the region that the normal TLB page size would cover with the typical number of TLB entries. 1GB of shared buffers would not be enough to cause TLB thrashing with most processors. Bump it to 8-32GB or more and if the queries use up TLB entries with local work_mem you should see some more value in the patch. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlogged tables
On Wed, Nov 17, 2010 at 02:16:06PM -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert is probably going to object that he wanted to prevent any fsyncing for unlogged tables, but the discussion over in pgsql-general is crystal clear that people do NOT want to lose unlogged data over a clean shutdown and restart. ?If all it takes to do that is to refrain from lobotomizing the checkpoint logic for unlogged tables, I say we should refrain. I think that's absolutely a bad idea. The customer is always right, and I think we are hearing loud and clear what the customers want. Please let's not go out of our way to create a feature that isn't what they want. regards, tom lane I would be fine with only having a safe shutdown with unlogged tables and skip the checkpoint I/O all other times. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protecting against unexpected zero-pages: proposal
On Tue, Nov 09, 2010 at 02:05:57PM -0500, Robert Haas wrote: On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark gsst...@mit.edu wrote: On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk ai...@highrise.ca wrote: So, for getting checksums, we have to offer up a few things: 1) zero-copy writes, we need to buffer the write to get a consistent checksum (or lock the buffer tight) 2) saving hint-bits on an otherwise unchanged page. ?We either need to just not write that page, and loose the work the hint-bits did, or do a full-page WAL of it, so the torn-page checksum is fixed Actually the consensus the last go-around on this topic was to segregate the hint bits into a single area of the page and skip them in the checksum. That way we don't have to do any of the above. It's just that that's a lot of work. And it still allows silent data corruption, because bogusly clearing a hint bit is, at the moment, harmless, but bogusly setting one is not. I really have to wonder how other products handle this. PostgreSQL isn't the only database product that uses MVCC - not by a long shot - and the problem of detecting whether an XID is visible to the current snapshot can't be ours alone. So what do other people do about this? They either don't cache the information about whether the XID is committed in-page (in which case, are they just slower or do they have some other means of avoiding the performance hit?) or they cache it in the page (in which case, they either WAL log it or they don't checksum it). I mean, there aren't any other options, are there? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company That would imply that we need to have a CRC for just the hint bit section or some type of ECC calculation that can detect bad hint bits independent of the CRC for the rest of the page. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Thu, Nov 04, 2010 at 10:00:40AM +, Dean Rasheed wrote: On 3 November 2010 09:24, Nicolas Barbier nicolas.barb...@gmail.com wrote: 2010/11/2 Kenneth Marshall k...@rice.edu: Given that our hash implimentation mixes the input data well (It does. I tested it.) then a simple rotate-and-xor method is all that should be needed to maintain all of the needed information. The original hash function has done the heavy lifting in this case. Even with the perfect hash function for the elements, certain combinations of elements could still lead to massive collisions. E.g., if repeated values are typical in the input data we are talking about, then the rotate-and-xor method would still lead to collisions between any array of the same values of certain lengths, regardless of the value. In Tom's implementation, as he mentioned before, those problematical lengths would be multiples of 32 (e.g., an array of 32 1s would collide with an array of 32 2s would collide with an array of 32 3s, etc). Yeah, rotate-and-xor is a pretty weak hashing algorithm, since any array of 32 identical elements will hash to either 0 or -1. Similarly various permutations or multiples of that array length will cause it to perform badly. The multiply-by-m algorithm doesn't have that weakness, provided m is chosen carefully. There are a couple of qualities a good algorithm should possess: 1). The bits from the individual element hash values should be distributed evenly so that no 2 different hash values would result in the same contribution to the final value. This is easy to achieve - just make sure that m is odd. 2). The way that each element's hash value bits are distributed should be different from the way that every other element's hash value bits are distributed. m=31 achieves this pretty well, although there are plenty of other equally valid choices. Regards, Dean Hi Dean, In my comment yesterday, I included a simple function that would allow us to leverage our current hash functions mixing process to scramble the bits effectively and retaining the maximum amount of information in the hash. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Wed, Nov 03, 2010 at 10:24:16AM +0100, Nicolas Barbier wrote: 2010/11/2 Kenneth Marshall k...@rice.edu: Given that our hash implimentation mixes the input data well (It does. I tested it.) then a simple rotate-and-xor method is all that should be needed to maintain all of the needed information. The original hash function has done the heavy lifting in this case. Even with the perfect hash function for the elements, certain combinations of elements could still lead to massive collisions. E.g., if repeated values are typical in the input data we are talking about, then the rotate-and-xor method would still lead to collisions between any array of the same values of certain lengths, regardless of the value. In Tom's implementation, as he mentioned before, those problematical lengths would be multiples of 32 (e.g., an array of 32 1s would collide with an array of 32 2s would collide with an array of 32 3s, etc). Nicolas True. I just took another look at our defined hash functions and it looks like we can make a simple variant of hash_uint32() that we can use as a stream checksum. The only thing missing is that ability to pass in the current 32-bit hash value as a starting seed to add the next 32-bit value. Something like this should work: Datum hash_uint32(uint32 k, uint32 initval) { register uint32 a, b, c; a = b = c = 0x9e3779b9 + (uint32) sizeof(uint32) + 3923095 + initval; a += k; final(a, b, c); /* report the result */ return UInt32GetDatum(c); } Then if you pass in the current value as the initval, it should mix well each additional 32-bit hash value. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Tue, Nov 02, 2010 at 04:42:19PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Really? ?I think I don't understand when this fails isn't obviously better than being able to predict when it fails ... Isn't that the whole point of hash functions? Collisions are inevitable, but you want them to be unpredictable. If you want a hash function with predictable collision behavior, just has the first element. It'll be highly predictable AND wicked fast. That seems like a rather poor straw man, since it suffers from exactly the defect I'm complaining about, namely failing to consider all the input values equally. I'd be happier about this approach if there were some actual theory behind it ... maybe there's some analysis out there, but the one link that was given was just about entirely unconvincing. I think it's from Knuth, though unfortunately I don't have a copy to check. There are probably better algorithms out there, but this one's pretty simple. I don't see anything in Knuth suggesting a multiplier of 31. His recommendation for a multiplier, if you're going to use multiplicative hashing, is wordsize/phi (phi being the golden ratio) ... and he also wants you to keep the high order not the low order bits of the product. However, this is largely beside the point, because that theory, as well as the Java code you're arguing from, has to do with the initial hashing of a raw sequence of input items. Not with combining some existing hash values. The rotate-and-xor method I suggested for that is borrowed exactly from section 6.4 of Knuth (page 512, in the first edition of volume 3). regards, tom lane Given that our hash implimentation mixes the input data well (It does. I tested it.) then a simple rotate-and-xor method is all that should be needed to maintain all of the needed information. The original hash function has done the heavy lifting in this case. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Wed, Oct 20, 2010 at 10:10:00AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Oct 19, 2010 at 11:30 PM, daveg da...@sonic.net wrote: On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: Heh - provided you specify SHM_HUGETLB in the relevant call that is :-) I had a patch for this against 8.3 that I could update if there is any interest. I suspect it is helpful. I think it would be a good feature. Of course, we would need appropriate documentation, and some benchmarks showing that it really works. I believe that for the equivalent Solaris option, we just automatically enable it when available. So there'd be no need for user documentation. However, I definitely *would* like to see some benchmarks proving that the change actually does something useful. I've always harbored the suspicion that this is just a knob to satisfy people who need knobs to frob. regards, tom lane Oracle apparently uses hugepages if they are available by first trying with the SHM_HUGETLB option. If it fails, they reissue the command without that option. This article does mention some of the benefits of the larger pagesizes with large shared memory regions: http://appcrawler.com/wordpress/?p=686 Regard, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we have a database specification in .pgpass?
On Thu, Oct 14, 2010 at 06:09:35AM +0200, Dennis Bj??rklund wrote: We have a database specification in .pgpass: hostname:port:database:username:password What is the purpose of 'database' since username/password combinations are global, not per database? I would like to documents its purpose. There is the GUC parameter db_user_namespace. Just for that you could probably use den...@foo as username instead, so maybe it's not the purpose. But I can't think of any other reason. /Dennis This will allow the same user to save and use automatically different passwords for each separate database. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another Modest Proposal: Platforms
On Wed, Sep 22, 2010 at 01:17:54PM -0700, David Fetter wrote: Folks, While it's interesting to note, in an historical sense, that a platform most recently updated when 1999 was still in the future, I think it's time we did a little pruning. We can start by supporting only platforms git runs on, this being the first in what I'd picture as a set of base requirements. What say? Cheers, David. Given the amount of trouble I had to get a git for a Solaris 8 system, I am not too keen on this definition for platform. PostgreSQL runs very well on the same system, along with SVN and CVS. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is time with timezone 12 bytes?
On Wed, Sep 22, 2010 at 10:54:53PM +0100, Thom Brown wrote: On 22 September 2010 22:01, Josh Berkus j...@agliodbs.com wrote: All, I was just checking on our year-2027 compliance, and happened to notice that time with time zone takes up 12 bytes. ?This seems peculiar, given that timestamp with time zone is only 8 bytes, and at my count we only need 5 for the time with microsecond precision. ?What's up with that? Also, what is the real range of our 8-byte *integer* timestamp? The time is 8 bytes, (1,000,000 microseconds * 60 minutes, * 24 hours = 1,440,000,000 microseconds = 31 bits = 8 bytes). 31 bits = approx. 4 bytes at 8 bits/byte, not 8 bytes. The timezone displacement takes up to 12 bits, meaning 3 bytes. (1460+1459 = 2919 = 12 bits = 3 bytes). So that's 11 bytes. Not sure where the extra 1 byte comes from. This would yield 7 bytes. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] beta3 the open items list
On Sun, Jun 20, 2010 at 03:01:04PM -0500, Kevin Grittner wrote: Joshua D. Drake wrote: Can someone tell me what we are going to do about firewalls that impose their own rules outside of the control of the DBA? Has anyone actually seen a firewall configured for something so stupid as to allow *almost* all the various packets involved in using a TCP connection, but which suppressed just keepalive packets? That seems to be what you're suggesting is the risk; it's an outlandish enough suggestion that I think the burden of proof is on you to show that it happens often enough to make this a worthless change. -Kevin I have seen this sort of behavior but in every case it has been the result of a myopic view of firewall/IP tables solutions to perceived attacks. While I do agree that having heartbeat within the replication process it worthwhile, it should definitely be 9.1 material at best. For 9.0 such ill-behaved environments will need much more interaction by the DBA with monitoring and triage of problems as they arrive. Regards, Ken P.S. My favorite example of odd behavior was preemptively dropping TCP packets in one direction only at a single port. Many, many odd things happen when the kernel does not know that the packet would never make it to it destination. Services would sometimes run for weeks without a problem depending on when the port ended up being used invariably at night or on the weekend. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BYTEA / DBD::Pg change in 9.0 beta
On Wed, May 19, 2010 at 10:54:01AM -0400, Robert Haas wrote: On Wed, May 19, 2010 at 10:17 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: On 05/19/2010 08:13 AM, Tom Lane wrote: Bernd Helmle maili...@oopsware.de writes: --On 18. Mai 2010 23:20:26 +0200 Jesper Krogh jes...@krogh.cc wrote: May I ask whats the reason is for breaking the compatibillity? Efficency, if i am allowed to call it this way. The new hex representation should be more efficient to retrieve and to handle than the old one. I think bytea_output was set to hex for testing purposes on the first hand, but not sure wether there was a consensus to leave it there finally later. Yeah, we intentionally set it that way initially to help find stuff that needs to be updated (as DBD::Pg evidently does). ?It's still TBD whether 9.0.0 will ship with that default or not. given how much faster the new format is (or rather how slow the old one was) and the number of people I have seen complaining why is bytea so slow) I would like to see it staying turned on by default. However this also depends on how quickly database driver developers can adapt. I would favor waiting a release to turn it on by default, precisely to give driver developers time to adapt. Changing something like that within the minor release arc is not a good idea. It would be better to have it on by default and if the driver developers are not up to use it, they can have that as a setting that they will need to change when going to 9.0. I would be very upset to have a minor upgrade break my database. At least the major upgrades have more testing. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BYTEA / DBD::Pg change in 9.0 beta
On Tue, May 18, 2010 at 03:26:17PM -0600, Alex Hunsaker wrote: On Tue, May 18, 2010 at 15:20, Jesper Krogh jes...@krogh.cc wrote: On 2010-05-18 23:12, Alex Hunsaker wrote: set bytea_output 'escape'; That was it. Knowing what the problem was I had no problem finding it in the release notes. May I ask whats the reason is for breaking the compatibillity? There were a couple IIRC, the big ones being speed and size. Id look at the archives for more. I imagine at some point DBD::Pg will handle this transparently. I also imagine Greg would happily accept patches :-) Yes, the new format is much faster, more space efficient, and uses less CPU to do the encoding. The older format caused the COPY for bytea to be CPU limited in many more situations. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal
Hi Peter, All you need to do is define your own sequence with an increment of 500. Look at: http://www.postgresql.org/docs/8.4/static/sql-createsequence.html Regards, Ken On Fri, May 14, 2010 at 02:56:18PM -0400, Peter Crabtree wrote: Recently, in preparation for migrating an application to postgres, I got to this part of the manual (which is *excellent* so far, by the way): http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html A quick check with the folks on #postgresql confirmed my understanding, which was that the locking semantics of setval() and nextval() make this unsafe: SELECT setval('my_seq', nextval('my_seq') + 500); Now, I was reminded that I could simply do this: SELECT nextval('my_seq') FROM generate_series(1, 500); But of course then I would have no guarantee that I would get a contiguous block of ids, which means if I'm using this to do a mass insert of records which refer to each others' ids (example: storing a directed, linear graph), I either have to do a correlated update on the client side, after transferring the keys (consider the cost of doing this for a few million records - 4 MB in keys per million records, for, in extreme cases, 12 MB of data to be inserted -- 33% overhead in the worst case, presuming symmetric bandwidth), or I have to insert into a temporary table, then have the db backend do the update, then insert from there to the real table. Both are imperfect options in terms of performance and complexity. Thus, before I start work on it, I propose an extension to the current nextval(): SELECT nextval('my_seq', 500); This would increment the my_seq sequence by its interval * 500, and return the first valid key. This both makes client code that needs a bunch of PKs simpler to implement, and saves in performance, since the client can just replace all its PKs (presuming they're currently a contiguous block from 1 to n) with my_starting_pk + current_pk, so this: pk | next_node +--- 0 | 1 1 | 2 2 | 0 can be easily updated like this: SELECT nextval('my_seq', (SELECT count(*) FROM my_table)); UPDATE my_table SET pk = currval('my_seq') + pk, next_node = currval('my_seq') + next_node; to something like this: pk | next_node +-- 521650 |521651 521651 |521652 521652 |521650 This is a net gain of performance and ease of implementation in many cases where a large number of ids from a sequence are needed -- with a small added benefit of the keys being guaranteed to be contiguous. I don't see any technical problems with this; postgres already can pre-allocate more than one key, but the number is semi-static (the CACHE parameter to CREATE SEQUENCE). This might break existing user code if they've defined a nextval(regclass, integer), but I don't see any way to Finally, I've checked sequence.c -- this looks pretty straightforward to implement, but I figured checking with this list was wise before starting work. Apologies if I've been overly wordy. Peter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] construct_array() use with PQexec with binary data
Dear PostgreSQL development community, I am working on adapting a regular PQexec() call to use binary transmission of the parameters. One of the parameters is an array of BIGINT. Looking in include/utils/array.h, it appears that construct_array() will do exactly what I need to get an array to pass in with the PQexec() call. Is there a library that includes that functionality? Or do I need to cobble it together from the various pieces of code? Thank you for any help. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can we still trust plperl?
On Thu, Mar 11, 2010 at 09:31:46AM -0500, Andrew Dunstan wrote: Last night my attention was drawn to this: http://search.cpan.org/~timb/PostgreSQL-PLPerl-Injector-1.002/lib/PostgreSQL/PLPerl/Injector.pm I'm wondering if we can reasonably continue to support plperl as a trusted language, or at least redefine what trusted actually means. Does it mean can't do untrusted operations or does it mean can't do untrusted operations unless the DBA and/or possibly the user decide to subvert the mechanism? To me, the latter doesn't sound much like it's worth having. Is it? There are a few places where plperl has an advantage over plpgsql, e.g. code that uses lots of regexes and use of variable to access records dynamically, so losing it might be a bit of a pain. Of course, there would still be plperlu, with the downside that the functions have to be installed by a superuser. One of my PGExperts colleagues told me his reaction was Well, I might just as well use plperlu, and that pretty well sums up my reaction. Of course, another thing is that it might spur either building of some of the missing stuff into plpgsql, or addition of another language that is both safe and which supports them, like say PL/JavaScript. Thoughts? cheers andrew The DBA can do what ever he wants to do to subvert the system up to installing hacked versions of any other trusted language so I do not see much of a distinction. We already provide many other foot-guns that may be used by the DBA. pl/perl is very useful as a trusted language but I am certainly for fleshing out the features in other pl-s. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbouncer + psql 9.0a4
Hi Garick, Add an ignore_startup_parameters to your pgbouncer.ini file with application_name. Cheers, Ken On Fri, Feb 26, 2010 at 11:26:23AM -0500, Garick Hamlin wrote: I was just trying out 9.0a4 and I noticed. That I can't connect to pgbouncer with psql from 9.0a4 as a result of the set application_name changes to psql. I imagine this is really mostly pgbouncer's problem, but I couldn't figure out how if it was possible to make psql not set application_name from the psql man page. If not, at least from a users point of view psql 9.0a4 looks incompatible with existing versions of pgbouncer in at least in this configuration that works with an older versions of psql. ... or, is that a crazy way to look at it? Garick Details are below.. My setup (Solaris 10 / sparc ) [ psql 9.0a4 ] - [ pgbouncer 1.3.1 ] - [ postgres 9.0a4 ] $ /usr/local/var/postgres/9.0a4/bin/psql -h 127.0.0.1 -p 6543 psql: ERROR: Unknown startup parameter ..pgbouncer's log.. 2010-02-25 21:56:29.721 6979 WARNING C-71d48: (nodb)/(nouser)@127.0.0.1:54000 unsupported startup parameter: application_name=psql 2010-02-25 21:56:29.721 6979 LOG C-71d48: (nodb)/(nouser)@127.0.0.1:54000 closing because: Unknown startup parameter (age=0) 2010-02-25 21:56:29.722 6979 LOG C-71d48: (nodb)/(nouser)@127.0.0.1:54000 Pooler Error: Unknown startup parameter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On Thu, Feb 18, 2010 at 08:31:05PM -0600, David Christensen wrote: On Feb 18, 2010, at 2:19 PM, Pierre C wrote: What about catching the error in the application and INSERT'ing into the current preprepare.relation table? The aim would be to do that in dev or in pre-prod environments, then copy the table content in production. Yep, but it's a bit awkward and time-consuming, and not quite suited to ORM-generated requests since you got to generate all the plan names, when the SQL query itself would be the most convenient unique identifier... A cool hack would be something like that : pg_execute( SELECT ..., arguments... ) By inserting a hook which calls a user-specified function on non-existing plan instead of raising an error, this could work. However, this wouldn't work as-is since the plan name must be = NAMEDATALEN, but you get the idea ;) How about the SHA1 hash of the query? Hey, it works for git... :-) Regards, David -- David Christensen End Point Corporation da...@endpoint.com Hi David, Not to beat out own drum, but we already include a hashing function that can be used for this purpose and is much faster than SHA-1. We would want to use all 64-bits for this use instead of just the 32-bits we currently use for the internal DB hashing. Here is an article comparing the Jenkins' Hash (the one we use) and SHA-1: http://home.comcast.net/~bretm/hash/ Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Union test case broken in make check?
Without an order by, the order is not defined. The answers are the same but the test gives a false failure because of the lack of ordering. Regards, Ken On Thu, Nov 19, 2009 at 07:54:30PM -0500, Emmanuel Cecchet wrote: Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Emmanuel Cecchet wrote: Is it just me or the union test case fails in CVS head? The buildfarm is pretty much all green: http://www.pgbuildfarm.org/cgi-bin/show_status.pl So it looks like it's you :-) When in doubt, try make distclean and a full rebuild before assuming you've got a problem worth tracking down ... Well, I did: 1. make distclean 2. configure with CFLAGS=-O0 --enable-cassert --enable-debug --without-perl --without-python --without-tcl --without-openssl 3. make (everything normal) 4. make check And it still fails for me. I am attaching my regression.diffs if someone thinks it is worth tracking down ... Emmanuel *** /home/manu/workspace/PG-HEAD/src/test/regress/expected/union.out 2009-02-09 16:18:28.0 -0500 --- /home/manu/workspace/PG-HEAD/src/test/regress/results/union.out 2009-11-19 19:37:32.0 -0500 *** *** 198,208 WHERE f1 BETWEEN 0 AND 100; five --- --1004.3 - -34.84 - -1.2345678901234e-200 0 123456 (5 rows) SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL --- 198,208 WHERE f1 BETWEEN 0 AND 100; five --- 0 123456 + -34.84 + -1.2345678901234e-200 +-1004.3 (5 rows) SELECT CAST(f1 AS char(4)) AS three FROM VARCHAR_TBL *** *** 263,278 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl; q2 -- - 4567890123456789 123 (2 rows) SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl; q2 -- 4567890123456789 4567890123456789 - 123 (3 rows) SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1; --- 263,278 SELECT q2 FROM int8_tbl INTERSECT SELECT q1 FROM int8_tbl; q2 -- 123 + 4567890123456789 (2 rows) SELECT q2 FROM int8_tbl INTERSECT ALL SELECT q1 FROM int8_tbl; q2 -- + 123 4567890123456789 4567890123456789 (3 rows) SELECT q2 FROM int8_tbl EXCEPT SELECT q1 FROM int8_tbl ORDER BY 1; *** *** 305,320 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl; q1 -- - 4567890123456789 123 (2 rows) SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl; q1 -- 4567890123456789 4567890123456789 - 123 (3 rows) -- --- 305,320 SELECT q1 FROM int8_tbl EXCEPT ALL SELECT q2 FROM int8_tbl; q1 -- 123 + 4567890123456789 (2 rows) SELECT q1 FROM int8_tbl EXCEPT ALL SELECT DISTINCT q2 FROM int8_tbl; q1 -- + 123 4567890123456789 4567890123456789 (3 rows) -- *** *** 341,348 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl; q1 --- - 4567890123456789 123 456 4567890123456789 123 --- 341,348 SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl; q1 --- 123 + 4567890123456789 456 4567890123456789 123 *** *** 353,367 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl))); q1 -- - 4567890123456789 123 (2 rows) (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl; q1 --- - 4567890123456789 123 456 4567890123456789 123 --- 353,367 SELECT q1 FROM int8_tbl INTERSECT (((SELECT q2 FROM int8_tbl UNION ALL SELECT q2 FROM int8_tbl))); q1 -- 123 + 4567890123456789 (2 rows) (((SELECT q1 FROM int8_tbl INTERSECT SELECT q2 FROM int8_tbl))) UNION ALL SELECT q2 FROM int8_tbl; q1 --- 123 + 4567890123456789
Re: [HACKERS] [PATCH] tsearch parser inefficiency if text includes urls or emails - new version
On Sun, Nov 08, 2009 at 05:00:53PM +0100, Andres Freund wrote: On Sunday 01 November 2009 16:19:43 Andres Freund wrote: While playing around/evaluating tsearch I notices that to_tsvector is obscenely slow for some files. After some profiling I found that this is due using a seperate TSParser in p_ishost/p_isURLPath in wparser_def.c. If a multibyte encoding is in use TParserInit copies the whole remaining input and converts it to wchar_t or pg_wchar - for every email or protocol prefixed url in the the document. Which obviously is bad. I solved the issue by having a seperate TParserCopyInit/TParserCopyClose which reuses the the already converted strings of the original TParser - only at different offsets. Another approach would be to get rid of the separate parser invocations - requiring a bunch of additional states. This seemed more complex to me, so I wanted to get some feedback first. Without patch: andres=# SELECT to_tsvector('english', document) FROM document WHERE filename = '/usr/share/doc/libdrm-nouveau1/changelog'; ?? ? ... (1 row) Time: 5835.676 ms With patch: andres=# SELECT to_tsvector('english', document) FROM document WHERE filename = '/usr/share/doc/libdrm-nouveau1/changelog'; ?? ? ... (1 row) Time: 395.341 ms Ill cleanup the patch if it seems like a sensible solution... As nobody commented here is a corrected (stupid thinko) and cleaned up version. Anyone cares to comment whether I am the only one thinking this is an issue? Andres +1 As a user of tsearch, I can certainly appreciate to speed-up in parsing -- more CPU for everyone else. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Wed, Oct 28, 2009 at 03:31:17PM -0400, Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: I had never checked the docs for hash functions, but I had assumed, that internal functions are prefixed by pg_ and anything else is public, free to use functionality. Sure, it's free to use. It's not free to assume that we promise never to change it. Changing hash functions also makes in-place upgrades a lot harder, as they can't be done incrementally anymore for tables which use hash indexes. Hash indexes are so far from being production-grade that this argument is not significant. regards, tom lane In addition that change from 8.3 - 8.4 to store only the hash and not the value in the index means that a reindex would be required in any event. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: String key space for advisory locks
On Mon, Oct 26, 2009 at 06:35:13PM -0700, Christophe Pettus wrote: On Oct 26, 2009, at 5:24 PM, Itagaki Takahiro wrote: Hmmm, hashtext() returns int32. , Can you reduce the collision issue if we had hashtext64()? That would certainly reduce the chance of a collison considerably, assuming the right algorithm. -- -- Christophe Pettus x...@thebuild.com The current hash function can already support generating a 64-bit hash value by using both the b and c values. The function is called hashlittle2 and has this comment in the original Bob Jenkins 2006 code: /* * hashlittle2: return 2 32-bit hash values * * This is identical to hashlittle(), except it returns two 32-bit hash * values instead of just one. This is good enough for hash table * lookup with 2^^64 buckets, or if you want a second hash if you're not * happy with the first, or if you want a probably-unique 64-bit ID for * the key. *pc is better mixed than *pb, so use *pc first. If you want * a 64-bit value do something like *pc + (((uint64_t)*pb)32). */ This should be a simple change. It would be worth running it by the developer community to figure out how to add this functionality in a way that will make 64-bit hashes available easily to other code in the DB, perhaps even using them in very large hash indexes. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On Thu, Oct 01, 2009 at 03:54:37PM +0200, Magnus Hagander wrote: On Thu, Oct 1, 2009 at 15:26, Albe Laurenz laurenz.a...@wien.gv.at wrote: Andrew Dunstan wrote: So here's the patch. I don't think there is documentation required; correct me if I am wrong. How will people know how to use it, or that it's even there without at least a note in the docs somewhere? I'd prefer to have an example as a contrib module, as well as docs. Quite apart from anything else, how the heck would we test it without such a thing? I was not sure because no other hooks were documented anywhere else than in the code. I could add a paragraph in the auth-password section of client-auth.sgml. Or is there a better place? I could easily write a simple contrib that adds a check for username = password if there is interest. I think it's better to have an actually *useful* contrib module for it, if there is one. Meaning perhaps something that links to that cracklib thing mentioned upthread. +1 for a sample module that will allow cracklib to drop in. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On Thu, Oct 01, 2009 at 01:07:04PM -0400, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Thu, Oct 1, 2009 at 17:24, Tom Lane t...@sss.pgh.pa.us wrote: I agree with the subsequent comments suggesting a sample module that actually does something useful --- although if it's going to link to external code like cracklib, it probably is going to have to be on pgfoundry not in contrib. Why is that? we have plenty of other things in contrib that rely on external code, for example the uuid, xml or ssl stuff. Well, maybe. I was concerned about availability, portability, license compatibility, and so on. The bar's a lot lower for pgfoundry projects on all those points ... regards, tom lane It has been a while since I last used cracklib, but the interface is generic enough that the sample we would ship in contrib could be trivially adapted to use cracklib. The version we include could just have the (username = password) check or something similar. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another try at reducing repeated detoast work for PostGIS
On Sat, Aug 22, 2009 at 12:39:41PM -0400, Tom Lane wrote: Mark Cave-Ayland mark.cave-ayl...@siriusit.co.uk writes: So in conclusion, I think that patch looks good and that the extra time I was seeing was due to RECHECK being applied to the operator, and not the time being spent within the index scan itself. Thanks, I appreciate the followup. I plan to go ahead and apply the patch to HEAD --- it doesn't conflict with Heikki's pending patch AFAICS, and no one has suggested an alternative that seems likely to get implemented soon. I am a bit tempted to apply it to 8.4 as well; otherwise the PostGIS people are likely to start cluttering their code with this add-a-dummy-function workaround, which would be unproductive in the long run. Comments? regards, tom lane +1 for applying it to 8.4 as well. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing geometic calculation
On Fri, Aug 07, 2009 at 11:29:47PM +1000, Paul Matthews wrote: Let us consider the ordering of real numbers in postgres. As you can see from the results below it has clearly returned the correct results. select( 1. = 1.0002 ); = f select( 1. 1.0002 ); = t select( 1. 1.0002 ); = f Imagine the situation however where postgres returned the following values to simple numerical inequalities. In such a case postgresql would be clearly defective and unfit for purpose. select( 1.00 = 1.01 ); = f select( 1.00 1.01 ); = f select( 1.00 1.01 ); = f If such a situation is unacceptable for the real number line, then in what way can it be acceptable for the real number plain. select( point(1.0,0) point(1.1,0) ); = f select( point(1.0,0) point(1.1,0) ); = f select( point(1.0,0) point(1.1,0) ); = f select( point(1.0,0) - point(1.1,0) ); = 1.000655e-05 We have two points with a finite separation in the x axis. Postgres thinks they are not the same point, nor one left of the other, nor to the right. This is clearly a both a physical and logical impossibility. The cause of this is the ill conceived FP* macros. They seem represent a solution to a problem that simply does not exist. The first effect of these macros is to reduce the accuracy of all geometric comparisons from double precision, to less than single precision. The following program correctly prints the correct answer. Whereas as we have seen above, postgres falls in a heap. int main() { float f = 1.0; float g = 1.1; if( f==g ) { printf( f=g\n ); } if( fg ) { printf( fg\n ); } if( fg ) { printf( fg\n ); } return 0; } The second effect is to take operations that would of worked correctly even in single precision, and to cause them to produce nonsensical result. For example points that can be both inside and outside a polygon at the same time. Simple analysis of the postgres source code shows that the only places where the FPzero, FPeq, FPne, FPlt, FPle FPgt and FPge macros are defined and used are in the src/backend/utils/adt/geo_ops.c and src/include/utils/geo_decls.h files. What is the justification for these macros? Why do they only affect geometric calculations, and not all numeric calculations? Why should these macro's not be abandoned? Does anyone any any objections to me: 1) removing these macros, or at least disabling EPSILON by default. 2) adding in the obviously missing operators (ie: box @ point) Hi Paul, Floating point calculations always have a bit of inaccuracy because at the very minimum some values do not have exact floating point representations and the results can be implimentation dependent. I think disabling EPLSILON by default is a bad idea. In my work with numeric methods, we actually calculated EPSILON for the system we where using at runtime. Maybe postgresql could do the same on startup. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing geometic calculation
On Fri, Aug 07, 2009 at 09:12:34AM -0500, Kenneth Marshall wrote: On Fri, Aug 07, 2009 at 11:29:47PM +1000, Paul Matthews wrote: Let us consider the ordering of real numbers in postgres. As you can see from the results below it has clearly returned the correct results. select( 1. = 1.0002 ); = f select( 1. 1.0002 ); = t select( 1. 1.0002 ); = f Imagine the situation however where postgres returned the following values to simple numerical inequalities. In such a case postgresql would be clearly defective and unfit for purpose. select( 1.00 = 1.01 ); = f select( 1.00 1.01 ); = f select( 1.00 1.01 ); = f If such a situation is unacceptable for the real number line, then in what way can it be acceptable for the real number plain. select( point(1.0,0) point(1.1,0) ); = f select( point(1.0,0) point(1.1,0) ); = f select( point(1.0,0) point(1.1,0) ); = f select( point(1.0,0) - point(1.1,0) ); = 1.000655e-05 We have two points with a finite separation in the x axis. Postgres thinks they are not the same point, nor one left of the other, nor to the right. This is clearly a both a physical and logical impossibility. Actually, quantum theory will allow this to happen. :) The cause of this is the ill conceived FP* macros. They seem represent a solution to a problem that simply does not exist. The first effect of these macros is to reduce the accuracy of all geometric comparisons from double precision, to less than single precision. The following program correctly prints the correct answer. Whereas as we have seen above, postgres falls in a heap. int main() { float f = 1.0; float g = 1.1; if( f==g ) { printf( f=g\n ); } if( fg ) { printf( fg\n ); } if( fg ) { printf( fg\n ); } return 0; } The second effect is to take operations that would of worked correctly even in single precision, and to cause them to produce nonsensical result. For example points that can be both inside and outside a polygon at the same time. Simple analysis of the postgres source code shows that the only places where the FPzero, FPeq, FPne, FPlt, FPle FPgt and FPge macros are defined and used are in the src/backend/utils/adt/geo_ops.c and src/include/utils/geo_decls.h files. What is the justification for these macros? Why do they only affect geometric calculations, and not all numeric calculations? Why should these macro's not be abandoned? Does anyone any any objections to me: 1) removing these macros, or at least disabling EPSILON by default. 2) adding in the obviously missing operators (ie: box @ point) Hi Paul, Floating point calculations always have a bit of inaccuracy because at the very minimum some values do not have exact floating point representations and the results can be implimentation dependent. I think disabling EPLSILON by default is a bad idea. In my work with numeric methods, we actually calculated EPSILON for the system we where using at runtime. Maybe postgresql could do the same on startup. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing geometic calculation
On Fri, Aug 07, 2009 at 04:16:56PM +0100, Sam Mason wrote: On Fri, Aug 07, 2009 at 09:49:41AM -0500, Kenneth Marshall wrote: On Fri, Aug 07, 2009 at 09:12:34AM -0500, Kenneth Marshall wrote: On Fri, Aug 07, 2009 at 11:29:47PM +1000, Paul Matthews wrote: We have two points with a finite separation in the x axis. Postgres thinks they are not the same point, nor one left of the other, nor to the right. This is clearly a both a physical and logical impossibility. Actually, quantum theory will allow this to happen. :) I'm not a physicist, but I don't think it does. QM defines the probability distribution within which the particle will be found. Once you've actually observed both points you will know their physical relation--you'll also have given them energy them so next time you look they'll be somewhere else, but the act of observation causes the above distribution to be collapsed. This sidesteps the whole issue of the fact that points in PG are defined in euclidean space and do indeed have a definite location and can be compared at all times---they don't arbitrarily go jumping off millions of miles away or being annihilated by their anti-particle just because it's possible. I would agree with Paul that EPSILON is a hack and probably should be removed. However it will cause user visible changes so it's not quite as simple as that to change. I don't have anything really very useful to add apart from saying that maybe the default should be the other way around? -- Sam http://samason.me.uk/ It was definitely a tongue-in-cheek response since QT is not really a topic for this mailing list. However, removing EPSILON completely is not a good idea for the exact reason it was included originally. Floating point numbers are approximations and since their precision is neccessarily limited this fact must be included in any calculation using them. I do agree that hard-coding it to a value that does not reflect the reality of the calculation is not good. It would be better to have a GUC to allow it to be specified than to have it be zero. Maybe one setting would allow the system to calculate the appropriate value for EPSILON based on the hardward. One way to address the duplicity issue is to define for yourself what it means if a point is both inside and outside, i.e. in this case the point is always defined to be inside or the point is always defined to be outside. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Review remove {join,from}_collapse_limit, add enable_join_ordering
On Thu, Jul 16, 2009 at 04:27:39PM +0100, Greg Stark wrote: On Thu, Jul 16, 2009 at 4:16 PM, Tom Lanet...@sss.pgh.pa.us wrote: However, I do observe that this seems a sufficient counterexample against the theory that we can just remove the collapse limits and let GEQO save us on very complex queries. ?On my machine, the example query takes about 22 seconds to plan using CVS HEAD w/ all default settings. If I set both collapse_limit variables to very high values (I used 999), it takes ... um ... not sure; I gave up waiting after half an hour. What's the point of GEQO if it doesn't guarantee to produce the optimal plana and *also* doesn't guarantee to produce some plan, any plan, within some reasonable amount of time? Either we need to fix that or else I don't see what it's buying us over our regular planner which also might not produce a plan within a reasonable amount of time but at least if it does it'll be the right plan. I do agree that we should have an actually time limit cap for GEQO that would have it return the best plan so far at that time. Then you can at least bound your planning time. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review remove {join,from}_collapse_limit, add enable_join_ordering
On Thu, Jul 16, 2009 at 06:49:08PM +0200, Andres Freund wrote: On Thursday 16 July 2009 17:59:58 Tom Lane wrote: Andres Freund and...@anarazel.de writes: The default settings currently make it relatively hard to trigger geqo at all. Yes, and that was intentional. One of the implications of what we're discussing here is that geqo would get used a lot more for typical complex queries (if there is any such thing as a typical one). So it's fully to be expected that the fallout would be pressure to improve geqo in various ways. Given that we are at the start of the development cycle, that prospect doesn't scare me --- there's plenty of time to fix whatever needs fixing. However, I am leaning to the feeling that I don't want to be putting people in a position where they have no alternative but to use geqo. So adjusting rather than removing the collapse limits is seeming like a good idea. Hm. I see a, a bit more fundamental problem with geqo: I tried several queries, and I found not a single one, where the whole genetical process did any significant improvments to the 'worth'. It seems that always the best variant out of the pool is either the path choosen in the end, or at least the cost difference is _really_ low. Andres Hi Andres, From some of my reading of the literature on join order optimization via random sampling, such as what would establish the initial GEQO pool, there is a very good possibility of having a pretty good plan in the first pool, especially for our larger initial pool sizes of 100-1000. And in fact, the final plan has a good chance of being of approximately the same cost as a member of the initial pool. Uniform sampling alone can give you a close to optimum plan 80% of the time with an initial sample size of 100. And using biased sampling raises that to 99% or better. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Deferrable unique constraints
On Tue, Jul 14, 2009 at 09:56:48AM -0700, Jeff Davis wrote: On Sun, 2009-07-12 at 14:14 +0100, Dean Rasheed wrote: Here is an updated version of this patch which should apply to HEAD, with updated docs, regression tests, pg_dump and psql \d. It works well for small numbers of temporary uniqueness violations, and at least as well (in fact about twice as fast) as deferred FK checks for large numbers of deferred checks. I took a brief look at this. You're extending the index AM, and that might not be necessary. It might be fine, but usually there is a lot of discussion around the changing of important APIs, so it might be worth looking at alternatives. With the patch I'm working on for generalized index constraints, there would be no need to extend the index AM. However, I don't expect my mechanism to replace the existing unique btree constraints, because I would expect the existing unique constraints to be faster (I haven't tested yet, though). Perhaps we could instead use the TRY/CATCH mechanism. It's generally difficult to figure out from the code exactly what happened, but in this case we have the error code ERRCODE_UNIQUE_VIOLATION. So, just check for that error code rather than passing back a boolean. You might want to change the signature of _bt_check_unique() so that it doesn't have to raise the error inside, and you can raise the error from _bt_doinsert(). The only problem there is telling the btree AM whether or not to do the insert or not (i.e. fake versus real insert). Perhaps you can just do that with careful use of a global variable? Sure, all of this is a little ugly, but we've already acknowledged that there is some ugliness around the existing unique constraint and the btree code that supports it (for one, the btree AM accesses the heap). I am looking at adding unique support to hash indexes for 8.5 and they will definitely need to visit the heap. Regards, Ken I propose trying to improve performance and scalability for large numbers of deferred checks in a separate patch. Would it be possible to just check how long the list of potential conflicts is growing, and if it gets to big, just replace them all with a bulk check event? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Deferrable unique constraints
On Tue, Jul 14, 2009 at 12:13:33PM -0700, Jeff Davis wrote: On Tue, 2009-07-14 at 13:29 -0500, Kenneth Marshall wrote: I am looking at adding unique support to hash indexes for 8.5 and they will definitely need to visit the heap. Have you seen this patch? http://archives.postgresql.org/message-id/1246840119.19547.126.ca...@jdavis This patch will support unique constraints for hash indexes as well. There may still be a use-case for specialized hash index unique constraints, similar to btree, but please follow the work to make sure that no work is wasted. Also, I don't see a problem with using the same hacks in the hash index code as is used in the btree index code. If you see a better way, or if you think index AM changes would be useful to you as well, you should probably open that discussion. I was trying to provide an alternative to an index AM API change, because I thought there might be some resistance to that. However, if there are multiple index AMs that can make use of it, there is a stronger case for an API change. Regards, Jeff Davis I will take a look at that patch. My thought was to use the same process as the btree support for unique indexes since it has been well tested and optimized. Thanks, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] *_collapse_limit, geqo_threshold
On Sat, Jul 11, 2009 at 12:23:59PM -0400, Tom Lane wrote: Andres Freund and...@anarazel.de writes: The only question I have is, whether random_r or similar is available on enough platforms... Has anybody an idea about this? On most unixoid system one could just wrap erand48() if random_r is not available. Windows? random_r() isn't in the Single Unix Spec AFAICS, and I also don't find it on HPUX 10.20, so I'd vote against depending on it. What I do see in SUS is initstate() and setstate() which could be used to control the random() function: http://www.opengroup.org/onlinepubs/007908799/xsh/initstate.html It would also work to leave random() for use by the core code and have GEQO depend on something from the drand48() family: http://www.opengroup.org/onlinepubs/007908799/xsh/drand48.html Probably drand48() is less random than random(), but for the limited purposes of GEQO I doubt we care very much. Ugh, tracking down problems caused a poor random number generator is a difficult. Poor randomness often causes weird results from algorithms that were designed around the assumption of a random number. So far as I can find in a quick google search, neither of these families of functions exist on Windows :-(. So I think maybe the best approach is the second one --- we could implement a port/ module that provides a version of whichever drand48 function we need. I think that having a port/module for a random number generator is a good idea. There are a number of good, fast random number generators to choose from. Cheers, Ken On reflection I think the best user API is probably a geqo_seed GUC in the range 0 to 1, and have GEQO always reset its seed to that value at start of a planning cycle. This ensures plan stability, and if you need to experiment with alternative plans you can change to different seed values. The no reset behavior doesn't seem to have much real-world usefulness, because even if you chance to get a good plan, you have no way to reproduce it... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] *_collapse_limit, geqo_threshold
Hi, When I was first familiarizing myself with PostgreSQL, I took a walk through its documentation on GECO and similar processes in the literature. One big advantage of GECO is that you can trade off planning time for plan optimization. I do agree that it should be updated, but there were definite cases in the literature where the planning time for exhaustive searches could take orders of magnitude more time to execute than the differences in the execution times of the differing plans. My two cents, Ken On Wed, Jul 08, 2009 at 09:43:12AM -0400, Noah Misch wrote: On Tue, Jul 07, 2009 at 09:31:14AM -0500, Kevin Grittner wrote: I don't remember any clear resolution to the wild variations in plan time mentioned here: http://archives.postgresql.org/pgsql-hackers/2009-06/msg00743.php I think it would be prudent to try to figure out why small changes in the query caused the large changes in the plan times Andres was seeing. Has anyone else ever seen such behavior? Can we get examples? (It should be enough to get the statistics and the schema, since this is about planning time, not run time.) With joins between statistically indistinguishable columns, I see planning times change by a factor of ~4 for each join added or removed (postgres 8.3). Varying join_collapse_limit in the neighborhood of the actual number of joins has a similar effect. See attachment with annotated timings. The example uses a single table joined to itself, but using distinct tables with identical contents yields the same figures. The expontential factor seems smaller for real queries. I have a query of sixteen joins that takes 71s to plan deterministically; it looks like this: SELECT 1 FROM fact JOIN dim0 ... JOIN dim6 JOIN t t0 ON fact.key = t.key AND t.x = MCV0 LEFT JOIN t t1 ON fact.key = t.key AND t.x = MCV1 JOIN t t2 ON fact.key = t.key AND t.x = MCV2 LEFT JOIN t t3 ON fact.key = t.key AND t.x = NON-MCV0 LEFT JOIN t t4 ON fact.key = t.key AND t.x = NON-MCV1 LEFT JOIN t t5 ON fact.key = t.key AND t.x = NON-MCV2 LEFT JOIN t t6 ON fact.key = t.key AND t.x = NON-MCV3 LEFT JOIN t t7 ON fact.key = t.key AND t.x = NON-MCV4 For the real query, removing one join drops plan time to 26s, and removing two drops the time to 11s. I don't have a good theory for the multiplier changing from 4 for the trivial demonstration to ~2.5 for this real query. Re-enabling geqo drops plan time to .5s. These tests used default_statistics_target = 1000, but dropping that to 100 does not change anything dramatically. I guess the question is whether there is anyone who has had a contrary experience. (There must have been some benchmarks to justify adding geqo at some point?) I have queries with a few more joins (19-21), and I cancelled attempts to plan them deterministically after 600+ seconds and 10+ GiB of memory usage. Even with geqo_effort = 10, they plan within 5-15s with good results. All that being said, I've never encountered a situation where a value other than 1 or inf for *_collapse_limit appeared optimal. nm SET geqo = off; SET join_collapse_limit = 100; CREATE TEMP TABLE t AS SELECT * FROM generate_series(1, 1000) f(n); ANALYZE t; --- Vary join count -- 242.4s EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07 NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10 NATURAL JOIN t t11 NATURAL JOIN t t12; -- 31.2s EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07 NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10 NATURAL JOIN t t11; -- 8.1s EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07 NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10; -- 2.0s EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07 NATURAL JOIN t t08 NATURAL JOIN t t09; -- 0.5s EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07 NATURAL JOIN t t08; --- Vary join_collapse_limit -- 8.1s SET join_collapse_limit = 100; EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07 NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL JOIN t t10; -- 8.0s SET join_collapse_limit = 11; EXPLAIN SELECT 1 FROM t t00 NATURAL JOIN t t01 NATURAL JOIN t t02 NATURAL JOIN t t03 NATURAL JOIN t t04 NATURAL JOIN t t05 NATURAL JOIN t t06 NATURAL JOIN t t07 NATURAL JOIN t t08 NATURAL JOIN t t09 NATURAL
Re: [HACKERS] *_collapse_limit, geqo_threshold
On Wed, Jul 08, 2009 at 04:13:11PM -0500, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: It occurs to me that one way to make GEQO less scary would be to take out the nondeterminism by resetting its random number generator for each query. You might get a good plan or an awful one, but at least it'd be the same one each time. DBAs like predictability. +1 The biggest reason that I've tended to avoid geqo is that I would never know when it might do something really stupid with a query one time out of some large number, leading to mysterious complaints which could eat a lot of time. For a moment it seemed logical to suggest a session GUC for the seed, so if you got a bad plan you could keep rolling the dice until you got one you liked; but my right-brain kept sending shivers down my spine to suggest just how uncomfortable it was with that idea -Kevin +1 I like the idea of a session GUC for the random number seed. If we can come up with a way to prune the search space more aggressively, GECO (or GECO2) will be much less prone to generating a bad plan. I also think that a session variable would make it easier to test GECO* by removing the nondeteminism. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] *_collapse_limit, geqo_threshold
On Wed, Jul 08, 2009 at 05:46:02PM -0400, Tom Lane wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: For a moment it seemed logical to suggest a session GUC for the seed, so if you got a bad plan you could keep rolling the dice until you got one you liked; but my right-brain kept sending shivers down my spine to suggest just how uncomfortable it was with that idea If memory serves, we actually had exactly that at some point. But I think the reason it got taken out was that it interfered with the behavior of the random() function for everything else. We'd have to give GEQO its own private random number generator. regards, tom lane A separate random number generator for GECO make a lot of sense. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Yes, you are right. I thought that they were absolute function counts. The data makes more sense now. Regards, Ken On Thu, Jun 18, 2009 at 07:03:34PM -0500, Kevin Grittner wrote: Kenneth Marshall k...@rice.edu wrote: What is not clear from Stefen's function listing is how the 8.4 server could issue 33% more XLogInsert() and CopyReadLine() calls than the 8.3.7 server using the same input file. I thought those were profiling numbers -- the number of times a timer checked what was executing and found it in that method. Which suggests that those two methods are probably slower now than in 8.3.7, at least in some environments. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Fri, Jun 19, 2009 at 07:49:31PM +0200, Stefan Kaltenbrunner wrote: Tom Lane wrote: Just eyeing the code ... another thing we changed since 8.3 is to enable posix_fadvise() calls for WAL. Any of the complaints want to try diking out this bit of code (near line 2580 in src/backend/access/transam/xlog.c)? #if defined(USE_POSIX_FADVISE) defined(POSIX_FADV_DONTNEED) if (!XLogArchivingActive() (get_sync_bit(sync_method) PG_O_DIRECT) == 0) (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED); #endif ok after a bit of bisecting I'm happy to announce the winner of the contest: http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php this patch causes a 25-30% performance regression for WAL logged copy, however in the WAL bypass case (maybe that was what got tested?) it results in a 20% performance increase. the raw numbers using the upthread posted minimal postgresql.conf are: post patch/wal logged: 4min10s/4min19/4min12 post patch/wal bypass: 1m55s/1m58s/2m00 prepatch/wal logged: 2m55s/3min00/2m59 prepatch/wal bypass: 2m22s/2m18s/2m20s Stefan Great! Maybe just increasing the size of the BULKWRITE ring, possibly as a function of the shared_memory is all that is needed. 256kB is the currently coded ring_size in storage/buffer/freelist.c Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Thu, Jun 18, 2009 at 05:20:08PM -0400, Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: Any objections if I add: http://archives.postgresql.org/pgsql-performance/2009-06/msg00215.php to the (currently empty) list of open items for 8.4? I am unable to duplicate any slowdown on this test case. AFAICT 8.4 and 8.3 branch tip are about the same speed; if anything 8.4 is faster. Testing on x86_64 Fedora 10 ... regards, tom lane What is not clear from Stefen's function listing is how the 8.4 server could issue 33% more XLogInsert() and CopyReadLine() calls than the 8.3.7 server using the same input file. That would account for the slow down but now why it is happening. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dot to be considered as a word delimiter?
On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote: Sushant Sinha sushant...@gmail.com wrote: I think that dot should be considered by as a word delimiter because when dot is not followed by a space, most of the time it is an error in typing. Beside they are not many valid english words that have dot in between. It's not treating it as an English word, but as a host name. select ts_debug('english', 'Mr.J.Sai Deepak'); ts_debug --- (host,Host,Mr.J.Sai,{simple},simple,{mr.j.sai}) (blank,Space symbols, ,{},,) (asciiword,Word, all ASCII,Deepak,{english_stem},english_stem,{deepak}) (3 rows) You could run it through a dictionary which would deal with host tokens differently. Just be aware of what you'll be doing to www.google.com if you run into it. I hope this helps. -Kevin In our uses for full text indexing, it is much more important to be able to find host name and URLs than to find mistyped names. My two cents. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dot to be considered as a word delimiter?
On Tue, Jun 02, 2009 at 04:40:51PM -0400, Sushant Sinha wrote: Fair enough. I agree that there is a valid need for returning such tokens as a host. But I think there is definitely a need to break it down into individual words. This will help in cases when a document is missing a space in between the words. So what we can do is: return the entire compound word as Host and also break it down into individual words. I can put up a patch for this if you guys agree. Returning multiple tokens for the same word is a feature of the text search parser as explained in the documentation here: http://www.postgresql.org/docs/8.3/static/textsearch-parsers.html Thanks, Sushant. +1 Ken On Tue, Jun 2, 2009 at 8:47 AM, Kenneth Marshall k...@rice.edu wrote: On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote: Sushant Sinha sushant...@gmail.com wrote: I think that dot should be considered by as a word delimiter because when dot is not followed by a space, most of the time it is an error in typing. Beside they are not many valid english words that have dot in between. It's not treating it as an English word, but as a host name. select ts_debug('english', 'Mr.J.Sai Deepak'); ts_debug --- (host,Host,Mr.J.Sai,{simple},simple,{mr.j.sai}) (blank,Space symbols, ,{},,) (asciiword,Word, all ASCII,Deepak,{english_stem},english_stem,{deepak}) (3 rows) You could run it through a dictionary which would deal with host tokens differently. Just be aware of what you'll be doing to www.google.com if you run into it. I hope this helps. -Kevin In our uses for full text indexing, it is much more important to be able to find host name and URLs than to find mistyped names. My two cents. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)
On Sat, May 23, 2009 at 02:52:49PM -0400, Zdenek Kotala wrote: I forgot to fix contrib. Updated patch attached. Zdenek Zdenek Kotala pe v p?? 22. 05. 2009 v 16:23 -0400: Attached patch cleanups hash index headers to allow compile hasham for 8.3 version. It helps to improve pg_migrator with capability to migrate database with hash index without reindexing. I discussed this patch year ago with Alvaro when we tried to cleanup include bloating problem. It should reduce also number of including. The main point is that hash functions for datatypes are now in related data files in utils/adt directory. hash_any() and hash_uint32 it now in utils/hashfunc.c. It would be nice to have this in 8.4 because it allows to test index migration functionality. Thanks Zdenek How does that work with the updated hash functions without a reindex? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a few crazy ideas about hash joins
On Fri, Apr 03, 2009 at 08:03:33AM -0400, Robert Haas wrote: On Fri, Apr 3, 2009 at 1:48 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Robert Haas wrote: While investigating some performance problems recently I've had cause to think about the way PostgreSQL uses hash joins. ?So here are a few thoughts. ?Some of these have been brought up before. 1. When the hash is not expected to spill to disk, it preserves the pathkeys of the outer side of the join. ?If the optimizer were allowed to assume that, it could produce significantly more efficient query plans in some cases. ?The problem is what to do if we start executing the query and find out that we have more stuff to hash than we expect, such that we need multiple batches? ?Now the results won't be sorted. I think we could handle this as follows: Don't count on the hash join to preserve pathkeys unless it helps, and only rely on it when it seems as if the hash table will still fit even if it turns out to be, say, three times as big as expected. ?But if you are counting on the hash join to preserve pathkeys, then pass that information to the executor. ?When the executor is asked to perform a hash join, it will first hash the inner side of the relation. ?At that point, we know whether we've succesfully gotten everything into a single batch, or not. ?If we have, perform the join normally. ?If the worst has happened and we've gone multi-batch, then perform the join and sort the output before returning it. ?The performance will suck, but at least you'll get the right answer. Previous in-passing reference to this idea here: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php Hmm, instead of a sorting the output if the worst happens, a final merge step as in a merge sort would be enough. Yeah - good point. 2. Consider building the hash table lazily. ?I often see query planner pick a hash join over a nested inner indexscan because it thinks that it'll save enough time making hash probes rather than index probes to justify the time spent building the hash table up front. ?But sometimes the relation that's being hashed has a couple thousand rows, only a tiny fraction of which will ever be retrieved from the hash table. ?We can predict when this is going to happen because n_distinct for the outer column will be much less than the size of the inner rel. ?In that case, we could consider starting with an empty hash table that effectively acts as a cache. ?Each time a value is probed, we look it up in the hash table. ?If there's no entry, we use an index scan to find the matching rows and insert them into the hash table. Negative results must also be cached. Yeah, that would be quite nice. One problem is that our ndistinct estimates are not very accurate. Well, the right solution to that problem is to fix our ndistinct estimates. :-) Also, as seen in the hash index build performance patch, it would be better to set the initial hash size bigger than needed to avoid the inter-page shuffle if the guess is wrong. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving concurrent transactin commit rate
On Wed, Mar 25, 2009 at 03:58:06PM +, Sam Mason wrote: On Wed, Mar 25, 2009 at 02:38:45PM +, Greg Stark wrote: Sam Mason s...@samason.me.uk writes: Why does it top out so much though? It goes up nicely to around ten clients (I tested with 8 and 12) and then tops out and levels off. The log is chugging along at around 2MB/s which is well above where they are for a single client, but it still seems as though things could go further. Well 2MB/s sounds about right actually: You have: 8kB / ( 1|7200|2min) You want: MB/s * 1.92 / 0.5208 I'd need more explanation (or other pointers) to follow what you mean there. I've actually got a 15k disk, but it shouldn't matter much. 2MB/s seems to be consistent across any number of clients (specifically 1 to 48 here). Heikki looked at this a while back and we concluded that the existing algorithm will only get 1/2 the optimal rate unless you have twice as many sessions as you ought to need to saturate the log i/o. I'm writing to a 15k disk which gives me 250 rotations per second. In the case of a single client I'm getting about 220 transactions per second. That seems reasonable. When I have two clients this stays at about 220 transactions per second. Also reasonable, they end up serialising after each other. Three clients; I get about 320 tps. This appears to be consistent with 1.5*220 and would imply that there's always a spare client behind the lock that gets committed for free. Four clients; I get 430 tps which would mean the queueing is all good. Below I've calculated the (mean) transaction per second over a series of runs and calculated the value I'd expect to get (i.e. clients/2) and then the ratio of the two. clients tps calc ratio 1 221.5 2 223.8 220.0 102% 3 323.5 330.098% 4 427.7 440.097% 6 647.4 660.098% 8 799.7 880.091% 12 946.0 1320.072% 18 1020.6 1980.052% 24 1089.2 2640.041% 32 1116.6 3520.032% 48 1141.8 5280.022% As you can see the ratio between the tps I'm seeing and expecting drops off significantly after 18 clients, with the trend starting somewhere around seven clients. I don't understand why this would be happening. My highly involved and complicated benchmarking is a shell script that does: #!/bin/bash nclients=$1 ittrs=$2 function gensql { echo INSERT INTO bm (c,v) VALUES ('$1','0'); for (( i = 1; i $ittrs; i++ )); do echo UPDATE bm SET v = '$i' WHERE c = '$1'; done echo DELETE FROM bm WHERE c = '$1'; } for (( c = 0; c $nclients; c++)); do gensql $c | psql -Xq -f - done for (( c = 0; c $nclients; c++)); do wait done I'm running time test.sh 8 1000 and recording the time; tps = nclients * ittrs / time. Where the time is the wall clock time expired. I'm repeating measurements four times and the error bars in my SVG from before were the standard deviation of the runs. Something (the HOT code?) keeps the number of dead tuples consistent so I don't think this would be confounding things. But improvements would be appreciated. -- Sam http://samason.me.uk/ Are you sure that you are able to actually drive the load at the high end of the test regime? You may need to use multiple clients to simulate the load effectively. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving concurrent transactin commit rate
On Wed, Mar 25, 2009 at 05:56:02PM +, Sam Mason wrote: On Wed, Mar 25, 2009 at 12:01:57PM -0500, Kenneth Marshall wrote: On Wed, Mar 25, 2009 at 03:58:06PM +, Sam Mason wrote: #!/bin/bash nclients=$1 ittrs=$2 function gensql { echo INSERT INTO bm (c,v) VALUES ('$1','0'); for (( i = 1; i $ittrs; i++ )); do echo UPDATE bm SET v = '$i' WHERE c = '$1'; done echo DELETE FROM bm WHERE c = '$1'; } for (( c = 0; c $nclients; c++)); do gensql $c | psql -Xq -f - done for (( c = 0; c $nclients; c++)); do wait done Are you sure that you are able to actually drive the load at the high end of the test regime? You may need to use multiple clients to simulate the load effectively. Notice that the code is putting things into the background and then waiting for them to finish so there will be multiple clients. Or maybe I'm misunderstanding what you mean. I've just tried modifying the code to write the generated SQL out to a set of files first and this speeds things up by about 6% (the 48 client case goes from taking ~42 seconds to ~39 seconds) indicating that everything is probably OK with the test harness. Also note that this 6% improvement will be linear and across the board and hence should just appear as slightly reduced performance for my system. As I'm not really interested in absolute performance and more in how the system scales as load increases this will negate this effect even further. -- Sam http://samason.me.uk/ I did notice how your test harness was designed. It just seemed that the actual process contention on your load generation system will actually bottle-neck as the number of clients increases and that may be the cause of your fall-off, or a contributor. You could test it by generating the load from independent boxes and see how the perfomance falls-off as you add additional load clients+boxes. My two cents, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with accesing Oracle from plperlu function when using remote pg client.
On Mon, Mar 16, 2009 at 03:16:07PM +0100, Tomasz Olszak wrote: Greetings to All! I've tried to find solution of my problem on other pg mailing lists but without bigger effect. I have a table A in PG. There is also table A in Oracle. I want to import specific row from oracle to pg, so i create plperlu function CREATE OR REPLACE FUNCTION import.ora_a_row(a_id numeric) RETURNS dok_za AS $BODY$ In IPL: create_connection; select all columns on oracle from table a where id = a_id; returning tuple; $BODY$ LANGUAGE 'plperlu' VOLATILE; then i can use such function in pl/pgsql; DECLARE: var A%ROWTYPE; BEGIN; ... select * into var from import.ora_a_row(100); END;... Like you see it's very, very convenient. And it works, but only when I make select * from import.ora_a_row(100); from psql?? on postgresql server(local client). When I try to make that select in pgadmin or from remote machine I have tns error: TNS:could not resolve the connect identifier specified (DBD ERROR: OCIServerAttach) at line 20 I've tried with different postgresql versions and different perls, and different DBI Oracle packages, so i think pg or perl versions are not causes(Of course environment variables are propably set etc.). Oracle Base directory is about 1.6 gigabyte so I think it's full client(not instant). When I used PGADMIN 1.6 on postgresql server and left host editline blank(specifying only a pgport) it worked too. But when I've written localhost as host it didn't work (the same with connecting psql -h localhost -U user database ). Anybody ancounter this kind of problem or maybe it's a bug in plperlu? I'll be grateful for any of Your help. Regards Tomasz This looks like an ENVIRONMENT variable problem. The server does not run with the same set of settings as your psql program. I think that it will work once you get those issues ironed out. Good luck, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 02:30:28PM -0400, Jonah H. Harris wrote: On Mon, Mar 16, 2009 at 2:26 PM, Jonah H. Harris jonah.har...@gmail.comwrote: On Mon, Mar 16, 2009 at 2:04 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: We already have one; it's called update_process_title. I have it turned off, and I still see the remote IP/port in the process list. Ahh, this is why: init_ps_display():set_ps_display(initial_str, true); Perhaps it should obey the configuration setting as well? -- Jonah H. Harris, Senior DBA myYearbook.com What about have the GUC support off, on, and a format string to use to fix this problem. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libxml incompatibility
This looks like a problem caused by two different libxml versions: the one used for the perl XML::LibXML wrappers and the one used to build PostgreSQL. They really need to be the same. Does it still segfault if they are identical? Regards, Ken On Fri, Mar 06, 2009 at 04:14:04PM -0300, Alvaro Herrera wrote: Hi, It seems that if you load libxml into a backend for whatever reason (say you create a table with a column of type xml) and then create a plperlu function that use XML::LibXML, we get a segmentation fault. This sequence reproduces the problem for me in 8.3: create table xmlcrash (a xml); insert into xmlcrash values ('a /'); create function xmlcrash() returns void language plperlu as $$ use XML::LibXML; $$; The problem is reported as TRAP: BadArgument(?!(((context) != ((void *)0) (Node*)((context)))-type) == T_AllocSetContext?, Archivo: ?/pgsql/source/83_rel/src/backend/utils/mmgr/mcxt.c?, L?nea: 507) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libxml incompatibility
On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote: Alvaro Herrera wrote: Hi, It seems that if you load libxml into a backend for whatever reason (say you create a table with a column of type xml) and then create a plperlu function that use XML::LibXML, we get a segmentation fault. Yes, I discovered this a few weeks ago. It looks like libxml is not reentrant, so for perl you need to use some other XML library. Very annoying. cheers andrew Ugh! That is worse than a simple library link incompatibility. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libxml incompatibility
On Fri, Mar 06, 2009 at 05:23:45PM -0300, Alvaro Herrera wrote: Kenneth Marshall wrote: This looks like a problem caused by two different libxml versions: the one used for the perl XML::LibXML wrappers and the one used to build PostgreSQL. They really need to be the same. Does it still segfault if they are identical? Unlikely, because AFAICT there's a single libxml installed on my system. Yes, I saw Andrew's comment and I have had that problem my self with Apache/PHP and perl with libxml. As simple library mismatch would at least be easy to resolve. :) Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore --multi-thread
On Fri, Feb 20, 2009 at 09:22:58AM -0800, Joshua D. Drake wrote: On Fri, 2009-02-20 at 09:33 -0500, Andrew Dunstan wrote: The short answer is that we don't know yet. There is anecdotal evidence that the number of CPUs on the server is a good place to start, but we should be honest enough to say that this is a new feature and we are still gathering information about its performance. If you want to give some advice, then I think the best advice is to try a variety of settings to see what works best for you, and if you have a good set of figures report it back to us. There has been some fairly heavy testing and research that caused the patch in the first place. The thread is here: http://archives.postgresql.org/pgsql-hackers/2008-02/msg00695.php It is a long thread. The end was result was the fastest restore time for 220G was performed with 24 threads with an 8 core box. It came in at 3.5 hours. http://archives.postgresql.org/pgsql-hackers/2008-02/msg01092.php It is important to point out that this was a machine with 50 spindles. Which is where your bottleneck is going to be immediately after solving the CPU bound nature of the problem. So although the CPU question is easily answered, the IO is not. IO is extremely variable in its performance. Sincerely, Joshua D. Drake I also ran some tests against a more modest system that was still showing a performance improvement at (number-of-cores * 2): http://archives.postgresql.org/pgsql-hackers/2008-11/msg01399.php I think that a good starting point for any use should be the number of cores given these two data points. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: hooking parser
On Thu, Feb 19, 2009 at 06:29:25PM +, Sam Mason wrote: On Wed, Feb 18, 2009 at 10:30:12AM -0500, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: I'd be quite interested to support some kind of hook to deal with this Oracle null issue. It would be a great help for porting projects. However, doing this properly is probably more complex and needs further thought. I'd suggest writing a type of regression test first for Oracle null behavior and then evaluating any kind of hook or hack against that. AFAIK, the Oracle behavior is just about entirely unrelated to the parser --- it's a matter of runtime comparison behavior. It is certainly *not* restricted to literal NULL/'' constants, which is the only case that a parser hack can deal with. How about introducing a varchar2 type as in Oracle? It would be a bit of a fiddle going through all the operators and functions making sure that versions existed to cast things back again but seems possible. Not sure how fragile user code would be with it though, I'm mainly worried about it trying to convert things back to TEXT automatically and the resulting change in semantics. Any ideas about good ways to go? Could you define a type/domain for varchar2 mapping it to varchar. There does not seem to be anything else that needs to be done. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] GIST versus GIN indexes for intarrays
On Fri, Feb 13, 2009 at 04:12:53PM +0300, Teodor Sigaev wrote: The short-term workaround for Rusty is probably to create his GIN index using the intarray-provided gin__int_ops opclass. But it Right seems to me that we ought to get rid of intarray's @ and @ operators and have the module depend on the core anyarray operators, just as we have already done for = and . Comments? Agree, will do. Although built-in anyarray operators have ~N^2 behaviour while intarray's version - only N*log(N) Is there a way to have the buily-in anyarray opeators be N*log(N)? Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] note on hash indexes
I had submitted the documentation change as part of my hash function patch but it was removed as not relevant. (It wasn't really.) I would basically remove the first sentence: Note: Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash. For this reason, hash index use is presently discouraged. Ken On Wed, Feb 04, 2009 at 01:22:23PM -0300, Alvaro Herrera wrote: Hi, indices.sgml contains this paragraph about hash indexes: Note: Testing has shown PostgreSQL's hash indexes to perform no better than B-tree indexes, and the index size and build time for hash indexes is much worse. Furthermore, hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash. For these reasons, hash index use is presently discouraged. However, it seems to me that hash indexes are much improved in 8.4, so maybe this needs to be reworded. I'm not sure to what point they have been improved though. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] note on hash indexes
On Wed, Feb 04, 2009 at 11:22:44PM +0100, Zdenek Kotala wrote: The main speed improvement is for varchar datatype. I think It should be mention here as well. IIRC, times are similar with B-Tree for integer datatype. Zdenek Kenneth Marshall pe v st 04. 02. 2009 v 13:57 -0600: I had submitted the documentation change as part of my hash function patch but it was removed as not relevant. (It wasn't really.) I would basically remove the first sentence: Note: Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash. For this reason, hash index use is presently discouraged. Ken On Wed, Feb 04, 2009 at 01:22:23PM -0300, Alvaro Herrera wrote: Hi, indices.sgml contains this paragraph about hash indexes: Note: Testing has shown PostgreSQL's hash indexes to perform no better than B-tree indexes, and the index size and build time for hash indexes is much worse. Furthermore, hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash. For these reasons, hash index use is presently discouraged. However, it seems to me that hash indexes are much improved in 8.4, so maybe this needs to be reworded. I'm not sure to what point they have been improved though. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support The speed improvement applies particularly to any datatype that is larger than an integer (typically 4 bytes). Also the size of fields that can be indexed efficiently is much, much larger than the 2K for Btree. And even 32-bit quantities may be indexed more efficiently than Btrees for large indexes due to the O(1) probe behavior. Btrees typically need to cache/probe the upper levels of the tree to locate the tuple. I have held off on extensive benchmarking until WAL has been implemented. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade project status
On Tue, Jan 27, 2009 at 11:39:50AM -0300, Alvaro Herrera wrote: Heikki Linnakangas wrote: Andrew Dunstan wrote: Zdenek Kotala wrote: 2) pg_upgrade.sh http://archives.postgresql.org/pgsql-hackers/2008-12/msg00248.php Pg_upgrade.sh is shell script for catalog conversion. It works for 8.3-8.4 upgrade. It will be useful while we will not have better solution. Disadvantage is that it is korn shell script. The idea is to rewrite it in PERL which is more portable, but I'm not PERL expert and currently there is no workable solution. I have had a very brief look at this. Translation to perl doesn't look difficult. I'll see what I can do during the next week or so. We don't require perl for any other feature, do we? Seems like a pretty onerous requireemnt for Windows in particular. We do use perl in the build scripts, but that's only required if you want to compile from source. I think it's fairly easy to install Perl on Windows actually. It doesn't sound too onerous a requirement if you want in-place upgrade; actually it looks a very reasonable one. Much more reasonable than Korn shell in any case (or any shell for that matter; I think anything is going to be more of a potentially painful platform dependency than Perl). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support +1 I agree with Alvaro. Perl is a breeze to install on Windows with Activestate and that using shell code to perform this task adds a huge platform dependency to the code. Perl is a known and well defined quantity for scripting. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Sun, Jan 25, 2009 at 10:27:03PM -0600, Kenneth Marshall wrote: On Sat, Jan 10, 2009 at 01:36:25PM -0500, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: I ran 5 times on both old and new code, eliminating the top and bottom and taking the average of the remaining 3, and I got a 6.9% performance improvement with the new code. The question that has been carefully evaded throughout the discussion of this patch is whether the randomness of the hash result is decreased, and if so what is that likely to cost us in performance of the various hash-dependent algorithms. I would still like to have an answer to that before we make a change to gain marginal performance improvement in the hash function itself (which is already shown to be barely measurable in the total context of a hash-dependent operation...) regards, tom lane Dear Hackers and Reviewers, In response to Tom's questioning the randomness of the hash_any when the mixing functions are split into two, the first used when sequentially processing the input and the second for the final mix, I have generated a more detailed analysis of the two hash_any functions. First, one change to the 11/2008 patch, the keylen is added to a, b and c initially so we do not need to add it later on. The is the micro-diff: - --- hashfunc.c_TWOMIX 2009-01-22 14:07:34.0 -0600 +++ hashfunc.c_TWOMIX2 2009-01-22 14:17:32.0 -0600 @@ -336,7 +336,6 @@ /* handle the last 11 bytes */ k = (const unsigned char *) ka; - c += keylen; #ifdef WORDS_BIGENDIAN switch (len) { @@ -439,7 +438,6 @@ } /* handle the last 11 bytes */ - c += keylen; #ifdef WORDS_BIGENDIAN switch (len)/* all the case statements fall through */ - The remainder of this document will use the runs from my initial results broken out using various power-of-two bucket sizes to simulate our actual use in PostgreSQL as the number of items hashed increases and we use more and more bits of our hash to identify the appropriate bucket. I have run each test twice, once with our current hash_any() with the single mix() function and then a second time using my patch from the November commitfest plus the patch above to produce a new hash_any() with two separate mixing functions mix() and final(). For each run I have generated a sequence of unique inputs, up to the number of buckets, hashed them with the hash functions (both old and new), then I calculate the expected number of collision p(n) using the poisson formula for each number of buckets, where the number of buckets are 2**16, 2**18, 2**20, 2**22, 2**24, and 2**26. For my initial run, I used a string consisting of the letter 'a' followed by the integer representation of the numbers from 0 to the (number of buckets - 1): 1) auint32 ((i.e. a1,a0002...) Number of buckets: 65536 Total number of items: 65536 Expected number with n items: 24109 24109 12054 4018 1004 200 33 4 Actual number mix(): 24044 24172 12078 4036 980 186 30 10 Actual number mix()/final(): 24027 24232 12060 3972 1001 207 31 5 1 Number of buckets: 262144 Total number of items: 262144 Expected number with n items: 96437 96437 48218 16072 4018 803 133 19 2 Actual number mix(): 96224 96730 48240 15951 4094 744 143 17 1 Actual number mix()/final(): 96335 96646 48071 16128 4018 801 122 21 2 Number of buckets: 1048576 Total number of items: 1048576 Expected number with n items: 385749 385749 192874 64291 16072 3214 535 76 9 Actual number mix(): 385716 385596 193243 64115 16053 3285 478 77 12 1 Actual number mix()/final(): 385955 385016 193789 63768 16259 3190 511 79 8 1 Number of buckets: 4194304 Total number of items: 4194304 Expected number with n items: 1542998 1542998 771499 257166 64291 12858 2143 306 38 Actual number mix(): 1542536 1543489 771351 25 63830 12847 2123 326 19 5 1 Actual number mix()/final(): 1541828 1544429 772072 256178 64579 12774 2129 288 22 5 Number of buckets: 16777216 Total number of items: 16777216 Expected number with n items: 6171992 6171992 3085996 1028665 257166 51433 8572 1224 153 Actual number mix(): 6170866 6174079 3085912 1027140 257808 51385 8638 1219 146 23 Actual number mix()/final(): 6172058 6171991 3086279 1027916 257535 51465 8554 1243 149 23 3 Number of buckets: 67108864 Total number of items: 67108864 Expected number with n items: 24687971 24687971 12343985 4114661 1028665 205733 34288 4898 612 Actual number mix(): 24686110 24690897 12344232 4113515 1028232 205682 34546 4942 629 72 7 Actual number mix()/final(): 24708515 24669248 12333034 4114796 1034256 208424 34888 5023 598 77 5 Here
Re: [HACKERS] pg_upgrade project status
On Tue, Jan 27, 2009 at 04:23:18PM +0100, Harald Armin Massa wrote: I think it's fairly easy to install Perl on Windows actually. It doesn't sound too onerous a requirement if you want in-place upgrade; actually it looks a very reasonable one. Much more reasonable than Korn shell in any case (or any shell for that matter; I think anything is going to be more of a potentially painful platform dependency than Perl). May I humbly recommend to rewrite in Python? That should be as difficult / easy as PERL, AND there is a very robust py2exe implementation, which allows to create a single .exe file which contains everything. Python is present on all Linux, Windows users are totally comfortable with .exe files. Harald Great idea, perl2exe is available as well and will allow the continued use of perl for our internal scripting language. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Sat, Jan 10, 2009 at 01:36:25PM -0500, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: I ran 5 times on both old and new code, eliminating the top and bottom and taking the average of the remaining 3, and I got a 6.9% performance improvement with the new code. The question that has been carefully evaded throughout the discussion of this patch is whether the randomness of the hash result is decreased, and if so what is that likely to cost us in performance of the various hash-dependent algorithms. I would still like to have an answer to that before we make a change to gain marginal performance improvement in the hash function itself (which is already shown to be barely measurable in the total context of a hash-dependent operation...) regards, tom lane Dear Hackers and Reviewers, In response to Tom's questioning the randomness of the hash_any when the mixing functions are split into two, the first used when sequentially processing the input and the second for the final mix, I have generated a more detailed analysis of the two hash_any functions. First, one change to the 11/2008 patch, the keylen is added to a, b and c initially so we do not need to add it later on. The is the micro-diff: - --- hashfunc.c_TWOMIX 2009-01-22 14:07:34.0 -0600 +++ hashfunc.c_TWOMIX2 2009-01-22 14:17:32.0 -0600 @@ -336,7 +336,6 @@ /* handle the last 11 bytes */ k = (const unsigned char *) ka; - c += keylen; #ifdef WORDS_BIGENDIAN switch (len) { @@ -439,7 +438,6 @@ } /* handle the last 11 bytes */ - c += keylen; #ifdef WORDS_BIGENDIAN switch (len)/* all the case statements fall through */ - The remainder of this document will use the runs from my initial results broken out using various power-of-two bucket sizes to simulate our actual use in PostgreSQL as the number of items hashed increases and we use more and more bits of our hash to identify the appropriate bucket. I have run each test twice, once with our current hash_any() with the single mix() function and then a second time using my patch from the November commitfest plus the patch above to produce a new hash_any() with two separate mixing functions mix() and final(). For each run I have generated a sequence of unique inputs, up to the number of buckets, hashed them with the hash functions (both old and new), then I calculate the expected number of collision p(n) using the poisson formula for each number of buckets, where the number of buckets are 2**16, 2**18, 2**20, 2**22, 2**24, and 2**26. For my initial run, I used a string consisting of the letter 'a' followed by the integer representation of the numbers from 0 to the (number of buckets - 1): 1) auint32 ((i.e. a1,a0002...) Number of buckets: 65536 Total number of items: 65536 Expected number with n items: 24109 24109 12054 4018 1004 200 33 4 Actual number mix(): 24044 24172 12078 4036 980 186 30 10 Actual number mix()/final(): 24027 24232 12060 3972 1001 207 31 5 1 Number of buckets: 262144 Total number of items: 262144 Expected number with n items: 96437 96437 48218 16072 4018 803 133 19 2 Actual number mix(): 96224 96730 48240 15951 4094 744 143 17 1 Actual number mix()/final(): 96335 96646 48071 16128 4018 801 122 21 2 Number of buckets: 1048576 Total number of items: 1048576 Expected number with n items: 385749 385749 192874 64291 16072 3214 535 76 9 Actual number mix(): 385716 385596 193243 64115 16053 3285 478 77 12 1 Actual number mix()/final(): 385955 385016 193789 63768 16259 3190 511 79 8 1 Number of buckets: 4194304 Total number of items: 4194304 Expected number with n items: 1542998 1542998 771499 257166 64291 12858 2143 306 38 Actual number mix(): 1542536 1543489 771351 25 63830 12847 2123 326 19 5 1 Actual number mix()/final(): 1541828 1544429 772072 256178 64579 12774 2129 288 22 5 Number of buckets: 16777216 Total number of items: 16777216 Expected number with n items: 6171992 6171992 3085996 1028665 257166 51433 8572 1224 153 Actual number mix(): 6170866 6174079 3085912 1027140 257808 51385 8638 1219 146 23 Actual number mix()/final(): 6172058 6171991 3086279 1027916 257535 51465 8554 1243 149 23 3 Number of buckets: 67108864 Total number of items: 67108864 Expected number with n items: 24687971 24687971 12343985 4114661 1028665 205733 34288 4898 612 Actual number mix(): 24686110 24690897 12344232 4113515 1028232 205682 34546 4942 629 72 7 Actual number mix()/final(): 24708515 24669248 12333034 4114796 1034256 208424 34888 5023 598 77 5 Here is a second run with number of items = (number of buckets)/2: Number of buckets: 65536 Total number of items: 32768 Expected number with n items: 39749 19874 4968 828 103 10 Actual
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Fri, Jan 09, 2009 at 02:00:39PM -0800, Jeff Davis wrote: On Fri, 2009-01-09 at 14:29 -0600, Kenneth Marshall wrote: Jeff, Thanks for the review. I would not really expect any differences in hash index build times other than normal noise variances. The most definitive benchmark that I have seen was done with my original patch submission in March posted by Luke of Greenplum: We just applied this and saw a 5 percent speedup on a hash aggregation query with four columns in a 'group by' clause run against a single TPC-H table. I wonder if they would be willing to re-run their test? Thanks again. Separating mix() and final() should have some performance benefit, right? Regards, Jeff Davis Yes, it does but the results can be swamped by other latencies in the code path. Tests such as Tom's benchmark of the underlying functions is needed to isolate the timings effectively or a benchmark like Greenplum's that will benefit from a more efficient function. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Sat, Jan 10, 2009 at 10:56:15AM -0800, Jeff Davis wrote: On Sat, 2009-01-10 at 13:36 -0500, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: I ran 5 times on both old and new code, eliminating the top and bottom and taking the average of the remaining 3, and I got a 6.9% performance improvement with the new code. The question that has been carefully evaded throughout the discussion of this patch is whether the randomness of the hash result is decreased, and if so what is that likely to cost us in performance of the various hash-dependent algorithms. I would still like to have an answer to that before we make a change to gain marginal performance improvement in the hash function itself (which is already shown to be barely measurable in the total context of a hash-dependent operation...) In: http://archives.postgresql.org/message-id/20081104202655.gp18...@it.is.rice.edu Ken showed that the number of hash collisions is the same in the old and the new code for his test data. Not a direct measurement of randomness, but it's some kind of indication. I'm not an authority on either hash functions or statistics, so I'll have to defer to Ken or someone else to actually prove that the randomness is maintained. Regards, Jeff Davis First, while I am not an expert by any means, basic statistics will give you the probability of a collision when packing N items into M buckets chosen at random. In all of my tests, I used 1.6M items into 2^32 buckets. If the bits mixing is complete and random, the number of collisions should be close to the same no matter what arrangement of bits are used for inputs. I think my test cases indicate quite clearly that the new hash function is as independent of bit- order as the older functions, in that the number of bucket collisions is basically the same for all layouts. I have the test harness and can test any other input that you would like me to. Second, the author of the basic hash function (old and new) has performed more extensive testing and has shown that the new functions are better in randomizing bits than his original function. I will try and run a micro-benchmark of my original patch in March and the result of the incremental approach that is the result of my Novermber patch tomorrow. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Sat, Jan 10, 2009 at 01:57:27PM -0500, Gregory Stark wrote: Tom Lane t...@sss.pgh.pa.us writes: Jeff Davis pg...@j-davis.com writes: I ran 5 times on both old and new code, eliminating the top and bottom and taking the average of the remaining 3, and I got a 6.9% performance improvement with the new code. The question that has been carefully evaded throughout the discussion of this patch is whether the randomness of the hash result is decreased, In fairness that doesn't seem to be the case. The original patch was posted with such an analysis using cracklib and raw binary data: http://article.gmane.org/gmane.comp.db.postgresql.devel.general/105675 marginal performance improvement in the hash function itself (which is already shown to be barely measurable in the total context of a hash-dependent operation...) If it's a 6% gain in the speed of Hash Join or HashAggregate it would be very interesting. But I gather it's a 6% speedup in the time spent actually in the hash function. Is that really where much of our time is going? If it's 10% of the total time to execute one of these nodes then we're talking about a 0.6% optimization... The Greenplum test did show a 5% increase in performance with the replacement functions in March. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
On Fri, Jan 09, 2009 at 12:04:15PM -0800, Jeff Davis wrote: On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote: Dear PostgreSQL developers, I am re-sending this to keep this last change to the internal hash function on the radar. Hi Ken, A few comments: 1. New patch with very minor changes attached. 2. I reverted the change you made to indices.sgml. We still don't use WAL for hash indexes, and in my opinion we should continue to discourage their use until we do use WAL. We can add back in the comment that hash indexes are suitable for large keys if we have some results to show that. 3. There was a regression test failure in union.sql because the ordering of the results was different. I updated the regression test. 4. Hash functions affect a lot more than hash indexes, so I ran through a variety of tests that use a HashAggregate plan. Test setup and results are attached. These results show no difference between the old and the new code (about 0.1% better). 5. The hash index build time shows some improvement. The new code won in every instance in which a there were a lot of duplicates in the table (100 distinct values, 50K of each) by around 5%. The new code appeared to be the same or slightly worse in the case of hash index builds with few duplicates (100 distinct values, 5 of each). The difference was about 1% worse, which is probably just noise. Note: I'm no expert on hash functions. Take all of my tests with a grain of salt. I would feel a little better if I saw at least one test that showed better performance of the new code on a reasonable-looking distribution of data. The hash index build that you showed only took a second or two -- it would be nice to see a test that lasted at least a minute. Regards, Jeff Davis Jeff, Thanks for the review. I would not really expect any differences in hash index build times other than normal noise variances. The most definitive benchmark that I have seen was done with my original patch submission in March posted by Luke of Greenplum: We just applied this and saw a 5 percent speedup on a hash aggregation query with four columns in a 'group by' clause run against a single TPC-H table. I wonder if they would be willing to re-run their test? Thanks again. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] float8 strtod weirdness
On Wed, Jan 07, 2009 at 08:12:44PM +0530, Nikhil Sontakke wrote: Hi, Consider the following with latest CVS sources: postgres=# create table temp(val float4); CREATE TABLE postgres=# insert into temp values (415.1); INSERT 0 1 postgres=# select * from temp where val = 415.1; val - (0 rows) !? The reason seems to be that 415.1 ends up being treated as a numeric and is converted into float8 (why not float4? - it could have helped to use the float4eq function then) The float8in function uses strtod which for some reason converts '415.1' into 415.12 causing the subsequent comparison to fail. I guess there are ample cases of float/strtod weirdness around? Needless to mention, I was mighty surprised on seeing the output for the first time around :) Casting to float4 works as expected: postgres=# select * from rel where x = 415.1::float4; x --- 415.1 (1 row) Regards, Nikhils -- http://www.enterprisedb.com The traditional approach to equality test with floating point is to do the check plus-or-minus some value epsilon. Otherwise, such seemingly bizarre behavior results. Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
Dear PostgreSQL developers, I am re-sending this to keep this last change to the internal hash function on the radar. Ken Sorry about the delay for this update to the new hash index implementation. I was trying to get the WAL logging in place and forgot to post the actual patch. The WAL for hash indexes will need to wait for 8.5, but I did want to add back in the piece of the Bob Jenkins 2006 hash function that was stripped out of the initial patch on application due to concerns about the randomness of the resulting hash values. Here is a re-post of my initial findings comparing the old/new Jenkins hash from lookup2 and lookup3. I have added a third column containing the results for the hash_any() resulting from the attached patch as well as simple timing test for a DB reindex both before and after patching. Also attached is a simple documentation patch updating the note attached to the hash index description. Regards, Ken Hi, I have finally had a chance to do some investigation on the performance of the old hash mix() function versus the updated mix()/final() in the new hash function. Here is a table of my current results for both the old and the new hash function. In this case cracklib refers to the cracklib-dict containing 1648379 unique words massaged in various ways to generate input strings for the hash functions. The result is the number of collisions in the hash values generated. hash inputoldnew newv2 -------- - cracklib 338316 338 cracklib x 2 (i.e. clibclib) 305319 300 cracklib x 3 (clibclibclib) 323329 315 cracklib x 10 302310 329 cracklib x 100350335 298 cracklib x 1000 314309 315 cracklib x 100 truncated to char(100) 311327 320 uint32 from 1-1648379 309319 347 (uint32 1-1948379)*256309314 304 (uint32 1-1948379)*16 310314 324 auint32 (i.e. a1,a0002...) 320321 312 uint32uint32 (i.e. uint64)321287 309 The different result columns are old = Jenkins 1996 hash function(lookup2.c), new = Jenkins 2006 hash function (lookup3.c), and newv2 = adaptation of current hash_any() to incorporate the separate mix()/final() functions. As you can see from the results, spliting the mix() and final() apart does not result in any perceptible loss of randomness in the hash assignment. I also ran a crude timing for a reindex of the following database: CREATE TABLE dict (word text); CREATE INDEX wordhash ON dict USING hash (word); INSERT INTO dict (word) VALUES('s;lhkjdpyoijxfg;lktjgh;sdlhkjo'); INSERT INTO dict (SELECT MAX(word)||MAX(word) FROM dict); ... (21 times) REINDEX TABLE ... The average time to reindex the table using our current hash_any() without the separate mix()/final() was 1696ms and 1482ms with the separate mix()/final() stages giving almost 13% better performance for this stupid metric. --- indices.sgml2008-10-13 14:40:06.0 -0500 +++ indices.sgml.NEW2008-11-04 12:42:35.0 -0600 @@ -190,13 +190,11 @@ note para -Testing has shown productnamePostgreSQL/productname's hash -indexes to perform no better than B-tree indexes, and the -index size and build time for hash indexes is much worse. -Furthermore, hash index operations are not presently WAL-logged, +productnamePostgreSQL/productname's hash indexes provide +the fast O(1) lookups, even for very large objects. +Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with commandREINDEX/ -after a database crash. -For these reasons, hash index use is presently discouraged. +after a database crash. /para /note --- hashfunc.c.ORIG 2008-09-03 13:07:14.0 -0500 +++ hashfunc.c.NEW 2008-11-04 08:36:16.0 -0600 @@ -200,39 +200,94 @@ * hash function, see http://burtleburtle.net/bob/hash/doobs.html, * or Bob's article in Dr. Dobb's Journal, Sept. 1997. * - * In the current code, we have adopted an idea from Bob's 2006 update - * of his hash function, which is to fetch the data a word at a time when - * it is suitably aligned. This makes for a useful speedup, at the cost - * of having to maintain four code paths (aligned vs unaligned, and - * little-endian vs big-endian). Note that we have NOT adopted his newer - * mix() function, which is faster but may sacrifice some randomness. + * In the current code, we have adopted Bob's 2006 update of his hash + * which fetches the data a word at a time when it is suitably aligned. + * This makes for a useful speedup, at the cost of having to maintain + * four code paths (aligned vs unaligned, and little-endian vs big-endian). + * It also two separate mixing functions
Re: [HACKERS] Preventing index scans for non-recoverable index AMs
On Wed, Dec 17, 2008 at 05:42:41PM -0500, Jaime Casanova wrote: On Wed, Dec 17, 2008 at 4:26 PM, Simon Riggs si...@2ndquadrant.com wrote: Hot Standby won't work with hash indexes because they are non-recoverable. We have a number of ways of dealing with this: i don't see a reason for inventing the wheel, we don't have wal for hash indexes because makes those more slow without any benefit at all... now there will be one... -- Atentamente, Jaime Casanova Soporte y capacitaci?n de PostgreSQL Asesor?a y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 I think having your index survive a server power outage or other crash is a very good thing. Rebuilding a hash index for the case for which it is preferred (large, large tables) would be excrutiating. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Preventing index scans for non-recoverable index AMs
On Wed, Dec 17, 2008 at 10:58:11PM +, Simon Riggs wrote: On Wed, 2008-12-17 at 16:47 -0600, Kenneth Marshall wrote: I think having your index survive a server power outage or other crash is a very good thing. Rebuilding a hash index for the case for which it is preferred (large, large tables) would be excrutiating. Completely agree. We may be outta time to make it happen. I agree. I was working on adding the WAL and ran up against the deadline. A rushed hash WAL implementation would be worse than the other alternatives. I plan on picking it back up after 8.4 is out the door. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Preventing index scans for non-recoverable index AMs
On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote: On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall k...@rice.edu wrote: Rebuilding a hash index for the case for which it is preferred (large, large tables) would be excrutiating. there's such a situation? As of 8.4, yes. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Preventing index scans for non-recoverable index AMs
On Wed, Dec 17, 2008 at 05:10:40PM -0600, Kenneth Marshall wrote: On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote: On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall k...@rice.edu wrote: Rebuilding a hash index for the case for which it is preferred (large, large tables) would be excrutiating. there's such a situation? As of 8.4, yes. In addition, hash indexes can index items larger than the 1/3 page limit of btree indexes. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
Would it be reasonable to turn of optimization for this file? Ken On Tue, Dec 09, 2008 at 05:47:47PM +0100, [EMAIL PROTECTED] wrote: On Tue, 9 Dec 2008, Tom Lane wrote: Date: Tue, 09 Dec 2008 09:23:06 -0500 From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Heikki Linnakangas [EMAIL PROTECTED], Zdenek Kotala [EMAIL PROTECTED], pgsql-hackers list pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cvs head initdb hangs on unixware [EMAIL PROTECTED] writes: Guess what! with the fprintf .. descending node... in place, everything goes well. The optimizer definitly does something weird along the definition/assignement of leftok/rightok.. Hmm, so the problem is in that second loop. The trick is to pick some reasonably non-ugly code change that makes the problem go away. The first thing I'd try is to get rid of the overly cute optimization int rightnodeno = leftnodeno + 1; and make it just read int rightnodeno = rightchild(nodeno); If that doesn't work, we might try refactoring the code enough to get rid of the goto, but that looks a little bit tedious. regards, tom lane I tried that and moving leftok,rightok declaration outside the loop, and refactor the assignement code of leftok, rightok . nothing worked! Regards, -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 15, Chemin des Monges+33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP parallel restore patch
Okay, I have had a chance to run some timing benchmarks. Here are my results for the parallel pg_restore patch: Ken -- Server settings: max_connections = 100 # (change requires restart) shared_buffers = 256MB# min 128kB work_mem = 128MB# min 64kB maintenance_work_mem = 256MB# min 1MB fsync = on # turns forced synchronization on or off synchronous_commit = off# immediate fsync at commit full_page_writes = on # recover from partial page writes checkpoint_segments = 10 # in logfile segments, min 1, 16MB each autovacuum = on # Enable autovacuum subprocess? 'on' The total final database size is 6.5GB. Here are the timings for the different run parallelism from 1 to 8 on a 4-core AMD box: -bash-3.00$ time pg_restore -U postgres -p 5435 -d rttest /tmp/rtout.pz ... real19m3.175s user1m2.968s sys 0m8.202s improvement - 0% -bash-3.00$ time pg_restore -U postgres -p 5435 -m 2 -d rttest /tmp/rtout.pz ... real12m55.680s user1m12.440s sys 0m8.343s improvement - 32% -bash-3.00$ time pg_restore -U postgres -p 5435 -m 4 -d rttest /tmp/rtout.pz ... real9m45.056s user1m1.892s sys 0m8.980s improvement - 49% The system only has 4 cores, but here are the results with -m 8: -bash-3.00$ time pg_restore -U postgres -p 5435 -m 8 -d rttest /tmp/rtout.pz ... real8m15.320s user0m55.206s sys 0m8.678s improvement - 53% -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP parallel restore patch
On Thu, Nov 20, 2008 at 02:26:14PM -0500, Andrew Dunstan wrote: Kenneth Marshall wrote: Okay, I have had a chance to run some timing benchmarks. Here are my results for the parallel pg_restore patch: Ken -- Server settings: max_connections = 100 # (change requires restart) shared_buffers = 256MB# min 128kB work_mem = 128MB# min 64kB maintenance_work_mem = 256MB# min 1MB fsync = on # turns forced synchronization on or off synchronous_commit = off# immediate fsync at commit full_page_writes = on # recover from partial page writes checkpoint_segments = 10 # in logfile segments, min 1, 16MB each autovacuum = on # Enable autovacuum subprocess? 'on' The total final database size is 6.5GB. Here are the timings for the different run parallelism from 1 to 8 on a 4-core AMD box: -bash-3.00$ time pg_restore -U postgres -p 5435 -d rttest /tmp/rtout.pz ... real19m3.175s user1m2.968s sys 0m8.202s improvement - 0% -bash-3.00$ time pg_restore -U postgres -p 5435 -m 2 -d rttest /tmp/rtout.pz ... real12m55.680s user1m12.440s sys 0m8.343s improvement - 32% -bash-3.00$ time pg_restore -U postgres -p 5435 -m 4 -d rttest /tmp/rtout.pz ... real9m45.056s user1m1.892s sys 0m8.980s improvement - 49% The system only has 4 cores, but here are the results with -m 8: -bash-3.00$ time pg_restore -U postgres -p 5435 -m 8 -d rttest /tmp/rtout.pz ... real8m15.320s user0m55.206s sys 0m8.678s improvement - 53% Interesting. Can you try with two changes? Turn fsync off, and use the --truncate-before-load switch. In general, though, this is fairly much in line with other experience, i.e. we can get up to about n/2 times speedup with n cores. thanks andrew Okay, here is the same test run with: Cheers, Ken fsync = off --truncate-before-load -bash-3.00$ time pg_restore -U postgres -p 5435 --truncate-before-load -d rttest /tmp/rtout.pz ... real16m25.031s user1m3.707s sys 0m8.776s improvement - 0% -bash-3.00$ time pg_restore -U postgres -p 5435 -m 2 --truncate-before-load -d r ttest /tmp/rtout.pz ... real10m26.730s user0m48.782s sys 0m7.214s improvement - 36% -bash-3.00$ time pg_restore -U postgres -p 5435 -m 4 --truncate-before-load -d r ttest /tmp/rtout.pz ... real8m5.061s user0m48.657s sys 0m7.602s improvement - 51% -bash-3.00$ time pg_restore -U postgres -p 5435 -m 8 --truncate-before-load -d r ttest /tmp/rtout.pz ... real6m18.787s user0m45.361s sys 0m7.811s improvement - 62% -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [RRR] [HACKERS] Tests citext casts
On Fri, Nov 07, 2008 at 10:15:17AM -0800, David E. Wheeler wrote: On Nov 5, 2008, at 12:34 PM, Kenneth Marshall wrote: I am using the anonymous CVS repository, it returns the following information in pg_catalog.pg_settings: What is lc_collate set to? % show lc_collate; FWIW, I just ran the tests myself and all passed, with and without the patch (using en_US.UTF-8). I think that the regression tests generally expect to be run with the C locale, though en_US generally works fine, too, given that ASCII ordering has the same semantics. Best, David David, Thank you for the pointers. lc_collate is set to en_US.UTF-8. I re-initdb the database with the --no-locale option and then the tests passed successfully. Thank you for the reminder that the regression tests need to run against a C locale database. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RRR] Tests citext casts - reviewed
The patch for the citext tests applied to module cleanly and the patched files resulted in a clean make installcheck run for the citext module. My previous problem was the result of not testing with a C locale database. This patch is ready to be applied. Regards, Ken Marshall -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Tests citext casts
I installed and ran the citext tests both with and without the patch and had failures both times. The patch applied cleanly and the make;make install completed without errors. I have attached the two regression.diffs files, one without the patch applied and the other with the patch. Regards, Ken Marshall *** /opt/local/pg8/pgsql/contrib/citext/expected/citext_1.out Wed Nov 5 08:33:40 2008 --- /opt/local/pg8/pgsql/contrib/citext/results/citext.out Wed Nov 5 08:34:44 2008 *** *** 122,146 SELECT 'B'::citext 'a'::text AS t; -- text wins. t --- ! t (1 row) SELECT 'B'::citext = 'a'::text AS t; -- text wins. t --- ! t (1 row) SELECT 'a'::citext 'B'::text AS t; -- text wins. t --- ! t (1 row) SELECT 'a'::citext = 'B'::text AS t; -- text wins. t --- ! t (1 row) -- Test implicit casting. citext casts to varchar, but not vice-versa. --- 122,146 SELECT 'B'::citext 'a'::text AS t; -- text wins. t --- ! f (1 row) SELECT 'B'::citext = 'a'::text AS t; -- text wins. t --- ! f (1 row) SELECT 'a'::citext 'B'::text AS t; -- text wins. t --- ! f (1 row) SELECT 'a'::citext = 'B'::text AS t; -- text wins. t --- ! f (1 row) -- Test implicit casting. citext casts to varchar, but not vice-versa. *** *** 159,183 SELECT 'B'::citext 'a'::varchar AS t; -- varchar wins. t --- ! t (1 row) SELECT 'B'::citext = 'a'::varchar AS t; -- varchar wins. t --- ! t (1 row) SELECT 'a'::citext 'B'::varchar AS t; -- varchar wins. t --- ! t (1 row) SELECT 'a'::citext = 'B'::varchar AS t; -- varchar wins. t --- ! t (1 row) -- A couple of longer examlpes to ensure that we don't get any issues with bad --- 159,183 SELECT 'B'::citext 'a'::varchar AS t; -- varchar wins. t --- ! f (1 row) SELECT 'B'::citext = 'a'::varchar AS t; -- varchar wins. t --- ! f (1 row) SELECT 'a'::citext 'B'::varchar AS t; -- varchar wins. t --- ! f (1 row) SELECT 'a'::citext = 'B'::varchar AS t; -- varchar wins. t --- ! f (1 row) -- A couple of longer examlpes to ensure that we don't get any issues with bad == *** /opt/local/pg8/pgsql/contrib/citext/expected/citext_1.out Wed Nov 5 08:27:57 2008 --- /opt/local/pg8/pgsql/contrib/citext/results/citext.out Wed Nov 5 08:28:10 2008 *** *** 122,146 SELECT 'B'::citext 'a'::text AS t; -- text wins. t --- ! t (1 row) SELECT 'B'::citext = 'a'::text AS t; -- text wins. t --- ! t (1 row) SELECT 'a'::citext 'B'::text AS t; -- text wins. t --- ! t (1 row) SELECT 'a'::citext = 'B'::text AS t; -- text wins. t --- ! t (1 row) -- Test implicit casting. citext casts to varchar, but not vice-versa. --- 122,146 SELECT 'B'::citext 'a'::text AS t; -- text wins. t --- ! f (1 row) SELECT 'B'::citext = 'a'::text AS t; -- text wins. t --- ! f (1 row) SELECT 'a'::citext 'B'::text AS t; -- text wins. t --- ! f (1 row) SELECT 'a'::citext = 'B'::text AS t; -- text wins. t --- ! f (1 row) -- Test implicit casting. citext casts to varchar, but not vice-versa. *** *** 159,183 SELECT 'B'::citext 'a'::varchar AS t; -- varchar wins. t --- ! t (1 row) SELECT 'B'::citext = 'a'::varchar AS t; -- varchar wins. t --- ! t (1 row) SELECT 'a'::citext 'B'::varchar AS t; -- varchar wins. t --- ! t (1 row) SELECT 'a'::citext = 'B'::varchar AS t; -- varchar wins. t --- ! t (1 row) -- A couple of longer examlpes to ensure that we don't get any issues with bad --- 159,183 SELECT 'B'::citext 'a'::varchar AS t; -- varchar wins. t --- ! f (1 row) SELECT 'B'::citext = 'a'::varchar AS t; -- varchar wins. t --- ! f (1 row) SELECT 'a'::citext 'B'::varchar AS t; -- varchar wins. t --- ! f (1 row) SELECT 'a'::citext = 'B'::varchar AS t; -- varchar wins. t --- ! f (1 row) -- A couple of longer examlpes to ensure that we don't get any issues with bad *** *** 740,769 t (1 row) - SELECT 'f'::char::citext = 'f' AS t; - t - --- - t - (1 row) - - SELECT 'f'::citext::char = 'f'::char AS t; - t - --- - t - (1 row) - - SELECT 'f'::char::citext = 'f' AS t; - t - --- - t - (1 row) - - SELECT 'f'::citext::char = 'f'::char AS t; - t - --- - t - (1 row) - SELECT 'foo'::citext::bytea = 'foo'::bytea AS t; t --- --- 740,745 == -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [RRR] [HACKERS] Tests citext casts
On Wed, Nov 05, 2008 at 09:04:04AM -0800, David E. Wheeler wrote: On Nov 5, 2008, at 6:40 AM, Kenneth Marshall wrote: I installed and ran the citext tests both with and without the patch and had failures both times. The patch applied cleanly and the make;make install completed without errors. I have attached the two regression.diffs files, one without the patch applied and the other with the patch. What patch was it you applied? And is this CVS HEAD that you're testing? What locale/collation is your database configured with? Thanks, David David, I am using the anonymous CVS repository, it returns the following information in pg_catalog.pg_settings: postgres=# select * from pg_catalog.pg_settings where name like 'server_%'; name| setting | unit | category | short_desc | extra_desc | c ontext | vartype | source | min_val | max_val | enumvals | boot_val | reset_ val | sourcefile | sourceline +--+--+- ---+++-- +-+--+-+-+--+---+--- ++ server_encoding| UTF8 | | Client Connection Defaults / Locale and Formatting | Sets the server (database) character set encoding. || i nternal | string | override | | | | SQL_ASCII | UTF8 || server_version | 8.4devel | | Preset Options | Shows the server version. || i nternal | string | default | | | | 8.4devel | 8.4dev el || server_version_num | 80400| | Preset Options | Shows the server version as an integer.|| i nternal | integer | default | 80400 | 80400 | | 80400 | 80400 || (3 rows) The patch that I used is from the link in the commitfest 2008-11 wiki which points to: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] Cheers, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers