Re: [HACKERS] Progress on fast path sorting, btree index creation time
On Wed, Feb 01, 2012 at 04:12:58PM -0600, Jim Nasby wrote: On Jan 26, 2012, at 9:32 PM, Robert Haas wrote: But if we want to put it on a diet, the first thing I'd probably be inclined to lose is the float4 specialization. Some members of the audience will recall that I take dim view of floating point arithmetic generally, but I'll concede that there are valid reasons for using float8. I have a harder time coming up with a good reason to use float4 - ever, for anything you care about. So I would be inclined to think that if we want to trim this back a bit, maybe that's the one to let go. If we want to be even more aggressive, the next thing I'd probably lose is the optimization of multiple sortkey cases, on the theory that single sort keys are probably by far the most common practical case. I do find float4 to be useful, though it's possible that my understanding is flawed… We end up using float to represent ratios in our database; things that really, honest to God do NOT need to be exact. In most cases, 7 digits of precision (which AFAIK is what you're guaranteed with float4) is plenty, so we use float4 rather than bloat the database (though, since we're on 64bit hardware I guess that distinction is somewhat moot…). Is there something I'm missing that would make float4 useless as compared to float8? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net If the values stored are float4, it would be nice to have that fast-path sort available too. The cases where I have used float4 values in the past, I absolutely did not need any of the float8 baggage and in my case, using the actual float4 comparison operator resulted in a significant time savings over the normal float8. This could be processor specific, but it would be worth testing before throwing it out. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
On Mon, Feb 20, 2012 at 10:18:31AM +0100, Marc Mamin wrote: I looked into the complaint here of poor estimation for GIN indexscans: http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php At first glance it sounds like a mistake in selectivity estimation, but it isn't: the rowcount estimates are pretty nearly dead on. The problem is in the planner's estimate of the cost of executing the @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but actually it's a good deal more expensive than that. Some experimentation suggests that @@ might be about 500 times as expensive as a simple integer comparison. I don't propose pushing its procost up that much, but surely at least 10 would be appropriate, maybe even 100. However ... if you just alter pg_proc.procost in Marc's example, the planner *still* picks a seqscan, even though its estimate of the seqscan cost surely does go up. The reason is that its estimate of the GIN indexscan cost goes up just as much, since we charge one qual eval cost per returned tuple in gincostestimate. It is easy to tell from the actual runtimes that that is not what's happening in a GIN indexscan; we are not re-executing the @@ operator for every tuple. But the planner's cost model doesn't know that. Hello, many thanks for your feedback. I've repeated my test with a table using plain storage, which halved the query time. This confirms that detoasting is the major issue for cost estimation, but even with plain storage the table scan remains about 30% slower compared to the index scan. Hi Marc, Do you happen to know in which function, the extra time for the toast storage is spent -- zlib compression? I saw a mention of the LZ4 compression algorithm that is BSD licensed as a Google summer of code project: http://code.google.com/p/lz4/ that compresses at almost 7X than zlib (-1) and decompresses at 6X. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Document hashtext() and Friends?
On Tue, Feb 21, 2012 at 12:14:03PM -0800, David E. Wheeler wrote: On Feb 21, 2012, at 12:11 PM, Michael Glaesemann wrote: And hashtext *has* changed across versions, which is why Peter Eisentraut published a version-independent hash function library: https://github.com/petere/pgvihash Yes, Marko wrote one, too: https://github.com/markokr/pghashlib But as I’m about to build a system that is going to have many billions of nodes, I could use a variant that returns a bigint. Anyone got a pointer to something like that? Thanks, David Hi David, The existing hash_any() function can return a 64-bit hash, instead of the current 32-bit hash, by returning the b and c values, instead of the current which just returns the c value, per the comment at the start of the function. It sounded like Peter had already done this in his pg_stat_statements normalization patch, but I could not find it. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 10:29:56AM -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. VACUUM ANALYZE scans the whole table sequentially. ANALYZE accesses a random sample of data blocks. Random access is slower than sequential access, so at some threshold of sample size and sequential/random I/O speed ratio ANALYZE could become slower. That analysis is entirely wrong. In the first place, although ANALYZE doesn't read all the blocks, what it does read it reads in block number order. So it's not like there are random seeks all over the disk that would not need to happen anyway. In the second place, VACUUM ANALYZE consists of two separate passes, VACUUM and then ANALYZE, and the second pass is going to be random I/O by your definition no matter what. If the filesystem is hugely biased towards sequential I/O for some reason, and the VACUUM scan causes the whole table to become resident in RAM where ANALYZE can read it for free, then I guess it might be possible to arrive at Pavel's result. But it would be an awfully narrow corner case. I cannot believe that his statement is true in general, or even for a noticeably large fraction of cases. regards, tom lane Wouldn't a full sequential scan trigger the kernel read-ahead, which might not trigger for the analyze block reads, even though they are in order? That could account for the observation. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster compression, again
On Wed, Mar 14, 2012 at 11:06:16AM -0700, Daniel Farina wrote: For 9.3 at a minimum. The topic of LZO became mired in doubts about: * Potential Patents * The author's intention for the implementation to be GPL Since then, Google released Snappy, also an LZ77-class implementation, and it has been ported to C (recently, and with some quirks, like no LICENSE file...yet, although it is linked from the original Snappy project). The original Snappy (C++) has a BSD license and a patent grant (which shields you from Google, at least). Do we want to investigate a very-fast compression algorithm inclusion again in the 9.3 cycle? +1 for Snappy and a very fast compression algorithm. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster compression, again
On Wed, Mar 14, 2012 at 04:43:55PM -0400, Andrew Dunstan wrote: On 03/14/2012 04:10 PM, Merlin Moncure wrote: there are plenty of on gpl lz based libraries out there (for example: http://www.fastlz.org/) and always have been. they are all much faster than zlib. the main issue is patents...you have to be careful even though all the lz77/78 patents seem to have expired or apply to specifics not relevant to general use. We're not going to include GPL code in the backend. We have enough trouble with readline and that's only for psql. SO the fact that there are GPL libraries can't help us, whether there are patent issues or not. cheers andrew That is what makes Google's Snappy so appealing, a BSD license. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster compression, again
On Thu, Mar 15, 2012 at 10:14:12PM +, Simon Riggs wrote: On Wed, Mar 14, 2012 at 6:06 PM, Daniel Farina dan...@heroku.com wrote: If we're curious how it affects replication traffic, I could probably gather statistics on LZO-compressed WAL traffic, of which we have a pretty huge amount captured. What's the compression like for shorter chunks of data? Is it worth considering using this for the libpq copy protocol and therefore streaming replication also? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Here is a pointer to some tests with Snappy+CouchDB: https://github.com/fdmanana/couchdb/blob/b8f806e41727ba18ed6143cee31a3242e024ab2c/snappy-couch-tests.txt They checked compression on smaller chunks of data. I have extracted the basic results. The first number is the original size in bytes, followed by the compressed size in bytes, the percent compressed and the compression ratio: 77 - 60, 90% or 1.1:1 120 - 104, 87% or 1.15:1 127 - 80, 63% or 1.6:1 5942 - 2930, 49% or 2:1 It looks like a good candidate for both the libpq copy protocol and streaming replication. My two cents. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
On Mon, Jan 07, 2013 at 09:10:31AM +, Simon Riggs wrote: On 7 January 2013 07:29, Takeshi Yamamuro yamamuro.take...@lab.ntt.co.jp wrote: Anyway, the compression speed in lz4 is very fast, so in my opinion, there is a room to improve the current implementation in pg_lzcompress. So why don't we use LZ4? +1 Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve compression speeds in pg_lzcompress.c
On Mon, Jan 07, 2013 at 01:36:33PM +, Greg Stark wrote: On Mon, Jan 7, 2013 at 10:21 AM, John R Pierce pie...@hogranch.com wrote: On 1/7/2013 2:05 AM, Andres Freund wrote: I think there should be enough bits available in the toast pointer to indicate the type of compression. I seem to remember somebody even posting a patch to that effect? I agree that it's probably too late in the 9.3 cycle to start with this. so an upgraded database would have old toasted values in the old compression format, and new toasted values in the new format in an existing table? that's kind of ugly. I haven't looked at the patch. It's not obvious to me from the description that the output isn't backwards compatible. The way the LZ toast compression works the output is self-describing. There are many different outputs that would decompress to the same thing and the compressing code can choose how hard to look for earlier matches and when to just copy bytes wholesale but the decompression will work regardless. I think this comment refers to the lz4 option. I do agree that the patch that was posted to improve the current compression speed should be able to be implemented to allow the current results to be decompressed as well. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIGFPE handler is naive
On Mon, Aug 13, 2012 at 11:52:06PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Aug 13, 2012 at 10:14 PM, Noah Misch n...@leadboat.com wrote: Overall, though, I think it best to plug this. We could set a flag before each operation, like evaluation of SQL arithmetic, for which SIGFPE is normal. Yeah, that's what I thought of, too. It seems like it'd be a lot of work to get there, though. That would depend on how many places there are where SIGFPE is expected. Are we sure there are any? Maybe we should just remove the handler and let SIGFPE be treated as a core dump. regards, tom lane Wouldn't any user level divide-by-zero code cause a SIGFPE? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
On Wed, Sep 19, 2012 at 02:39:12PM -0500, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: It still seems like awfully weird behavior. Why? The WHERE condition relates only to the output of the _stats subquery, so why shouldn't it be evaluated there, rather than after the join? In another thread, Tom Lane t...@sss.pgh.pa.us wrote: It's easier to understand why this is if you realize that SQL has a very clear model of a pipeline of query execution. Conceptually, what happens is: 1. Form the cartesian product of the tables listed in FROM (ie, all combinations of rows). 2. Apply the WHERE condition to each row from 1, and drop rows that don't pass it. People expect that the results will be consistent with this model, even if the implementation is optimized under the covers. I think correct semantics should trump performance here. -Kevin It seems like this is what happens here except that the function is evaluated once for the WHERE and not once per ROW. Both of these meet the criterion for 2 above and Tom's earlier comments both hold. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] change in LOCK behavior
On Wed, Oct 10, 2012 at 10:21:51PM +0200, Tomas Vondra wrote: Hi, I've just noticed a change of LOCK command behavior between 9.1 and 9.2, and I'm not sure whether this is expected or not. Let's use a very simple table CREATE TABLE x (id INT); Say there are two sessions - A and B, where A performs some operations on x and needs to protect them with an ACCESS EXCLUSIVE lock (e.g. it might be a pg_bulkload that acquires such locks, and we need to do that explicitly on one or two places). Session B is attempting to read the data, but is blocked and waits. On 9.1 it sees the commited data (which is what we need) but on 9.2 it sees only data commited at the time of the lock attemt. Example: A: BEGIN; A: LOCK x IN ACCESS EXCLUSIVE MODE; A: INSERT INTO x VALUES (100); B: SELECT * FROM x; A: COMMIT; Now on 9.1, B receives the value 100 while on 9.2 it gets no rows. Is this expected? I suspect the snapshot is read at different time or something, but I've checked release notes but I haven't seen anything relevant. Without getting the commited version of data, the locking is somehow pointless for us (unless using a different lock, not the table itself). regards Tomas Hi Tomas, 9.2 is doing it right. Per the documentation on explicit locking: http://www.postgresql.org/docs/9.2/static/explicit-locking.html Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating Hash Indexes
On Mon, Oct 15, 2012 at 10:13:24AM -0400, Robert Haas wrote: On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs si...@2ndquadrant.com wrote: * Put WARNINGs in the docs against the use of hash indexes, backpatch to 8.3. CREATE INDEX gives no warning currently, though Index Types does mention a caution. I'd be in favor of adding such warnings to the documentation if they are not there already, and possibly even warning on CREATE INDEX .. USING hash(). I don't think I'd go so far as to say that we should imply that they'll be removed in a future release. Given how deeply intertwined they are with the planner, I doubt that that will happen; and I think there is enough interest in the technology that it's likely to eventually be fixed. +1 for adding more warnings but do not deprecate them. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating Hash Indexes
On Mon, Oct 15, 2012 at 11:46:40AM -0700, Jeff Janes wrote: On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus j...@agliodbs.com wrote: I would be in favor of moving them to contrib for 9.4. Assuming that someone can figure out how this interacts with the existing system table opclasses. Them being in /contrib would also put less pressure on the next new hacker who decides to take them on as a feature; they can improve them incrementally without needing to fix 100% of issues in the first go. Is there anything currently in contrib that defines its own WAL records and replay methods? Are there hooks for doing so? Cheers, Jeff That is a good point. Please do not move it to contrib if that will make it even harder/impossible to add WAL support. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq
On Tue, Nov 06, 2012 at 04:04:51PM -0500, Christopher Browne wrote: It seems not unusual for Linux distributions to supply libpq as part of a separate package (whether via dpkg, which I think uses ar as the archiver, or RPM, which uses cpio). Possibly this is already provided on your system via some means akin to those. If, instead, you are keen on getting the source code for libpq in a separate tarball, I'd seriously question why that would be expected to be valuable. On most systems, these days, it doesn't take terribly much time or space (on our systems with lots of GBs) to build all of Postgres, so separating the source code to the library out seems like an effort with not much value. +1 For a Linux box the entire installation is 50MB. Now if it were something large like *racle, maybe Regards, Ken On Tue, Nov 6, 2012 at 2:11 PM, Stefan humdumde...@gmail.com wrote: Would it be possible to provide libpq in separate tarbal or is that too much to ask for ? Thank you! Kind regards Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 02:01:02PM -0400, Robert Haas wrote: On Tue, Apr 10, 2012 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Huh? I understood what you said upthread to be that we have two ways in existing releases (anything unreleased has zero standing in this discussion): float8 sec in pg_stat_statements.total_time, and int8 msec everywhere else. Did I miss something? We also have int8 usec floating around. But even if we didn't, float8 msec would be a new one, regardless of whether it would be third or fourth... It would still be the second one, because it would replace the only use of float8 sec, no? And more to the point, it converges us on msec being the only exposed unit. The business about underlying microseconds is maybe not so good, but I don't think we want to touch that right now. In the long run I think it would make sense to converge on float8 msec as being the standard for exposed timing values, because that is readily adaptable to the underlying data having nsec or even better precision. Hmm. Maybe we should think about numeric ms, which would have all the same advantages but without the round-off error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company They are also a lot bigger with tons of added overhead. :) Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 11, 2012 at 01:53:06AM +0100, Peter Geoghegan wrote: On 11 April 2012 01:16, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote: If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? That supposes that the normalised query text is perfectly stable. It may well not be, particularly for things like ad-hoc queries or queries generated by ORMs, across database clusters and over long periods of time - Indeed, but the hash value isn't stable either given those sorts of assumptions, so I'm not convinced that there's any advantage there. Isn't it? The hash captures the true meaning of the query, while having the database server's platform as a usually irrelevant artefact. Another thing that I forgot to mention is client encoding - it may well be fairly inconvenient to have to use the same algorithm to hash the query string across applications. You also have to hash the query string yourself again and again, which is expensive to do from Python or something, and is often inconvenient - differences beyond track_activity_query_size bytes (default:1024) are not recognised. Using an SQL code beautifier where a single byte varies now breaks everything, which developers don't expect at all (we've trained them not to), so in many ways you're back to the same limitations as classic pg_stat_statements if you attempt to aggregate queries over time and across machines, which is a very real use case. It's probably pretty annoying to have to get your Python app to use the same hash function as your Java app or whatever I, unless you want to use something heavyweight like a cryptographic hash function. By doing it within Postgres, you avoid those headaches. I'm not asking you to very loudly proclaim that it should be used like this - just expose it, accurately document it, and I'm quite confident that it will be widely used and relied upon by those that are reasonably well informed, and understand its limitations, which are really quite straightforward. What I think people would actually like to know, if they're in a situation where distinct query texts are getting hashed to the same thing, is *which* different texts got hashed to the same thing. But there's no good way to expose that given the pg_stat_statements infrastructure, and exposing the hash value doesn't help. Apart from detecting the case where we get a straightforward collision, I don't expect that that would be useful. The whole point is that the user doesn't care about the difference, and I think we've specified a practical, widely useful standard for when queries should be considered equivalent. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services By using all 64-bits of the hash that we currently calculate, instead of the current use of 32-bits only, the collision probabilities are very low. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Thu, Jun 14, 2012 at 02:38:02PM -0500, Merlin Moncure wrote: On Thu, Jun 14, 2012 at 1:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: So I've got very little patience with the idea of let's put in some hooks and then great things will happen. It would be far better all around if we supported exactly one, well-chosen, method. But really I still don't see a reason not to let openssl do it for us. Well, for toast compression the right choice is definitely one of the lz based algorithms (not libz!). For transport compression you have the case of sending large data over very slow and/or expensive links in which case you want to use bzip type methods. But in the majority of cases I'd probably be using lz there too. So if I had to pick just one, there you go. But which one? the lz algorithm with arguably the best pedigree (lzo) is gnu but there are many other decent candidates, some of which have really tiny implementations. merlin +1 for a very fast compressor/de-compressor. lz4 from Google has a BSD license and at 8.5X faster compression than zlib(-1) and 5X faster de-compression the zlib (-1), 2X faster than LZO even would be my pick. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Fri, Jun 15, 2012 at 07:18:34AM -0500, Merlin Moncure wrote: On Fri, Jun 15, 2012 at 5:48 AM, Florian Pflug f...@phlo.org wrote: On Jun15, 2012, at 12:09 , Magnus Hagander wrote: On Fri, Jun 15, 2012 at 5:52 PM, Florian Pflug f...@phlo.org wrote: On Jun15, 2012, at 07:50 , Magnus Hagander wrote: Second, we also have things like the JDBC driver and the .Net driver that don't use libpq. the JDBC driver uses the native java ssl support, AFAIK. Does that one support the compression, and does it support controlling it? Java uses pluggable providers with standardized interfaces for most things related to encryption. SSL support is provided by JSSE (Java Secure Socket Extension). The JSSE implementation included with the oracle JRE doesn't seem to support compression according to the wikipedia page quoted above. But chances are that there exists an alternative implementation which does. Yeah, but that alone is IMO a rather big blocker for claiming that this is the only way to do it :( And I think the fact that that wikipedia page doesn't list any other ones, is a sign that there might not be a lot of other choices out there in reality - expecially not opensource… Hm, but things get even harder for the JDBC and .NET folks if we go with a third-party compression method. Or would we require that the existence of a free Java (and maybe .NET) implementation of such a method would be an absolute must? The way I see it, if we use SSL-based compression then non-libpq clients there's at least a chance of those clients being able to use it easily (if their SSL implementation supports it). If we go with a third-party compression method, they *all* need to add yet another dependency, or may even need to re-implement the compression method in their implementation language of choice. hm, that's a really excellent point. merlin I agree and think that the SSL-based compression is an excellent default compression scheme. The plugable compression approach allows for the choice of the most appropriate compression implementation based on the application needs. It really addresses corner cases such as high- performance system. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Sat, Jun 16, 2012 at 11:15:30AM -0400, Tom Lane wrote: Magnus Hagander mag...@hagander.net writes: On Sat, Jun 16, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's not obvious to me that we actually *need* anything except the ability to recognize that a null-encrypted SSL connection probably shouldn't be treated as matching a hostssl line; which is not something that requires any fundamental rearrangements, since it only requires an after-the-fact check of what was selected. Maybe I spelled it out wrong. It does require it insofar that if we want to use this for compression, we must *always* enable openssl on the connection. So the with these encryption method boils down to NULL encryption only or whatever other standards I have for encryption. We don't need the ability to change the whatever other standards per subnet, but we need to control the accept-NULL-encryption on a per subnet basis. After sleeping on it, I wonder if we couldn't redefine the existing list of acceptable ciphers option as the list of ciphers that are considered to provide encrypted transport. So you'd be allowed to connect with SSL using any unapproved cipher (including NULL), the backend just considers it as equivalent to a non-SSL connection for pg_hba purposes. Then no change is needed in any configuration stuff. regards, tom lane +1 That is nice and clean. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Mon, Jun 25, 2012 at 03:12:46PM +0200, Florian Pflug wrote: On Jun25, 2012, at 04:04 , Robert Haas wrote: If, for example, someone can demonstrate that an awesomebsdlz compresses 10x as fast as OpenSSL... that'd be pretty compelling. That, actually, is demonstrably the case for at least Google's snappy. (and LZO, but that's not an option since its license is GPL) They state in their documentation that In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable compression ratios. The only widely supported compression method for SSL seems to be DEFLATE, which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by an order of magnitude. The compression ratio achieved by DEFLATE/gzip/zlib is much better, though. The snappy documentation states Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. Here are a few numbers for LZO vs. gzip. Snappy should be comparable to LZO - I tested LZO because I still had the command-line compressor lzop lying around on my machine, whereas I'd have needed to download and compile snappy first. $ dd if=/dev/random of=data bs=1m count=128 $ time gzip -1 data data.gz real 0m6.189s user 0m5.947s sys 0m0.224s $ time lzop data data.lzo real 0m2.697s user 0m0.295s sys 0m0.224s $ ls -lh data* -rw-r--r-- 1 fgp staff 128M Jun 25 14:43 data -rw-r--r-- 1 fgp staff 128M Jun 25 14:44 data.gz -rw-r--r-- 1 fgp staff 128M Jun 25 14:44 data.lzo $ dd if=/dev/zero of=zeros bs=1m count=128 $ time gzip -1 zeros zeros.gz real 0m1.083s user 0m1.019s sys 0m0.052s $ time lzop zeros zeros.lzo real 0m0.186s user 0m0.123s sys 0m0.053s $ ls -lh zeros* -rw-r--r-- 1 fgp staff 128M Jun 25 14:47 zeros -rw-r--r-- 1 fgp staff 572K Jun 25 14:47 zeros.gz -rw-r--r-- 1 fgp staff 598K Jun 25 14:47 zeros.lzo To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about 350MB/s if the data is purely random, and about 800MB/s if the data compresses extremely well. (Numbers based on user time since that indicates the CPU time used, and ignores the IO overhead, which is substantial) IMHO, the only compelling argument (and a very compelling one) to use SSL compression was that it requires very little code on our side. We've since discovered that it's not actually that simple, at least if we want to support compression without authentication or encryption, and don't want to restrict ourselves to using OpenSSL forever. So unless we give up at least one of those requirements, the arguments for using SSL-compression are rather thin, I think. best regards, Florian Pflug +1 for http://code.google.com/p/lz4/ support. It has a BSD license too. Using SSL libraries give all the complexity without any real benefit. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Mon, Jun 25, 2012 at 09:45:26PM +0200, Florian Pflug wrote: On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote: Magnus Hagander mag...@hagander.net writes: Or that it takes less code/generates cleaner code... So we're talking about some LZO things such as snappy from google, and that would be another run time dependency IIUC. I think it's time to talk about fastlz: http://fastlz.org/ http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c 551 lines of C code under MIT license, works also under windows I guess it would be easy (and safe) enough to embed in our tree should we decide going this way. Agreed. If we extend the protocol to support compression, and not rely on SSL, then let's pick one of these LZ77-style compressors, and let's integrate it into our tree. We should then also make it possible to enable compression only for the server - client direction. Since those types of compressions are usually pretty easy to decompress, that reduces the amount to work non-libpq clients have to put in to take advantage of compression. best regards, Florian Pflug Here is the benchmark list from the Google lz4 page: NameRatio C.speed D.speed LZ4 (r59) 2.084 330 915 LZO 2.05 1x_1 2.038 311 480 QuickLZ 1.5 -1 2.233 257 277 Snappy 1.0.52.024 227 729 LZF 2.076 197 465 FastLZ 2.030 190 420 zlib 1.2.5 -1 2.72839 195 LZ4 HC (r66)2.71218 1020 zlib 1.2.5 -6 3.09514 210 lz4 absolutely dominates on compression/decompression speed. While fastlz is faster than zlib(-1) on compression, lz4 is almost 2X faster still. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting a bug tracker for the Postgres project
On Mon, May 30, 2011 at 09:52:38PM -0400, Robert Haas wrote: On Mon, May 30, 2011 at 8:16 PM, Christopher Browne cbbro...@gmail.com wrote: On 2011-05-30 4:31 PM, Peter Eisentraut pete...@gmx.net wrote: On sön, 2011-05-29 at 18:36 -0400, Joe Abbate wrote: I've summarizes the main points made in the recent discussion and did some minor additional research on the lists suggested by Peter and Chris Browne. Anyone interested in the tracker, please visit http://wiki.postgresql.org/wiki/TrackerDiscussion and add your feedback/input. Based on that, and past discussions, and things we've tried in the past, and gut feeling, and so on, it looks like Request Tracker would appear to be the next best thing to consider trying out. What do people think about that? My suspicion is that RT may be rather a lot heavier weight in terms of how it would have to affect process than people would be happy with. What has been pretty clearly expressed is that various of the developers prefer for the mailing lists and archives thereof to be the primary data source and the venue for bug discussions. RT, and Bugzilla, and pretty well the bulk of the issue trackers out there are designed to themselves be the venue for discussions, and that's not consistent with the preference for email discussions. There are Debian packages for RT 3.8, and I imagine it may be worth tossing an instance, but I'd definitely commend trying to minimize the amount of deployment effort done, as I think there's a fair chance that a number of devs (I'll pick on Greg Stark :-)) are liable to rebel against it. It'd be interesting to see the reactions to the interaction between RT, -hackers, and -bugs for a bug or three... I'd be more optimistic that debbugs, or an adaption thereof, might more nearly fit into the workflow. Yeah, that's my feeling, as well. I have used RT and I found that the web interface was both difficult to use and unwieldly for tickets containing large numbers of messages. Maybe those those things have been improved, but frankly if RT or Bugzilla is the best we can come up with then I'd rather not have a bug tracker at all. See also: Linus's opinion on CVS. I don't personally care if I need to go to a web interface to mark bugs closed. Being able to do it via email is possibly useful, but I don't really care about it personally. Sounds like we should have it for the benefit of those who do, but it's not my priority. What I do care about is that the tracker doesn't get in the way of *discussion* of bugs. IOW, if people just reply-to-all on bug reports as they do now, and either include some special tag in the subject line or copy some special address on the CC list, it should all get sucked into the appropriate bug report. The number of people reading and replying to emails on pgsql-bugs is already insufficient, perhaps because of the (incorrect) perception that Tom does or will fix everything and no one else needs to care. So anything that makes it harder for people to follow along and participate is a non-starter IMV. Based on the discussion thus far, it sounds like debbugs might be reasonably close to what we need. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company We use RT here and it is very customizable. In particular, it is easy to have the basic process be completely via Email, if desired. It seems that the general opinion is to use Email and consolidate the information in the bug tracking system. RT can definitely step into the background as needed. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting a bug tracker for the Postgres project
On Tue, May 31, 2011 at 02:58:02PM +0200, Magnus Hagander wrote: On Tue, May 31, 2011 at 14:44, Andrew Dunstan and...@dunslane.net wrote: On 05/31/2011 06:41 AM, Magnus Hagander wrote: We already have a search system that works reasonably well for the archives... I trust this weas a piece of sarcasm. I spoke to more than a few people at pgcon and nobody had a good word to say about the search system on the archives. Well, it's tsearch. And I've heard nobody say anything else than that it's *a lot* better than what we had before. But sure, it can probably be improved. But what people are then basically asying is that tsearch isn't good enough for searching. Which is too bad, but may be so, and in that case we need to fix *that*, rather than build Yet Another Service To Do The Same Thing Slightly Differently. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ I do agree that the current archive search is much, much better than the searching before the upgrade. I would be interested in taking a look at some open source projects with a good search engine. Most projects have search engines that are true exercises in frustration by pulling either apparently everything or next to nothing and nothing in between. If there is a good one to look at maybe we can do some tweaking our search engine to improve it. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting a bug tracker for the Postgres project
On Tue, May 31, 2011 at 09:33:33AM -0400, Robert Haas wrote: On Tue, May 31, 2011 at 4:12 AM, Peter Eisentraut pete...@gmx.net wrote: On mån, 2011-05-30 at 21:52 -0400, Robert Haas wrote: I have used RT and I found that the web interface was both difficult to use and unwieldly for tickets containing large numbers of messages. Maybe those those things have been improved, but frankly if RT or Bugzilla is the best we can come up with then I'd rather not have a bug tracker at all. Given that you have been one of the people calling for a bug tracker, and these are the two most widely used systems available, what's wrong with them and what else would you suggest? IIRC, both of them think that you should log into the web interface to send emails (which, in the case of Bugzilla, don't permit replies), rather than sending emails that show up in the web interface. But the web interface is, at least in RT, also seems to be pretty rudimentary. If you use the commands-by-email with RT you can do most things with Email. Suppose you have a thread with 40 emails in it. View that thread in Gmail. Now view it in RT. In RT, you will notice that there's no way to unexpand emails, and all of the data is loaded with the page, so you sit there for half a minute waiting for everything to load. There's also no suppression of duplicated or quoted meterial, as Gmail does. It's usable, I guess, but it's a long way from state-of-the-art. You can adjust what RT will display in the interface and the latest release does include some enhanced duplicate/quoted material suppression. Note, I am not pushing for RT necessarily just trying to keep information available. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting a bug tracker for the Postgres project
On Tue, May 31, 2011 at 09:36:00AM -0400, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 05/31/2011 06:41 AM, Magnus Hagander wrote: We already have a search system that works reasonably well for the archives... I trust this weas a piece of sarcasm. I spoke to more than a few people at pgcon and nobody had a good word to say about the search system on the archives. Please note, though, that there is no bug tracker anywhere whose search mechanism doesn't suck as much or more. If you're unhappy with the search stuff the solution is to improve it, not bring in another bad mechanism. +1 Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
On Thu, Jun 02, 2011 at 02:58:52PM +0200, Pavel Stehule wrote: 2011/6/2 Peter Eisentraut pete...@gmx.net: On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote: I partialy implemented following missing LOBs types. Requirement for this was to give ability to create (B/C)LOB columns and add casting functionality e.g. SET my_clob = 'My long text'. Idea is as follow: 0. Blob is two state object: 1st in memory contains just bytea, serialized contains Oid of large object. 1. Each type has additional boolean haslobs, which is set recursivly. 2. Relation has same bool haslobs (used to speed up tables without LOBs) 3. When data are inserted/updated then special function is called and tuple is modified in this way all LOBs are serialized to (old) LOB table and just Oid is stored. 4. When removed LOB is removed from (old) LOB table. Superficially, this looks like a reimplementation of TOAST. What functionality exactly do you envision that the BLOB and CLOB types would need to have that would warrant treating them different from, say, bytea and text? a streaming for bytea could be nice. A very large bytea are limited by query size - processing long query needs too RAM, Pavel +1 for a streaming interface to bytea/text. I do agree that there is no need to reinvent the TOAST architecture with another name, just improve the existing implementation. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Full GUID support
On Tue, Jul 12, 2011 at 04:29:33PM -0400, Andrew Dunstan wrote: On 07/12/2011 03:44 PM, Joshua D. Drake wrote: What about extensions makes them less usable? It is an extra step, that is less usable. Does it matter? Shrug, I know I hate having to type apt-get just to use xyz, does it mean it is a big deal? Probably not. By that argument we wouldn't have any extensions at all, just a monolithic product. I don't think that would be an advance. cheers andrew For me, the criteria I like to use for core functionality are: 1. It is available with a common definition from a number of DB products. With a UUID, it's size/structure is predefined and this allows a dump from another SQL product to be loaded into a PostgreSQL DB. 2. It would benefit from the tighter integration with the core DB for either performance or development use. 3. It is a feature where the extra step is an unexpected nuisance. That is why I think having the UUID generators be a contrib module is the correct place for them to be, but the UUID type is better as a core function. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mysterious server crashes
On Fri, Jul 15, 2011 at 11:37:54PM +0200, Žiga Kranjec wrote: Hello! Recently we have upgraded our debian system (sid), which has since started crashing mysteriously. We are still looking into that. It runs on 3ware RAID. Postgres package is 8.4.8-2. The database came back up apparently ok, except for indexes. Running reindex produces this error on one of the tables: ERROR: unexpected chunk number 1 (expected 0) for toast value 17539760 in pg_toast_16992 Same with select. I tried running reindex on toast table didn't help. Running: select * from pg_toast.pg_toast_16992 where chunk_id = 17539760; crashed postgres backend (and apparently the whole server). Is there anything we can/should do to fix the problem, besides restoring the whole database from backup? Thanks! Ziga Hi Ziga, I do not want to be negative, but it sounds like your server is having serious problems completely outside of PostgreSQL. Reading a file should not cause your system to crash. That sounds like a driver or hardware problem and you need to fix that. I would make sure you have a good backup for your DB before you do anything else. Good luck, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduced power consumption in autovacuum launcher process
On Mon, Jul 18, 2011 at 03:12:20PM -0400, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 18.07.2011 18:32, Tom Lane wrote: Hmm. Well, it's not too late to rethink the WaitLatch API, if we think that that might be a significant limitation. Right, we can easily change the timeout argument to be in milliseconds instead of microseconds. On the whole I'd be more worried about giving up the shorter waits than the longer ones --- it's not too hard to imagine using submillisecond timeouts in the future, as machines get faster. If we really wanted to fix this, I think we need to move to a wider datatype. regards, tom lane You could also tag the high bit to allow you to encode larger ranges by having microseconds for the values with the high bit = 0 and use milliseconds for the values with the high bit = 1. Then you could have the best of both without punching up the width of the datatype. Regard, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQescapeByteaConn - returns wrong string for PG9.1 Beta3
On Wed, Aug 03, 2011 at 03:19:06PM +0200, Petro Meier wrote: Normal021false falsefalseDEX-NONEX-NONE MicrosoftInternetExplorer4 Please let me clarify the bug: CREATE TABLE testtable ( ID integer NOT NULL, BinaryContents bytea ); INSERT INTO testtable (ID, BinaryContents) values (1, E'\xea2abd8ef3'); returns invalid byte sequence. '\xea2abd8ef3' is the string delivered by the PG 9.1 Beta3 server when calling PQescapeByteaConn(). It cannot be further processed by the server itself afterwards! There is a leading '\' missing. When calling the function for a PG 9.0.1 server, then the result (correctly) is '\\xea2abd8ef3' (with a double-backslash!), and then the insert works fine, both, with PG9.1 Beta3 and PG9.0.1 It is a serious issue, as it will break all existing PostgreSQL applications that deal with binary contents and use PQescapeByteaConn(). Best regards Petro That looks correct for the new default for SQL conforming strings set to true in 9.1+. The command you should be using is: INSERT INTO testtable (ID, BinaryContents) values (1, '\xea2abd8ef3'); Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, Mar 04, 2013 at 01:00:09PM -0800, Jeff Davis wrote: On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote: If you're serious enough about your data that you want checksums, you should be able to choose your filesystem. I simply disagree. I am targeting my feature at casual users. They may not have a lot of data or a dedicated DBA, but the data they do have might be very important transactional data. And right now, if they take a backup of their data, it will contain all of the corruption from the original. And since corruption is silent today, then they would probably think the backup is fine, and may delete the previous good backups. +1 There is no reasonable availability of checksum capable filesystems across PostgreSQL's supported OSes. It really needs to be available in core. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Join cost estimates
On Thu, Apr 04, 2013 at 04:16:12PM -0400, Stephen Frost wrote: * Stephen Frost (sfr...@snowman.net) wrote: It does look like reducing bucket depth, as I outlined before through the use of a 2-level hashing system, might help speed up ExecScanHashBucket, as it would hopefully have very few (eg: 1-2) entries to consider instead of more. Along those same lines, I really wonder if we're being too generous wrt the bucket-depth goal of '10' instead of, say, '1', especially when we've got plenty of work_mem available. Rerunning using a minimally configured build (only --enable-openssl and --enable-debug passed to configure) with NTUP_PER_BUCKET set to '1' results in a couple of interesting things- First, the planner actually picks the plan to hash the small table and seqscan the big one. That also, finally, turns out to be *faster* for this test case. ... I'm certainly curious about those, but I'm also very interested in the possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable depending on the work_mem setting. It's only used in ExecChooseHashTableSize, so while making it variable or depending on work_mem could slow planning down a bit, it's not a per-tuple cost item. +1 for adjusting this based on work_mem value. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation epub format
On Wed, May 01, 2013 at 01:52:43PM -0400, Tom Lane wrote: Joshua D. Drake j...@commandprompt.com writes: Once upon a time we had multiple books as documentation, then at some point we merged them. It was quite a few years ago. I would agree at this point that we need to consider breaking them up again. The documentation is unwieldy. The reason we merged them was to allow hyperlink cross-references between different parts of the docs. I would be sad to lose that. regards, tom lane Yes, please keep that feature! Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] corrupt pages detected by enabling checksums
On Sun, May 12, 2013 at 03:46:00PM -0500, Jim Nasby wrote: On 5/10/13 1:06 PM, Jeff Janes wrote: Of course the paranoid DBA could turn off restart_after_crash and do a manual investigation on every crash, but in that case the database would refuse to restart even in the case where it perfectly clear that all the following WAL belongs to the recycled file and not the current file. Perhaps we should also allow for zeroing out WAL files before reuse (or just disable reuse). I know there's a performance hit there, but the reuse idea happened before we had bgWriter. Theoretically the overhead creating a new file would always fall to bgWriter and therefore not be a big deal. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net Unless something has changed dramtically, creating new files is a LOT more overhead than reusing existing files. My two cents. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] corrupt pages detected by enabling checksums
On Sun, May 12, 2013 at 07:41:26PM -0500, Jon Nelson wrote: On Sun, May 12, 2013 at 3:46 PM, Jim Nasby j...@nasby.net wrote: On 5/10/13 1:06 PM, Jeff Janes wrote: Of course the paranoid DBA could turn off restart_after_crash and do a manual investigation on every crash, but in that case the database would refuse to restart even in the case where it perfectly clear that all the following WAL belongs to the recycled file and not the current file. Perhaps we should also allow for zeroing out WAL files before reuse (or just disable reuse). I know there's a performance hit there, but the reuse idea happened before we had bgWriter. Theoretically the overhead creating a new file would always fall to bgWriter and therefore not be a big deal. For filesystems like btrfs, re-using a WAL file is suboptimal to simply creating a new one and removing the old one when it's no longer required. Using fallocate (or posix_fallocate) (I have a patch for that!) to create a new one is - by my tests - 28 times faster than the currently-used method. -- Jon What about for less cutting (bleeding) edge filesystems? Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF-8 encoding problem w/ libpq
On Mon, Jun 03, 2013 at 03:40:14PM +0100, Martin Schäfer wrote: I try to create database columns with umlauts, using the UTF8 client encoding. However, the server seems to mess up the column names. In particular, it seems to perform a lowercase operation on each byte of the UTF-8 multi-byte sequence. Here is my code: const wchar_t *strName = Lid_äß; wstring strCreate = wstring(Lcreate table test_umlaut() + strName + L integer primary key); PGconn *pConn = PQsetdbLogin(, , NULL, NULL, dev503, postgres, **); if (!pConn) FAIL; if (PQsetClientEncoding(pConn, UTF-8)) FAIL; PGresult *pResult = PQexec(pConn, drop table test_umlaut); if (pResult) PQclear(pResult); pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str()); if (pResult) PQclear(pResult); pResult = PQexec(pConn, select * from test_umlaut); if (!pResult) FAIL; if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL; if (PQnfields(pResult)!=1) FAIL; const char *fName = PQfname(pResult,0); ShowW(Name: , strName); ShowA(in UTF8: , ToUtf8(strName).c_str()); ShowA(from DB: , fName); ShowW(in UTF16: , ToWide(fName).c_str()); PQclear(pResult); PQreset(pConn); (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.) And this is the output generated: Name: id_äß in UTF8: id_äß from DB: id_ã¤ãÿ in UTF16: id_??? It seems like the backend thinks the name is in ANSI encoding, not in UTF-8. If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the original name is already in lowercase, so I think it should also work without quoting the column name. Am I missing some setup in either the database or in the use of libpq? I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit The database uses: ENCODING = 'UTF8' LC_COLLATE = 'English_United Kingdom.1252' LC_CTYPE = 'English_United Kingdom.1252' Thanks for any help, Martin Hi Martin, If you do not want the lowercase behavior, you must put double-quotes around the column name per the documentation: http://www.postgresql.org/docs/9.2/interactive/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS section 4.1.1. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF-8 encoding problem w/ libpq
On Mon, Jun 03, 2013 at 04:09:29PM +0100, Martin Schäfer wrote: If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the original name is already in lowercase, so I think it should also work without quoting the column name. Am I missing some setup in either the database or in the use of libpq? I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit The database uses: ENCODING = 'UTF8' LC_COLLATE = 'English_United Kingdom.1252' LC_CTYPE = 'English_United Kingdom.1252' Thanks for any help, Martin Hi Martin, If you do not want the lowercase behavior, you must put double-quotes around the column name per the documentation: http://www.postgresql.org/docs/9.2/interactive/sql-syntax- lexical.html#SQL-SYNTAX-IDENTIFIERS section 4.1.1. Regards, Ken The original name 'id_äß' is already in lowercase. The backend should leave it unchanged IMO. Regards, Martin Only in utf-8 which needs to be double-quoted for a column name as you have seen, otherwise the value will be lowercased per byte. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sha1, sha2 functions into core?
On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote: On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote: Hmm, this thread seems to have petered out without a conclusion. Just wanted to comment that there _are_ non-password storage uses for these digests: I use them in a context of storing large files in a bytea column, as a means to doing data deduplication, and avoiding pushing files from clients to server and back. But I suppose you don't need the hash function in the database system for that. It is very useful to have the same hash function used internally by PostgreSQL exposed externally. I know you can get the code and add an equivalent one of your own... Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sha1, sha2 functions into core?
On Fri, Sep 02, 2011 at 04:27:46PM -0500, Ross J. Reedstrom wrote: On Fri, Sep 02, 2011 at 02:05:45PM -0500, k...@rice.edu wrote: On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote: On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote: Hmm, this thread seems to have petered out without a conclusion. Just wanted to comment that there _are_ non-password storage uses for these digests: I use them in a context of storing large files in a bytea column, as a means to doing data deduplication, and avoiding pushing files from clients to server and back. But I suppose you don't need the hash function in the database system for that. It is very useful to have the same hash function used internally by PostgreSQL exposed externally. I know you can get the code and add an equivalent one of your own... Thanks for the support Ken, but Peter's right: the only backend use in my particular case is to let the backend do the hash calc during bulk loads: in the production code path, having the hash in two places doesn't save any work, since the client code has to calculate the hash in order to test for its existence in the backend. I suppose if the network cost was negligable, I could just push the files anyway, and have a before-insert trigger calculate the hash and do the dedup: then it'd be hidden in the backend completely. But as is, I can do all the work in the client. While it is true that it doesn't save any work. My motivation for having it exposed is that good hash functions are non-trivial to find. I have dealt with computational artifacts produced by hash functions that seemed at first to be good. We use a very well behaved function within the data- base and exposing it will help prevent bad user hash function implementations. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to improve reliability of postgresql on linux nfs
On Mon, Sep 12, 2011 at 04:46:53PM +1000, George Barnett wrote: On 12/09/2011, at 3:59 PM, Florian Pflug wrote: If you really meant to say intr there (and not nointr) then that probably explains the partial writes. Still, I agree with Noah and Kevin that we ought to deal more gracefully with this, i.e. resubmit after a partial read() or write(). AFAICS there's nothing to be gained by not doing that, and the increase in code complexity should be negligible. If we do that, however, I believe we might as well handle EINTR correctly, even if SA_RESTART should prevent us from ever seeing that. Hi Florian, You are indeed correct. Setting nointr also resolves my issue. I could swear I checked this, but obviously not. It does still concern me that pgsql did not deal with this as gracefully as other software. I hope the list will consider a patch to resolve that. Thanks in advance, George Hi George, Many, many, many other software packages expect I/O usage to be the same on an NFS volume and a local disk volume, including Oracle. Coding every application, or more likely mis-coding, to handle this gives every application another chance to get it wrong. If the OS does this, when it gets it right, all of the apps get it right. I think you should be surprised when other software actually deals with broken I/O semantics gracefully rather than concerned when one of a pantheon of programs does not. My two cents. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to improve reliability of postgresql on linux nfs
On Tue, Sep 13, 2011 at 01:30:34PM +0200, Florian Pflug wrote: On Sep13, 2011, at 13:07 , Florian Pflug wrote: Here's my suggested implementation for pg_write_nointr. pg_read_nointr should be similar (but obviously without the ENOSPC handling) wrong pg_write_nointr implementation snipped Sorry for the self-reply. I realized only after hitting send that I got the ENOSPC handling wrong again - we probably ought to check for ENOSPC as well as ret == 0. Also, it seems preferable to return the number of bytes actually written instead of -1 if we hit an error during retry. With this version, any return value other than amount signals an error, the number of actually written bytes is reported even in the case of an error (to the best of pg_write_nointr's knowledge), and errno always indicates the kind of error. int pg_write_nointr(int fd, const void *bytes, Size amount) { int written = 0; while (amount 0) { int ret; ret = write(fd, bytes, amount); if ((ret 0) (errno == EINTR)) { /* interrupted by signal before first byte was written. Retry */ /* XXX: Is it safe to call CHECK_FOR_INTERRUPTS here? */ CHECK_FOR_INTERRUPTS(); continue; } else if (ret 1) { /* error occurred. Abort */ if (ret == 0) /* out of disk space */ errno = ENOSPC; if (written == 0) return -1; else return written; } /* made progress */ written += ret; amount -= ret; bytes = (const char *) bytes + ret; /* XXX: Is it safe to call CHECK_FOR_INTERRUPTS here? */ CHECK_FOR_INTERRUPTS(); } } best regards, Florian Pflug It will be interesting to see if there are any performance ramifications to this new write function. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to improve reliability of postgresql on linux nfs
On Tue, Sep 13, 2011 at 03:02:57PM +0200, Florian Pflug wrote: On Sep13, 2011, at 14:58 , k...@rice.edu wrote: It will be interesting to see if there are any performance ramifications to this new write function. What would those be? For non-interruptible reads and writes, the overhead comes down to an additional function call (if we don't make pg_write_nointr inlined) and a few conditional jumps (which branch prediction should be able to take care of). These are bound to disappear in the noise compared to the cost of the actual syscall. best regards, Florian Pflug That would be my expectation too. It is just always nice to benchmark changes, just in case. I have had similar simple changes blow out a cache and have a much greater impact on performance than might be expected from inspection. :) Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disable OpenSSL compression
On Tue, Nov 08, 2011 at 04:59:02PM +0100, Albe Laurenz wrote: Tom Lane wrote: There might be some argument for providing a client option to disable compression, but it should not be forced, and it shouldn't even be the default. But before adding YA connection option, I'd want to see some evidence that it's useful over non-local connections. Here are numbers from a test via LAN. The client machine has OpenSSL 0.9.8e, the server OpenSSL 1.0.0. The client command run was echo 'select ...' | time psql host=... -o /dev/null and \timing was turned on in .psqlrc In addition to the oprofile data I collected three times: - the duration as shown in the server log - the duration as shown by \timing - the duration of the psql command as measured by time Without patch: duration: 5730.996 ms (log), 5975.093 ms (\timing), 22.87 s (time) samples %image name symbol name 4428 80.2029 libz.so.1.2.3/lib64/libz.so.1.2.3 559 10.1250 postgres hex_encode 361 6.5387 libcrypto.so.1.0.0 /usr/lib64/libcrypto.so.1.0.0 831.5034 libc-2.12.so memcpy With patch: duration: 3001.009 ms (log), 3243.690 ms (\timing), 20.27 s (time) samples %image name symbol name 1072 58.0401 libcrypto.so.1.0.0 /usr/lib64/libcrypto.so.1.0.0 587 31.7813 postgres hex_encode 105 5.6849 libc-2.12.so memcpy I think this makes a good case for disabling compression. Yours, Laurenz Albe Certainly a good case for providing the option to disable compression. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disable OpenSSL compression
On Tue, Nov 08, 2011 at 04:19:02PM +0100, Albe Laurenz wrote: Tom Lane wrote: I distinctly recall us getting bashed a few years ago because there wasn't any convenient way to turn SSL compression *on*. Now that SSL finally does the sane thing by default, you want to turn it off? The fact of the matter is that in most situations where you want SSL, ie links across insecure WANs, compression is a win. Testing a local connection, as you seem to have done, is just about 100% irrelevant to performance in the real world. Maybe that's paranoia, but we use SSL via the company's LAN to keep potentially sensitive data from crossing the network unencrypted. There might be some argument for providing a client option to disable compression, but it should not be forced, and it shouldn't even be the default. But before adding YA connection option, I'd want to see some evidence that it's useful over non-local connections. I will try to provide test results via remote connection; I thought that localhost was a good enough simulation for a situation where you are not network bound. I agree with you that a client option would make more sense. The big problem I personally have with that is that it only works if you use libpq. When using the JDBC driver or Npgsql, a client option wouldn't help me at all. Yours, Laurenz Albe I think that JDBC and Npgsql should also support disabling compression. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimize postgres protocol for fixed size arrays
On Tue, Nov 22, 2011 at 11:47:22PM +0200, Mikko Tiihonen wrote: Hi, During conversion of the jdbc driver to use binary encoding when receiving array objects from postgres it was noticed that for example for int[] arrays the binary encoding is normally 30% to 200% larger in bytes than the text encoding. This was of concern to some users with slower links to database. Even though the encoded size was larger the binary encoding was still constantly faster (with 50% speed up for float[]). Here is a patch that adds a new flag to the protocol that is set when all elements of the array are of same fixed size. When the bit is set the 4 byte length is only sent once and not for each element. Another restriction is that the flag can only be set when there are no NULLs in the array. Cool. This would be very useful with the DSPAM binary array driver. Although the binary is shorter because the values are 8 byte integers, they would be much shorter without the redundant sizing information. Barring issues: +1 Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote: On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote: Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... Presumably the thought was that you would turn it off to restore that existed in older versions of PostgreSQL. Doesn't seem much different from default_with_oids or lo_compat_privileges. Seems very different to me - those change *what* happens when you do certain things. sync_seqscans is just a performance tuning option, no? It doesn't actually change the semantics of any operations... In a query without enforced orders, the returned rows will come out in a possibly different order each time the query runs. I know it is bad coding to depend on things like that, but it is out there... So in those cases it is not just semantics. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
On Tue, Dec 20, 2011 at 02:54:32PM +0100, Magnus Hagander wrote: On Tue, Dec 20, 2011 at 14:47, k...@rice.edu k...@rice.edu wrote: On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote: On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote: Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... Presumably the thought was that you would turn it off to restore that existed in older versions of PostgreSQL. Doesn't seem much different from default_with_oids or lo_compat_privileges. Seems very different to me - those change *what* happens when you do certain things. sync_seqscans is just a performance tuning option, no? It doesn't actually change the semantics of any operations... In a query without enforced orders, the returned rows will come out in a possibly different order each time the query runs. I know it is bad coding to depend on things like that, but it is out there... So in those cases it is not just semantics. Yes, but they may also come out in a different order if you run the same query again 5 minutes later... If the sequential scans always start at the beginning of the table, which was true before the sync-ed scans ability, the order is basically fixed for a large set of queries if you do not modify the data. With sync-ed scans, every repetition of the query will depend on where the scan starts in the data set. At least that is what I remember happening during the original testing of that feature, which is a great feature. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changing the continuation-line prompt in psql?
On Fri, Apr 29, 2011 at 02:10:19PM -0400, Stephen Frost wrote: * Stephen Frost (sfr...@snowman.net) wrote: Uhm.. With the above, perhaps --%Z+, which would generate: postgres= -- + yah, obviously not going to work. :) However, it wouldn't be impossible to have psql recognize and strip --spaces+ if/when it's seen starting a new line, if we set it up such that it's always the same.. Could be optional anyway, perhaps configurable as a regexp. Anyhow, just some thoughts. Thanks, Stephen +1 to have psql strip a configurable beginning of line sequence. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On Wed, Jun 26, 2013 at 03:47:43PM +0200, Markus Wanner wrote: On 06/25/2013 11:52 PM, Kevin Grittner wrote: At least until we have parallel query execution. At *that* point this all changes. Can you elaborate on that, please? I currently have a hard time imagining how partitions can help performance in that case, either. At least compared to modern RAID and read-ahead capabilities. After all, RAID can be thought of as hash partitioning with a very weird hash function. Or maybe rather range partitioning on an internal key. Put another way: ideally, the system should take care of optimally distributing data across its physical storage itself. If you need to do partitioning manually for performance reasons, that's actually a deficiency of it, not a feature. I certainly agree that manageability may be a perfectly valid reason to partition your data. Maybe there even exist other good reasons. I don't think performance optimization is one. (It's more like giving the system a hint. And we all dislike hints, don't we? *ducks*) Regards Markus Wanner Hi Markus, I think he is referring to the fact that with parallel query execution, multiple partitions can be processed simultaneously instead of serially as they are now with the resulting speed increase. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX checking of index constraints
On Sun, Jul 21, 2013 at 11:30:54AM -0700, Josh Berkus wrote: Noah, Attached patch just restores the old behavior. Would it be worth preserving the ability to fix an index consistency problem with a REINDEX independent from related heap consistency problems such as duplicate keys? I would love to have two versions of REINDEX, one which validated and one which didn't. Maybe a ( validate off ) type check? +1 There are reasons to reindex that do not involve its validity and it would be great to not need to visit the heap for that. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Thu, Aug 29, 2013 at 04:14:53PM +0200, Kohei KaiGai wrote: 2013/8/29 Alexander Korotkov aekorot...@gmail.com: On Wed, Aug 28, 2013 at 4:17 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2013/8/28 Oleg Bartunov obartu...@gmail.com: btw, there is serious problem with row-level security and constraints. For example, user with low security level could use unique constraint to know about existence of a row with higher security. I don't know, what is the best practice to avoid this. ... A principle of this row-level security feature is, it prohibits to leak invisible datum itself, but might allow users to expect existence of records with a particular value. In fact, we never push down function that may leak the given argument, that does not have leakproof attribute, even if it can be utilized for index-scan. My opinion is, we should deal with it is a limitation of this feature, as long as it does not expose the raw data to be hidden. Estimation takes time to carry out much hidden data via covert channel, thus traditional secure operating system specification with MAC implementation says its degree of threat is not significant as long as bandwidth of covert channel is not so much. I think it is a reasonable standpoint. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp Okay, given that argument, how would you monitor such attempts to access data through the covert channel and shut it down? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)
On Fri, Aug 30, 2013 at 03:22:37AM +0300, Ants Aasma wrote: On Fri, Aug 30, 2013 at 3:02 AM, Andres Freund and...@2ndquadrant.com wrote: I am not sure hot cache large buffer performance is really the interesting case. Most of the XLogInsert()s are pretty small in the common workloads. I vaguely recall trying 8 and getting worse performance on many workloads, but that might have been a problem of my implementation. Slice-by-8 doesn't have any overhead for small buffers besides the lookup tables, so it most likely the cache misses that were the issue. Murmur3, CityHash and SpookyHash don't have any lookup tables and are excellent with small keys. Especially CityHash, 64 byte hash is quoted at 9ns. The reason I'd like to go for a faster CRC32 implementation as a first step is that it's easy. Easy to verify, easy to analyze, easy to backout. I personally don't have enough interest/time in the 9.4 cycle to purse conversion to a different algorithm (I find the idea of using different ones on 32/64bit pretty bad), but I obviously won't stop somebody else ;) I might give it a shot later this cycle as I have familiarized my self with the problem domain anyway. I understand the appeal of staying with what we have, but this would cap the speedup at 4x and has large caveats with the extra lookup tables. A 28x speedup might be worth the extra effort. Regards, Ants Aasma You may want to also check out xxhash with a BSD License and very fast 32-bit performance as well: http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html http://code.google.com/p/xxhash/ FWIW I agree that a much faster function would be better for CPU overhead. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On Thu, Sep 05, 2013 at 08:47:32AM -0400, Peter Eisentraut wrote: Can we consider getting rid of the SQL_ASCII server-side encoding? I don't see any good use for it, and it's often a support annoyance, and it leaves warts all over the code. This would presumably be a multi-release effort. As a first step in accommodating users who have existing SQL_ASCII databases, we could change SQL_ASCII into a real encoding with conversion routines to all other encodings that only convert 7-bit ASCII characters. That way, users who use SQL_ASCII as real ASCII or don't care could continue to use it. Others would be forced to either set SQL_ASCII as the client encoding or adjust the encoding on the server. On the client side, the default libpq client encoding SQL_ASCII would be renamed to something like SAME or whatever, so the behavior would stay the same. Other ideas? Are there legitimate uses for SQL_ASCII? Hi Peter, Yes, we have processes that insert data from a large number of locales into the same database and we need to process the information in a locale agnostic way, just a a range of bytes. Not to mention how much faster it can be. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On Thu, Sep 05, 2013 at 09:42:17AM -0700, Josh Berkus wrote: Peter, Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. +++1 :) Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get rid of SQL_ASCII?
On Thu, Sep 05, 2013 at 09:53:18AM -0700, Joshua D. Drake wrote: On 09/05/2013 09:42 AM, Josh Berkus wrote: Peter, Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. iconv? Yes, you can use iconv but then you have to check that it generated values that do not break your system including the application logic. That can prove a major stumbling block to changing DBs. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote: 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2]. For fun, try not hashing those ints at all and see how that performs (that, I think, is what you get from HashSetint in Java/C#). I've used crc32 mostly because it's easily available in the code (i.e. I'm lazy), but I've done some quick research / primitive benchmarking too. For example hashing 2e9 integers takes this much time: FNV-1 = 11.9 FNV-1a = 11.9 jenkins = 38.8 crc32 = 32.0 So it's not really slow and the uniformity seems to be rather good. I'll try FNV in the future, however I don't think that's the main issue right now. Hi Tomas, If you are going to use a function that is not currently in the code, please consider xxhash: http://code.google.com/p/xxhash/ Here are some benchmarks for some of the faster hash functions: NameSpeed Q.Score Author xxHash 5.4 GB/s 10 MumurHash 3a2.7 GB/s 10 Austin Appleby SpookyHash 2.0 GB/s 10 Bob Jenkins SBox1.4 GB/s 9 Bret Mulvey Lookup3 1.2 GB/s 9 Bob Jenkins CityHash64 1.05 GB/s10 Pike Alakuijala FNV 0.55 GB/s 5 Fowler, Noll, Vo CRC32 0.43 GB/s 9 MD5-32 0.33 GB/s10 Ronald L. Rivest SHA1-32 0.28 GB/s10 Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Tue, Oct 15, 2013 at 03:11:22PM +0900, KONDO Mitsumasa wrote: (2013/10/15 13:33), Amit Kapila wrote: Snappy is good mainly for un-compressible data, see the link below: http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4yebmoa...@mail.gmail.com This result was gotten in ARM architecture, it is not general CPU. Please see detail document. http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9 I found compression algorithm test in HBase. I don't read detail, but it indicates snnapy algorithm gets best performance. http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of In fact, most of modern NoSQL storages use snappy. Because it has good performance and good licence(BSD license). I think it is bit difficult to prove that any one algorithm is best for all kind of loads. I think it is necessary to make best efforts in community than I do the best choice with strict test. Regards, -- Mitsumasa KONDO NTT Open Source Software Center Google's lz4 is also a very nice algorithm with 33% better compression performance than snappy and 2X the decompression performance in some benchmarks also with a bsd license: https://code.google.com/p/lz4/ Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On Tue, Oct 15, 2013 at 11:02:39AM -0400, Robert Haas wrote: goals may be in conflict; we'll have to pick something. Note that parsing COPYs is a major PITA from most languages... Perhaps we should make the default output json instead? With every action terminated by a nullbyte? That's probably easier to parse from various scripting languages than anything else. I could go for that. It's not quite as compact as I might hope, but JSON does seem to make people awfully happy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Feeding such a JSON stream into a compression algorithm like lz4 or snappy should result in a pretty compact stream. The latest lz4 updates also have ability to use a pre-existing dictionary which would really help remove the redundant pieces. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Wed, Oct 16, 2013 at 01:42:34PM +0900, KONDO Mitsumasa wrote: (2013/10/15 22:01), k...@rice.edu wrote: Google's lz4 is also a very nice algorithm with 33% better compression performance than snappy and 2X the decompression performance in some benchmarks also with a bsd license: https://code.google.com/p/lz4/ If we judge only performance, we will select lz4. However, we should think another important factor which is software robustness, achievement, bug fix history, and etc... If we see unknown bugs, can we fix it or improve algorithm? It seems very difficult, because we only use it and don't understand algorihtms. Therefore, I think that we had better to select robust and having more user software. Regards, -- Mitsumasa KONDO NTT Open Source Software Hi, Those are all very good points. lz4 however is being used by Hadoop. It is implemented natively in the Linux 3.11 kernel and the BSD version of the ZFS filesystem supports the lz4 algorithm for on-the-fly compression. With more and more CPU cores available in modern system, using an algorithm with very fast decompression speeds can make storing data, even in memory, in a compressed form can reduce space requirements in exchange for a higher CPU cycle cost. The ability to make those sorts of trade-offs can really benefit from a plug-able compression algorithm interface. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Thu, Oct 24, 2013 at 10:28:43AM +0530, Amit Kapila wrote: On Thu, Oct 24, 2013 at 4:58 AM, Thomas Munro mu...@ip9.org wrote: Hi I noticed that CLUSTER doesn't have a FREEZE option. Here is a patch to add that, for consistency with VACUUM. Is it useful? I wonder why anyone would like to freeze during CLUSTER command when they already have separate way (VACUUM FREEZE) to achieve it, do you know or can think of any case where user wants to do it along with Cluster command? Anyway code side, I think you need to set both feeze_min_age as well as freeze_table_age, see VACUUM command in gram.y CLUSTER opt_freeze opt_verbose qualified_name cluster_index_specification { ClusterStmt *n = makeNode(ClusterStmt); - n-relation = $3; - n-indexname = $4; - n-verbose = $2; + n-relation = $4; + n-freeze_min_age = $2 ? 0 : -1; + n-indexname = $5; + n-verbose = $3; .. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com Hi Amit, If the FREEZE is part of the CLUSTER, you would only read/write the table once. With a follow-up VACUUM FREEZE, you would re-read/write a second time. I, for one, would appreciate being able to perform both in the same run. (+1) Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote: On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao masao.fu...@gmail.com wrote: So, our consensus is to introduce the hooks for FPW compression so that users can freely select their own best compression algorithm? Also, probably we need to implement at least one compression contrib module using that hook, maybe it's based on pglz or snappy. I don't favor making this pluggable. I think we should pick snappy or lz4 (or something else), put it in the tree, and use it. Hi, My vote would be for lz4 since it has faster single thread compression and decompression speeds with the decompression speed being almost 2X snappy's decompression speed. The both are BSD licensed so that is not an issue. The base code for lz4 is c and it is c++ for snappy. There is also a HC (high-compression) varient for lz4 that pushes its compression rate to about the same as zlib (-1) which uses the same decompressor which can provide data even faster due to better compression. Some more real world tests would be useful, which is really where being pluggable would help. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Thu, Oct 24, 2013 at 12:22:59PM -0400, Robert Haas wrote: On Thu, Oct 24, 2013 at 11:40 AM, k...@rice.edu k...@rice.edu wrote: On Thu, Oct 24, 2013 at 11:07:38AM -0400, Robert Haas wrote: On Mon, Oct 21, 2013 at 11:52 PM, Fujii Masao masao.fu...@gmail.com wrote: So, our consensus is to introduce the hooks for FPW compression so that users can freely select their own best compression algorithm? Also, probably we need to implement at least one compression contrib module using that hook, maybe it's based on pglz or snappy. I don't favor making this pluggable. I think we should pick snappy or lz4 (or something else), put it in the tree, and use it. Hi, My vote would be for lz4 since it has faster single thread compression and decompression speeds with the decompression speed being almost 2X snappy's decompression speed. The both are BSD licensed so that is not an issue. The base code for lz4 is c and it is c++ for snappy. There is also a HC (high-compression) varient for lz4 that pushes its compression rate to about the same as zlib (-1) which uses the same decompressor which can provide data even faster due to better compression. Some more real world tests would be useful, which is really where being pluggable would help. Well, it's probably a good idea for us to test, during the development cycle, which algorithm works better for WAL compression, and then use that one. Once we make that decision, I don't see that there are many circumstances in which a user would care to override it. Now if we find that there ARE reasons for users to prefer different algorithms in different situations, that would be a good reason to make it configurable (or even pluggable). But if we find that no such reasons exist, then we're better off avoiding burdening users with the need to configure a setting that has only one sensible value. It seems fairly clear from previous discussions on this mailing list that snappy and lz4 are the top contenders for the position of compression algorithm favored by PostgreSQL. I am wondering, though, whether it wouldn't be better to add support for both - say we added both to libpgcommon, and perhaps we could consider moving pglz there as well. That would allow easy access to all of those algorithms from both front-end and backend-code. If we can make the APIs parallel, it should very simple to modify any code we add now to use a different algorithm than the one initially chosen if in the future we add algorithms to or remove algorithms from the list, or if one algorithm is shown to outperform another in some particular context. I think we'll do well to isolate the question of adding support for these algorithms form the current patch or any other particular patch that may be on the table, and FWIW, I think having two leading contenders and adding support for both may have a variety of advantages over crowning a single victor. +++1 Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
On Mon, Oct 28, 2013 at 05:48:55PM +0100, Andres Freund wrote: On 2013-10-28 12:42:28 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Oct 28, 2013 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: The idea I'm thinking about at the moment is that toast tokens of this sort might each contain a function pointer to the required flattening function. This might be OK, but it bloats the in-memory representation. For small data types like numeric that might well be significant. Meh. If you don't include a function pointer you will still need the OID of the datatype or the decompression function, so it's not like omitting it is free. That's what I thought at first too - but I am not sure it's actually true. The reason we need to include the toastrelid in varatt_externals (which I guess you are thinking of, like me) is that we need to be able to resolve naked Datums to their original value without any context. But at the locations where we'd need to call the memory representation-disk conversion function we should have a TupleDesc with type information, so we could lookup the needed information there. In any case, the design target here is for data values that are going to be quite large, so an extra 4 bytes or whatever in the reference object doesn't really seem to me to be something to stress over. I'd actually be happy if we can get this to work for numeric as well - I have seen several workloads where that's a bottleneck. Not that I am sure that the 8bytes for a pointer would be the problem there (in contrast to additional typecache lookups). Greetings, Andres Freund With the type information available, you could have a single lookup table per backend with the function pointer so the space would be negligible amortized over all of the datums of each type. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Oct 29, 2013 at 02:53:37PM +, Leonardo Francalanci wrote: Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Nobody's bothered to add WAL support, let alone do any other real work on them. The non-btree index types that have been getting love are the ones that offer the ability to index queries that btree can't. I think a new index type whose only benefit is the claim to be faster in a narrow use-case is likely to end up like hash, not getting used enough to be properly maintained. regards, tom lane Aren't hash indexes in a poor state because they are not faster than btree in every condition? Hi Leonardo, If there was ONE perfect index, better in every condition, postgres would be using it. As in everything else, each type has its strengths and weaknesses. The hash index allows equality searches for very large key lengths using a relatively very small index size. As has been mentioned before, we still do not have WAL logging for hash indexes. But even so, for I/O bound systems hash indexes are twice as fast for searches than the btree equivalent. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Save Hash Indexes
On Fri, Nov 01, 2013 at 01:31:10PM +, Dimitri Fontaine wrote: Hi, Here's an idea: when a user ask for an Hash Index transparently build a BTree index over an hash function instead. Advantages: - it works - it's crash safe - it's (much?) faster than a hash index anyways Drawbacks: - root access concurrency - we need a hash_any function stable against pg_upgrade After talking about it with Heikki, we don't seem to find ways in which the root access concurrency pattern would be visible enough to matter. Also, talking with Peter Geoghegan, it's unclear that there's a use case where a hash index would be faster than a btree index over the hash function. Comments? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support Hi Dimitri, This use of a function index as a safe hash index has been the substitute for a while. Check the previous threads. It is not a true substitute because a hash index is O(1) for lookups but a btree is O(log n) so hash indexes have an advantage for very large numbers on entries. In fact a recent post compared both the btree substitute and the current hash index and for large indexes the hash allowed 2X the lookups than the equivalent btree, which is what you would expect. The use-case is exactly for very large tables/indexes where the index does not fit in memory, to say nothing of the data itself. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote: I believe this was a danger we recognized when we added the JSON type, including the possibility that a future binary type might need to be a separate type due to compatibility issues. The only sad thing is the naming; it would be better for the new type to carry the JSON name in the future, but there's no way to make that work that I can think of. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com What about a GUC for json version? Then you could choose and they could both be call json. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote: Hello! I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL: Documentation: http://www.garret.ru/imcs/user_guide.html Sources: http://www.garret.ru/imcs-1.01.tar.gz Any feedbacks, bug reports and suggestions are welcome. Vertical representation of data is stored in PostgreSQL shared memory. This is why it is important to be able to utilize all available physical memory. Hi, This is very neat! The question I have, which applies to the matview support as well, is How can we transparently substitute usage of the in-memory columnar store/matview in a SQL query?. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] open and close columns in the NEW record not allowed
On Thu, Feb 06, 2014 at 04:21:41PM +0100, Rafael Martinez Guerrero wrote: On Thu, 2014-02-06 at 07:11 -0800, Adrian Klaver wrote: On 02/06/2014 06:35 AM, Rafael Martinez Guerrero wrote: We think the behavior should be consistent, either it is allow to use them or not, but not like it is today. As a general rule, if you get spurious parser errors for commands that contain any of the listed key words as an identifier you should try to quote the identifier to see if the problem goes away. Which indeed solves the problem on my end at least: Hello Thanks for the feedback. Our problem is that an application decides the name of the columns in the tables and XDB replication from EnterpriseDB decides the triggers. We have no control over the code :-( regards, -- Rafael Martinez Guerrero Center for Information Technology Services University of Oslo, Norway Hi Rafael, It sounds like a bug in the XDB trigger generation code. Maybe file a bug report. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 06, 2014 at 06:14:21PM -0500, Robert Haas wrote: On Thu, Mar 6, 2014 at 3:44 PM, Jeff Janes jeff.ja...@gmail.com wrote: I've been tempted to implement a new type of hash index that allows both WAL and high concurrency, simply by disallowing bucket splits. At the index creation time you use a storage parameter to specify the number of buckets, and that is that. If you mis-planned, build a new index with more buckets, possibly concurrently, and drop the too-small one. Yeah, we could certainly do something like that. It sort of sucks, though. I mean, it's probably pretty easy to know that starting with the default 2 buckets is not going to be enough; most people will at least be smart enough to start with, say, 1024. But are you going to know whether you need 32768 or 1048576 or 33554432? A lot of people won't, and we have more than enough reasons for performance to degrade over time as it is. It would be useful to have a storage parameter for the target size of the index, even if it is not exact, to use in the initial index build to avoid the flurry of i/o caused by bucket splits as the index grows. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding unsigned 256 bit integers
On Thu, Apr 10, 2014 at 09:13:47PM +0800, Olivier Lalonde wrote: I was wondering if there would be any way to do the following in PostgreSQL: UPDATE cryptotable SET work = work + 'some big hexadecimal number' where work is an unsigned 256 bit integer. Right now my column is a character varying(64) column (hexadecimal representation of the number) but I would be happy to switch to another data type if it lets me do the operation above. If it's not possible with vanilla PostgreSQL, are there extensions that could help me? -- - Oli Olivier Lalonde http://www.syskall.com -- connect with me! Hi Olivier, Here are some sample pl/pgsql helper functions that I have written for other purposes. They use integers but can be adapted to use numeric. Regards, Ken --- CREATE OR REPLACE FUNCTION hex2dec(t text) RETURNS integer AS $$ DECLARE r RECORD; BEGIN FOR r IN EXECUTE 'SELECT x'''||t||'''::integer AS hex' LOOP RETURN r.hex; END LOOP; END $$ LANGUAGE plpgsql IMMUTABLE STRICT; --- --- CREATE OR REPLACE FUNCTION bytea2int ( in_string BYTEA ) RETURNS INTEGER AS $$ DECLARE b1 INTEGER := 0; b2 INTEGER := 0; b3 INTEGER := 0; b4 INTEGER := 0; out_int INTEGER := 0; BEGIN CASE OCTET_LENGTH(in_string) WHEN 1 THEN b4 := get_byte(in_string, 0); WHEN 2 THEN b3 := get_byte(in_string, 0); b4 := get_byte(in_string, 1); WHEN 3 THEN b2 := get_byte(in_string, 0); b3 := get_byte(in_string, 1); b4 := get_byte(in_string, 2); WHEN 4 THEN b1 := get_byte(in_string, 0); b2 := get_byte(in_string, 1); b3 := get_byte(in_string, 2); b4 := get_byte(in_string, 3); END CASE; out_int := (b1 24) + (b2 16) + (b3 8) + b4; RETURN(out_int); END; $$ LANGUAGE plpgsql IMMUTABLE; --- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Wed, Apr 30, 2014 at 12:26:20AM -0700, Peter Geoghegan wrote: On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote: As a GSoC student, I will implement WAL recovery of hash indexes using the other index types' WAL code as a guide. Frankly, I'm skeptical of the idea that hash indexes will ever really be useful. I realize that that's a counter-intuitive conclusion, but there are many things we could do to improve B-Tree CPU costs to make them closer to those of hash indexes, without making them any less flexible. I myself would much rather work on that, and intend to. The O(1) cost seems attractive when you consider that that only requires that we read one index page from disk to service any given index scan, but in fact B-Trees almost always only require the same. They are of course also much more flexible. The concurrency characteristics B-Trees are a lot better understood. I sincerely suggest that we forget about conventional hash table type indexes. I fear they're a lost cause. -- Peter Geoghegan Hi Peter, I do not think that CPU costs matter as much as the O(1) probe to get a result value specifically for very large indexes/tables where even caching the upper levels of a B-tree index would kill your working set in memory. I know, I know, everyone has so much memory and can just buy more... but this does matter. I also think that development of hash indexes has been stalled waiting for WAL logging. For example, hash indexes can almost trivially become more space efficient as they grow in size by utilizing the page number to represent the prefix bits of the hash value for a bucket. My 2 cents. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why data of timestamptz does not store value of timezone passed to it?
On Thu, Aug 28, 2014 at 03:33:56PM -0400, Bruce Momjian wrote: On Thu, Aug 28, 2014 at 11:26:53AM -0700, Kevin Grittner wrote: Steve Crawford scrawf...@pinpointresearch.com wrote: I have always considered timestamp with time zone to be a bad description of that data type but it appears to be a carryover from the specs. It is really a point in time I agree. While what timestamptz implements is a very useful data type, I think it was a very unfortunate decision to implement that for the standard type name, instead of something more consistent with the spec. It seems very unlikely to change, though, because so much existing production code would break. :-( Understandably, people do tend to expect that saving something into a column defined as TIMESTAMP WITH TIME ZONE will save a time zone with the timestamp, and in PostgreSQL it does not. So the standard requires storing of original timezone in the data type? I was not aware of that. I do not have a copy of the SQL 92 spec, but several references to the spec mention that it defined the time zone as a format SHH:MM where S represents the sign (+ or -), which seems to be what PostgreSQL uses. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Sep 02, 2014 at 10:30:11AM -0300, Arthur Silva wrote: On Tue, Sep 2, 2014 at 9:11 AM, Rahila Syed rahilasye...@gmail.com wrote: Hello, It'd be interesting to check avg cpu usage as well I have collected average CPU utilization numbers by collecting sar output at interval of 10 seconds for following benchmark: Server specifications: Processors:Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T, 20 MB) * 2 nos RAM: 32GB Disk : HDD 450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos 1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s, 10,000 rpm Benchmark: Scale : 16 Command :java JR /home/postgres/jdbcrunner-1.2/scripts/tpcc.js -sleepTime 550,250,250,200,200 Warmup time : 1 sec Measurement time : 900 sec Number of tx types : 5 Number of agents : 16 Connection pool size : 16 Statement cache size : 40 Auto commit : false Checkpoint segments:1024 Checkpoint timeout:5 mins Average % of CPU utilization at user level for multiple blocks compression: Compression Off = 3.34133 Snappy = 3.41044 LZ4 = 3.59556 Pglz = 3.66422 The numbers show the average CPU utilization is in the following order pglz LZ4 Snappy No compression Attached is the graph which gives plot of % CPU utilization versus time elapsed for each of the compression algorithms. Also, the overall CPU utilization during tests is very low i.e below 10% . CPU remained idle for large(~90) percentage of time. I will repeat the above tests with high load on CPU and using the benchmark given by Fujii-san and post the results. Thank you, On Wed, Aug 27, 2014 at 9:16 PM, Arthur Silva arthur...@gmail.com wrote: Em 26/08/2014 09:16, Fujii Masao masao.fu...@gmail.com escreveu: On Tue, Aug 19, 2014 at 6:37 PM, Rahila Syed rahilasye...@gmail.com wrote: Hello, Thank you for comments. Could you tell me where the patch for single block in one run is? Please find attached patch for single block compression in one run. Thanks! I ran the benchmark using pgbench and compared the results. I'd like to share the results. [RESULT] Amount of WAL generated during the benchmark. Unit is MB. MultipleSingle off202.0201.5 on6051.06053.0 pglz3543.03567.0 lz43344.03485.0 snappy3354.03449.5 Latency average during the benchmark. Unit is ms. MultipleSingle off19.119.0 on55.357.3 pglz45.045.9 lz444.244.7 snappy43.443.3 These results show that FPW compression is really helpful for decreasing the WAL volume and improving the performance. The compression ratio by lz4 or snappy is better than that by pglz. But it's difficult to conclude which lz4 or snappy is best, according to these results. ISTM that compression-of-multiple-pages-at-a-time approach can compress WAL more than compression-of-single-... does. [HOW TO BENCHMARK] Create pgbench database with scall factor 1000. Change the data type of the column filler on each pgbench table from CHAR(n) to TEXT, and fill the data with the result of pgcrypto's gen_random_uuid() in order to avoid empty column, e.g., alter table pgbench_accounts alter column filler type text using gen_random_uuid()::text After creating the test database, run the pgbench as follows. The number of transactions executed during benchmark is almost same between each benchmark because -R option is used. pgbench -c 64 -j 64 -r -R 400 -T 900 -M prepared checkpoint_timeout is 5min, so it's expected that checkpoint was executed at least two times during the benchmark. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers It'd be interesting to check avg cpu usage as well. Is there any reason to default to LZ4-HC? Shouldn't we try the default as well? LZ4-default is known for its near realtime speeds in exchange for a few % of compression, which sounds optimal for this use case. Also, we might want to compile these libraries with -O3 instead of the default -O2. They're finely tuned to work with all possible compiler optimizations w/ hints and other tricks, this is specially true for LZ4, not sure for snappy. In my virtual machine LZ4 w/ -O3 compression runs at twice the speed (950MB/s) of -O2 (450MB/s) @ (61.79%), LZ4-HC seems unaffected though (58MB/s) @ (60.27%). Yes, that's right,
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Sep 11, 2014 at 09:37:07AM -0300, Arthur Silva wrote: I agree that there's no reason to fix an algorithm to it, unless maybe it's pglz. There's some initial talk about implementing pluggable compression algorithms for TOAST and I guess the same must be taken into consideration for the WAL. -- Arthur Silva On Thu, Sep 11, 2014 at 2:46 AM, Rahila Syed rahilasyed...@gmail.com wrote: I will repeat the above tests with high load on CPU and using the benchmark given by Fujii-san and post the results. Average % of CPU usage at user level for each of the compression algorithm are as follows. CompressionMultipleSingle Off81.133881.1267 LZ4 81.099881.1695 Snappy:80.9741 80.9703 Pglz :81.235381.2753 http://postgresql.1045698.n5.nabble.com/file/n5818552/CPU_utilization_user_single.png http://postgresql.1045698.n5.nabble.com/file/n5818552/CPU_utilization_user.png The numbers show CPU utilization of Snappy is the least. The CPU utilization in increasing order is pglz No compression LZ4 Snappy The variance of average CPU utilization numbers is very low. However , snappy seems to be best when it comes to lesser utilization of CPU. As per the measurement results posted till date LZ4 outperforms snappy and pglz in terms of compression ratio and performance. However , CPU utilization numbers show snappy utilizes least amount of CPU . Difference is not much though. As there has been no consensus yet about which compression algorithm to adopt, is it better to make this decision independent of the FPW compression patch as suggested earlier in this thread?. FPW compression can be done using built in compression pglz as it shows considerable performance over uncompressed WAL and good compression ratio Also, the patch to compress multiple blocks at once gives better compression as compared to single block. ISTM that performance overhead introduced by multiple blocks compression is slightly higher than single block compression which can be tested again after modifying the patch to use pglz . Hence, this patch can be built using multiple blocks compression. Thoughts? Hi, The big (huge) win for lz4 (not the HC variant) is the enormous compression and decompression speed. It compresses quite a bit faster (33%) than snappy and decompresses twice as fast as snappy. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Sep 11, 2014 at 06:58:06PM +0200, Andres Freund wrote: On 2014-09-11 12:55:21 -0400, Robert Haas wrote: I advise supporting pglz only for the initial patch, and adding support for the others later if it seems worthwhile. The approach seems to work well enough with pglz that it's worth doing even if we never add the other algorithms. That approach is fine with me. Note though that I am pretty strongly against adding support for more than one algorithm at the same time. So, if we gain lz4 support - which I think is definitely where we should go - we should drop pglz support for the WAL. Greetings, Andres Freund +1 Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Sep 11, 2014 at 07:17:42PM +0200, Andres Freund wrote: On 2014-09-11 13:04:43 -0400, Robert Haas wrote: On Thu, Sep 11, 2014 at 12:58 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-09-11 12:55:21 -0400, Robert Haas wrote: I advise supporting pglz only for the initial patch, and adding support for the others later if it seems worthwhile. The approach seems to work well enough with pglz that it's worth doing even if we never add the other algorithms. That approach is fine with me. Note though that I am pretty strongly against adding support for more than one algorithm at the same time. What if one algorithm compresses better and the other algorithm uses less CPU time? Then we make a choice for our users. A configuration option about an aspect of postgres that darned view people will understand with for the marginal differences between snappy and lz4 doesn't make sense. I don't see a compelling need for an option if we get a new algorithm that strictly dominates what we've already got in all parameters, and it may well be that, as respects pglz, that's achievable. But ISTM that it need not be true in general. If you look at the results lz4 is pretty much there. Sure, there's algorithms which have a much better compression - but the time overhead is so large it just doesn't make sense for full page compression. Greetings, Andres Freund In addition, you can leverage the the presence of a higher-compression version of lz4 (lz4hc) that can utilize the same decompression engine that could possibly be applied to static tables as a REINDEX option or even slowly growing tables that would benefit from the better compression as well as the increased decompression speed available. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory Alignment in Postgres
On Thu, Sep 11, 2014 at 02:54:36PM -0300, Arthur Silva wrote: Indeed I don't know any other architectures that this would be at an option. So if this ever moves forward it must be turned on at compile time for x86-64 only. I wonder how the Mysql handle their rows even on those architectures as their storage format is completely packed. If we just reduced the alignment requirements when laying out columns in the rows and indexes by reducing/removing padding -- typalign, it'd be enough gain in my (humble) opinion. If you think alignment is not an issue you can see saving everywhere, which is kinda insane... I'm unsure how this equates in patch complexity, but judging by the reactions so far I'm assuming a lot. If the column order in the table was independent of the physical layout, it would be possible to order columns to reduce the padding needed. Not my suggestion, just repeating a valid comment from earlier in the thread. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Sep 12, 2014 at 11:17:12PM +0300, Ants Aasma wrote: On Fri, Sep 12, 2014 at 10:38 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I don't mean that we should abandon this patch - compression makes the WAL smaller which has all kinds of other benefits, even if it makes the raw TPS throughput of the system worse. But I'm just saying that these TPS comparisons should be taken with a grain of salt. We probably should consider switching to a faster CRC algorithm again, regardless of what we do with compression. CRC is a pretty awfully slow algorithm for checksums. We should consider switching it out for something more modern. CityHash, MurmurHash3 and xxhash look like pretty good candidates, being around an order of magnitude faster than CRC. I'm hoping to investigate substituting the WAL checksum algorithm 9.5. Given the room for improvement in this area I think it would make sense to just short-circuit the CRC calculations for testing this patch to see if the performance improvement is due to less data being checksummed. Regards, Ants Aasma +1 for xxhash - versionspeed on 64-bits speed on 32-bits --- XXH64 13.8 GB/s 1.9 GB/s XXH32 6.8 GB/s 6.0 GB/s Here is a blog about its performance as a hash function: http://fastcompression.blogspot.com/2014/07/xxhash-wider-64-bits.html Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sat, Sep 13, 2014 at 12:55:33PM -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-09-13 08:52:33 +0300, Ants Aasma wrote: On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva arthur...@gmail.com wrote: That's not entirely true. CRC-32C beats pretty much everything with the same length quality-wise and has both hardware implementations and highly optimized software versions. For better or for worse CRC is biased by detecting all single bit errors, the detection capability of larger errors is slightly diminished. The quality of the other algorithms I mentioned is also very good, while producing uniformly varying output. There's also much more literature about the various CRCs in comparison to some of these hash allgorithms. Indeed. CRCs have well-understood properties for error detection. Have any of these new algorithms been analyzed even a hundredth as thoroughly? No. I'm unimpressed by evidence-free claims that something else is also very good. Now, CRCs are designed for detecting the sorts of short burst errors that are (or were, back in the day) common on phone lines. You could certainly make an argument that that's not the type of threat we face for PG data. However, I've not seen anyone actually make such an argument, let alone demonstrate that some other algorithm would be better. To start with, you'd need to explain precisely what other error pattern is more important to defend against, and why. regards, tom lane Here is a blog on the development of xxhash: http://fastcompression.blogspot.com/2012/04/selecting-checksum-algorithm.html Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sat, Sep 13, 2014 at 09:50:55PM -0300, Arthur Silva wrote: On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-09-13 08:52:33 +0300, Ants Aasma wrote: On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva arthur...@gmail.com wrote: That's not entirely true. CRC-32C beats pretty much everything with the same length quality-wise and has both hardware implementations and highly optimized software versions. For better or for worse CRC is biased by detecting all single bit errors, the detection capability of larger errors is slightly diminished. The quality of the other algorithms I mentioned is also very good, while producing uniformly varying output. There's also much more literature about the various CRCs in comparison to some of these hash allgorithms. Indeed. CRCs have well-understood properties for error detection. Have any of these new algorithms been analyzed even a hundredth as thoroughly? No. I'm unimpressed by evidence-free claims that something else is also very good. Now, CRCs are designed for detecting the sorts of short burst errors that are (or were, back in the day) common on phone lines. You could certainly make an argument that that's not the type of threat we face for PG data. However, I've not seen anyone actually make such an argument, let alone demonstrate that some other algorithm would be better. To start with, you'd need to explain precisely what other error pattern is more important to defend against, and why. regards, tom lane Mysql went this way as well, changing the CRC polynomial in 5.6. What we are looking for here is uniqueness thus better error detection. Not avalanche effect, nor cryptographically secure, nor bit distribution. As far as I'm aware CRC32C is unbeaten collision wise and time proven. I couldn't find tests with xxhash and crc32 on the same hardware so I spent some time putting together a benchmark (see attachment, to run it just start run.sh) I included a crc32 implementation using ssr4.2 instructions (which works on pretty much any Intel processor built after 2008 and AMD built after 2012), a portable Slice-By-8 software implementation and xxhash since it's the fastest software 32bit hash I know of. Here're the results running the test program on my i5-4200M crc sb8: 90444623 elapsed: 0.513688s speed: 1.485220 GB/s crc hw: 90444623 elapsed: 0.048327s speed: 15.786877 GB/s xxhash: 7f4a8d5 elapsed: 0.182100s speed: 4.189663 GB/s The hardware version is insanely and works on the majority of Postgres setups and the fallback software implementations is 2.8x slower than the fastest 32bit hash around. Hopefully it'll be useful in the discussion. Thank you for running this sample benchmark. It definitely shows that the hardware version of the CRC is very fast, unfortunately it is really only available on x64 Intel/AMD processors which leaves all the rest lacking. For current 64-bit hardware, it might be instructive to also try using the XXH64 version and just take one half of the hash. It should come in at around 8.5 GB/s, or very nearly the speed of the hardware accelerated CRC. Also, while I understand that CRC has a very venerable history and is well studied for transmission type errors, I have been unable to find any research on its applicability to validating file/block writes to a disk drive. While it is to quote you unbeaten collision wise, xxhash, both the 32-bit and 64-bit version are its equal. Since there seems to be a lack of research on disk based error detection versus CRC polynomials, it seems likely that any of the proposed hash functions are on an equal footing in this regard. As Andres commented up-thread, xxhash comes along for free with lz4. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sun, Sep 14, 2014 at 05:21:10PM +0200, Andres Freund wrote: On 2014-09-13 20:27:51 -0500, k...@rice.edu wrote: Also, while I understand that CRC has a very venerable history and is well studied for transmission type errors, I have been unable to find any research on its applicability to validating file/block writes to a disk drive. Which incidentally doesn't really match what the CRC is used for here. It's used for individual WAL records. Usually these are pretty small, far smaller than disk/postgres' blocks on average. There's a couple scenarios where they can get large, true, but most of them are small. The primary reason they're important is to correctly detect the end of the WAL. To ensure we're interpreting half written records, or records from before the WAL file was overwritten. While it is to quote you unbeaten collision wise, xxhash, both the 32-bit and 64-bit version are its equal. Aha? You take that from the smhasher results? Yes. Since there seems to be a lack of research on disk based error detection versus CRC polynomials, it seems likely that any of the proposed hash functions are on an equal footing in this regard. As Andres commented up-thread, xxhash comes along for free with lz4. This is pure handwaving. Yes. But without research to support the use of CRC32 in this same environment, it is handwaving in the other direction. :) Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
On Mon, Oct 20, 2014 at 09:03:59PM +0200, jes...@krogh.cc wrote: Hi. One of our production issues is that the system generates lots of wal-files, lots is like 151952 files over the last 24h, which is about 2.4TB worth of WAL files. I wouldn't say that isn't an issue by itself, but the system does indeed work fine. We do subsequently gzip the files to limit actual disk-usage, this makes the files roughly 30-50% in size. ... Suggestions are welcome. An archive-command/restore command that could combine/split wal-segments might be the easiest workaround, but how about crash-safeness? Hi, Have you considered using something like tar/star in the archive command to pack them into much larger tar archives? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
On Tue, Oct 28, 2014 at 01:51:21PM -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On 28 October 2014 17:06, Tom Lane t...@sss.pgh.pa.us wrote: My own thought is that allowing external AMs is simply a natural consequence of PG's general approach to extensibility, and it would be surprising if we were to decide we didn't want to allow that. If it wasn't clear from my two earlier attempts, yes, +1 to that. I'd like to avoid all of the pain by making persistent AMs that are recoverable after a crash, rather than during crash recovery. I think the notion of having AMs that explicitly don't have WAL support is quite orthogonal to what's being discussed in this thread. It might be worth doing that just to get the hash AM into a less-weird state (given that nobody is stepping up to the plate to fix it properly). regards, tom lane Hi, I think that someone is working on the hash index WAL problem, but are coming up to speed on the whole system, which takes time. I know that I have not had a large enough block of time to spend on it either. :( Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On Tue, Nov 04, 2014 at 11:44:22AM +0900, Michael Paquier wrote: On Sun, Nov 2, 2014 at 2:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: In the case of hash indexes, because we still have to have the hash opclasses in core, there's no way that it could be pushed out as an extension module even if we otherwise had full support for AMs as extensions. So what I hear you proposing is let's break this so thoroughly that it *can't* be fixed. I'm not on board with that. I think the WARNING will do just fine to discourage novices who are not familiar with the state of the hash AM. In the meantime, we could push forward with the idea of making hash indexes automatically unlogged, so that recovering from a crash wouldn't be quite so messy/ dangerous. There is as well another way: finally support WAL-logging for hash indexes. +1 Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Wed, Dec 03, 2014 at 10:00:26AM -0300, Alvaro Herrera wrote: Amit Langote wrote: From: Robert Haas [mailto:robertmh...@gmail.com] What is an overflow partition and why do we want that? That would be a default partition. That is, where the tuples that don't belong elsewhere (other defined partitions) go. VALUES clause of the definition for such a partition would look like: (a range partition) ... VALUES LESS THAN MAXVALUE (a list partition) ... VALUES DEFAULT There has been discussion about whether there shouldn't be such a place for tuples to go. That is, it should generate an error if a tuple can't go anywhere (or support auto-creating a new one like in interval partitioning?) In my design I initially had overflow partitions too, because I inherited the idea from Itagaki Takahiro's patch. Eventually I realized that it's a useless concept, because you can always have leftmost and rightmost partitions, which are just regular partitions (except they don't have a low key, resp. high key). If you don't define unbounded partitions at either side, it's fine, you just raise an error whenever the user tries to insert a value for which there is no partition. Hi, Maybe I am not clear on the concept of an overflow partition, but I thought that it functioned to catch any record that did not fit the partitioning scheme. You end of range with out a low key or high key would only catch problems in those areas. If you partitioned on work days of the week, you should not have anything on Saturday/Sunday. How would that work? You would want to catch anything that was not a weekday in the overflow. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 03, 2014 at 02:08:27PM -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: Do you need to plan for every combination, where some joins are removed and some are not? I would vote for just having two plans and one switch node. To exploit any finer grain, we'd have to have infrastructure that would let us figure out *which* constraints pending triggers might indicate transient invalidity of, and that doesn't seem likely to be worth the trouble. I hope the same mechanism could be used to prepare a plan for a query with parameters, where the parameters might or might not allow a partial index to be used. We have some smarts nowadays to use custom plans, but this could be better. Interesting thought, but that would be a totally different switch condition ... regards, tom lane Or between a node with a low rows count and a high rows count for those pesky mis-estimation queries. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)
On Fri, Dec 19, 2014 at 04:41:51PM -0600, Jim Nasby wrote: On 12/18/14, 5:00 PM, Jim Nasby wrote: 2201582 20 -- Mostly LOCALLOCK and Shared Buffer Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64. It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We also have very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets us down to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminate one of the mix() calls. But I wonder if we could still do better, because we typically also won't have that many relations. Is there some fast way we could combine dbid, forkNum and relid into a uint32? That gets us down to 8 bytes, which means we could use fash-hash, or a stripped down mix(). Unfortunately I don't know much about hash algorithms, so I don't know how practical any of this actually is, or what a good method for combining those fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ relid, but if you got unlucky with your oid values you could end up with a lot of collissions from that. I can put some effort into this, but I'd like some guidance. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com Hi, If we are going to consider changing the hash function, we should consider something like xxhash which runs at 13.8GB/s on a 2.7GHz x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which is double the speed of fast-hash according to the page running on a 3GHz x86_64. In addition, something like that could be used a checksum instead of the current CRC32, although some work has already gone into speeding it up, as is. Otherwise, it probably makes sense to just stick with creating the fastpath 8-byte analogously to the 4-byte fastpath that was just added. Is calculating the hash the bottle-neck? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Fri, Jan 02, 2015 at 01:01:06PM +0100, Andres Freund wrote: On 2014-12-31 16:09:31 -0500, Bruce Momjian wrote: I still don't understand the value of adding WAL compression, given the high CPU usage and minimal performance improvement. The only big advantage is WAL storage, but again, why not just compress the WAL file when archiving. before: pg_xlog is 800GB after: pg_xlog is 600GB. I'm damned sure that many people would be happy with that, even if the *per backend* overhead is a bit higher. And no, compression of archives when archiving helps *zap* with that (streaming, wal_keep_segments, checkpoint_timeout). As discussed before. Greetings, Andres Freund +1 On an I/O constrained system assuming 50:50 table:WAL I/O, in the case above you can process 100GB of transaction data at the cost of a bit more CPU. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for Numeric
On Mon, Mar 23, 2015 at 09:41:40PM +, Andrew Gierth wrote: Peter == Peter Geoghegan p...@heroku.com writes: Peter As I said, I don't really consider that my patch is a rewrite, Peter especially V4, which changes nothing substantive except removing Peter 32-bit support. Well, that's a hell of an except. Here's my main arguments for why 32bit support should be kept: 1. It exists and works well (and yes, I have tested it). 2. This optimization is a huge win even on very small data sets. On sorts of as few as 100 items it gives detectable (on the order of +50%) improvements. On 1000 items the speedup can easily be 3 times. So it's not just people with big data who want this; even small databases will benefit. 3. Keeping the 32bit support (and desupporting DEC_DIGITS != 4) makes it unnecessary to have #ifdefs that disable the numeric abbreviation entirely. (You don't even need those for comparative performance testing; easier to do that by tweaking the catalogs.) As against that, you have the fact that it's ~70 lines of code in one self-contained function which is 32bit-specific. So what do other people think? +1 for including 32-bit support as well. This is a tremendous performance increase and users of older systems will benefit, and should benefit. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC idea - Simulated annealing to search for query plans
On Thu, Feb 26, 2015 at 10:59:50PM +0100, Grzegorz Parka wrote: Dear Hackers, I'm Grzegorz Parka, BSc Engineer of Technical Physics and student of Computer Science at WUT, Poland. Last year I've been a bit into evolutionary algorithms and during my research I found out about GEQO in Postgres. I also found out that there are plans to try a different attempt to find optimal query plans and thought it could be a good thing to use it as a project for GSoC. I'm interested in one of old TODO items related to the optimizer - 'Consider compressed annealing to search for query plans'. I believe this would be potentially beneficial to Postgres to check if such a heuristic could really choose better query plans than GEQO does. Judging by the results that simulated annealing gives on the travelling salesman problem, it looks like a simpler and potentially more effective way of combinatorial optimization. As deliverables of such a project I would provide a simple implementation of basic simulated annealing optimizer and some form of quantitative comparison with GEQO. I see that this may be considerably bigger than most of GSoC projects, but I would like to know your opinion. Do you think that this would be beneficial enough to make a proper GSoC project? I would also like to know if you have any additional ideas about this project. Best regards, Grzegorz Parka Hi, I think someone already mentioned it, but it would be very neat if the optimizer could be pluggable. Then many different algorithms could be evaluated more easily. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On Mon, Apr 06, 2015 at 12:07:47PM -0500, Jim Nasby wrote: ... As I understand it, the goal here is to prevent huge amounts of periodic freeze work due to XID wraparound. I don't think we need the Freeze state to accomplish that. With a single bit per page in the Frozen Map, checking a 800GB table would require reading a mere 100MB of FM. That's pretty tiny, and largely accomplishes the goal. Obviously it would be nice to eliminate even that 100MB read, but I suggest you leave that for a 3rd patch. I think you'll find that just getting the first 2 accomplished will be a significant amount of work. Hi, I may have my math wrong, but 800GB ~ 100M pages or 12.5MB and not 100MB. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
On Fri, Jun 05, 2015 at 11:54:01PM +, deavid wrote: Thanks to everybody for answering. I wasn't expecting this attention; this is a great community :-) Jim asked me about something real. Well, the problem is this showed up more than five years ago, and keeps popping from time to time since in different circumstances. I solved them in different ways each time, depending the exact use-case. I wanted to generalize, because seems a good feature for several situations; and I don't expect a solution for me as each time I hit with this I found some way to sort it out. As Jim said, we need here are figures for real examples, and i don't have yet. I'll do my homework and email back with exact problems with exact timing. Give me a week or two. Also, some of you are talking about IO. Well, it's hard to say without the figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on those big databases, and also in my tests i tried setting fsync=off. So the problem is: i see a low iowait, and CPU time for one core is at 80-90% most of the time. I can buy more ram, better disks, or cpu's with more cores. But one cpu core would have more-or-less the same speed no matter how much money you invest. When someone wants a delayed-write index is similar to setting synchronous_commit = off. We want to give an OK to the backend as soon as is possible and do this work in background. But we also want some reliability against crashes. Also, if the task is done in background it may be done from other backend, so probably several indexes could be packed at once using different backend processes. We could use the entire cpu if our index writes aren't tied to the session who wrote the row. PD: I'm very interested on existent approaches like GIN or BRIN (this one is new to me). Thanks a lot; i'll try them in my tests. Hi David, Here is an interesting read comparing LSM and Fractal Tree indexing: http://highscalability.com/blog/2014/8/6/tokutek-white-paper-a-comparison-of-log-structured-merge-lsm.html Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reliance on undefined behaviour in << operator
On Wed, Sep 16, 2015 at 03:57:04PM -0400, Tom Lane wrote: > Robert Haaswrites: > > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer wrote: > >> Our implementation of << is a direct wrapper around the C operator. It > >> does not check the right-hand side's value. > >> ... On x64 intel gcc linux it does a rotation but that's > >> not AFAIK guaranteed by anything, and we should probably not be > >> relying on this or exposing it at the user level. > > > I agree. > > As far as I'm concerned, what those operators mean is "whatever your > compiler makes them mean". This is hardly the only place where we expose > platform-dependent behavior --- see also locale dependencies, timezones, > floating point, yadda yadda --- and I do not find it the most compelling > place to start reversing that general approach. > > regards, tom lane > +1 I tend to agree. Unless the behavior is mandated by the SQL standard, I have always expected the behavior of those apps to follow that defined by the compiler. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No Issue Tracker - Say it Ain't So!
On Wed, Sep 23, 2015 at 04:33:33PM -0700, Josh Berkus wrote: > On 09/23/2015 03:05 PM, Jim Nasby wrote: > > On 9/23/15 3:12 PM, Thomas Kellerer wrote: > >> They also support Postgres as their backend (and you do find hints > >> here and > >> there > >> that it is the recommended open source DBMS for them - but they don't > >> explicitly state it like that). We are using Jira at the company I > >> work for > >> and > >> all Jira installations run on Postgres there. > > > > I'll second Jira as well. It's the only issue tracker I've seen that you > > can actually use for multiple different things without it becoming a > > mess. IE: it could track Postgres bugs, infrastructure issues, and the > > TODO list if we wanted, allow issues to reference each other > > intelligently, yet still keep them as 3 separate bodies. > > Speaking as someone who uses Jira for commericial work, I'm -1 on them. > I simply don't find Jira to be superior to OSS BT systems, and inferior > in several ways (like that you can't have more than one person assigned > to a bug). And email integration for Jira is nonexistant. > > When we discussed this 8 years ago, Debian said debbugs wasn't ready for > anyone else to use. Has that changed? > I do not think using a commercial system is a good idea. Currently, Jira is free for open-source, but there is no guarantee. That could change at anytime and result in possibly an expensive license cost or port to another system. We use Jira/Confluence and the random loss of support for various plugins caused by forced security-based upgrades has resulted in a lot of unexpected work to maintain the system. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does people favor to have matrix data type?
On Wed, May 25, 2016 at 09:10:02AM +, Kouhei Kaigai wrote: > > -Original Message- > > From: Simon Riggs [mailto:si...@2ndquadrant.com] > > Sent: Wednesday, May 25, 2016 4:39 PM > > To: Kaigai Kouhei(海外 浩平) > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Does people favor to have matrix data type? > > > > On 25 May 2016 at 03:52, Kouhei Kaigaiwrote: > > > > > > In a few days, I'm working for a data type that represents matrix in > > mathematical area. Does people favor to have this data type in the core, > > not only my extension? > > > > > > If we understood the use case, it might help understand whether to include > > it or not. > > > > Multi-dimensionality of arrays isn't always useful, so this could be good. > > > As you may expect, the reason why I've worked for matrix data type is one of > the groundwork for GPU acceleration, but not limited to. > > What I tried to do is in-database calculation of some analytic algorithm; not > exporting entire dataset to client side. > My first target is k-means clustering; often used to data mining. > When we categorize N-items which have M-attributes into k-clusters, the master > data can be shown in NxM matrix; that is equivalent to N vectors in > M-dimension. > The cluster centroid is also located inside of the M-dimension space, so it > can be shown in kxM matrix; that is equivalent to k vectors in M-dimension. > The k-means algorithm requires to calculate the distance to any cluster > centroid > for each items, thus, it produces Nxk matrix; that is usually called as > distance > matrix. Next, it updates the cluster centroid using the distance matrix, then > repeat the entire process until convergence. > > The heart of workload is calculation of distance matrix. When I tried to write > k-means algorithm using SQL + R, its performance was not sufficient (poor). > https://github.com/kaigai/toybox/blob/master/Rstat/pgsql-kmeans.r > > If we would have native functions we can use instead of the complicated SQL > expression, it will make sense for people who tries in-database analytics. > > Also, fortunately, PostgreSQL's 2-D array format is binary compatible to BLAS > library's requirement. It will allow GPU to process large matrix in HPC grade > performance. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei Hi, Have you looked at Perl Data Language under pl/perl? It has pretty nice support for matrix calculations: http://pdl.perl.org Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers