[HACKERS] CREATE INDEX CONCURRENTLY?
I have not kept up with PostgreSQL changes and have just been using it. A co-worker recently told me that you need to word CONCURRENTLY in CREATE INDEX to avoid table locking. I called BS on this because to my knowledge PostgreSQL does not lock tables. I referenced this page in the documentation: http://www.postgresql.org/docs/9.3/static/locking-indexes.html However, I do see this sentence in the indexing page that was not in the docs prior to 8.0: Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Is this true? When/why the change? When we use concurrently, it seems to hang. I am looking into it.
[HACKERS] SSL and USER_CERT_FILE
I am using PostgreSQL's SSL support and the conventions for the key and certifications don't make sense from the client perspective. Especially under Windows. I am proposing a few simple changes: Adding two API void PQsetSSLUserCertFileName(char *filename) { user_crt_filename = strdup(filename); } PQsetSSLUserKeyFileName(char *filename) { user_key_filename = strdup(filename); } Adding two static vars in fe-secure.c char *user_key_filename=NULL; char *user_crt_filename=NULL; In client_cert_cb(...) Add: if(user_crt_filename) strncpy(fnbuf, sizeof(fnbuf), user_crt_filename); else snprintf(fnbuf, sizeof(fnbuf), %s/%s, homedir, USER_CERT_FILE); and: if(user_key_filename) strncpy(fnbuf, sizeof(fnbuf), user_key_filename); else snprintf(fnbuf, sizeof(fnbuf), %s/%s, homedir, USER_KEY_FILE); The purpose of these changes is to make it easier to configure SSL in an application which uses libpq. Any comments? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Permanent settings
I have been looking at this thread for a bit and want to interject an idea. A couple years ago, I offered a patch to the GUC system that added a number of abilities, two left out were: (1) Specify a configuration file on the command line. (2) Allow the inclusion of a configuration file from within the configuration file. If the include functionality were re-instated, then a default file, something like, $DATADIR/defaults.conf could be included by default on start up. This file could be written by the program. It offers the benefit of being readable and editable, requiring very little extra work to implement, and working in a consistent way with existing functionality. P.S. I'd like the ability to load a config file with a command line option as well. :-) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Permanent settings
Mark Woodward wrote: I have been looking at this thread for a bit and want to interject an idea. A couple years ago, I offered a patch to the GUC system that added a number of abilities, two left out were: (1) Specify a configuration file on the command line. (2) Allow the inclusion of a configuration file from within the configuration file. eh? We have both of these capabilities. Really? Maybe I'm just missing it in the --help message and the web docs, (I didn't see it in the code either) how do you specify a config file on postmaster startup? Also I didn't see any way to include a file from within postgresql.conf. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Syntax bug? Group by?
Shouldn't this work? select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; ERROR: column y.ycis_id must appear in the GROUP BY clause or be used in an aggregate function If I am asking for a specific column value, should I, technically speaking, need to group by that column? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Syntax bug? Group by?
Stephen Frost wrote: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; But back to the query the issue comes in that the ycis_id value is included with the return values requested (a single row value with aggregate values that isn't grouped) - if ycis_id is not unique you will get x number of returned tuples with ycis_id=15 and the same min() and avg() values for each row. Removing the ycis_id after the select will return the aggregate values you want without the group by. I still assert that there will always only be one row to this query. This is an aggregate query, so all the rows with ycis_id = 15, will be aggregated. Since ycis_id is the identifying part of the query, it should not need to be grouped. My question, is it a syntactic technicality that PostgreSQL asks for a group by, or a bug in the parser? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Syntax bug? Group by?
Hi, Mark, Mark Woodward wrote: Stephen Frost wrote: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; But back to the query the issue comes in that the ycis_id value is included with the return values requested (a single row value with aggregate values that isn't grouped) - if ycis_id is not unique you will get x number of returned tuples with ycis_id=15 and the same min() and avg() values for each row. Removing the ycis_id after the select will return the aggregate values you want without the group by. I still assert that there will always only be one row to this query. This is an aggregate query, so all the rows with ycis_id = 15, will be aggregated. Since ycis_id is the identifying part of the query, it should not need to be grouped. My question, is it a syntactic technicality that PostgreSQL asks for a group by, or a bug in the parser? I think that it's a lack of special-casing the = operator. Imagine where ycis_id15 or where ycis_id @| $RECTANGLE or other (probably user defined) operators on (probably user defined) datatypes. The parser has no real knowledge what the operators do, it simply requests one that returns a bool. One could make the parser to special case the = operator, and maybe some others, however I doubt it's worth the effort. I understand the SQL, and this isn't a sql question else it would be on a different list, it is a PostgreSQL internals question and IMHO potential bug. The original query: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; Should NOT require a group by to get ycis_id in the results. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Syntax bug? Group by?
Hi, Mark, Mark Woodward wrote: Shouldn't this work? select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; ERROR: column y.ycis_id must appear in the GROUP BY clause or be used in an aggregate function If I am asking for a specific column value, should I, technically speaking, need to group by that column? Try: SELECT 15 as ycis_id, min(tindex), avt(tindex) from y where ycis_id = 15; This isn't a SQL question!!! This is a question of whether or not PostgreSQL is correct in requiring a group by in the query. I assert that since it is unabiguous as to what ycis_id should be, PostgreSQL should not require a grouping. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Syntax bug? Group by?
Mark Woodward wrote: Stephen Frost wrote: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; But back to the query the issue comes in that the ycis_id value is included with the return values requested (a single row value with aggregate values that isn't grouped) - if ycis_id is not unique you will get x number of returned tuples with ycis_id=15 and the same min() and avg() values for each row. Removing the ycis_id after the select will return the aggregate values you want without the group by. I still assert that there will always only be one row to this query. This is an aggregate query, so all the rows with ycis_id = 15, will be aggregated. Since ycis_id is the identifying part of the query, it should not need to be grouped. My question, is it a syntactic technicality that PostgreSQL asks for a group by, or a bug in the parser? AFAIK what you want is not per sql spec. What if you had instead written select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id) = 15; ? I think you are expecting too much reasoning from the engine. Regardless, I can get the results I need and have already worked around this. The reason why I posted the question to hackers was that I think it is a bug. The output column ycis_id is unabiguously a single value with regards to the query. Shouldn't PostgreSQL know this? AFAIR, I think I've used this exact type of query before either on PostgreSQL or another system, maybe Oracle, and it did work. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Syntax bug? Group by?
Mark Woodward wrote: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; I still assert that there will always only be one row to this query. This is an aggregate query, so all the rows with ycis_id = 15, will be aggregated. Since ycis_id is the identifying part of the query, it should not need to be grouped. My question, is it a syntactic technicality that PostgreSQL asks for a group by, or a bug in the parser? I think your point is that every non-aggregate column in the results of the query also appears in the where clause and is given a single value there, so conceivably, an all-knowing, all-powerful postgres could recognize this and do the implied GROUP by on these columns. Not exactly. I'm not in a position to give a definitive answer on this, but I suspect that adjusting the query parser/planner to allow an implied GROUP BY either gets prohibitively complicated, or fits too much of a special case to be worth implementing. select ycis_id, some_other_id, min(tindex), avg(tindex) from y where ycis_id = 15 group by some_other_id; This is not, in fact, like the example I gave and confuses the point I am trying to make. The original query: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; ycis_id is unambiguous and MUST be only one value, there should be no requirement of grouping. In fact, a group by implies multiple result rows in an aggregate query. As I said in other branches of this thread, this isn't a SQL question, it is a question of whether or not the PostgreSQL parser is correct or not, and I do not believe that it is working correctly. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Syntax bug? Group by?
On Tue, Oct 17, 2006 at 02:41:25PM -0400, Mark Woodward wrote: The output column ycis_id is unabiguously a single value with regards to the query. Shouldn't PostgreSQL know this? AFAIR, I think I've used this exact type of query before either on PostgreSQL or another system, maybe Oracle, and it did work. Doesn't work in Oracle 10g: SELECT ycis_id, tindex from x where ycis_id = 15; YCIS_ID TINDEX === == 15 10 15 20 SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15; ORA-00937: not a single-group group function SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15 GROUP BY ycis_id; YCIS_ID MIN(TINDEX) AVG(TINDEX) === === === 15 10 15 That's interesting. I am digging through the SQL99 spec, and am trying to find a definitive answer. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Syntax bug? Group by?
Mark Woodward wrote: Shouldn't this work? select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; ERROR: column y.ycis_id must appear in the GROUP BY clause or be used in an aggregate function This would require a great deal of special-casing, in particular knowledge of the = operator, and then the restriction to a particular form of the WHERE clause. For overall consistency, I don't think this should be allowed. Well, this started out as a huh, that's funny, that should work, is that a bug? and is turning into a search through the SQL99 spec for a clear answer. I've already worked around it, but to me, at least, it seems it should work. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Syntax bug? Group by?
On Oct 17, 2006, at 15:19, Peter Eisentraut wrote: Mark Woodward wrote: Shouldn't this work? select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; ERROR: column y.ycis_id must appear in the GROUP BY clause or be used in an aggregate function This would require a great deal of special-casing, in particular knowledge of the = operator, and then the restriction to a particular form of the WHERE clause. For overall consistency, I don't think this should be allowed. In this particular case, the client constructing the query *knows* the value of ycis_id (since the client is generating the ycis_id = 15 clause). It's technically just a waste of bandwidth and server resources to recalculate it. If you really want to replicate the output of the query you proposed, you could rewrite it on the client as: select 15 as ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; You could argue that the server should do this for you, but it seems ugly to do in the general case. And, like Peter points out, would need a lot of special-casing. I guess the parser could do it for expressions in the SELECT clause that exactly match expressions in the WHERE clause. But, and here's the rub, which is the correct way to handle it? I'm looking through the SQL99 spec to see if I can find an answer. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Hints WAS: Index Tuning Features
Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward) mumbled into her beard: What is the point of writing a proposal if there is a threat of will be rejected if one of the people who would do the rejection doesn't at least outline what would be acceptable? If your proposal is merely let's do something like Oracle, it should be obvious why that would be rejected. There is considerable legal danger to slavish emulation. Further, since PostgreSQL isn't Oracle, slavish emulation wouldn't work anyways. I don't actually like Oracle's hinting system. If a proposal is too fuzzy to be considered a source of a specification, it should be obvious that that would be rejected. Well, fuzzy isn't a bad starting place to start gathering information for an eventual proposal. If you have an idea clear enough to turn into a meaningful proposal, put it in for the usual to and fro; that generally leads to enormous improvements. Absolutely. I'm not sure what a good hinting system ought to look like; what I *do* know is that a fuzzy proposal won't be much good. That is sort of the stopping block. None of us know what it should look like, but leaving the topic as if you want it, go do the work and submit a patch. Isn't going to get it done. First we should decide if it is, in fact, something that ought to happen, then if that happens, we should think about what it should be. Again, what would be the point of writing a proposal if there is *no* concensus on what would be acceptible? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Index Tuning Features
On 10/10/06, Mark Woodward [EMAIL PROTECTED] wrote: I think the idea of virtual indexes is pretty interesting, but ultimately a lesser solution to a more fundimental issue, and that would be hands on control over the planner. Estimating the effect of an index on a query prior to creating the index is a great idea, how that is done is something different than building concensus that it should be done. Another thing that this brings up is hints to a query. Over the years, I have run into situation where the planner wasn't great. It would be nice to try forcing different strategies on the planner and see if performance caan be improved. you can do this by setting enable_access_method type parameters. Here's your hammer, all your problems are now nails. The enable_xxx setting are OK for simple queries gone wrong, but if you have a more complex query, any one of those settins may help or hinder different parts of a query, then you would be left with choosing which of them helps more than hurts the over-all query. being able to alter the query plan would help in areas where there are data patterns in a database that the ANALYZE command can't pick up because it is not designed too. Imagine you have a street map database ordered by zip, street, number. The primary order is zipcode, the secondary order is street. There is a relationship of number to street, and zip to street. The analyzer, at least the last time I checked, does not recognize these relationships. So, a search by street and number would probably use a sequential scan rather than the street index. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Index Tuning Features
Mark Woodward [EMAIL PROTECTED] writes: The analyzer, at least the last time I checked, does not recognize these relationships. The analyzer is imperfect but arguing from any particular imperfection is weak because someone will just come back and say we should work on that problem -- though I note nobody's actually volunteering to do so whereas they appear to be for hints. I think the stronger argument is to say that there are some statistical properties that the analyzer _cannot_ be expected to figure out. Either because a) they're simply too complex to ever expect to be able to find automatically, b) too expensive to make it worthwhile in the general case, or c) because of some operational issue such as the data changing frequently enough that the analyzes that would be necessary to keep the statistics up to date would become excessively expensive or even be impossible to perform rapidly enough. Well, from a purely data domain standpoint, it is impossible to charactize the exact nature of a data set without enough information to recreate it. Anything less must be designed for a fixed set of assumptions. There is no way that every specific trend can be covered by a fixed number of assumptions. The argument that all we need is better statistics completely misses the point. There will *always* be a number cases where the planner will not work optimally. I would say that a simpler planner with better hints will always be capable of creating a better query plan. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Index Tuning Features
Mark Woodward [EMAIL PROTECTED] writes: I would say that a simpler planner with better hints will always be capable of creating a better query plan. This is demonstrably false: all you need is an out-of-date hint, and you can have a worse plan. That doesn't make it false, it makes it higher maintenance. Hints are understood to require maintenance. The argument against hints is not about whether someone could knock together a crappy hint facility and be able to get some use out of it. It is about how much work it would take to design a *good* hint facility that makes it easy to maintain hints that are robust in the face of data and query changes. If someone were to sit down and design and build such a thing, it'd very likely get accepted into core Postgres --- but personally, I think the equivalent amount of effort would be better spent on improving the planner and the statistics. While it is always true that something can be improved, there comes a point where work outweighs benefits. I can't say that the planner is at that point, but I think that isn't even an issue. The notion of hints would probably one of the biggest steps toward improving the planner. Like I said, it is inarguable that there will always be queries that the planner can not execute efficiently based on the statistics gathered by analze. Since that number must be greater than zero, some methodology to deal with it should be created. As Josh already noted, Oracle-like hints are pretty likely to get rejected ... not only because of doubts about their true usefulness, but out of fear of falling foul of some Oracle patent or other. Well, if it would get rejected if it looked like Oracle, assuming you would probably be one of the people rejecting it, what do you envision as not being rejected? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Hints WAS: Index Tuning Features
Mark, First off, I'm going to request that you (and other people) stop hijacking Simon's thread on hypothetical indexes. Hijacking threads is an effective way to get your ideas rejected out of hand, just because the people whose thread you hijacked are angry with you. So please observe the thread split, thanks. Well, if it would get rejected if it looked like Oracle, assuming you would probably be one of the people rejecting it, what do you envision as not being rejected? Something better than Oracle. Since you're the one who wants hints, that's kind of up to you to define. Write a specification and make a proposal. What is the point of writing a proposal if there is a threat of will be rejected if one of the people who would do the rejection doesn't at least outline what would be acceptable? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Hints WAS: Index Tuning Features
Since you're the one who wants hints, that's kind of up to you to define. Write a specification and make a proposal. What is the point of writing a proposal if there is a threat of will be rejected if one of the people who would do the rejection doesn't at least outline what would be acceptable? Oh come on Mark, you have been here long enough to know how this works. Exactly. IMHO, it is a frustrating environment. PostgreSQL is a great system, and while I completely respect the individuals involved, I think the management for lack of a better term, is difficult. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Index Tuning Features
Simon Riggs [EMAIL PROTECTED] writes: - RECOMMEND command Similar in usage to an EXPLAIN, the RECOMMEND command would return a list of indexes that need to be added to get the cheapest plan for a particular query (no explain plan result though). Both of these seem to assume that EXPLAIN results, without EXPLAIN ANALYZE results to back them up, are sufficient for tuning. I find this idea a bit dubious, particularly for cases of marginal indexes. I think the idea of virtual indexes is pretty interesting, but ultimately a lesser solution to a more fundimental issue, and that would be hands on control over the planner. Estimating the effect of an index on a query prior to creating the index is a great idea, how that is done is something different than building concensus that it should be done. Another thing that this brings up is hints to a query. Over the years, I have run into situation where the planner wasn't great. It would be nice to try forcing different strategies on the planner and see if performance caan be improved. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Upgrading a database dump/restore
Mark Woodward [EMAIL PROTECTED] writes: Whenever someone actually writes a pg_upgrade, we'll institute a policy to restrict changes it can't handle. IMHO, *before* any such tool *can* be written, a set of rules must be enacted regulating catalog changes. That one is easy: there are no rules. We already know how to deal with catalog restructurings --- you do the equivalent of a pg_dump -s and reload. Any proposed pg_upgrade that can't cope with this will be rejected out of hand, because that technology was already proven five years ago. The issues that are actually interesting have to do with the contents of user tables and indexes, not catalogs. It is becomming virtually impossible to recreate databases. Data storage sizes are increasing faster than the transimssion speeds of the media on which they are stored or the systems by which they are connected. The world is looking at a terabyte as merely a very large database these days. tens of terabytes are not far from being common. Dumping out a database is bad enough, but that's only the data, and that can takes (mostly) only hours. Recreating a large database with complex indexes can take days or hours for the data, hours per index, it adds up. No one could expect that this could happen by 8.2, or the release after that, but as a direction for the project, the directors of the PostgreSQL project must realize that the dump/restore is becomming like the old locking vacuum problem. It is a *serious* issue for PostgreSQL adoption and arguably a real design flaw. If the barrier to upgrade it too high, people will not upgrade. If people do not upgrade, then older versions will have to be supported longer or users will have to be abandoned. If users are abandoned and there are critical bugs in previous versions of PostgreSQL, then user who eventually have to migrate their data, they will probably not use PostgreSQL in an attempt to avoid repeating this situation. While the economics of open source/ free software are different, there is still a penalty for losing customers, and word of mouth is a dangerous thing. Once or twice in the customers product usage history can you expect to get away with this sort of inconvenience, but if every new major version requres a HUGE process, then the TCO of PostgreSQL gets very high indeed. If it is a data format issue, maybe there should be a forum for a next gen version of the current data layout that is extensible without restructuring. This is not something that a couple people can go off and do and submit a patch, it is something that has to be supported and promoted from the core team, otherwise it won't happen. We all know that. The question is whether or not you all think it is worth doing. I've done consulting work for some very large companies that everyone has heard of. These sorts of things matter. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Upgrading a database dump/restore
On Mon, Oct 09, 2006 at 11:50:10AM -0400, Mark Woodward wrote: That one is easy: there are no rules. We already know how to deal with catalog restructurings --- you do the equivalent of a pg_dump -s and reload. Any proposed pg_upgrade that can't cope with this will be rejected out of hand, because that technology was already proven five years ago. snip Dumping out a database is bad enough, but that's only the data, and that can takes (mostly) only hours. Recreating a large database with complex indexes can take days or hours for the data, hours per index, it adds up. I think you missed the point of the email you replied to. *catalog* changes are quick and (relativly) easy. Even with 10,000 tables, it would only take a few moments to rewrite the entire catalog to a new version. If it is a data format issue, maybe there should be a forum for a next gen version of the current data layout that is extensible without restructuring. This is not something that a couple people can go off and do and submit a patch, it is something that has to be supported and promoted from the core team, otherwise it won't happen. We all know that. Actually, the data format is not the issue either. The tuple structure hasn't changed that often. What has changed is the internal format of a few types, but postgresql could support both the old and the new types simultaneously. There has already been a statement from core-members that if someone comes up with a tool to handle the catalog upgrade, they'd be willing to keep code from older types around with the original oid so they'd be able to read the older version. That's good to know. The question is whether or not you all think it is worth doing. I've done consulting work for some very large companies that everyone has heard of. These sorts of things matter. People are working it, someone even got so far as dealing with most catalog upgrades. The hard part going to be making sure that even if the power fails halfway through an upgrade that your data will still be readable... Well, I think that any *real* DBA understands and accepts that issues like power failure and hardware failure create situations where suboptimal conditions exist. :-) Stopping the database and copying the pg directory addresses this problem, upon failure, it is a simple mv bkdir pgdir, gets you started again. If you have a system on a good UPS and on reliable hardware, which is exactly the sort of deployment that would benefit most from an in place upgrade. There is no universal panacea where there is 0 risk, one can only mitigate risk. That being said, it should be the preferred method of upgrade with new versions not being released untill they can migrate cleanly. A dump/restore should be a last resort. Don't you think? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Upgrading a database dump/restore
Mark, No one could expect that this could happen by 8.2, or the release after that, but as a direction for the project, the directors of the PostgreSQL project must realize that the dump/restore is becomming like the old locking vacuum problem. It is a *serious* issue for PostgreSQL adoption and arguably a real design flaw. directors? (looks around) Nobody here but us chickens, boss. If you're really interested in pg_upgrade, you're welcome to help out. Gavin Sherry, Zdenek, and Jonah Harris are working on it (the last separately, darn it). This is the most frustrating thing, I *wan't* to do these things, but I can't find any companies that are willing to pay me to do it, and having kids, I don't have the spare time to do it. I *have* a recommendations system already, but I can't even find the time to do the NetFlix Prize thing. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Upgrading a database dump/restore
Mark Woodward [EMAIL PROTECTED] writes: Not to cause any arguments, but this is sort a standard discussion that gets brought up periodically and I was wondering if there has been any softening of the attitudes against an in place upgrade, or movement to not having to dump and restore for upgrades. Whenever someone actually writes a pg_upgrade, we'll institute a policy to restrict changes it can't handle. But until we have a credible upgrade tool it's pointless to make any such restriction. (Credible means able to handle system catalog restructurings, IMHO --- without that, you'd not have any improvement over the current rules for minor releases.) IMHO, *before* any such tool *can* be written, a set of rules must be enacted regulating catalog changes. If there are no rules and no process by which changes get approved, requiring a was is conversion strategy, then the tools has to change with every major version, which will, of course, put it at risk of losing support in the long term. Like I said, I understand the reluctance to do these things, it isn't an easy thing to do. Designing and planning for the future is, however, the hallmark of a good engineer. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Query Failed, out of memory
I am using the netflix database: Table public.ratings Column | Type | Modifiers +--+--- item | integer | client | integer | day| smallint | rating | smallint | The query was executed as: psql -p 5435 -U pgsql -t -A -c select client, item, rating, day from ratings order by client netflix netflix.txt My question, it looks like the kernel killed psql, and not postmaster. The postgresql log file complained about a broken pipe. Question, is this a bug in psql? It took over 4 hours of run time before the crash. dmesg: Free pages: 13192kB (112kB HighMem) Active:124664 inactive:124330 dirty:0 writeback:0 unstable:0 free:3298 slab:2188 mapped:248080 pagetables:1939 DMA free:12160kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:12602 all_unreclaimable? yes protections[]: 0 0 0 Normal free:920kB min:928kB low:1856kB high:2784kB active:438608kB inactive:437656kB present:901120kB pages_scanned:978318 all_unreclaimable? yes protections[]: 0 0 0 HighMem free:112kB min:128kB low:256kB high:384kB active:60176kB inactive:59536kB present:131008kB pages_scanned:134673 all_unreclaimable? yes protections[]: 0 0 0 DMA: 6*4kB 3*8kB 3*16kB 3*32kB 3*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12160kB Normal: 0*4kB 1*8kB 7*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 920kB HighMem: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 112kB Swap cache: add 548633, delete 548633, find 11883/13748, race 0+0 0 bounce buffer pages Free swap:0kB 262128 pages of RAM 32752 pages of HIGHMEM 3593 reserved pages 608 pages shared 0 pages swap cached Out of Memory: Killed process 9143 (psql). ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Query Failed, out of memory
Mark Woodward [EMAIL PROTECTED] writes: psql -p 5435 -U pgsql -t -A -c select client, item, rating, day from ratings order by client netflix netflix.txt My question, it looks like the kernel killed psql, and not postmaster. Not too surprising. Question, is this a bug in psql? It's really a libpq design issue: since libpq provides random access to a PGresult, and has no mechanism for handling failures after returning the PGresult to the client, it has to slurp the whole query result into memory first. FWIW, there's a feature in CVS HEAD to instruct psql to try to use a cursor to break up huge query results like this. For the moment I'd suggest using COPY instead. That's sort of what I was afraid off. I am trying to get 100 million records into a text file in a specific order. Sigh, I have to write a quick program to use a cursor. :-( ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Query Failed, out of memory
On Thu, Oct 05, 2006 at 11:56:43AM -0400, Mark Woodward wrote: The query was executed as: psql -p 5435 -U pgsql -t -A -c select client, item, rating, day from ratings order by client netflix netflix.txt My question, it looks like the kernel killed psql, and not postmaster. The postgresql log file complained about a broken pipe. Question, is this a bug in psql? It took over 4 hours of run time before the crash. Well, psql tried to store the entire resultset in memory at once, and failed. I'm not sure how many records you were trying to display, but try to estimate how much memory that would take to store... What were you trying to do? It's the stupid NetFlix prize thing, I need to dump out the data in a specific order. This is just *one* such query I want to try. I guess, like I told Tom, I have to write a small program that uses a cursor. :-( ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Query Failed, out of memory
FWIW, there's a feature in CVS HEAD to instruct psql to try to use a cursor to break up huge query results like this. For the moment I'd suggest using COPY instead. That's sort of what I was afraid off. I am trying to get 100 million records into a text file in a specific order. Sigh, I have to write a quick program to use a cursor. :-( Why don't you try the psql client from 8.2beta1 then? This way you don't have to write the program yourself and you're helping out with beta testing as well :-) See FETCH_COUNT in http://developer.postgresql.org/pgdocs/postgres/app-psql.html Well, maybe next time, it only took about 10 minutes to write. It is a simple program. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Query Failed, out of memory
Tom Lane wrote: Mark Woodward [EMAIL PROTECTED] writes: psql -p 5435 -U pgsql -t -A -c select client, item, rating, day from ratings order by client netflix netflix.txt FWIW, there's a feature in CVS HEAD to instruct psql to try to use a cursor to break up huge query results like this. For the moment I'd suggest using COPY instead. but COPY doesn't guarantee any order. BTW, I just this morning discovered the hard way that our linux boxes didn't have strict memory allocation turned on, and then went and set it. I'd advise Mark to do the same, if he hasn't already. Yea, I've been toying with the idea of that setting lately, I can't for the life of me understand why it isn't the default behavior. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Query Failed, out of memory
On Thu, 2006-10-05 at 14:53 -0400, Luke Lonergan wrote: Is that in the release notes? Yes: Allow COPY to dump a SELECT query (Zoltan Boszormenyi, Karel Zak) I remember this discussion, it is cool when great features get added. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Upgrading a database dump/restore
Not to cause any arguments, but this is sort a standard discussion that gets brought up periodically and I was wondering if there has been any softening of the attitudes against an in place upgrade, or movement to not having to dump and restore for upgrades. I am aware that this is a difficult problem and I understand that if there is a radical restructuring of the database then a dump/restore is justified, but wouldn't it be a laudable goal to *not* require this with each new release? Can't we use some release as a standard who's binary format shall not be changed. I know the arguments about predicting the future, and all, but standards and stability are important too. I'm not saying it should never ever change or never ever require a dump/restore, but make it, as policy, difficult to get past the group and the norm not to require d/r. The issue is that as disks get bigger and bigger, databases get bigger and bigger, and this process becomes more and more onerous. If you haven't noticed, data transmission speeds are not accelerating at the rate disk space is growing. I am currently building a project that will have a huge number of records, 1/2tb of data. I can't see how I would ever be able to upgrade PostgreSQL on this system. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Upgrading a database dump/restore
Mark Woodward wrote: I am currently building a project that will have a huge number of records, 1/2tb of data. I can't see how I would ever be able to upgrade PostgreSQL on this system. Slony will help you upgrade (and downgrade, for that matter) with no downtime at all, pretty much. Of course, you do need double the resources You other suggestion of setting the on disk format in high viscosity jello, if not in concrete, seems doomed to failure. Cool features that you and other people want occasionally rely on format changes. I disagree with the all or nothing attitude, I'm generally a pragmatist. It is unreasonable to expect that things will never change, by the same token, never attempting to standardize or enforce some level of stability is equally unreasonable. From an enterprise DB perspective, a d/r of a database is a HUGE process and one that isn't taken lightly. I just think that an amount of restraint in this area would pay off well. Of course, you don't have to upgrade every release. Many people (including me) don't. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Upgrading a database dump/restore
Indeed. The main issue for me is that the dumping and replication setups require at least 2x the space of one db. That's 2x the hardware which equals 2x $$$. If there were some tool which modified the storage while postgres is down, that would save lots of people lots of money. Its time and money. Stoping a database and staring with new software is a lot faster than dumping the data out (disallowing updates or inserts) and restoring the data can take hours or days *and* twice the hardware. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Netflix Prize data
I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded their data and have imported it into PostgreSQL. Here is how I created the table: Table public.ratings Column | Type | Modifiers +-+--- item | integer | client | integer | rating | integer | rdate | text| Indexes: ratings_client btree (client) ratings_item btree (item) [EMAIL PROTECTED]:~/netflix$ time psql netflix -c select count(*) from ratings count --- 100480507 (1 row) real2m6.270s user0m0.004s sys 0m0.005s The one thing I notice is that it is REAL slow. I know it is, in fact, 100 million records, but I don't think PostgreSQL is usually slow like this. I'm going to check with some other machines to see if there is a problem with my test machine or if something is wierd about PostgreSQL and large numbers of rows. I tried to cluster the data along a particular index but had to cancel it after 3 hours. I'm using 8.1.4. The rdate field looks something like: 2005-09-06 So, the raw data is 23 bytes, the date string will probably be rounded up to 12 bytes, that's 24 bytes per row of data. What is the overhead per variable? per row? Is there any advantage to using varchar(10) over text ? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Netflix Prize data
I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded their data and have imported it into PostgreSQL. Here is how I created the table: I signed up as well, but have the table as follows: CREATE TABLE rating ( movie SMALLINT NOT NULL, person INTEGER NOT NULL, rating SMALLINT NOT NULL, viewed DATE NOT NULL ); I also recommend not loading the entire file until you get further along in the algorithm solution. :) Not that I have time to really play with this As luck would have it, I wrote a recommendations system based on music ratings a few years ago. After reading the NYT article, it seems as though one or more of the guys behind Net Perceptions is either helping them or did their system, I'm not sure. I wrote my system because Net Perceptions was too slow and did a lousy job. I think the notion of communities in general is an interesting study in statistics, but every thing I've seen in the form of bad recommendations shows that while [N] people may share certain tastes, but that doesn't nessisarily mean that what one likes the others do. This is especially flawed with movie rentals because it is seldom a 1:1 ratio of movies to people. There are often multiple people in a household. Also, movies are almost always for multiple people. Anyway, good luck! (Not better than me, of course :-) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Netflix Prize data
Mark Woodward [EMAIL PROTECTED] writes: The one thing I notice is that it is REAL slow. How fast is your disk? Counting on my fingers, I estimate you are scanning the table at about 47MB/sec, which might or might not be disk-limited... I'm using 8.1.4. The rdate field looks something like: 2005-09-06 So why aren't you storing it as type date? You are assuming I gave it any thought at all. :-) I converted it to a date type (create table ratings2 as ) [EMAIL PROTECTED]:~/netflix/download$ time psql -c select count(*) from ratings netflix count --- 100480507 (1 row) real1m29.852s user0m0.002s sys 0m0.005s That's about the right increase based on the reduction in data size. OK, I guess I am crying wolf, 47M/sec isn't all that bad for the system. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Netflix Prize data
Greg Sabino Mullane [EMAIL PROTECTED] writes: CREATE TABLE rating ( movie SMALLINT NOT NULL, person INTEGER NOT NULL, rating SMALLINT NOT NULL, viewed DATE NOT NULL ); You would probably be better off putting the two smallints first followed by the integer and date. Otherwise both the integer and the date field will have an extra two bytes of padding wasting 4 bytes of space. If you reorder the fields that way you'll be down to 28 bytes of tuple header overhead and 12 bytes of data. There's actually another 4 bytes in the form of the line pointer so a total of 44 bytes per record. Ie, almost 73% of the disk i/o you're seeing is actually per-record overhead. That's good advice, however, It is said that Netflix has greater than 64K movies, so, while the test info may work with a small int, I doubt the overall system would work. The rating, however, is one char 1~9. Would making it a char(1) buy anything? In wonder If I started screwing around with movie ID and rating, and moved them into one int. One byte for rating, three bytes for movie ID. That could reduce the data size by at least half gig. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Netflix Prize data
Mark Woodward [EMAIL PROTECTED] writes: The rating, however, is one char 1~9. Would making it a char(1) buy anything? No, that would actually hurt because of the length word for the char field. Even if you used the char type, which really is only one byte, you wouldn't win anything because of alignment issues. Personally I'd just go for three ints and a date, rather than trying to be cute with the rating. Actually, the date is just days, right? I don't actualy need it too much. So, create a small int for date and do this: smalldate = date('1970-01-01') - rdate. And use small int for rating. Column | Type | Modifiers +--+--- movie | integer | client | integer | day| smallint | rating | smallint | ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Mapping arbitriary and heirachical XML to tuple
I have a system by which I store complex data in PostgreSQL as an XML string. I have a simple function that can return a single value. I would like to return sets and sets of rows from the data. This is not a huge problem, as I've written a few of these functions. The question I'd like to put out there, is how would you represent heirarchical data as: foo bar ndx0/ndx val1.00/val meta2.5/meta froboz3.5/froboz klude item5/item life10/life /kludge /bar bar ndx1/ndx val1.10/val meta2.2/meta froboz3.53/froboz klude item3/item life9/life /kludge /bar /bar The biggest problem with XML is storing data is easy, getting it back out in a sane way is less so. How would you guys think to represent this? (Obviously, this is a bogus example, real life would be much worse!) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] update/insert,
On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote: Mark, I don't know how it will exactly works in postgres but my expectations are: Mark Woodward wrote: Is there a difference in PostgreSQL performance between these two different strategies: if(!exec(update foo set bar='blahblah' where name = 'xx')) exec(insert into foo(name, bar) values('xx','blahblah'); or The update code generates new tuple in the datafile and pointer has been changed in the indexfile to the new version of tuple. This action does not generate B-Tree structure changes. If update falls than insert command creates new tuple in the datafile and it adds new item into B-Tree. It should be generate B-Tree node split. Actually, not true. Both versions will generate a row row and create a new index tuple. The only difference may be that in the update case the may be a ctid link from the old version to the new one, but that's about it... Which is faster will probably depends on what is more common in your DB: row already exists or not. If you know that 99% of the time the row will exist, the update will probably be faster because you'll only execute one query 99% of the time. OK, but the point of the question is that constantly updating a single row steadily degrades performance, would delete/insery also do the same? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] update/insert, delete/insert efficiency WRT vacuum and MVCC
Is there a difference in PostgreSQL performance between these two different strategies: if(!exec(update foo set bar='blahblah' where name = 'xx')) exec(insert into foo(name, bar) values('xx','blahblah'); or exec(delete from foo where name = 'xx'); exec(insert into foo(name, bar) values('xx','blahblah'); In my session handler code I can do either, but am curious if it makes any difference. Yes, name is unique. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 09:10, kirjutas Mark Woodward: Ãœhel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce Momjian: Jonah H. Harris wrote: On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote: What I see in this discussion is a huge amount of the grass must be greener on the other side syndrome, and hardly any recognition that every technique has its downsides and complications. I'm being totally objective. I don't think we should abandon PostgreSQL's overall design at all, because we do perform INSERTs and DELETEs much better than most systems. However, I've looked at many systems and how they implement UPDATE so that it is a scalable operation. Sure, there are costs and benefits to each implementation, but I think we have some pretty brilliant people in this community and can come up with an elegant design for scalable UPDATEs. I think the UPDATE case is similar to the bitmap index scan or perhaps bitmap indexes on disk --- there are cases we know can not be handled well by our existing code, so we have added (or might add) these features to try to address those difficult cases. Not really. Bitmap index scan and bitmap index are both new additions working well with existing framework. While the problem of slowdown on frequent updates is real, the suggested fix is just plain wrong, as it is based on someones faulty assumption on how index lookup works, and very much simplified view of how different parts of the system work to implement MVCC. Yes, the suggestion was based on MVCC concepts, not a particular implementation. On the contrary - afaik, it was loosely based on how Oracle does it with its rollback segments, only assuming that rollback segments are kept in heap and that indexes point only to the oldest row version :p Well, give me a little more credit than that. Yes, Oracle did play small part in my thinking, but only in as much as they can't do it, why can't we? The problem was how to get the most recent tuple to be more efficient and not have tuples that will never be used impact performance without excessive locking or moving data around. It was a just a quick idea. Bruce's solution, you have to admit, is somewhat similar. The original fix he suggests was to that imagined behaviour and thus ignored all the real problems of such change. The original suggestion, was nothing more than a hypothetical for the purpose of discussion. The problem was the steady degradation of performance on frequent updates. That was the point of discussion. I brought up one possible way to start a brain storm. The discussion then morphed into critisizing the example and not addressing the problem. The problem is heatedly discussed every 3-4 months. And yet, here we are again. Anyway, I think some decent discussion about the problem did happen, and that is good. Agreed. Maybe this _was_ the best way to bring up the discussion again. I have a way, for better or worse, I guess, of stirring up the pot. :-) Cry as we may about MySQL, but I have a sneaking suspicion that this is one of the issues that puts PostgreSQL at a serious disadvantage. While heavily updated rows are a single type of problem, these days I think *most* database deployments are as back-ends for web sites. This problem is *very* critical to that type of application, consequently probably why PostgreSQL has difficulty in that space. If PostgreSQL can be made *not* to suffer performance degradation on heavily updated rows, then that is realy the last issue in the way of it being a completely creadible medium to large enterprise back end. This combined with its amazing pragramability, should make it unstoppable. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
On Fri, Jun 23, 2006 at 06:37:01AM -0400, Mark Woodward wrote: While we all know session data is, at best, ephemeral, people still want some sort of persistence, thus, you need a database. For mcache I have a couple plugins that have a wide range of opitions, from read/write at startup and shut down, to full write through cache to a database. In general, my clients don't want this, they want the database to store their data. When you try to explain to them that a database may not be the right place to store this data, they ask why, sadly they have little hope of understanding the nuances and remain unconvinced. Have you done any benchmarking between a site using mcache and one not? I'll bet there's a huge difference, which translates into hardware $$. That's something managers can understand. Last benchmark I did was on a pure data level, a couple years ago, PostgreSQL could handle about 800 session transactions a second, but degraded over time, MCache was up about 7500 session transactions a second and held steady. I should dig up that code and make it available on my site. I have a couple users that tell me that their sites couldn't work without it, not even with MySQL. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] SO_SNDBUF size is small on win32?
I would set the SO_SNDBUF to 32768. Hi, I see a performance issue on win32. This problem is causes by the following URL. http://support.microsoft.com/kb/823764/EN-US/ On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is 8192 too. pqcomm.c:117 #define PQ_BUFFER_SIZE 8192 send() may take as long as 200ms. So, I think we should increase SO_SNDBUF to more than 8192. I attache the patch. Regards, -- Yoshiyuki Asaba [EMAIL PROTECTED] Index: pqcomm.c === RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v retrieving revision 1.184 diff -c -r1.184 pqcomm.c *** pqcomm.c 5 Mar 2006 15:58:27 - 1.184 --- pqcomm.c 27 Jun 2006 15:17:18 - *** *** 593,598 --- 593,608 return STATUS_ERROR; } + #ifdef WIN32 + on = PQ_BUFFER_SIZE * 2; + if (setsockopt(port-sock, SOL_SOCKET, SO_SNDBUF, +(char *) on, sizeof(on)) 0) + { + elog(LOG, setsockopt(SO_SNDBUF) failed: %m); + return STATUS_ERROR; + } + #endif + /* * Also apply the current keepalive parameters. If we fail to set a * parameter, don't error out, because these aren't universally ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] SO_SNDBUF size is small on win32?
We have definitly seen weird timing issues sometimes when both client and server were on Windows, but have been unable to pin it exactly on what. From Yoshiykis other mail it looks like this could possibly be it, since he did experience a speedup in the range we've been looking for in those cases. What I would think might help is a patch on the libpq side (because it *does* use a nonblocking socket) to avoid sending more than 8K per WSASend call. The effect would just be to break a long send into a series of shorter sends, which wouldn't really do anything useful on a well-designed TCP stack, but then this is Windows we're talking about... It could definitly be a good idea to have a patch there *as well*, but I think they'd both be affected. As I said earlier, I would boost the socket buffer to something larger than merely 2x the packet size. I'd try for 32K (32768), that way we have some space for additional buffers before we hit the problem. It is presumed that we should have enough data in the socket buffer to at least try to match the expected amount of data that would be sent while waiting for the defered ACK. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce Momjian: Jonah H. Harris wrote: On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote: What I see in this discussion is a huge amount of the grass must be greener on the other side syndrome, and hardly any recognition that every technique has its downsides and complications. I'm being totally objective. I don't think we should abandon PostgreSQL's overall design at all, because we do perform INSERTs and DELETEs much better than most systems. However, I've looked at many systems and how they implement UPDATE so that it is a scalable operation. Sure, there are costs and benefits to each implementation, but I think we have some pretty brilliant people in this community and can come up with an elegant design for scalable UPDATEs. I think the UPDATE case is similar to the bitmap index scan or perhaps bitmap indexes on disk --- there are cases we know can not be handled well by our existing code, so we have added (or might add) these features to try to address those difficult cases. Not really. Bitmap index scan and bitmap index are both new additions working well with existing framework. While the problem of slowdown on frequent updates is real, the suggested fix is just plain wrong, as it is based on someones faulty assumption on how index lookup works, and very much simplified view of how different parts of the system work to implement MVCC. Yes, the suggestion was based on MVCC concepts, not a particular implementation. The original fix he suggests was to that imagined behaviour and thus ignored all the real problems of such change. The original suggestion, was nothing more than a hypothetical for the purpose of discussion. The problem was the steady degradation of performance on frequent updates. That was the point of discussion. I brought up one possible way to start a brain storm. The discussion then morphed into critisizing the example and not addressing the problem. Anyway, I think some decent discussion about the problem did happen, and that is good. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Heikki Linnakangas wrote: On Mon, 26 Jun 2006, Jan Wieck wrote: On 6/25/2006 10:12 PM, Bruce Momjian wrote: When you are using the update chaining, you can't mark that index row as dead because it actually points to more than one row on the page, some are non-visible, some are visible. Back up the truck ... you mean in the current code base we have heap tuples that are visible in index scans because of heap tuple chaining but without index tuples pointing directly at them? In current code, no. Every heap tuple has corresponding index tuples. In Bruce's proposal, yes. You would have heap tuples without index tuples pointing directly at them. An index scan could only find them by following t_ctid chains. Correct me if I understood you incorrectly, Bruce. Correct! We use the same pointers used by normal UPDATEs, except we set a bit on the old tuple indicating it is a single-index tuple, and we don't create index entries for the new tuple. Index scan routines will need to be taught about the new chains, but because only one tuple in the chain is visible to a single backend, the callers should not need to be modified. (All tuples in the chain have page item ids. It is just that when they are freed, the pointers are adjusted so the index points to the chain head.) One problem is that once you find the row you want to update, it is difficult to see if it is part of a single-index chain because there are only forward pointers, so I think we have to scan the entire page to find the chains. To reduce that overhead, I am thinking we free the non-visible tuples only when the page has no more free space. This allows us to free not just our own non-visible tuples, but perhaps others as well. This sort of incorporates the vacuum row I suggested. We have never been able to free non-visible tuples before because of index cleanup overhead, but with single-index chains, we can, and reduce the requirements of vacuum for many workloads. This is great! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum row?
On 6/24/06, Mark Woodward [EMAIL PROTECTED] wrote: I originally suggested a methodology for preserving MVCC and everyone is confusing it as update in place, this isnot what I intended. Actually, you should've presented your idea as performing MVCC the way Firebird does... the idea is basically the same. Doing some research never hurts... especially with this crowd. Is it really nessisary make personal comments like this? Lets discuss ideas not personalities or people. The whole issue was how to address updates steadily degrading performance. I wanted to brainstorm the issue and find a solution. I tossed out a first guess at an algorithm to start the ball rolling. Was it perfect? No. Was it intended to be? no. It was intended to spark a discussion, get people, first to recognize the problem, and then to think about possible solutions. I find that this, while chaotic, usually finds the best solutions. There are a lot of good and smart people here who understand this process and see it for what it is. Unfortunately, some don't. It isn't about research, per se, because it is assumed that we all know the various issues involved to some degree. It is about using the collective knowledge of the group and coming up with an answer. Over email, this can sometimes come off badly, and for that I appologize, but imagine, we were sitting at a table in cambridge brewing company, and we had laptops and pitchers of beer and were discussing the problem. I'm at a stark disadvantage as I use PostgreSQL a lot, but don't have the luxury of being able to work on it in any real depth. I'd really love too. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
On 6/24/2006 9:23 AM, Mark Woodward wrote: On Sat, 24 Jun 2006, Mark Woodward wrote: I'm probably mistaken, but aren't there already forward references in tuples to later versions? If so, I'm only sugesting reversing the order and referencing the latest version. I thought I understood your idea, but now you lost me again. I thought what you want is that the older heap tuple has a pointer to the newer one. Which it already has, it's called t_ctid. Perfect! Can you try to explain more carefully how the whole thing would work? What would an index tuple point to? What pointers would a heap tuple have? What would an index scan do to find the row version it's interested in? What exactly would an update do? Since we already allocate space for some notion of linked list, then all I'm suggesting is we reverse the order, sort of. Currently it looks like this: ver001-ver002-ver003-...-verN That's what t_ctid does now, right? Well, that's sort of stupid. Why not have it do this: ver001-verN-...-ver003-ver002-| ^-/ This will speed up almost *all* queries when there are more than two version of rows. OK, here is the behavior of an update: (1) Find the latest version of the row (2) Duplicate row and modify as per plan (3) Set the t_ctid of the new row to the last latest (4) Set the t_ctid of the first row to that of the new row (5) Attempt to index the row (6) If the first version of the row is in the index already (ver001) Don't modify the index, otherwise, add the new version (just as before) When you vacuum, simply make the latest version (verN) the key row (ver001). This isn't done simply. Currently, vacuum collects a trivial array of ctid's it is removing and every now and then does a bulk remove of the index tuples pointing to them. Now lets consider a table with two indexed columns with the following row versions resulting from an insert and 3 updates to that same row: v1: a,b v2: a,c v3: a,d v4: b,d In your new scheme, there would be two index tuples for column 1 (one pointing to v1, one pointing to v4) and 3 index tuples for column 2 (one for each different value pointing to v1, v2 and v3). Did I get that right so far? If vacuum now can remove v1, it has to update index 1 to point to v2 and remove the pointer to v1 from index 2. If it can remove v1 and v2, it has to update index 1 to point to v3 and remove v1 and v2 from index 2. If it can remove v1, v2 and v3 it must delete the index 1 tuple pointing to v1, delete the index 2 entries pointing to v1 and v2 and update the index 2 entry for v3 to point to v4. Figuring out which index tuples to remove and which ones to update can only be done by comparing each and every indexed columns old and new values. To do so, vacuum will have to fetch all the row versions, which can be scattered all over the place, with all possible locking issues including but not limited to deadlocks. I'm not sure why vacuum can't run similarly to the way it does now. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
On 6/23/2006 3:10 PM, Mark Woodward wrote: This is NOT an in-place update. The whole MVCC strategy of keeping old versions around doesn't change. The only thing that does change is one level of indirection. Rather than keep references to all versions of all rows in indexes, keep only a reference to the first or key row of each row, and have the first version of a row form the head of a linked list to subsequent versions of each row. The list will be in decending order. Where exactly do you intend to keep all those links (for a table with N indexes)? I'm probably mistaken, but aren't there already forward references in tuples to later versions? If so, I'm only sugesting reversing the order and referencing the latest version. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
On Sat, 24 Jun 2006, Mark Woodward wrote: I'm probably mistaken, but aren't there already forward references in tuples to later versions? If so, I'm only sugesting reversing the order and referencing the latest version. I thought I understood your idea, but now you lost me again. I thought what you want is that the older heap tuple has a pointer to the newer one. Which it already has, it's called t_ctid. Perfect! Can you try to explain more carefully how the whole thing would work? What would an index tuple point to? What pointers would a heap tuple have? What would an index scan do to find the row version it's interested in? What exactly would an update do? Since we already allocate space for some notion of linked list, then all I'm suggesting is we reverse the order, sort of. Currently it looks like this: ver001-ver002-ver003-...-verN That's what t_ctid does now, right? Well, that's sort of stupid. Why not have it do this: ver001-verN-...-ver003-ver002-| ^-/ This will speed up almost *all* queries when there are more than two version of rows. OK, here is the behavior of an update: (1) Find the latest version of the row (2) Duplicate row and modify as per plan (3) Set the t_ctid of the new row to the last latest (4) Set the t_ctid of the first row to that of the new row (5) Attempt to index the row (6) If the first version of the row is in the index already (ver001) Don't modify the index, otherwise, add the new version (just as before) When you vacuum, simply make the latest version (verN) the key row (ver001). There are, no doubt, issues that need to be resolved (I can think of a coouple off the top of my head), but overall I think it is workable and don't think this will affect performance in the simple case and improve performance in the cases where there are more than one or two version of a row. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
On 6/24/06, Mark Woodward [EMAIL PROTECTED] wrote: Currently it looks like this: ver001-ver002-ver003-...-verN That's what t_ctid does now, right? Well, that's sort of stupid. Why not have it do this: ver001-verN-...-ver003-ver002-| Heh, because that's crazy. The first time you insert a key into the index it will point to v1 of a tuple... say after 5 updates you have v2,v3,v4,v5... your c_tid pointer chain looks like v1 (original)-v2-v3-v4-v5 (newest). However, your whole idea is based on not having to do another index insert for unchanged keys, so the index still points to v1... which means you have to follow the c_tid chain to get to the newest version just like a sequential scan. I don't see how you think you can reverse pointer it. In the scenario, as previously outlined: ver001-verN-...-ver003-ver2-| ^-/ The index points to version 1 (ver001) which points to the latest version (verN). This will speed up almost *all* queries when there are more than two version of rows. Nope. Of course it will. When you vacuum, simply make the latest version (verN) the key row (ver001). How are you going to do this without a ton of locking... remember, the index is pointing to v1 with a tid... so you'll have to physically move the newest version v5 to v1's tid from wherever it was... like a vacuum full on steroids. Unless of course, you rebuild the index... but that's not a solution either. I don't understand how you can assume this. In fact, it wil proably reduce locking and disk IO by not having to modify indexes. \ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] vacuum, performance, and MVCC
On 6/24/06, Mark Woodward [EMAIL PROTECTED] wrote: In the scenario, as previously outlined: ver001-verN-...-ver003-ver2-| ^-/ So you want to always keep an old version around? Prior to vacuum, it will be there anyway, and after vacuum, the new version will become ver001. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
On 6/24/06, Mark Woodward [EMAIL PROTECTED] wrote: On 6/24/06, Mark Woodward [EMAIL PROTECTED] wrote: In the scenario, as previously outlined: ver001-verN-...-ver003-ver2-| ^-/ So you want to always keep an old version around? Prior to vacuum, it will be there anyway, and after vacuum, the new version will become ver001. So you do intend to move verN into ver001's slot? What about the other conditions you had mentioned where you have to follow PostgreSQL's current behavior? How are you going to have a pointer chain in that case? Who said anything about moving anything. When vacuum comes along, it cleans out previous versions of rows. Very little will change. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] vacuum row?
I originally suggested a methodology for preserving MVCC and everyone is confusing it as update in place, this isnot what I intended. How about a form of vacuum that targets a particular row? Is this possible? Would if have to be by transaction? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
The example is a very active web site, the flow is this: query for session information process HTTP request update session information This happens for EVERY http request. Chances are that you won't have concurrent requests for the same row, but you may have well over 100 HTTP server processes/threads answering queries in your web server farm. You're crazy :) Use memcache, not the DB :) I actually have what I consider a better and more complete session handler system. MCache formally MSession. (http://www.mohawksoft.org/?q=node/8) I mean, it implements a LOT of nifty features, loadable function modules, collision safe counters and operators, ability to save session data to file or SQL database and at varying levels of caching, but that doesn't mean it is used. Technologies like memcached and my mcache are a separate data store. Your session data is not usable anywhere but in your web system. I have gone as far as to write a session serializer for PHP that outputs XML, a PostgreSQL plugin that can extract data from the XML session string, and a set of functions for interfacing mcache with PostgreSQL and I still have a hard time convincing clients that this is the right way to go. While we all know session data is, at best, ephemeral, people still want some sort of persistence, thus, you need a database. For mcache I have a couple plugins that have a wide range of opitions, from read/write at startup and shut down, to full write through cache to a database. In general, my clients don't want this, they want the database to store their data. When you try to explain to them that a database may not be the right place to store this data, they ask why, sadly they have little hope of understanding the nuances and remain unconvinced. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
I suppose you have a table memberships (user_id, group_id) or something like it ; it should have as few columns as possible ; then try regularly clustering on group_id (maybe once a week) so that all the records for a particular group are close together. Getting the members of a group to send them an email should be faster (less random seeks). It is like this, and some more bookkeeping data which must be there... we could split the table for smaller records or for updatable/stable fields, but at the end of the day it doesn't make much sense, usually all the data is needed and I wonder if more big/shallow tables instead of one big/wider makes sense... Regularly clustering is out of question as it would render the system unusable for hours. There's no 0 activity hour we could use for such stuff. There's always something happening, only the overall load is smaller at night... Let me ask a question, you have this hundred million row table. OK, how much of that table is read/write? Would it be posible to divide the table into two (or more) tables where one is basically static, only infrequent inserts and deletes, and the other is highly updated? The big thing in performance is the amount of disk I/O, if you have a smaller active table with only a single index, then you may be able to cut your disk I/O time really down. The smaller the row size, the more rows fit into a block. The fewer blocks the less dissk I/O. The less disk I/O the bbetter the performance. Also, and anyone listening correct me if I'm wrong, you NEED to vacuum frequently because the indexes grow and vacuuming them doesnt remove everything, sometimes a REINDEX or a drop/recreate is the only way to get performance back. So if you wait too long between vacuums, your indexes grow and spread across more disk blocks than they should and thus use more disk I/O to search and/or shared memory to cache. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Let me ask a question, you have this hundred million row table. OK, how much of that table is read/write? Would it be posible to divide the table into two (or more) tables where one is basically static, only infrequent inserts and deletes, and the other is highly updated? Well, all of it is read write... some of the data might be updated less frequently, but there's no way I would know which part of the data is that. Logically is just the same type of data... so unless I find a way to continuously move back and forth the data between an archive table and the live table, based on how active the groups are, I can't imagine any other way of partitioning it. And that would also mean some quite big load given the pretty high dynamics of the groups. The big thing in performance is the amount of disk I/O, if you have a smaller active table with only a single index, then you may be able to cut your disk I/O time really down. The smaller the row size, the more rows fit into a block. The fewer blocks the less dissk I/O. The less disk I/O the bbetter the performance. I agree, but it is quite hard to achieve that when the data set is both big AND the partitioning criteria is highly dynamic. Not to mention that deleting from that table is also a PITA performance-wise, so I wonder how well the continuous back and forth between the active and inactive table would do. Also, and anyone listening correct me if I'm wrong, you NEED to vacuum frequently because the indexes grow and vacuuming them doesnt remove everything, sometimes a REINDEX or a drop/recreate is the only way to get performance back. So if you wait too long between vacuums, your indexes grow and spread across more disk blocks than they should and thus use more disk I/O to search and/or shared memory to cache. This is nice in theory, but kills performance. I vacuum the big tables only overnight, otherwise the server is sluggish. Well, the only thing left is to cluster the database. There are a couple ways to do this, one switch to a platform that supports clustering or create an API to wrap multiple databases. If your queries are simple and limited, you could create an HTTP/XML service that wraps a number of postgresql databases, issues a query across all databases, merges multiple query sets, and returns one homoginous stream. Inserts would be handled by hash to machine weighted by number of records on each machine. Updates and deletes would have two keys, machine and ID. It sounds like you have a big problem and you need a big solution. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward: Depending on exact details and optimisations done, this can be either slower or faster than postgresql's way, but they still need to do something to get transactional visibility rules implemented. I think they have a different strategy. I think they maintain the notion of current version of a row, and hunt for previous versions when needed, at least that's how I suspect Oracle does it with redo logs. Not current but last :) True And one side effect of redo logs is that it is practically impossible to do large deletes on production databases. So you design around that, like you have to design around limitations of MVCC. Think that's bad, try doing an update in PostgreSQL on a table with 20 million rows and a few indexes. I had to write a script to chunk up the block update into segments and vacuum between each. There has to be a more linear way of handling this scenario. So vacuum the table often. It's easy to say VACUUM often... but I'd bet that vacuuming is going to lessen the throughput in his tests even more; no matter how it's tuned. Running VACUUM often/continuously will likely keep his update rate fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra load. At least if vacuum is configured right and the server is not already running at 100% IO saturation, in which case it will be worse. Assuming the table is a reasonable size, the I/O required for vacuum doesn't kill everything else! I have solved the problem of unneccessary IO by keeping active and finished rows in separate tables, with the finish() function moving the row between tables. Sorry, an RDBMS is a relational database management system, if you are doing the database management, it isn't a very good RDBMS. In case of the number of actively modified rows being in only tens or low hundreds of thousands of rows, (i.e. the modified set fits in memory) the continuous vacuum process shows up as just another backend, not really taking order of magnitude more resources. It mainly generates WAL traffic, as modified pages are already in memory/cache and are mostly synced by background writer and/or checkpoint. Of course you have to adjust vacuum_cost_* variables so as to not saturate IO. These sort of solutions, IMHO, don't show how good PostgreSQL is, but show where it is very lacking. The max throughput figure is not something you actually need very often in production. No, but you need to have some degree of certainty and predictability in the system you are developing. Yup. You have to design it so it has. I was refereing to the system as a whole and the individual components. PostgreSQL's performance under some pathalogical condictions is not very predictable or reliable. What is interesting is setting up the server so that you can service your loads comfortably. Running the server at 100% lead is not anything you want to do on production server. There will be things you need to do anyway and you need some headroom for that. Of course, you design it so peaks are easily managed, but unless you run vacuum continuously, and that has its own set of problems, you run into this problem, and it can get really really bad. Usually it gets really bad if you *don't* run vacuum continuously, maybe hopeing to do it in slower times at night. For high-update db you have to run it continuously, maybe having some 5-15 sec pauses between runs. And how much I/O does this take? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
Mark Woodward wrote: In case of the number of actively modified rows being in only tens or low hundreds of thousands of rows, (i.e. the modified set fits in memory) the continuous vacuum process shows up as just another backend, not really taking order of magnitude more resources. It mainly generates WAL traffic, as modified pages are already in memory/cache and are mostly synced by background writer and/or checkpoint. Of course you have to adjust vacuum_cost_* variables so as to not saturate IO. These sort of solutions, IMHO, don't show how good PostgreSQL is, but show where it is very lacking. We all know Postgres is lacking; some of us try to improve it (some with more success than others). People who know the current limitations but like the capabilities, try to find workarounds to the problems. What surprises me is that, if you have such a low opinion of Postgres, you still use it. Actually I love PostgreSQL, I've been using it for about 10 years on a lot of projects. There are some serious issues with it, however, and it is important to expose them, discuss them, and resolve them. Work arounds are great, but in the end, they are work arounds. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Bottom line: there's still lots of low-hanging fruit. Why are people feeling that we need to abandon or massively complicate our basic architecture to make progress? regards, tom lane I, for one, see a particularly nasty unscalable behavior in the implementation of MVCC with regards to updates. For each update to a row additional work needs to be done to access that row. Surely a better strategy can be done, especially considering that the problem being solved is a brief one. The only reason why you need previous versions of a row is for transactions that started before or during the transaction that seeks to modify a row. After which time, the previous versions continue to affect performance and take up space even though they are of no value. (Caveats for rollback, etc. but the point is still valid). This is a very pessimistic behavior and penalizes the more common and optimistic operations. Now, if a tool were to be created that could roll back an entire database to some arbitrary transaction ID between vacuums, then I can see the usefulnes of the older versions. I still think an in-place indirection to the current row could fix the problem and speed up the database, there are some sticky situations that need to be considered, but it shouldn't break much. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
On 6/23/06, Mark Woodward [EMAIL PROTECTED] wrote: I, for one, see a particularly nasty unscalable behavior in the implementation of MVCC with regards to updates. I think this is a fairly common acceptance. The overhead required to perform an UPDATE in PostgreSQL is pretty heavy. Actually, it's not really PostgreSQL's implementation, but anything that employs basic multi-version timestamp ordering (MVTO) style MVCC. Basically, MVTO-style systems require additional work to be done in an UPDATE so that queries can find the most current row more quickly. This is a very pessimistic behavior Yes, and that's basically the point of MVTO in general. The nice thing about MVTO-style MVCC is that it isn't super complicated. No big UNDO strategy is needed because the old versions are always there and just have to satisfy a snapshot. I still think an in-place indirection to the current row could fix the problem and speed up the database, there are some sticky situations that need to be considered, but it shouldn't break much. I agree, but should make clear that moving to an in-place update isn't a quick-fix; it will require a good amount of design and planning. This is NOT an in-place update. The whole MVCC strategy of keeping old versions around doesn't change. The only thing that does change is one level of indirection. Rather than keep references to all versions of all rows in indexes, keep only a reference to the first or key row of each row, and have the first version of a row form the head of a linked list to subsequent versions of each row. The list will be in decending order. In the vast majority of cases, the overhead of this action will be trivial. In an unmodified row, you're there. In a modified row, you have one extra lookup. In extream cases, you may have to go back a few versions, but I don't see that as a common behavior. On a heavily updated row, you are never more than one jump away, the indexes shouldn't grow overly much. What I find in these discussions is that we always talk about over complicating vacuum in order to fix the poor behavior in MVCC. Fixing autovacuum does not eliminate the overhead required to add index entries and everything associated with performing an UPDATE... it's just cleaning up the mess after the fact. As I see it, fixing the root problem by moving to update-in-place may add a little more complication to the core, but will eliminate a lot of the headaches we have in overhead, performance, and manageability. Vacuum is a tool for removing old versions. I think there is an overly eager tendency to have it fix other problems. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
Tom Lane wrote: If you're doing heavy updates of a big table then it's likely to end up visiting most of the table anyway, no? There is talk of keeping a map of dirty pages, but I think it'd be a win for infrequently-updated tables, not ones that need constant vacuuming. I think a lot of our problems in this area could be solved with fairly straightforward tuning efforts on the existing autovacuum infrastructure. In particular, someone should be looking into recommendable default vacuum-cost-delay settings so that a background vacuum doesn't affect performance too much. Another problem with the current autovac infrastructure is that it doesn't respond very well to the case where there are individual tables that need constant attention as well as many that don't. If you have N databases then you can visit a particular table at most once every N*autovacuum_naptime seconds, and *every* table in the entire cluster gets reconsidered at that same rate. I'm not sure if we need the ability to have multiple autovac daemons running at the same time, but we definitely could use something with a more flexible table-visiting pattern. Perhaps it would be enough to look through the per-table stats for each database before selecting the database to autovacuum in each cycle, instead of going by least recently autovacuumed. Bottom line: there's still lots of low-hanging fruit. Why are people feeling that we need to abandon or massively complicate our basic architecture to make progress? I think at some point we have to admit that _polling_ the tables, which is what autovacuum does, just isn't going to work well, no matter how much it is tweeked, and another approach should be considered for certain workload cases. Thank you, that is *eactly* the anaology I have been unable to formulate. It was on my mind but I could not put my finger on it. Vacuum is findimentally inefficient as it does not know what has changed and must go through an entirety to find the specific each time. Going through the whole table each time is messed up and wasteful. At some point, the autovacuum approach starts to look like a car with fifty bumper stickers. The first few were fine, but at some point, the tweeks (bumper stickers) start to overwhelm the car, and it is time to look for a new car. I think particularly for the UPDATE with no index key changes, a new approach must be considred. I have been ranting about a first row strategy, one where the first version of a row is the top of a linked list of versions. (1) The indexes point to the first key row. (2) When a row is updated, it is found in the various indexes, if the key row currenlty exists in the index, no changes to the index are made. If it is not found, the old version of the row is orphaned and behaves as PostgreSQL always behaves. (3) If the row is not orphaned, its last version reference is updated. For the most part, this should only affect updates where the index entries don't change. If the index value is always change, PostgreSQL will behave as it currently does. If the index values do not change, updates will be faster to do and won't impact queries. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Just out of curiosity Mark, didn't you write your session daemon so that you don't have to put sessions in postgres anymore? The original project started as a shared key/value system for a beowulf cluster in the late 90s, but got reworked to be a session handler for PHP when I worked with Stig, MSession. Or are you just giving that as an example of a very wide, very heavily updated table? My session tables have been an extreme case of this problem, but no other table that I have is so adversely affected by this behavior. My decision was not to pull postgres out entirely, just using other session handlers. I have been working as a consultant since 2001, and prior to that, as CTO at at a dot.com startup. MSession (the previous name) was used to circumvent shortcomings in PostgreSQL, specificially the problem we are talking about. As a consultant, I have to convince the customer that all is well. My MCache system does not guarentee that no session data lost, nor does memcached or other non-ACID system. The technical arguments we can make, no matter how correct, leave us on the defensive when asked What if the server crashes, do you lose data? of course the answer is yes. Then we get drawn into a conversation about transient and unimportant data vs persistent and valuable data. At which point you've lost the customer. A solid SQL database is the defacto standard, perhaps not the best choice, but unavoidable. The update behavior of PostgreSQL is probably the *last* serious issue. Debate all you want, vacuum mitigates the problem to varying levels, fixing the problem will be a huge win. If the update behavior gets fixed, I can't think of a single issue with postgresql that would be a show stopper. Rick On Jun 22, 2006, at 7:59 AM, Mark Woodward wrote: After a long battle with technology, [EMAIL PROTECTED] (Mark Woodward), an earthling, wrote: Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward) mumbled into her beard: [snip] 1. The index points to all the versions, until they get vacuumed out. It can't point to all versions, it points to the last current version as updated by vacuum, or the first version of the row. No, it points to *all* the versions. Suppose I take a table with two rows: INFO: analyzing public.test INFO: test: 1 pages, 2 rows sampled, 2 estimated total rows VACUUM Then, over and over, I remove and insert one entry with the same PK: sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 [snip] Now, I vacuum it. sample=# vacuum verbose analyze test; INFO: vacuuming public.test INFO: index test_id_key now contains 2 row versions in 2 pages DETAIL: 10 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: test: removed 10 row versions in 1 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: test: found 10 removable, 2 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.test INFO: test: 1 pages, 2 rows sampled, 2 estimated total rows VACUUM Notice that the index contained 10 versions of that one row. It pointed to *ALL* the versions. Hmm, OK, then the problem is more serious than I suspected. This means that every index on a row has to be updated on every transaction that modifies that row. Is that correct? I am attaching some code that shows the problem with regard to applications such as web server session management, when run, each second the system can handle fewer and fewer connections. Here is a brief output: [EMAIL PROTECTED]:~/pgfoo$ ./footest 1307 sessions per second, elapsed: 1 1292 sessions per second, elapsed: 2 1287 sessions per second, elapsed: 3 1216 sessions per second, elapsed: 25 1213 sessions per second, elapsed: 26 1208 sessions per second, elapsed: 27 1192 sessions per second, elapsed: 36 1184 sessions per second, elapsed: 37 1183 sessions per second, elapsed: 38 1164 sessions per second, elapsed: 58 1170 sessions per second, elapsed: 59 1168 sessions per second, elapsed: 60 As you can see, in about a minute at high load, this very simple table lost about 10% of its performance, and I've seen worse based on update frequency. Before you say this is an obscure problem, I can tell you it isn't. I have worked with more than a few projects that had to switch away from PostgreSQL because of this behavior. Obviously this is not a problem with small sites, but this is a real problem with an enterprise level web site with millions of visitors and actions a day. Quite frankly it is a classic example of something that does not scale. The more and more updates there are, the higher the load becomes. You can see it on top as the footest program runs. There has to be a more
Re: [HACKERS] vacuum, performance, and MVCC
Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward) mumbled into her beard: We all know that PostgreSQL suffers performance problems when rows are updated frequently prior to a vacuum. The most serious example can be seen by using PostgreSQL as a session handler for a busy we site. You may have thousands or millions of active sessions, each being updated per page hit. Each time the record is updated, a new version is created, thus lengthening the correct version search each time row is accessed, until, of course, the next vacuum comes along and corrects the index to point to the latest version of the record. Is that a fair explanation? No, it's not. 1. The index points to all the versions, until they get vacuumed out. It can't point to all versions, it points to the last current version as updated by vacuum, or the first version of the row. 2. There may simultaneously be multiple correct versions. The notion that there is one version that is The Correct One is wrong, and you need to get rid of that thought. Sorry, this is misunderstanding. By correct version search it was implied for this transaction. Later I mention finding the first row with a transaction lower than the current. If my assertion is fundimentally true, then PostgreSQL will always suffer performance penalties under a heavy modification load. Of course, tables with many inserts are not an issue, it is mainly updates. The problem is that there are classes of problems where updates are the primary operation. The trouble with your assertion is that it is true for *all* database systems except for those whose only transaction mode is READ UNCOMMITTED, where the only row visible is the Latest version. Not true. Oracle does not seem to exhibit this problem. I was thinking, just as a hypothetical, what if we reversed the problem, and always referenced the newest version of a row and scanned backwards across the versions to the first that has a lower transacton number? That would require an index on transaction number, which is an additional data structure not in place now. That would presumably worsen things. All things being equal, perhaps not. It would proably be a loser if you have a static database, but in a database that undergoes modification, it would be the same or less work if the average row has two versions. (assuming nothing else changes) One possible implementation: PostgreSQL could keep an indirection array of index to table ref for use by all the indexes on a table. The various indexes return offsets into the array, not direct table refs. Because the table refs are separate from the index, they can be updated each time a transaction is commited. You mean, this index would be VACUUMed as a part of each transaction COMMIT? I can't see that turning out well... No, it would not be vacuumed!!! Right now, the indexes point to the lowest row version. When an index returns the row ID, it is checked if there are newer versions, if so, the newer versions are searched until the last one is found or exceeds the current TID. This way, the newest version of a row is always the first row found. Also, on a heavily updated site, the most used rows would always be at the end of the table, reducing amount of disk reads or cache memory required to find the correct row version for each query. I can't see how it follows that most-used rows would migrate to the end of the table. Sorry, OK, as assumtion it ignores the FSM, but the idea is that there is only one lookup. That would only be true in a database that is never VACUUMed; as soon as a VACUUM is done, free space opens up in the interior, so that new tuples may be placed in the interior. Regardless, the point is that you have to search the [N] versions of a row to find the latest correct version of the row for your transacation. This is done, AFAICT, from first to last version, meaning that the work required to find a row increases with every update prior to vacuum. PostgreSQL fails miserably as a web session handler because of this behavior and it requires too frequent vacuums and inconsistent performance. OK, forget the version array, it was just an off the top idea. How about this: Currently a row does this: row_TID[0] - row_TID[1] -row_TID[2] ./. row_TID[LAST-1] - row_TID[LAST] Pointing to each subsequent row. What if it did this: row_TID[0] - row_TID[LAST] - row_TID[LAST-1] ./. - row_TID[2] - row_TID[1] The base tuple of a version chain gets updated to point to the latest commited row. It should be fairly low impact on performance on a static database, but REALLY speed up PostgreSQL on a heavily modified database and provide more consistent performance between vacuums and require fewer vacuums to maintain performance. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command
Re: [HACKERS] vacuum, performance, and MVCC
After a long battle with technology, [EMAIL PROTECTED] (Mark Woodward), an earthling, wrote: Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward) mumbled into her beard: [snip] 1. The index points to all the versions, until they get vacuumed out. It can't point to all versions, it points to the last current version as updated by vacuum, or the first version of the row. No, it points to *all* the versions. Suppose I take a table with two rows: INFO: analyzing public.test INFO: test: 1 pages, 2 rows sampled, 2 estimated total rows VACUUM Then, over and over, I remove and insert one entry with the same PK: sample=# delete from test where id = 2;insert into test (id) values (2); DELETE 1 [snip] Now, I vacuum it. sample=# vacuum verbose analyze test; INFO: vacuuming public.test INFO: index test_id_key now contains 2 row versions in 2 pages DETAIL: 10 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: test: removed 10 row versions in 1 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: test: found 10 removable, 2 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.test INFO: test: 1 pages, 2 rows sampled, 2 estimated total rows VACUUM Notice that the index contained 10 versions of that one row. It pointed to *ALL* the versions. Hmm, OK, then the problem is more serious than I suspected. This means that every index on a row has to be updated on every transaction that modifies that row. Is that correct? I am attaching some code that shows the problem with regard to applications such as web server session management, when run, each second the system can handle fewer and fewer connections. Here is a brief output: [EMAIL PROTECTED]:~/pgfoo$ ./footest 1307 sessions per second, elapsed: 1 1292 sessions per second, elapsed: 2 1287 sessions per second, elapsed: 3 1216 sessions per second, elapsed: 25 1213 sessions per second, elapsed: 26 1208 sessions per second, elapsed: 27 1192 sessions per second, elapsed: 36 1184 sessions per second, elapsed: 37 1183 sessions per second, elapsed: 38 1164 sessions per second, elapsed: 58 1170 sessions per second, elapsed: 59 1168 sessions per second, elapsed: 60 As you can see, in about a minute at high load, this very simple table lost about 10% of its performance, and I've seen worse based on update frequency. Before you say this is an obscure problem, I can tell you it isn't. I have worked with more than a few projects that had to switch away from PostgreSQL because of this behavior. Obviously this is not a problem with small sites, but this is a real problem with an enterprise level web site with millions of visitors and actions a day. Quite frankly it is a classic example of something that does not scale. The more and more updates there are, the higher the load becomes. You can see it on top as the footest program runs. There has to be a more linear way of handling this scenario. #include stdio.h #include stdlib.h #include stdarg.h #include libpq-fe.h #include sys/times.h #include assert.h #include string.h // Create these items in a database named sessionfoo // create table foo_sessions(session_id text, session_values text); // create index foo_sessions_ndx on foo_sessions(session_id); #define MAX_SESSIONS 100 char *sessions[MAX_SESSIONS]; void PQexecClear(PGconn *pg, char *sql) { PGresult *pgres = PQexec(pg, sql); assert(pgres); PQclear(pgres); } void createRandSession(PGconn *pg, int ndx) { char session[64]; char qbuf[256]; snprintf(session, sizeof(session), %8X-%8X-%8X, time(0), rand(), times(NULL)); snprintf(qbuf, sizeof(qbuf), insert into foo_sessions(session_id, session_values)values('%s','%08X'), session, times(NULL)); PQexecClear(pg,qbuf); sessions[ndx] = strdup(session); } void updateSession(PGconn *pg, int ndx) { PGresult *pgres; char qbuf[256]; char *session = sessions[ndx]; snprintf(qbuf, sizeof(qbuf), select * from foo_sessions where session_id = '%s', session); PQexecClear(pg,qbuf); snprintf(qbuf, sizeof(qbuf), update foo_sessions set session_values = '%08X' where session_id = '%s', times(NULL), session); PQexecClear(pg,qbuf); } int main() { int startTime; int loopTime; int count=0; int i; PGresult * pgres; PGconn *pg = PQconnectdb(dbname=sessionfoo); assert(pg); PQexecClear(pg, delete from foo_sessions); PQexecClear(pg, vacuum foo_sessions); for(i=0; i MAX_SESSIONS; i++) createRandSession(pg,i); loopTime = time(0); while(loopTime == time(0)) // Wait for a fraction ; startTime = loopTime = time(0); while(1) { int theTime = time(0); if(loopTime != theTime) { loopTime = theTime; printf(%d sessions per second, elapsed: %d\n, count, loopTime-startTime
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, N, 2006-06-22 kell 09:59, kirjutas Mark Woodward: After a long battle with technology, [EMAIL PROTECTED] (Mark Woodward), an earthling, wrote: Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward) mumbled into It pointed to *ALL* the versions. Hmm, OK, then the problem is more serious than I suspected. This means that every index on a row has to be updated on every transaction that modifies that row. Is that correct? Yes. I am attaching some code that shows the problem with regard to applications such as web server session management, when run, each second the system can handle fewer and fewer connections. Here is a brief output: [EMAIL PROTECTED]:~/pgfoo$ ./footest 1307 sessions per second, elapsed: 1 1292 sessions per second, elapsed: 2 1287 sessions per second, elapsed: 3 1216 sessions per second, elapsed: 25 1213 sessions per second, elapsed: 26 1208 sessions per second, elapsed: 27 1192 sessions per second, elapsed: 36 1184 sessions per second, elapsed: 37 1183 sessions per second, elapsed: 38 1164 sessions per second, elapsed: 58 1170 sessions per second, elapsed: 59 1168 sessions per second, elapsed: 60 As you can see, in about a minute at high load, this very simple table lost about 10% of its performance, and I've seen worse based on update frequency. Before you say this is an obscure problem, I can tell you it isn't. I have worked with more than a few projects that had to switch away from PostgreSQL because of this behavior. You mean systems that are designed so exactly, that they can't take 10% performance change ? No, that's not really the point, performance degrades over time, in one minute it degraded 10%. The update to session ratio has a HUGE impact on PostgreSQL. If you have a thousand active sessions, it may take a minute to degrade 10% assuming some level of active vs operations per session per action. If an active user causes a session update once a second, that is not too bad, but if an active user updates a session more often, then it is worse. Generally speaking, sessions aren't updated when they change, they are usually updated per HTTP request. The data in a session may not change, but the session handling code doesn't know this and simply updates anyway. In a heavily AJAX site, you may have many smaller HTTP requests returning items in a page. So, a single page may consist of multiple HTTP requests. Worse yet, as a user drags an image around, there are lots of background requests being made. Each request typically means a session lookup and a session update. This is compounded by the number of active users. Since the object of a site is to have many active users, this is always a problem. It is less intrusive now that non-locking vacuum is there, but that doesn't mean it isn't a problem. Or just that they did not vacuum for so long, that performance was less than needed in the end? In an active site or application, vacuuming often enough to prevent this often is, itself, a load on the system. btw, what did they switch to ? One switched to oracle and one is using a session handler I wrote for PHP. One company I did work for tried to maintain a table with a single row that indicated state, this single row would sometimes take more than a second to query. It was horrible. I'm not sure what they ended up using, but I wrote a shared memory variable C function got rid of that specific problem. They were trying to use PostgreSQL as the database to implement a HUGE redundent networked file system. My personal opinion was that there biggest problem was that they decided to use Java as the programming environment, but that's another issue. Obviously this is not a problem with small sites, but this is a real problem with an enterprise level web site with millions of visitors and actions a day. On such site you should design so that db load stays below 50% and run vacuum often, that may even mean that you run vacuum continuously with no wait between runs. If you run vacuum with right settings, Yea, but that, at least in my opinion, is a poor design. Quite frankly it is a classic example of something that does not scale. The more and more updates there are, the higher the load becomes. You can see it on top as the footest program runs. Yes, you understood correctly - the more updates, the higher the load :) Imagine this: Each row in a table has a single entry that represents that row. Lets call it the key entry. Whether or not the key entry maintains data is an implementation detail. When indexing a table, the index always points to the key entry for a row. When a row is updated, in the spirit of MVCC, a new data row is created. The key entry is then updated to point to the new version of the row. The new row points to the previous version of the row, and the previous entry continues to point to its previous entry, etc. When a row is found by the index, the key entry
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, N, 2006-06-22 kell 10:20, kirjutas Jonah H. Harris: On 6/22/06, Alvaro Herrera [EMAIL PROTECTED] wrote: Hmm, OK, then the problem is more serious than I suspected. This means that every index on a row has to be updated on every transaction that modifies that row. Is that correct? Add an index entry, yes. Again, this is a case for update-in-place. No need to write an extra index entry and incur the WAL associated with it. I guess that MySQL on its original storage does that, but they allow only one concurrent update per table and no transactions. Imagine a table with 3 indexes on it... I would estimate that we perform at least 3 to 6 times more overhead than any commercial database on such an update. One way to describe what commercial databases do to keep constant update rates is saying that they do either vacuuming as part of update, or they just use locks anf force some transactions to wait or fail/retry. Depending on exact details and optimisations done, this can be either slower or faster than postgresql's way, but they still need to do something to get transactional visibility rules implemented. I think they have a different strategy. I think they maintain the notion of current version of a row, and hunt for previous versions when needed, at least that's how I suspect Oracle does it with redo logs. There has to be a more linear way of handling this scenario. So vacuum the table often. It's easy to say VACUUM often... but I'd bet that vacuuming is going to lessen the throughput in his tests even more; no matter how it's tuned. Running VACUUM often/continuously will likely keep his update rate fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra load. At least if vacuum is configured right and the server is not already running at 100% IO saturation, in which case it will be worse. Assuming the table is a reasonable size, the I/O required for vacuum doesn't kill everything else! The max throughput figure is not something you actually need very often in production. No, but you need to have some degree of certainty and predictability in the system you are developing. What is interesting is setting up the server so that you can service your loads comfortably. Running the server at 100% lead is not anything you want to do on production server. There will be things you need to do anyway and you need some headroom for that. Of course, you design it so peaks are easily managed, but unless you run vacuum continuously, and that has its own set of problems, you run into this problem, and it can get really really bad. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
Christopher Browne [EMAIL PROTECTED] writes: After a long battle with technology, [EMAIL PROTECTED] (Mark Woodward), an earthling, wrote: Not true. Oracle does not seem to exhibit this problem. Oracle suffers a problem in this regard that PostgreSQL doesn't; in Oracle, rollbacks are quite expensive, as recovery requires doing extra work that PostgreSQL doesn't do. The Oracle design has got other drawbacks: if you need to access a row version other than than the very latest, you need to go searching in the rollback segments for it. This is slow (no index help) and creates significant amounts of contention (since lots of processes are competing to touch the rollback segments). But, it is all probability, in most cases, the VAST majority, older versions aren't much needed outside the concurency of of active transactions. Plus there's the old bugaboo that long-running transactions require indefinite amounts of rollback space, and Oracle is apparently unable to enlarge that space on-the-fly. (This last seems like a surmountable problem, but maybe there is some non-obvious reason why it's hard.) Yea, Oracle has a million way to die. And when you think you know all one million, you find one million and one. Basically there's no free lunch: if you want the benefits of MVCC it's going to cost you somewhere. In the Postgres design you pay by having to do VACUUM pretty often for heavily-updated tables. I don't think that decision is fundamentally wrong --- the attractive thing about it is that the overhead is pushed out of the foreground query-processing code paths. Under certain circumstances, it is a very poor design. Think of a single row table that keeps a scoreboard or a session table that keeps a limited number of rows that are updated very frequently. We still have lots of work to do in making autovacuum smarter, avoiding vacuuming parts of relations that have not changed, and so on. But I have no desire to go over to an Oracle-style solution instead. We can't beat them by trying to be like them, and we run no small risk of falling foul of some of their patents if we do. I proposed having a key row entry for each logical row. The key row entry points to the latest version of the row. There, each row entry is a linked list, in descending order, of previous row versions. The vast majority of the time, the latest version will be the first version. It is only when you have a previously started long running or concurrent transaction will you ever look at previous versions. I'm not saying it is an easy slam dunk, as I can think of a few difficulties off the top of my head, but it would solve the steady degradation of performance between vacuums and, to a possibly lesser extent, the cost of updating a row in a heavily indexed table. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
You mean systems that are designed so exactly, that they can't take 10% performance change ? No, that's not really the point, performance degrades over time, in one minute it degraded 10%. The update to session ratio has a HUGE impact on PostgreSQL. If you have a thousand active sessions, it may take a minute to degrade 10% assuming some level of active vs operations per session per action. So don't do an update. Multiple updates to the same row block anyway which is generally not something you want anyway. The example is a very active web site, the flow is this: query for session information process HTTP request update session information This happens for EVERY http request. Chances are that you won't have concurrent requests for the same row, but you may have well over 100 HTTP server processes/threads answering queries in your web server farm. If you INSERT into multiple partitions (by time -- say one per minute) and TRUNCATE periodically (30 minute old partitions for 30 minute expiry) it works much better. Expiring the session is quite fast as well since they'll go away with the truncate. Index on sessionid and time and grab the row with the most recent time. I doubt that that approach (1) answers the problem or (2) would be more efficient. -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
As you can see, in about a minute at high load, this very simple table lost about 10% of its performance, and I've seen worse based on update frequency. Before you say this is an obscure problem, I can tell you it isn't. I have worked with more than a few projects that had to switch away from PostgreSQL because of this behavior. Obviously this is not a problem with small sites, but this is a real problem with an enterprise level web site with millions of visitors and actions a day. Quite frankly it is a classic example of something that does not scale. The more and more updates there are, the higher the load becomes. You can see it on top as the footest program runs. I believe sessions should not be stored in a SQL database. Alas, this is a long debate, and while I fundimentally agree with this position, there is an inconvenient truth that it is often nessisary. http://www.mohawksoft.org/?q=node/8 It makes no sense. Updates and Inserts to the database should only be done where there is an interesting thing to record, when the user does an action like posting to a forum, making a purchase, sending a message, etc. Again, preaching to the chior. I believe sessions should be stored in the memory of the application server, as native objects of the whatever language the application is written in. This way, sessions incur no serializing overhead and can be quite large and complex, which allows storage of interesting things, like the result of a complicated search query which is to be later paginated, for instance. It really makes sense to use native language objects too, as these have a lot more power and versatility than a database row. Think about rights management, for instance. What you seem not to grasp at this point is a large web-farm, about 10 or more servers running PHP, Java, ASP, or even perl. The database is usually the most convenient and, aside from the particular issue we are talking about, best suited. When the framework used lacks this power (most do and this is sad), then sessions incur serializing overhead ; but they should be serialized to filesystem files, or better, to memory using memcached, for instance. I actually have a good number of years of experience in this topic, and memcached or file system files are NOT the best solutions for a server farm. It makes no sense to pay the performance penalty of a COMMIT (disk seek delay etc) and the database overhead for sessions, which are by nature volatile data changing all the time. Very true. I don't think postgres should be tweaked to allow better handling of this. It would only make a better foot-gun. I think the debate is over, it may be a bad use of a database, but there are few alternatives, and SQL databases have become the defacto methodology for dealing with this type of problem. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
What you seem not to grasp at this point is a large web-farm, about 10 or more servers running PHP, Java, ASP, or even perl. The database is usually the most convenient and, aside from the particular issue we are talking about, best suited. The answer is sticky sessions : each user is assigned to one and only one webserver in the cluster and his session is maintained locally, in RAM. No locks, no need to manage distributed session... I actually have a good number of years of experience in this topic, and memcached or file system files are NOT the best solutions for a server farm. If sessions are distributed, certainly, but if sessions are sticky to their own server ? And what if a particulr server goes down? or gets too high a percentage of the load? ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] vacuum, performance, and MVCC
We all know that PostgreSQL suffers performance problems when rows are updated frequently prior to a vacuum. The most serious example can be seen by using PostgreSQL as a session handler for a busy we site. You may have thousands or millions of active sessions, each being updated per page hit. Each time the record is updated, a new version is created, thus lengthening the correct version search each time row is accessed, until, of course, the next vacuum comes along and corrects the index to point to the latest version of the record. Is that a fair explanation? If my assertion is fundimentally true, then PostgreSQL will always suffer performance penalties under a heavy modification load. Of course, tables with many inserts are not an issue, it is mainly updates. The problem is that there are classes of problems where updates are the primary operation. I was thinking, just as a hypothetical, what if we reversed the problem, and always referenced the newest version of a row and scanned backwards across the versions to the first that has a lower transacton number? One possible implementation: PostgreSQL could keep an indirection array of index to table ref for use by all the indexes on a table. The various indexes return offsets into the array, not direct table refs. Because the table refs are separate from the index, they can be updated each time a transaction is commited. This way, the newest version of a row is always the first row found. Also, on a heavily updated site, the most used rows would always be at the end of the table, reducing amount of disk reads or cache memory required to find the correct row version for each query. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Preventing DELETE and UPDATE without a WHERE clause?
On 6/16/06, Mark Woodward [EMAIL PROTECTED] wrote: Chris Campbell [EMAIL PROTECTED] writes: I heard an interesting feature request today: preventing the execution of a DELETE or UPDATE query that does not have a WHERE clause. These syntaxes are required by the SQL spec. Furthermore, it's easy to imagine far-more-probable cases in which the system wouldn't detect that you'd made a mistake, eg DELETE FROM tab WHERE key 1 where you meant to type DELETE FROM tab WHERE key 1000 I suggest counseling your client to learn how to use BEGIN/ROLLBACK. This proposal strikes me as falling squarely within the rule about design a system that even a fool can use, and only a fool will want to use it. Just a theory, couldn't a trigger be set up that would case the query to tank if it touches too many rows? i haven't tried but maybe a FOR STATEMENT trigger AFTER the event can ask ROW_COUNT using GET DIAGNOSTICS? Well, if you *can't do it in a trigger, maybe that's a valid modification for Hackers to consider. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Preventing DELETE and UPDATE without a WHERE clause?
Chris Campbell [EMAIL PROTECTED] writes: I heard an interesting feature request today: preventing the execution of a DELETE or UPDATE query that does not have a WHERE clause. These syntaxes are required by the SQL spec. Furthermore, it's easy to imagine far-more-probable cases in which the system wouldn't detect that you'd made a mistake, eg DELETE FROM tab WHERE key 1 where you meant to type DELETE FROM tab WHERE key 1000 I suggest counseling your client to learn how to use BEGIN/ROLLBACK. This proposal strikes me as falling squarely within the rule about design a system that even a fool can use, and only a fool will want to use it. Just a theory, couldn't a trigger be set up that would case the query to tank if it touches too many rows? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] How to avoid transaction ID wrap
On Wed, Jun 07, 2006 at 07:07:55PM -0400, Mark Woodward wrote: I guess what I am saying is that PostgreSQL isn't smooth, between checkpoints and vacuum, it is near impossible to make a product that performs consistently under high load. Have you tuned the bgwriter and all the vacuum_cost stuff? I've get to find a case where I couldn't smooth out the IO load so that it wasn't an issue. In several project that I have been involved with, PostgreSQL had most of the important features to be used, but in one project, checkpoints caused us to time out under load. In this current project I am researching, I know that vacuum may be an issue. The load is brutally constant. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] How to avoid transaction ID wrap
On Wed, Jun 07, 2006 at 11:47:45AM -0400, Tom Lane wrote: Zdenek Kotala [EMAIL PROTECTED] writes: Koichi Suzuki wrote: I've once proposed a patch for 64bit transaction ID, but this causes some overhead to each tuple (XMIN and XMAX). Did you check performance on 32-bit or 64-bit systems and 64-bit binary version of PGSQL? I think that today is not problem to have 64-bit architecture and 64-bit ID should increase scalability of Postgres. The percentage increase in I/O demand is the main reason the patch was rejected, not so much the arithmetic. Before considering 64 bit XIDs, it'd be very helpful to know why Mark can't vacuum frequently enough to handle rollover... The system is under heavy load, and while there are tricks that can be done, vacuum is a process which is extra load the system when it is running. It is a sliding scale, as always, you may get the system to the point where it can vacuum AND perform as needed, but the database is growing constantly. Eventually you will get to the point where you can't run vacuum *and* keep up with the data stream. I guess what I am saying is that PostgreSQL isn't smooth, between checkpoints and vacuum, it is near impossible to make a product that performs consistently under high load. Now don't flame me, I really do love PostgreSQL, it is just that I bump up against these issues from time to time and it would be nice if there were some way to work around them. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] How to avoid transaction ID wrap
OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. We want to keep about a years worth of data at any specific time. We have triggers that fire is something interesting is found on insert. We want this thing to run for a log time. From the numbers, you can see the PostgreSQL database is VERY loaded. Running VACUUM may not always be possible without losing data. The numbers I have amount to 466,560,000 transactions per month, lasting a maximum of about 9 months until XID wrap. I am thinking about a few work arounds, BEGIN/COMMIT to reduce the number of transactions, COPY, etc. so I'm not dead in the water, but I would be interested in any observations yo may have. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] AGREGATE FUNCTIONS
Hello, I would like to know where in the source-code of postgres is located the code of the aggregate functions min, max, avg. I wish to develop more statistical aggregate functions, and I prefer to use C than to write then in the PL/R. There is a library in contrib called intagg. I wrote it a few years ago, and I have to laugh at the README file because I must have been stoned or something, because I can't understand it. Anyways, if you want to make an aggregate function, it covers what you need. Feel free to ignore the array stuff, because you probably won't need to deal with it. Aggregates have basically two functions, a single function called on every iteration of the query (or GROUP BY) for state. Then there is a function that is called at the end called final. The PostgreSQL docs are pretty good as well. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] How to avoid transaction ID wrap
Mark Woodward wrote: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. Maybe if you grouped multiple operations on bigger transactions, the I/O savings could be enough to buy you the ability to vacuum once in a while. Or consider buffering somehow -- save the data elsewhere, and have some sort of daemon to put it into the database. This would allow to cope with the I/O increase during vacuum. The problem is ssufficiently large that any minor modification can easily hide the problem for a predictble amount of time. My hope was that someone would have a real long term work around. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much
Jim C. Nasby [EMAIL PROTECTED] writes: On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: I'm reading this as just another uninformed complaint about libpq's habit of buffering the whole query result. It's possible that there's a memory leak in the -A path specifically, but nothing said so far provided any evidence for that. Certainly seems like it. It seems like it would be good to allow for libpq not to buffer, since there's cases where it's not needed... See past discussions. The problem is that libpq's API says that when it hands you back the completed query result, the command is complete and guaranteed not to fail later. A streaming interface could not make that guarantee, so it's not a transparent substitution. I wouldn't have any strong objection to providing a separate API that operates in a streaming fashion, but defining it is something no one's bothered to do yet. In practice, if you have to code to a variant API, it's not that much more trouble to use a cursor... Wouldn't the COPY (select ...) TO STDOUT format being discussed solve this for free? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much
Mark Woodward wrote: Jim C. Nasby [EMAIL PROTECTED] writes: On Mon, Jun 05, 2006 at 11:27:30AM -0400, Tom Lane wrote: I'm reading this as just another uninformed complaint about libpq's habit of buffering the whole query result. It's possible that there's a memory leak in the -A path specifically, but nothing said so far provided any evidence for that. Certainly seems like it. It seems like it would be good to allow for libpq not to buffer, since there's cases where it's not needed... See past discussions. The problem is that libpq's API says that when it hands you back the completed query result, the command is complete and guaranteed not to fail later. A streaming interface could not make that guarantee, so it's not a transparent substitution. I wouldn't have any strong objection to providing a separate API that operates in a streaming fashion, but defining it is something no one's bothered to do yet. In practice, if you have to code to a variant API, it's not that much more trouble to use a cursor... Wouldn't the COPY (select ...) TO STDOUT format being discussed solve this for free? It won't solve it in the general case for clients that expect a result set. ISTM that use a cursor is a perfectly reasonable answer, though. I'm not sure I agree -- surprise! psql is often used as a command line tool and using a cursor is not acceptable. Granted, with an unaligned output, perhaps psql should not buffer the WHOLE result at once, but without rewriting that behavior, a COPY from query may be close enough. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] COPY (query) TO file
Tom had posted a question about file compression with copy. I thought about it, and I want to through this out and see if anyone things it is a good idea. Currently, the COPY command only copies a table, what if it could operate with a query, as: COPY (select * from mytable where foo='bar') as BAR TO stdout I have no idea if it is doable, but I can see uses for replication psql -h source mydb -c COPY (select * from mytable where ID x) as mytable TO STDOUT | psql -h target mydb -c COPY mytable FROM stdin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] COPY (query) TO file
Mark Woodward wrote: Tom had posted a question about file compression with copy. I thought about it, and I want to through this out and see if anyone things it is a good idea. Currently, the COPY command only copies a table, what if it could operate with a query, as: COPY (select * from mytable where foo='bar') as BAR TO stdout I have no idea if it is doable, but I can see uses for replication I doubt it be really usefull (apart from maybe saving some work coding a client app) but did you actually test it with create table as select ...; followed by a copy of that table if it really is faster then just the usual select fetch? Why create table? The idea is that you would have one or more redundent databases and use the COPY TO/FROM to keep them up to date. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY (query) TO file
Mark Woodward wrote: Mark Woodward wrote: Tom had posted a question about file compression with copy. I thought about it, and I want to through this out and see if anyone things it is a good idea. Currently, the COPY command only copies a table, what if it could operate with a query, as: COPY (select * from mytable where foo='bar') as BAR TO stdout I have no idea if it is doable, but I can see uses for replication I doubt it be really usefull (apart from maybe saving some work coding a client app) but did you actually test it with create table as select ...; followed by a copy of that table if it really is faster then just the usual select fetch? Why create table? Just to simulate and time the proposal. SELECT ... already works over the network and if COPY from a select (which would basically work like yet another wire protocol) isnt significantly faster, why bother? Because the format of COPY is a common transmiter/receiver for PostgreSQL, like this: pg_dump -t mytable | psql -h target -c COPY mytable FROM STDIN With a more selective copy, you can use pretty much this mechanism to limit a copy to a sumset of the records in a table. The idea is that you would have one or more redundent databases and use the COPY TO/FROM to keep them up to date. Well, if you have databases you would have regular tables - and can use copy as it is now :-) But COPY copies all the records, not some of the records. Regards Tino ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COPY (query) TO file
Mark Woodward wrote: ... create table as select ...; followed by a copy of that table if it really is faster then just the usual select fetch? Why create table? Just to simulate and time the proposal. SELECT ... already works over the network and if COPY from a select (which would basically work like yet another wire protocol) isnt significantly faster, why bother? Because the format of COPY is a common transmiter/receiver for PostgreSQL, like this: pg_dump -t mytable | psql -h target -c COPY mytable FROM STDIN With a more selective copy, you can use pretty much this mechanism to limit a copy to a sumset of the records in a table. Ok, but why not just implement this into pg_dump or psql? Why bother the backend with that functionality? Because COPY runs on the back-end, not the front end, and the front end may not even be in the same city as the backend. When you issue a COPY the file it reads or writes local to the backend. True, the examples I gave may not show how that is important, but consider this: psql -h remote masterdb -c COPY (select * from mytable where ID xxlastxx) as mytable TO '/replicate_backup/mytable-060602.pgc' This runs completely in the background and can serve as a running backup. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY (query) TO file
Mark Woodward wrote: ... pg_dump -t mytable | psql -h target -c COPY mytable FROM STDIN With a more selective copy, you can use pretty much this mechanism to limit a copy to a sumset of the records in a table. Ok, but why not just implement this into pg_dump or psql? Why bother the backend with that functionality? Because COPY runs on the back-end, not the front end, and the front end may not even be in the same city as the backend. When you issue a COPY the file it reads or writes local to the backend. True, the examples I gave may not show how that is important, but consider this: We were talking about COPY to stdout :-) Copy to file is another issue :-) Copy to (server fs) file has so many limitations I dont see wide use for it. (Of course there are usecases) wide use for is not always the same as useful. Sometimes useful is something not easily doable in other ways or completes a feature set. psql -h remote masterdb -c COPY (select * from mytable where ID xxlastxx) as mytable TO '/replicate_backup/mytable-060602.pgc' This runs completely in the background and can serve as a running backup. And you are sure it would be much faster then a server local running psql just dumping the result of a query? No I can't be sure of that at all, but The COPY command has a specific use that is understood and an operation that is separate from the normal query mechanism. (And you could more easy avoid raceconditions in contrast to several remote clients trying to trigger your above backup ) Again, the examples may not have been precise in presenting why, the focus was mostly what so it could be discussed. As a generic feature it has many potential uses. Trying to debate and defend a specific use limits the potential scope of the feature. Why have COPY anyway? Why not just use SELECT * FROM TABLE? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY (query) TO file
Allow COPY to output from views Another idea would be to allow actual SELECT statements in a COPY. Personally I strongly favor the second option as being more flexible than the first. I second that - allowing arbitrary SELECT statements as a COPY source seems much more powerful and flexible than just supporting COPY FROM VIEW. Not to be a sour apple or anything but I don't see how any of this is needed in the backend since we can easily use Psql to do it, or pretty much any other tool. There is an important difference between a capability in the backend vs one synthesized in the frontend. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Possible TODO item: copy to/from pipe
After re-reading what I just wrote to Andreas about how compression of COPY data would be better done outside the backend than inside, it struck me that we are missing a feature that's fairly common in Unix programs. Perhaps COPY ought to have the ability to pipe its output to a shell command, or read input from a shell command. Maybe something like COPY mytable TO '| gzip /home/tgl/mytable.dump.gz'; (I'm not wedded to the above syntax, it's just an off-the-cuff thought.) Of course psql would need the same capability, since the server-side copy would still be restricted to superusers. You can accomplish COPY piping now through psql, but it's a bit awkward: psql -c COPY mytable TO stdout mydb | gzip ... Thoughts? Is this worth doing, or is the psql -c approach good enough? To be honest, I don't see much benefit in it. You can already accomplish what you want to accomplish easily enough. If you want to muck with COPY, I'd like to see it accept a query as: psql -c COPY select * from mytable where foo='bar' TO stdout mydb | gzip ... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] LIKE, leading percent, bind parameters and indexes
On Thu, May 25, 2006 at 08:41:17PM -0300, Rodrigo Hjort wrote: I think more exactly, the planner can't possibly know how to plan an indexscan with a leading '%', because it has nowhere to start. The fact is that index scan is performed on LIKE expression on a string not preceded by '%', except when bound parameter is used. select * from table where field like 'THE NAME%'; -- index scan select * from table where field like '%THE NAME%'; -- seq scan select * from table where field like :bind_param; -- seq scan (always) Since I'm somewhat doubtful of coming up with a generic means for dealing with plan changes based on different bound parameter values any time soon... How difficult would it be to make LIKE check the value of the bound parameter for a starting % and use that information to decide on a query plan? IMHO this is worth making into a special case in the planner, because it's very easy to detect and makes a tremendous difference in the query plan/performance. My solution is a function in one of my libraries called strrev() which returns the reverse of a string. I make a function index of a strrev(field). Then, just search where strrev('%the name') like strrev(field); ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Performance Issues
Dhanaraj M wrote: I have the following doubts. 1. Does postgres create an index on every primary key? Usually, queries are performed against a table on the primary key, so, an index on it will be very useful. Yes, a unique index is used to enforce the primary-key. Well, here is an interesting question that I have suddenly become very curious of, if you have a primary key, obviously a unique index, is it, in fact, use this index regardless of analyzing the table? 2. If 'm executing a complex query and it takes 10 seconds to return the results -- it takes 10 seconds to execute the next time also. I'm wondering if there's any kind of caching that can be enabled -- so, the next time it takes 10 seconds to return the results. Not of query results. Obviously data itself might be cached. You might want to look at memcached for this sort of thing. I am looking at this string of posts and it occurs to me that he should run analyze. Maybe I'm jumping at the wrong point. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] String Similarity
Try contrib/pg_trgm... Tri-graphs are interesting, and I'll try to reconsider whether they fit or not, ut I suspect that do not. (You are the second to recommend it) Anything based on a word parser is probably not appropriate, the example I first gave is a little misleading in that it is not the whole of the problem. Consider this: pinkfloyd darkside of the moon - money Again, we humans see that this string is almost identical to the others. I have a working system right now, and it strips all non alnum() out of the strings, then searches the strings for runs, and compiles a list of runs from longest to shortest. The algorithm is strlen1*strlen2*N where N is the number of runs detected. As you can see it is merely brute force. Secondly, there is a subtle difference between comparing a known string for which you are searching and comparing two arbitrary strings. The known string is assumed to be in some sort of regular form and the string to be compared must break down into that form based on your alorithm. When trying to understand the similarity of two arbitrary strings, you don't always know what similarities are or what the parsing rules should be. ThisIsSomethinThatHumansCanReadButSpaceBasedParsersCanNot. tHISiSaLSOsOMETHING Yo uca nalmos trea dthi s can you see the similarity in these two strings? CanYouSeeTheSimilarityInTheseTwoStrings? Ideally I would metahone the individual words, strip out all the white spaces, and find the run lengths that compare in the two strings, and calculate the similarity based on the number and size of the runs. I do not currently metaphone (It isn't clear to me that endcases can be handled correctly) and I'm not sure how to best calculate similarity. I've been trying best run, total match, and a host of others, but haven't found one I really like. Chris Mark Woodward wrote: I have a side project that needs to intelligently know if two strings are contextually similar. Think about how CDDB information is collected and sorted. It isn't perfect, but there should be enough information to be usable. Think about this: pink floyd - dark side of the moon - money dark side of the moon - pink floyd - money money - dark side of the moon - pink floyd etc. To a human, these strings are almost identical. Similarly: dark floyd of money moon pink side the Is a puzzle to be solved by 13 year old children before the movie starts. My post has three questions: (1) Does anyone know of an efficient and numerically quantified method of detecting these sorts of things? I currently have a fairly inefficient and numerically bogus solution that may be the only non-impossible solution for the problem. (2) Does any one see a need for this feature in PostgreSQL? If so, what kind of interface would be best accepted as a patch? I am currently returning a match liklihood between 0 and 100; (3) Is there also a desire for a Levenshtein distence function for text and varchars? I experimented with it, and was forced to write the function in item #1. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Christopher Kings-Lynne Technical Manager CalorieKing Tel: +618.9389.8777 Fax: +618.9389.8444 [EMAIL PROTECTED] www.calorieking.com ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-advocacy] [HACKERS] [OT] MySQL is bad, but THIS bad?
On Sat, May 20, 2006 at 02:29:01PM +0200, Dawid Kuroczko wrote: On 5/20/06, Lukas Smith [EMAIL PROTECTED] wrote: The improvements to the installer are great, but there simply needs to be a packaged solution that adds more of the things people are very likely to use. From my understanding Bizgres goes in that direction? I just think that whatever highly packaged solution PostgreSQL picks, this should be the download that is pushed at conferences, in articles and books. People with a clue will still know where they can get the clean base. Hmm, a Comprehensive PostgreSQL Archive Network? ;) I mean, something like CPAN, CTAN or CRAN? :) I mean, the -contrib is great, but pushing other things there is a bit tricky (to say the least) from the maintenance point of view. (Every bugfix, a new release of -contrib, etc, etc...). Then again PGfoundry is great to keep development centered, but finding and building a new package is not really a one-liner, and if you're unlucky you might get alpha-quality code installed. :) I don't see any reason why CPgAN would need to change pgFoundry at all. In fact, my thought was that any such system should use pgFoundry as it's backend/repository. I think a CPgAN-like solution would be the best. A uniform method of getting approved Pg extensions. It would simplify installing the extensions, and would encourage distributions to package such extensions. Somebody suggested apt-get install postgresql-contrib. Imagine: apt-get install postgresql-datatype-fqdn apt-get install postgresql-gist-ltree ...and so on. Except that apt doesn't work on all platforms. Though it would certainly make sense to look at lifting the framework for CPgAN from somewhere, rather than coding it ourselves. A CPgAN would be a great idea in theory, but I have reservations. As a software developer, I'm fine with pgfoundery, but as a DB admin, and one who deploys data centers from time to time, I'd like to see something closer to the contrib. If I could have any influence at all, I'd like to see contrib essentially go away in the main distribution and replaced or renamed extensions. Then, some advisory group blesses extensions, and those extensions get packaged into a PostgreSQL extensions pack. I, again as a DB admin, would have NO problem with PostgreSQL playing favorites and picking best of breed for these extensions. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] String Similarity
What I was hoping someone had was a function that could find the substring runs in something less than a strlen1*strlen2 number of operations and a numerically sane way of representing the similarity or difference. Acually, it is more like strlen1*strlen2*N, where N is the number of valid runs. Unless someone has a GREAT algorithm, I think it will always be at least strlen1*strlen2. The amount of processing for N is the question. Is N * (strlen1*strlen2) less than sorting an array of N elements, scanning through those elements and eliminating duplicate character matches? Depending on the max value of N, I could save all the runs, sort by max length, then exclude based on overlapp, but it isn't clear that this is a performance win unless the strings are long, even then, I'm not completely convinced as N still has some strlen ramifications for removing duplicates. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
My question is whether psql using libreadline.so has to be GPL, meaning the psql source has to be included in a binary distribution. If I understand what I have been told by lawyers, here's what using a GPL, and NOT LGPL, library means: According to RMS, the definition of a derivitive work is one which shares the same address space when running. The in-memory process separation also separates works. One may argue this definition, but it is in supporting documents to the GPL and likely to be considered as the intention of the GPL in a court of law. There is no requirement of shipping source with a binary. One must make available the source. This can be done by a web site or alternate CD distribution. It need not be free, as in beer, but must be free of any restrictions beyond those of the GPL. There is no requirement that one would need to make the source of the 3rd party GPL library available, as it is available from the original source from whence it was obtained in the first place. Any changes, however, made, by you, to that library must be made available. (If you do not make modifications to libreadline, you don't even need to worry about it.) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
On Fri, May 19, 2006 at 07:04:47PM -0400, Bruce Momjian wrote: libreadline is not a problem because you can distribute postgresql compiled with readline and comply with all licences involved simultaneously. It doesn't work with openssl because the licence requires things that are incompatable with the GPL. My question is whether psql using libreadline.so has to be GPL, meaning the psql source has to be included in a binary distribution. IANAL, but yes. Or any other of the methods allowed, like providing a written voucher valid for at least three years. People who feel they need to keep the source to psql secret should link against libeditline instead. The way I understand it, the GPL affects programs in two main ways: 1. A program which is GPL'd must, when distributed, be able to provide all source used to build it under terms compatable with the GPL. This is not technically true. If you incorporate GPL code that is publically available and unchanged, you needn't provide the 3rd party packages. 2. A program which includes a GPL'd header file while building, must, when distributed, provide its own source and the library under GPL compatable terms, but not necessariliy the source of anything else needed to build it. This is why it's OK that psql links against openssl and readline. This is sort of a disputable position, and RMS himself isn't clear. If the header files are simply definitions and declarations, then no GPL material is actually included in a binary. However, inline functions and macros may constitute code. These are obviously only relevent when distributing precompiled binaries. If you are only distributing source, none of the above applies to you. Of course. There's a third method that some people claim, but I don't buy. This where a program using an interface of a GPL'd library somehow become a derived work of said library. That's just way whacked out. There is no supporting argument for that, however, RMS supporting writings indicate that he defines derived as being in the same process space. You may ofcourse disagree with any of the above, and hey, if you have a lawyer to back you up, who am I to argue? I have talked to too many lawyers, sigh, aout this stuff. As for why you don't solve the problem by distributing a libpq not compiled against OpenSSL, well, that's a different question. Back when SSL was considered an arms exports by the US, having both SSL and non-SSL versions was common (and a big PITA). When that disappeared, the main reason for the split went away and people started compiling SSL by default. This solved the problem for 99% of programs. However, one tiny subset remains problematic: - A library implements SSL, but only using OpenSSL - The library doesn't use the GPL, or doesn't have an OpenSSL exception clause. - A GPL'd program uses this library, without an OpenSSL exception clause. In this subset of a subset of a subset of programs, it's a problem. Many libraries that implement SSL provide an alternative to OpenSSL, many programs using such libraries have exception clauses so that there's just a handful of programs and libraries that are problematic. As long as there's a possibility that the situation can change (either every GPL program using postgresql gains an exception clause, or postgresql might someday support some other library) it will probably stay this way. If the the postgresql core decides that OpenSSL will be the only SSL ever supported, no matter what, well, the split distribution may yet happen. In the meantime, we have status quo. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Andrew Dunstan [EMAIL PROTECTED] writes: Mark Woodward wrote: Again, there is so much code for MySQL, a MySQL emulation layer, MEL for short, could allow plug and play compatibility for open source, and closed source, applications that otherwise would force a PostgreSQL user to hold his or her nose and use MySQL. If we had infinite resources this might make sense. We don't, so it doesn't. There is a real cost to producing a compatibility layer, and the cost will be those spiffy new features. The real problem is that there's a whole lot of stuff, such as mysql's weak error checking, that I don't think a compatibility layer could sanely provide. I kind of agree with this statement, but while I was playing devils's advocate and just grousing a bit about having to use MySQL, there is a sort of reality of openomics where mind-share is everything. The more mind-share you have, the more opportunities you have and the more resources become available. Not always, of course, look at OpenSSH, but for the most part. As MySQL adds features, not matter how poorly implemented, and maintain a migration path, we will never reach their users. PostgreSQL is better, true, but it is not ideal in many ways. It can be best said that the difference between PostgreSQL and MySQL is similar to the difference between Linux/BSD and Windows. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [OT] MySQL is bad, but THIS bad?
Actually, I think it's a lot more accurate to compare PostgreSQL and MySQL as FreeBSD vs Linux from about 5 years ago. Back then FreeBSD was clearly superior from a technology standpoint, and clearly playing second-fiddle when it came to users. And now, Linux is actually technically superior in most ways thanks to all the mindshare that's been poured into it. And with that, I am going to sit in a lawn chair and watch the bonfire. Even I know that is NOT a discussion we want to start. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] String Similarity
Mark Woodward wrote: I have a side project that needs to intelligently know if two strings are contextually similar. Think about how CDDB information is collected and sorted. It isn't perfect, but there should be enough information to be usable. Think about this: pink floyd - dark side of the moon - money dark side of the moon - pink floyd - money money - dark side of the moon - pink floyd etc. To a human, these strings are almost identical. Similarly: dark floyd of money moon pink side the Is a puzzle to be solved by 13 year old children before the movie starts. [snip] Hmmm... I think I like this problem. Maybe I'll work on it a bit as a contrib module. I *have* a working function, but it is not very efficient and it is not what I would call numerically predictable. And it does find the various sub-strings between the two strings in question. Email me offline and we can make something for contrib. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] String Similarity
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have a side project that needs to intelligently know if two strings are contextually similar. The examples you gave seem heavy on word order and whitespace consideration, before applying any algorithms. Here's a quick perl version that does the job: [SNIP] This is a case where the example was too simple to explain the problem, sorry. I have an implementation of Oracle's contains function for PostgreSQL, and it does basically what you are doing, and, in fact, also has Mohawk Software Extensions (LOL) that provide metaphone. The problem is that parsing white space realy isn't reliable. Sometimes it is pinkfloyd-darksideofthemoon. Also, I have been thinking of other applications. I have a piece of code that does this: apps$ ./stratest pink foyd dark side of the moon money money dark side of the moon pink floyd Match: dark side of the moon Match: pink f Match: money Match: oyd apps$ ./stratest pinkfoyddarksideofthemoonmoney moneydarksideofthemoonpinkfloyd Match: darksideofthemoon Match: pinkf Match: money Match: oyd I need to come up with a numerically sane way of taking this information and understanding overall similarity. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq