Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
On 10/31/2011 06:54 AM, Marcin Mańk wrote: On Sun, Oct 30, 2011 at 8:50 PM, Eric Ridgeeeb...@gmail.com wrote: Well, it's a display thing as much as any SELECT statement (especially via psql) is a display thing. It's more like I want all 127 columns, except the giant ::xml column, and I'm too lazy to type each column name out by hand. How about an option for psql to truncate too long columns to X characters ? I would use this option frequently... :-) It seems more to the point of what is trying to be accomplished, and doesn't even require a server change? :-) -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
Stupid question: Is this just a display thing? Or does this have impact for things such as COUNT(*) vs COUNT(1)? Is it like a view, but on the fly? I'm found myself in the *occasional* (certainly not daily!) situation where such a feature might be useful, but each time I wonder about if there should be a better way, I realize that if I ever saw such a thing in production code it would be the first code I shot down. 1) Not standards compliant, 2) Not deterministic (i.e. a database change might cause my code to break), 3) Working around a problem that maybe shouldn't exist in the first place? It's a like buying a rug, so that nobody sees the scratches on the floor. I can see the contention. :-) If it existed, I would occasionally use it from the command line. I'm thinking three times a year. Certainly not daily. Heck, if it's more characters to type (than select * ...) the number of times I would bother typing it are quite short. :-) Cheers, -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
On 10/30/2011 03:50 PM, Eric Ridge wrote: On Sun, Oct 30, 2011 at 3:38 PM, Mark Mielkem...@mark.mielke.cc wrote: 2) Not deterministic (i.e. a database change might cause my code to break), Okay, I'm inventing a use-case here, but say you have a users table with various bits of metadata about the user, including password. Maybe, regardless of database changes, you never want the password column returned: SELECT * EXCLUDING (password) FROM tbl_users; Changes of omission can break your code just as easily. I think I wasn't as clear as I intended. In many ways, I think use of * in the first place is wrong for code (despite that I do it as well). Therefore, * EXCLUDING (...) would also be wrong. It comes to does the code know what it wants? In the above case - maybe you don't want password - what about social insurance number, credit card number, or any other private bit? The only way to truly know you aren't accidentally pulling in fields you don't need or want to unnecessarily expose on the wire - is to specifically list the fields you DO want, which is precisely to not use * at all. A particular area that I don't like * is that my code may make an assumption about the exact field names, or the field order that comes out. If this is explicitly specified, then it will survive ALTER TABLE, or a restore of the table with columns in a different order, or a replacement of the table with a view. However, if I use *, then my code is likely to fail in any of these cases, and possibly fail in some unexpected way. For example, in a language such as Perl with DBI returning a hash, I may accidentally assume that the field is always undef. It might even pass some designer testing if the value is usually NULL = undef, and I fail to simulate the case where it is not. select * is not deterministic from a programming perspective. -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OT?] Time-zone database down [was: Re: timezone buglet?]
On 10/07/2011 11:02 PM, Greg Stark wrote: All that said I think this is far murkier than you all seem to think. Copyright law is one of the most complex areas of the law and this is one of the least well defined parts of copyright law. Hi Greg: I don't think we all think this issue is clear. Quoting relevant case law and considering what position to hold or what action to take is what I would call due diligence. If somebody wants to hire a lawyer that might be advisable as well. I think wait and see whether this is a true violation is a perfectly valid legal position to hold and is not pretending in any way that this issue is clear... -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OT?] Time-zone database down [was: Re: timezone buglet?]
My original read of the problem determined (for me personally) that the only way one could be in violation of copyright was if the data was incorrect (i.e. not factual). It presented an interesting contradiction. The only way they could sue is by agreeing that their data is faulty and should not be trusted. :-) The case Merlin refers to below seemed to rule that even faulty information is not a concern. Personally, I think the best choice is to officially state a position on the matter and agree to remove any copyrighted material that has been used without the permission of the copyright owner from PostgreSQL if or when this is ever demonstrated in court. Until that time, the damage to the community by responding to this unproven legal threat would be unreasonable to bear. On 10/07/2011 05:10 PM, Merlin Moncure wrote: The one interesting case that I can recall were this was tested was this (lifted from Wikipedia): In October 1984, Fred L. Worth, author of The Trivia Encyclopedia, Super Trivia, and Super Trivia II, filed a $300 million lawsuit against the distributors of Trivial Pursuit. He claimed that more than a quarter of the questions in the game's Genus Edition had been taken from his books, even to the point of reproducing typographical errors and deliberately placed misinformation. One of the questions in Trivial Pursuit was What was Columbo's first name? with the answer Philip. That information had been fabricated to catch anyone who might try to violate his copyright.[5] The inventors of Trivial Pursuit acknowledged that Worth's books were among their sources, but argued that this was not improper and that facts are not protected by copyright. The district court judge agreed, ruling in favor of the Trivial Pursuit inventors. The decision was appealed, and in September 1987 the United States Court of Appeals for the Ninth Circuit upheld the ruling.[6] Worth asked the Supreme Court of the United States to review the case, but the Court declined, denying certiorari in March 1988.[7] IANAL, but this seems pretty conclusive to me... Facts are not subject to copyright but compilations can be. However, the arrangement and presentation of the compilation has to be sufficient to have merit protection. For example, the SCOTUS denied copywrite protection to phone books, which I think is entirely relevant to this issue. (BUT INAL). -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW API: don't like the EXPLAIN mechanism
On 02/21/2011 11:38 AM, Andrew Dunstan wrote: On 02/21/2011 11:23 AM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: If we allow the invention of new explain states we'll never be able to publish an authoritative schema definition of the data. That's not necessarily an argument against doing it, just something to be aware of. Maybe we don't care about having EXPLAIN XML output validated. I thought one of the principal arguments for outputting XML/etc formats was exactly that we'd be able to add fields without breaking readers. If that's not the case, why did we bother? Well, I thought the motivation was to allow easy construction of parsers for the data, since creating a parser for those formats is pretty trivial. Anyway, if we don't care about validation that's fine. I just didn't want us to make that decision unconsciously. Parsing XML isn't trivial, not if done correctly... :-) I don't see the benefit of validation beyond test suites, and then the specification can be published with the version of PostgreSQL (as XSD?) if so necessary. Primary benefits include: 1) Open and widely recognized format. 2) Well tested and readily available parsers already exist. 3) Able to easily add content without breaking existing parsers or analyzers, provided the parsers and analyzers are written properly. Any XML parser that does: m[tag(.*?)/tag] ... is not written properly. -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/27/2010 11:20 PM, Craig Ringer wrote: Essentially, you have: 1) People preparing statements to save on parse+plan time; and 2) People preparing statements to get convenenient param placement. I suspect that most of (1) also want (2), but many of (2) don't care much about (1) and are just preparing statements for sql-injection safety (param placement), because they've been told to by someone, because their library does it for them, etc. So: Would it be easier to handle control of replan vs no-replan at PREPARE time? Or would that have very much the same protocol/pl change issues? I think if SQL hints were sufficient, that clients would only need to remove the prepared statement and re-create it whenever required. It should do the right thing automatically. I'm convinced that means generic plans are always wrong, and that some combination of performing fixed operations in PREPARE and variable operations in EXECUTE, combined with a plan caching against the prepared statement with criteria to determine whether or not the parameters match the assumptions made when creating one of the cached plans. Tom says extracting the fixed part of the planning out to PREPARE would be difficult or less valuable than I think. And the multi-plan caching with criteria seems to have been brought up and not commented on much by several people. So, it doesn't look like I will get this unless I learn how to implement it myself - which is probably not feasible at this time. :-) Not getting this, I think I'd be happy if PREPARE/EXECUTE can *easily* detect the worst cases (i.e. not slower in the general case), and generic plan plus custom plan plus custom execution is still significantly faster than generic plan plus generic execution. Adding SQL to indicate whether it should be re-planned or not is completely unappealing. If I could change the code, today, I'd just turn off or choose not to use PREPARE/EXECUTE. Today, PREPARE/EXECUTE seems like it should always be considered slower unless one can prove it is actually faster in a specific case, which is the exact opposite of what people expect. Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
My preference is to deal with the specific value vs generic value issue. For this issue, it can affect performance even if PREPARE/EXECUTE is execute exactly once. In the last case I saw, a certain query was executing once every second, and with a specific value it would take 1 ms, and with a generic value it would take 50 ms. That's 5% system load for one CPU core to do nothing. After analysis, it was clearly a common value vs not common value problem. For this particular table, it stored an integer, but only used two values across something like 100k rows. The query was for a third value that did not exist. The difference was a sequential scan vs an index lookup. I do not know whether the application was doing PREPARE/EXECUTE each time, or whether it was doing PREPARE once in advance and then EXECUTE each time after that, but I don't think it matters, either, as I think both cases deserve attention, and the problem is the same in both cases. Even one generic plan run costs 50+ the cost of both planning and execution. Re-planning a generic plan with another generic plan may generate zero benefit, with a measurable cost. More on this after... All the points about ms seem invalid to me. There are many reason why ms could increase, and many of them have nothing to do with plan efficiency. Again, re-planning due to a high ms, or a high ratio of ms, does not indicate that re-planning will improve the success of the plan. The planning process does not measure ms or predict ms. My idea of an optimal system is as follows: 1) Prepare gathers and caches data about the tables involved in the query, including column statistics that are likely to be required during the planning process, but prepare does not running the planning process. 2) Execute runs the planning process re-using data cached by prepare, and then executes the plan. 3) Advanced: Execute may cache the selected plan for re-use only if it can identify a set of criteria that would allow the selected plan to be tested and invalidated if the parameter nature has changed such that a re-planning would likely choose another plan. Execute may cache multiple plans against a prepared statement, provided that each cached plan identify invalidation criteria. 4) Even more Advanced: Prepare may identify that elements of the plan that will always be the same, no matter what parameter is specified, and cache these results for substitution into the planning phase when execute is run. (Effectively lifting the planning from execute to prepare, but only where it makes obvious [= cheap to detect] sense) This treats the whole statement planning and execution as a pipeline, lengthening the pipeline, and adjusting some of the pipeline elements from prepare to execute. It has the benefit of having fast prepare/execute whether execute is invoked only once or many times. The effect is that all statements are specifically planned, but specific plans are re-used wherever possible. To support the case of changing data, I think the analyze process should be able to force invalidation of cached plans, and force the cached column statistics for prepared statements to be invalidated and re-queried on demand, or push new statistics directly into the prepared statements. It makes no sense (to me) to re-plan for the same parameters until an analyze is done, so this tells me that analyze is the event that should cause the re-plan to occur. I think anything less than the above will increasing the performance of some queries while describing the performance of other queries. It might be possible to guess which queries are more valuable to people than others, and hard code solutions for these specific queries, but hard coding solutions will probably always be a lowest hanging fruit solution. After writing this, I'm pretty sure that implementation of the above into PostgreSQL would be difficult, and it could be a valid concern that the investment is not worth the benefit at this time. It's a tough problem. My $0.01 CDN. :-) Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/26/2010 05:20 AM, Jeroen Vermeulen wrote: Mark Mielke wrote: All the points about ms seem invalid to me. There are many reason why ms could increase, and many of them have nothing to do with plan efficiency. Again, re-planning due to a high ms, or a high ratio of ms, does not indicate that re-planning will improve the success of the plan. The planning process does not measure ms or predict ms. That's true, but missing some very basic points about the idea: one, if we can tell that a query is going to be expensive, then the cost of re-planning it is marginal. Two, if we can tell that a query is going to be expensive, then we stand a lot to gain if re-planning turns out to be useful. It follows that we can afford to re-plan on the off-chance, without anything more than a vague orders-of-magnitude idea of what expensive means. What Tom said validates a big assumption I've been making: that we do in fact have a decent shot at telling in advance that a query is going to be expensive. Which means we have a decent shot at stopping your 100ms query from taking seconds just because you prepared it and are missing out on that tiny partial index. That would be worth the extra planning time at a 1% hit rate, and there's not much downside if we don't reach that. You trimmed most of my concerns. :-) Problems: 1) If I do a PREPARE/EXECUTE, the above lengthens the process from 1 generic planning plus 1 generic plan execute to 1 generic planning, 1 specific planning, and 1 specific plan execution. This is still overall longer than a regular statement and it still may be longer than the original generic plan on its own. The hope is that the analysis is somehow detecting the scenario where a generic plan makes no sense, but the criteria is not about whether the generic plan actually does make sense - the criteria is can the customer afford to wait longer for us to second guess ourselves? It's a guess. As a guess, it means sometimes it will be right, and sometimes it will be wrong. 2) Only the order of magnitude (by estimate) plans will benefit. If you set the number to 100X, then most plans won't benefit. If you set it to less than 100X, you increase the chance of guessing wrong in other cases. In any case, there is still no guarantee that a specific plan will be faster, so even in the 100X case, the overall results could be slower - it's just that you've decided the customer can afford to wait longer. My idea of an optimal system is as follows: 1) Prepare gathers and caches data about the tables involved in the query, including column statistics that are likely to be required during the planning process, but prepare does not running the planning process. It sounds to me like you're in the process of inventing another planning process. Developer time aside, how much CPU time can you afford to throw at this? I already said I don't think PostgreSQL could easily evolve here. However, I wanted to point out that the problem may be architectural. As for developer time and CPU time, that's not really relevant. If PREPARE/EXECUTE could be reliably sped up, than the savings is probably measure in millions of dollars or more, as it is widely used by many applications throughout the day on hundreds of thousands of computers. Oh, you mean is it worth scratching my itch? :-) Not really. I was thinking about it yesterday and decided that such a major change might just as easily result in a new database engine, and I didn't want to go there. Still, if some clever person agrees with me that it is an architecture problem, and that PostgreSQL could benefit from a clean from scratch caching mechanism for statements (note that what I described could probably be extended to support automatic prepare of every statement, and matching of query to prepared statement based on text, similar to MySQL query caching), and can come up with a way to do this using the existing architecture - that would be great. Or, they can tell me too hard as you are. That's fine too... :-) I don't see any reason to argue over what would be optimal when so much information is still missing. It just makes the problem look harder than it is. To me, our best shot at getting something useful is to stay simple and defensive. After that, if there is still a need, we'll have code to help us gather more data and figure out how to make it better. Nothing wrong with the lowest-hanging fruit. What information is missing? PREPARE sucks in many known situations. It is a documented fact. :-) Will guessing at when the user can afford to wait longer improve the situation? Maybe or often, but not always. Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/26/2010 11:27 AM, Tom Lane wrote: Also, I think there is a lot of confusion here over two different issues: generic plan versus parameter-specific plan, and bad planner estimates leading to a wrong plan choice. While the latter is certainly an issue sometimes, there is no reason to believe that it affects prepared statements worse than non-prepared ones. So I think that designing a fix for prepared statements on the assumption that you can't trust the planner's estimates is solving the wrong problem. Just to point out that I agree, and as per my original post, I think the only time prepared statements should be re-planned for the statistics case, is after 'analyze' has run. That sounds like a quicker solution, and a much smaller gain. After 'analyze' of an object, invalidate all cached plans for prepared statements that rely on that object and require a re-plan. I doubt this will help me or many others very often. It's something that should be done some day, but I don't recall ever concluding that a performance problem I was experiencing was related to using prepared statements too long. Also, the client is able to figure this out. The client can choose to free prepared statements after 1 minute or 1000 calls. It's not really a problem. It also has nothing to do with trust of the planner's estimates. Given the same criteria, the planner should come up with the same best plan most or all of the time. Trial and error planning, with the exception of hugely complicated plans that cannot be produced in a finite time frame, does not appeal to me at all. I do trust the planner's estimates. The issue of specific parameter is the one I think most of us would benefit from, and I think the most effective benefit is to not create generic plans. I would prefer a prepare with specific plan and re-plan when the specific plan does not apply, over generic plan, every time. This has nothing to do with time to prepare or a ratio of time to prepare vs time to execute, or plans that are expected to take some time to execute. The fact that I can run a PREPARE/EXECUTE, and SELECT, and with only one invocation see a difference of over 100X shows that generic plans is just not the right approach. It works according to spec, but it is not practical under the current model. Generic plans is the problem. My post was to bring attention to this, as I see most comments focusing on an assumption that generic plans provide value, and specific plans should only be used when generic plans are expected to take a while to execute. It's walking around the problem that the idea of a generic plan is just wrong. The only time a generic plan is right, is when the specific plan would result in the same. Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/26/2010 01:59 PM, Tom Lane wrote: ... It's walking around the problem that the idea of a generic plan is just wrong. The only time a generic plan is right, is when the specific plan would result in the same. I think that's a significant overstatement. There are a large number of cases where a custom plan isn't worth it, even if it doesn't generate exactly the same plan. There must be some way to lift the cost of planning out of the plan enumeration and selection phase, such that only plan enumeration and selection is run at execute time. In most cases, plan enumeration and selection, provided that all data required to make these decisions is all cached in data structures ready to go, should be very fast? Right? Wrong? If right, my original post suggested that prepare should do the parts of planning which are fixed, and not change based on the input parameters, while execute should do the dynamic parts that would change based on the input parameters. By not worth it, do you mean development effort or run time? For development effort, it would definitely be worth it in the grand scheme of things, but perhaps not worth it to specific individuals. For run time, I've having trouble seeing the situation where it would not be worth it. In the case that the resulting plan is the same (custom vs generic) there should be no cost. In the case that the plan is different, I think the difference proves that it is worth it. The case where it wouldn't be worth it would be if a prepared statement was called many times with many different parameters, and each set of parameters required a re-plan - but my experience in this regard tells me that the current model is to choose a sub-optimal plan, and the entire query will run much slower than the planning time, on every execute. We wouldn't be having this discussion if generic plans were considered adequate. So, I feel that it is worth it in this case as well. It's the development effort that is the problem. I can't do it, and I can't make you do it. If you say too hard, there isn't anything I can do about it. :-) Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/26/2010 01:59 PM, Tom Lane wrote: Mark Mielkem...@mark.mielke.cc writes: Just to point out that I agree, and as per my original post, I think the only time prepared statements should be re-planned for the statistics case, is after 'analyze' has run. That sounds like a quicker solution, and a much smaller gain. After 'analyze' of an object, invalidate all cached plans for prepared statements that rely on that object and require a re-plan. Please note that that has been happening since 8.3, which is probably why you haven't detected a problem. Excellent, and sorry for missing the release note on this. Thanks, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/26/2010 02:57 PM, Tom Lane wrote: Mark Mielkem...@mark.mielke.cc writes: There must be some way to lift the cost of planning out of the plan enumeration and selection phase, such that only plan enumeration and selection is run at execute time. In most cases, plan enumeration and selection, provided that all data required to make these decisions is all cached in data structures ready to go, should be very fast? Right? Huh? What exactly do you think the cost of planning is, if not enumeration and selection? There isn't very much that's cacheable, at least not in any sanely-sized cache. I think most operations, including this one, can be broken into a fixed portion and a dynamic portion. The PREPARE should concern itself only with the fixed portion, and should leave the dynamic portion to EXECUTE. At present, the planning process is one big blob. Here are parts that can be done fixed: 1) Statement parsing and error checking. 2) Identification of tables and columns involved in the query. 3) Query the column statistics for involved columns, to be used in plan cost estimation now and later. 4) Determine plan constraints under which elements of the plan must be executed a certain way (something like constant folding for a compiler), or for which parameter substitution would not impact the outcome. 5) Identify the elements of the plan that still require plan enumeration and plan selection, to be used in a later part of the pipeline. At a minimum, I am suggesting that 1), 2), and 3) should take a chunk out of the planning process. I think 4) and 5) are more complex but still valuable in terms of extracting the fixed portion out of the planning process. I think an assumption is being made that the planning process is an atomic unit that cannot be turned into a pipeline or assembly line. I think this assumption was what originally tied PREPARE = PLAN, and EXECUTE = RUN. I think this assumption is leading to the conclusion that EXECUTE should re-plan. I also expect that this assumption is tightly woven into the current implementation and changing it would require some amount of re-architecture. :-) By not worth it, do you mean development effort or run time? Run time. The development cost of what you are proposing is negligible: just rip out the plan cache altogether. I don't believe it would be a performance win though. That's not my proposal, though. I'm suspecting you didn't read it. :-) I'm fine with you saying too hard and not worth my development effort after you read it. I agree it would be a lot of work. But if the conclusion is that the current architecture is the best that can be had, and the decision is only about when to do a custom re-plan or when to use the generic plan, I am putting my opinion out there that the generic plan has always been a compromise, and it will always be a compromise, and that this discussion exists primarily because the compromise is not adequate in many real world scenarios. And that all said, I think I am challenging the status quo and ticking people off. So while my intent is to challenge the status quo, it is not to tick people off. So, please let me know if you would like me to continue, or if you have already written this off. :-) Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/26/2010 03:11 PM, Yeb Havinga wrote: Tom Lane wrote: Right, but if the parameter is unknown then its distribution is also unknown. In any case that's just nitpicking, because the solution is to create a custom plan for the specific value supplied. Or are you suggesting that we should create a way for users to say here is the expected distribution of this parameter, and then try to fold that into the planner estimates? Or instead of letting users give the distribution, gather it automatically in some plan statistics catalog? I suspect in most applications queries stay the same for months and maybe years, so after some number of iterations it is possible to have decent call statistics / parameter distributions. Maybe the the parameter value distribution could even be annotated with actual cached plans. The problem with the last - actual cached plans - is that it implies the other aspect I have been suggesting: In order to have a custom cached plan, the primary model must be to use custom plans. If PREPARE/EXECUTE uses generic plans normally, than the only cached plans available will be generic plans. Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/26/2010 07:03 PM, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: Basically, what I really want here is some kind of keyword or other syntax that I can stick into a PL/pgsql query that requests a replan on every execution. Wouldn't it be better if it just did the right thing automatically? Yes please. :-) Often, we are just users of the application, and we do not have the freedom to change it. The sort of heuristic I'm envisioning would essentially do replan every time for some number of executions, and give up only if it noticed that it wasn't getting anything better than the generic plan. So you'd have a fixed maximum overhead per session when the custom plan was useless, and the Right Thing when it wasn't. My other comments aside - I think generic plan + specific plan where specific plan continues to beat generic plan, will meet the cases that really annoyed me, and would make a lot of us very happy... Thanks. Cheers, mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync
On 02/14/2010 03:24 PM, Florian Weimer wrote: * Tom Lane: Which options would that be? I am not aware that there any for any of the recent linux filesystems. Shouldn't journaling of metadata be sufficient? You also need to enforce ordering between the directory update and the file update. The file metadata is flushed with fsync(), but the directory isn't. On some systems, all directory operations are synchronous, but not on Linux. dirsync All directory updates within the filesystem should be done syn- chronously. This affects the following system calls: creat, link, unlink, symlink, mkdir, rmdir, mknod and rename. The widely reported problems, though, did not tend to be a problem with directory changes written too late - but directory changes being written too early. That is, the directory change is written to disk, but the file content is not. This is likely because of the ordered journal mode widely used in ext3/ext4 where metadata changes are journalled, but file pages are not journalled. Therefore, it is important for some operations, that the file pages are pushed to disk using fsync(file), before the metadata changes are journalled. In theory there is some open hole where directory updates need to be synchronized with file updates, as POSIX doesn't enforce this ordering, and we can't trust that all file systems implicitly order things correctly, but in practice, I don't see this sort of problem happening. If you are concerned, enable dirsync. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync
On 02/14/2010 03:49 PM, Andres Freund wrote: On Sunday 14 February 2010 21:41:02 Mark Mielke wrote: The widely reported problems, though, did not tend to be a problem with directory changes written too late - but directory changes being written too early. That is, the directory change is written to disk, but the file content is not. This is likely because of the ordered journal mode widely used in ext3/ext4 where metadata changes are journalled, but file pages are not journalled. Therefore, it is important for some operations, that the file pages are pushed to disk using fsync(file), before the metadata changes are journalled. Well, but thats not a problem with pg as it fsyncs the file contents. Exactly. Not a problem. If you are concerned, enable dirsync. If the filesystem already behaves that way a fsync on it should be fairly cheap. If it doesnt behave that way doing it is correct... Well, I disagree, as the whole point of this thread is that fsync() is *not* cheap. :-) Besides there is no reason to fsync the directory before the checkpoint, so dirsync would require a higher cost than doing it correctly. Using ordered metadata journaling has approximately the same effect. Provided that the data is fsync()'d before the metadata is required, either the metadata is recorded in the journal, in which case the data is accessible, or the metadata is NOT recorded in the journal, in which case, the files will appear missing. The races that theoretically exist would be in situations where the data of one file references a separate file that does not yet exist. You said you would try and reproduce - are you going to try and reproduce on ext3/ext4 with ordered journalling enabled? I think reproducing outside of a case such as CREATE DATABASE would be difficult. It would have to be something like: open(O_CREAT)/write()/fsync()/close() of new data file, where data gets written, but directory data is not yet written out to journal open()/.../write()/fsync()/close() of existing file to point to new data file, but directory data is still not yet written out to journal crash In this case, dirsync should be effective at closing this hole. As for cost? Well, most PostgreSQL data is stored within file content, not directory metadata. I think dirsync might slow down some operations like CREATE DATABASE or rm -fr, but I would not expect it to effect day-to-day performance of the database under real load. Many operating systems enable the equivalent of dirsync by default. I believe Solaris does this, for example, and other than slowing down rm -fr, I don't recall any real complaints about the cost of dirsync. After writing the above, I'm seriously considering adding dirsync to my /db mounts that hold PostgreSQL and MySQL data. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hostnames in pg_hba.conf
On 02/11/2010 08:13 AM, Bart Samwel wrote: ISSUE #1: Performance / caching At present, I've simply not added caching. The reasoning for this is as follows: (a) getaddrinfo doesn't tell us about expiry, so when do you refresh? (b) If you put the cache in the postmaster, it will not work for exec-based backends as opposed to fork-based backends, since those read pg_hba.conf every time they are exec'ed. (c) If you put this in the postmaster, the postmaster will have to update the cache every once in a while, which may be slow and which may prevent new connections while the cache update takes place. (d) Outdated cache entries may inexplicably and without any logging choose the wrong rule for some clients. Big aargh: people will start using this to specify 'deny' rules based on host names. If you COULD get expiry info out of getaddrinfo you could potentially store this info in a table or something like that, and have it updated by the backends? But that's way over my head for now. ISTM that this stuff may better be handled by a locally-running caching DNS server, if people have performance issues with the lack of caching. These local caching DNS servers can also handle expiry correctly, etcetera. We should of course still take care to look up a given hostname only once for each connection request. You should cache for some minimal amount of time or some minimal number of records - even if it's just one minute, and even if it's a fixed length LRU sorted list. This would deal with situations where a new connection is raised several times a second (some types of load). For connections raised once a minute or less, the benefit of caching is far less. But, this can be a feature tagged on later if necessary and doesn't need to gate the feature. Many UNIX/Linux boxes have some sort of built-in cache, sometimes persistent, sometimes shared. On my Linux box, I have nscd - name server caching daemon - which should be able to cache these sorts of lookups. I believe it is used for things as common as mapping uid to username in output of /bin/ls -l, so it does need to be pretty fast. The difference between in process cache and something like nscd is the inter-process communication required to use nscd. ISSUE #2: Reverse lookup? There was a suggestion on the TODO list on the wiki, which basically said that maybe we could use reverse lookup to find the hostname and then check for that hostname in the list. I think that won't work, since IPs can go by many names and may not support reverse lookup for some hostnames (/etc/hosts anybody?). Furthermore, due to the top-to-bottom processing of pg_hba.conf, you CANNOT SKIP entries that might possibly match. For instance, if the third line is for host foo.example.com http://foo.example.com and the fifth line is for bar.example.com http://bar.example.com, both lines may apply to the same IP, and you still HAVE to check the first one, even if reverse lookup turns up the second host name. So it doesn't save you any lookups, it just costs an extra one. I don't see a need to do a reverse lookup. Reverse lookups are sometimes done as a verification check, in the sense that it's cheap to get a map from NAME - IP, but sometimes it is much harder to get the reverse map from IP - NAME. However, it's not a reliable check as many legitimate users have trouble getting a reverse map from IP - NAME. It also doesn't same anything as IP - NAME lookups are a completely different set of name servers, and these name servers are not always optimized for speed as IP - NAME lookups are less common than NAME - IP. Finally, if one finds a map from IP - NAME, that doesn't prove that a map from NAME - IP exists, so using *any* results from IP - NAME is questionable. I think reverse lookups are unnecessary and undesirable. ISSUE #3: Multiple hostnames? Currently, a pg_hba entry lists an IP / netmask combination. I would suggest allowing lists of hostnames in the entries, so that you can at least mimic the match multiple hosts by a single rule. Any reason not to do this? I'm mixed. In some situations, I've wanted to put multiple IP/netmask. I would say that if multiple names are supported, then multiple IP/netmask should be supported. But, this does make the lines unwieldy beyond two or three. This direction leans towards the capability to define host classes, where the rules allows the host class, and the host class can have a list of hostnames. Two other aspects I don't see mentioned: 1) What will you do for hostnames that have multiple IP addresses? Will you accept all IP addresses as being valid? 2) What will you do if they specify a hostname and a netmask? This seems like a convenient way of saying everybody on the same subnet as NAME. Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [HACKERS] Hostnames in pg_hba.conf
On 02/11/2010 04:54 PM, Bart Samwel wrote: On Thu, Feb 11, 2010 at 16:36, Mark Mielke m...@mark.mielke.cc mailto:m...@mark.mielke.cc wrote: ISSUE #3: Multiple hostnames? Currently, a pg_hba entry lists an IP / netmask combination. I would suggest allowing lists of hostnames in the entries, so that you can at least mimic the match multiple hosts by a single rule. Any reason not to do this? I'm mixed. In some situations, I've wanted to put multiple IP/netmask. I would say that if multiple names are supported, then multiple IP/netmask should be supported. But, this does make the lines unwieldy beyond two or three. This direction leans towards the capability to define host classes, where the rules allows the host class, and the host class can have a list of hostnames. Yes, but before you know it people will ask for being able to specify multiple host classes. :-) Quite simply put, with a single subnet you can allow multiple hosts in. Allowing only a single hostname is a step backward from that, so adding support for multiple hostnames could be useful if somebody is replacing subnets with hostname-based configuration. This implies two aspects which may not be true: 1) All hosts that I want to allow belong to the same subnet. 2) If I trust one host on the subnet, then I trust all hosts on the subnet. While the above two points are often true, they are not universally true. 2) What will you do if they specify a hostname and a netmask? This seems like a convenient way of saying everybody on the same subnet as NAME. Not supported. Either an IP address / netmask combo, or a hostname, but not both. I wouldn't want to recommend hardcoding something such as netmasks (which are definitely subnet dependent) in combination with something as volatile as a host name -- move it to a different subnet, and you might allow a whole bigger subnet than you intended. If they want to specify a netmask, then they should just use hardcoded IPs as well. Ah yes, I recall this from a previous thread. I think I also disagreed on the other thread. :-) I thought of a use for reverse lookup - it would allow wild card hostnames. Still, that's an advanced feature that might be for later... :-) Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [HACKERS] Hostnames in pg_hba.conf
On 02/11/2010 05:12 PM, Bart Samwel wrote: On Thu, Feb 11, 2010 at 23:01, Mark Mielke m...@mark.mielke.cc mailto:m...@mark.mielke.cc wrote: On 02/11/2010 04:54 PM, Bart Samwel wrote: ISSUE #3: Multiple hostnames? Currently, a pg_hba entry lists an IP / netmask combination. I would suggest allowing lists of hostnames in the entries, so that you can at least mimic the match multiple hosts by a single rule. Any reason not to do this? I'm mixed. In some situations, I've wanted to put multiple IP/netmask. I would say that if multiple names are supported, then multiple IP/netmask should be supported. But, this does make the lines unwieldy beyond two or three. This direction leans towards the capability to define host classes, where the rules allows the host class, and the host class can have a list of hostnames. Yes, but before you know it people will ask for being able to specify multiple host classes. :-) Quite simply put, with a single subnet you can allow multiple hosts in. Allowing only a single hostname is a step backward from that, so adding support for multiple hostnames could be useful if somebody is replacing subnets with hostname-based configuration. This implies two aspects which may not be true: 1) All hosts that I want to allow belong to the same subnet. 2) If I trust one host on the subnet, then I trust all hosts on the subnet. While the above two points are often true, they are not universally true. I don't think we're talking about the same thing here. I wasn't suggesting doing hostname-plus-netmask. NO! I was suggesting that where a lazy sysadmin would previously configure by subnet, they might switch to more fine-grained hostname-based configuration ONLY IF it doesn't require duplicating every line in pg_hba.conf for every host in the subnet. Ah yes. You are focusing on allowing a netmask to expand to hostnames. I'm focusing on how netmasks were never that great on their own. You want to allow multiple hosts - I want you to allow multiple netmasks. I think the requirement is the same. I also think that same line has always been an annoying restriction. I have many duplicated lines today just for: host DATABASE USER 127.0.0.1/32 md5 host DATABASE USER ::1/128 md5 Isn't that a big silly? If you think it's acceptable to allow multiple hostname, I'm pointing out that your requirement is not limited to hostnames only. Why not? host DATABASE USER 127.0.0.1/32,::1/128 md5 Same requirements, same syntax (assuming you were suggesting ','), same documentation. Why not? But once there, it seems clear that packing hostnames or netmasks onto one line is just ugly and hard to manage. I'd like to see this extended to any of the many ways to allow hostnames to be specified one per line. For example: set tool_servers { 127.0.0.1/32 ::1/128 1.2.3.4/32 1.2.3.5/32 } host DATABASE USER $tool_servers md5 The above features easy parsing capability. Of course, then I'll ask for the ability to simplify specifying multiple databases: set databases { db1 db2 } set users { user1 user2 } host $databases $users $tool_servers md5 Sorry... :-) 2) What will you do if they specify a hostname and a netmask? This seems like a convenient way of saying everybody on the same subnet as NAME. Not supported. Either an IP address / netmask combo, or a hostname, but not both. I wouldn't want to recommend hardcoding something such as netmasks (which are definitely subnet dependent) in combination with something as volatile as a host name -- move it to a different subnet, and you might allow a whole bigger subnet than you intended. If they want to specify a netmask, then they should just use hardcoded IPs as well. Ah yes, I recall this from a previous thread. I think I also disagreed on the other thread. :-) I thought of a use for reverse lookup - it would allow wild card hostnames. Still, that's an advanced feature that might be for later... :-) I think wildcards are interesting, but I have yet to see an actual use case other than it's cool and very generalized. In my mind (tell me if I'm wrong), the most common type of PostgreSQL authentication setup is within a local network within an organization. There, you either authorize an entire subnet (the entire server park or all client PCs) or you authorize specific hosts (single IP address). The wildcard case is for replacing the first case, but for that case, subnets are usually just fine. I'm trying to target the second case here. The user case would be an organization with nodes all over the IP space, that wants to manage configuration from a single place. DNS would be that single place of choice. If moves trust from trust the netmasks to be kept up-to-date to trust that DNS will be kept
Re: [HACKERS] Hostnames in pg_hba.conf
On 02/11/2010 09:38 PM, Euler Taveira de Oliveira wrote: Mark Mielke escreveu: Of course, then I'll ask for the ability to simplify specifying multiple databases: We already support multiple users and/or databases for a single pg_hba.conf line ... Is there a reason you trimmed out my points about specifying classes as a list of data (host, db, user) and using it in host lines? :-) Simplifying specifying multiple databases in the sense of if a particular line has to list 10 databases, 10 users, and 10 hostnames, it would be most convenient to specify 3 classes with 10 records each, followed by one host line, instead of have 10 lines with 20 entries on it or 1000 lines... :-) Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On 02/09/2010 08:46 AM, Jeroen Vermeulen wrote: This sounds like a really nice to have feature. Maybe it'd also be possible to skip replanning between executes if the current bound values are 'indexwise-equivalent' to the values used at previous planning, i.e. nothing in the statistics indicates that execution cost would be (much) different. Are there more ways to cut down on planning time? Obviously some plannedstatement/plannerinfo structures could be kept, but maybe it'd also be possible to plan only that part of the join tree where the params are used in a scan/join qual. I think we should be careful not to over-think this. Planning isn't *that* costly, so apply Amdahl's Law liberally. I'm proposing some easy things we could do without adding much overhead or maintenance burden; I've been assuming that getting intimate with the planner would risk those advantages. In a current commercial app we have that uses JDBC and prepared plans for just about everything, it regularly ends up with execution times of 30+ milliseconds when a complete plan + execute would take less than 1 millisecond. PostgreSQL planning is pretty fast. In terms of not over thinking things - I think I would even prefer an option that said always re-plan prepared statements as a starting point. If it happened to become smarter over time, such that it would have invalidation criteria that would trigger a re-plan, that would be awesome, but in terms of what would help me *today* - being able to convert prepared plans into just a means to use place holders would help me today on certain real applications in production use right now. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG 9.0 and standard_conforming_strings
On 02/03/2010 01:20 PM, Robert Haas wrote: I am not sure I really understand why anyone is a rush to make this change. What harm is being done by the status quo? What benefit do we get out of changing the default? The major argument that has been offered so far is that if we don't change it now, we never will, but I don't believe that the tenor of this discussion supports the contention that Tom or anyone else never wants to make this change. For myself, it isn't so much a rush as a sense that the code out there that will break, will never change unless forced, and any time seems better than never. Correct me if I am wrong - but I think this issue represents an exploitable SQL injection security hole. I switched because I convinced myself that the ambiguity of \' represented actual danger. I'm concerned that if the web front end doing parameter checking and passing in code using either '' quoting or \' quoting can be exploited if the server happens to be configured the opposite way. To me, this ambiguity can only be addressed by everybody agreeing on the right way to do it, and '' quoting seems like the right way to do it to me. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG 9.0 and standard_conforming_strings
On 02/03/2010 02:15 PM, Robert Haas wrote: The longer we wait before making an incompatible change, the more people will have adjusted their code to the new reality (or upgraded their drivers, etc.) and the fewer things will break. In my experience, the opposite is true, although in this case, the damage may already be done. That is, the longer bad habits are allowed to form, the harder they are to break, and the more code is written that may be broken. People won't upgrade unless forced. At some point, the switch does have to be tripped. Is now the time? I have no comment. I just don't want to see never be the time, and if never is not the time, than now does not seem impratical. That said, if you say we'll tell people to prepare for a change in 9.0, and enforce the change in a later release, that is fine too. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG 9.0 and standard_conforming_strings
On 01/29/2010 09:01 PM, Tom Lane wrote: Maybe. We concluded in the April 2009 thread that standard_conforming_strings = ON had gotten little or no field testing, and I don't see any strong reason to hope that it's gotten much more since then. It would be rather surprising if there *aren't* any lurking bugs in one piece or another of client-side code. And I don't think that we should be so myopic as to consider that problems in drivers and so forth are not of concern. Not to contradict any justifiable investigation, but just as a data point: All of my installations use: backslash_quote = off # on, off, or safe_encoding escape_string_warning = off standard_conforming_strings = on I have not encountered any problems so far. I use PostgreSQL in about 10 production applications (too tired to count them out :-) ), from psql to PHP to Perl to Java. I had also assumed this feature was tested and supported when I enabled it, as it seemed to me to be the only sensible implementation, and it was consistent with my interpretation of SQL. I had done some testing before enabling it the first time and was satisfied with the results. I would be all for making this change in an orderly fashion pursuant to some agreed-on plan. But cramming it in at the last minute because of an essentially marketing-driven change of version name isn't good project management, and I'm seriously afraid that doing so would bite us in the rear. An actual plan here might look like let's flip it before 9.1alpha1 so we can get some alpha testing cycles on it ... Yep. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.5 vs. 9.0, Postgres vs. PostgreSQL
On 01/22/2010 09:52 AM, Greg Sabino Mullane wrote: Well, this *was* posted to -hackers and not -advocacy, but advocacy, mind share, and many other non-hacking-on-the-base-code things matter too. And frankly, our name is one of our *top* problems. Perhaps you've never had to explain to non-technical people how to pronounce it? Or sheepishly explained why we have such a lame, geeky sounding portmanteau? Or assured people that saying Postgres is perfectly fine, and that everyone says it that way anyway? I do not read -advocacy, so I probably missed the important discussion on this subject... I cannot see how the current name is a top problem in any priority scheme I care about. I like the current name, and the *infrequent* time the question comes up, it gives me the opportunity to summarize the history of PostgreSQL, and show people how PostgreSQL is a mature product that has earned a place in software history. How this could be a problem? I don't understand. I do not believe people would choose or not choose a product based on whether they happen to pronounce it correctly from the start. Most importantly, changing the name back to Postgres does not actually make the product better in any material way, nor does it improve understanding of what the product does. Having SQL in the name, makes it clear what the product is. We use Atlassian products, and one of the first complaints we get is that people don't implicitly know what products like Bamboo, Confluence, Crucible, FishEye, or JIRA do. They cannot map the products in their head because they have no context. Calling it PostgreSQL, makes it very clear to the uninformed masses where the product fits in a product map. Tell an executive of a company Postgres, and they would ask what is it? Tell them PostgreSQL, and they'll say is that like Oracle? The second is hugely more valuable. I don't want to open the discussion, because I like things the way they are, and think the PostgreSQL developers are doing an excellent job on the high priority items. PostgreSQL is really one of the greatest open source projects out there. I love it! I just can't see a statement like our name is one of our *top* problems go by uncontested. It is false in every way I can think of considering it. Perhaps *some* people have an issue with it. Perhaps these people are enough to pressure a change against the rest who care more about performance, reliability, and features, than a name. But, ultimately, the people working on the performance, reliability, and features, are the people that are making PostgreSQL the success that it is today. The name will not and should not increase adoption. Well, at least in my not so humble opinion. Back to the exciting live standby features and such please! I'm very much looking forward to seeing them in a release. *These* features, I can sell from an advocacy perspective. :-) Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.5 vs. 9.0, Postgres vs. PostgreSQL
On 01/22/2010 10:57 AM, Aidan Van Dyk wrote: * Brendan Jurddire...@gmail.com [100122 10:29]: Holy query language, Batman! Do you mean to tell me that the uninformed masses you interact with have an understanding of what SQL means? I am skeptical of this claim, but if true, you must have access to the most spectacularly informed uninformed masses on the planet. I can't speak for Mark, but the uniformed masses I interact with tend to be the guys looking for (and authorizing) solutions in small-medium business segment... And Microsoft has done the education for us and automatically associated this unknown SQL with a big database... So despite that they have no idea what SQL actually means, or where it came from, it's got the desired association. So, my neck of the woods ain't necessarily yours, but... Exactly. People know where SQL fits in the product map. They probably do NOT know what it stands for, but they don't really care. They pay professional technical people to understand the details. How many people know what SONAR, RADAR, or SCUBA stand for? This doesn't seem to stop them from being able to use the word effectively. MS SQL, MySQL, SQLite - do they have advocacy problems due to the SQL in their name? I think it is the opposite. SQL in the name almost grants legitimacy to them as products. Dropping the SQL has the potential to increase confusion. What is a Postgres? :-) Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for SE-Linux security
My two cents - if it's desired - I invariably disable selinux from all of my production machines. Once upon a time I tried to work with it time and time again - but it was such a head ache to administer for what I considered to be marginal gains, that I eventually gave up. Every time I add a server, it needs to be setup. Or it runs in tolerant mode at which point I'm not sure what value I am really getting at all. Too many times people have come to me with weird problems of servers not starting, or not working properly, and I have now started with the question do you have selinux running? try turning it off... I'm sure some people somewhere love selinux - but I suspect most people find the most relief once they turn it off. I vote for PostgreSQL committers spending their time on things that bring value to the most number of people. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/16/2009 11:28 AM, Bruce Momjian wrote: Dave Page wrote: Too many of those caveats, and it's easy to see how we can be discounted early in the evaluation phase. It's not helped that often these lists will be drawn up by people used to working with the commercial DBMSs, so we probably wouldn't get extra points for having a dozen procedural languages, or other features that are largely unique to PostgreSQL, no matter how cool and useful they are. Yep, this is illustrating something that is pretty basic to open source --- that is open source often provides the tools for a solution, rather than a complete solution. I often think of open source as providing a calculator with wires sticking out, rather than calculator buttons; the wires allow more flexibility, but they are harder to use. Although often true - I think this is selling PostgreSQL a little short. It is a self-contained solution for what it does best, and for those that need more - there are better frameworks designed to be integrated that PostgreSQL is able to integrate with. PostgreSQL isn't a calculator with wires - if anything, I think PostgreSQL is an easy-to-use full functioned calculator whereas Oracle might be some advanced HP calculator that requires special training to learn how to use right... :-) Personally I think the calculator/wires approach is better from an engineering perspective, but it can be a handicap in the user experience and checkbox categories --- ease of use is perhaps not our strong point. Much of our open source value is being different, in both cost, reliability, and configurability. I found this true of a lot of tools. I still remember when the mutt developers argued against putting IMAP in their solution because they thought there might be a better IMAP component client out there. Eventually, such arguments are dropped, as the practical sense on the matter says that tight integration is a requirement. I don't see how PostgreSQL has really failed in this regard. Maybe Oracle comes out-of-box with more features - but this doesn't make it necessarily a more complete solution - it just means it has more bells and whistles. A bicycle doesn't need a ticking card mounted through the spokes for it to be considered a complete solution. :-) Somebody might one day want that feature - but it's extra - it's not core. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/15/2009 03:54 AM, Dave Page wrote: On Wed, Oct 14, 2009 at 11:21 PM, Mark Mielkem...@mark.mielke.cc wrote: On 10/14/2009 05:33 PM, Dave Page wrote: No. Any checks at the client are worthless, as they can be bypassed by 10 minutes worth of simple coding in any of a dozen or more languages. Why care? Because many large (and small for that matter) organisations also have security policies which mandate the enforcement of specific password policies. Just because you think it's worthless to try to prevent someone reusing a password, or using 'password' doesn't mean that everyone else does. Some organisations will use such a feature in a box-ticking exercise when evaluating, and others may actually decide to use the feature, and expect it to work effectively. Beside, we are not in the habit of putting half-arsed features in PostgreSQL. If we do something, we do it properly. You miss my point (and conveniently cut it out). For users who accidentally break policy vs users who purposefully circumvent policy - the approaches must be different, and the risk management decision may be different. It's a lot easier to circumvent policy than most people (management specifically) realize. If your attempt it to absolutely prevent a determined competent individual from circumventing your policy - you need to do a LOT MORE than what you are suggesting. If you just want to prevent accidents - having the client software do the checks is fine. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/15/2009 10:08 AM, Dave Page wrote: It's certainly true that there are other ways for users to compromise their passwords if they want. The fact remains though, that most other DBMSs (and all major operating systems I can think of) offer password policy features as non-client checks which are difficult, if not impossible for the user to bypass. Clearly other people think it's important to do this, and we are compared against their products on a daily basis, so if we want to compete with them on a level playing field we need at least a comparable feature set. Not so clear to me. If they're doing strong checks, this means they're sending passwords in the clear or only barely encoded, or using some OTHER method than 'alter role ... password ...' to change the password. Point being - if you think this is absolutely important to do - don't go +5% of the way - go 100% of the way. Then again, I'm not so concerned about what arbitrary criteria some person defines as what makes a good database system. I'm more concerned with what makes the system better for *me*. I don't see how this entire thread helps *me* in any way - and I do understand the need for strong passwords - and my company *does* have policies that require strong passwords. Even if the plugin is provided - I'm not going to activate it. I already have a policy for setting strong passwords that I already follow. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/15/2009 10:38 AM, Albe Laurenz wrote: Mark Mielke wrote: Does Oracle really do password checks on the base SQL commands used to change an Oracle password? That sounds silly. In Oracle you can write a stored procedure to check passwords; it is invoked whenever a user is created or altered. No matter how you change the password, Oracle can always recover the plaintext and feed it to the password checking function. So, unless you use the Advanced Security option (extra $$) that enables you to encrypt network connections, any eavesdropper with knowledge of Oracle's (secret) encryption algorithms can get your new password when you change it. And the DBA can get your password with ease. Now I remember. Our secure password server used as single-sign on for most applications in the company, which normally avoids any applications ever having to see or authenticate the login, needs to send the passwords in plain or encoded form (not one-way encrypted) to a few broken systems, which include systems designed around Oracle user management, to allow people to login to these applications using their corporate wide password. I remember thinking one word when I learned this... awesome (sarcastic tone playing in head). Where is the check box that says prevents password recovery? For Dave Page: Understand that in a large company, as you are discussing, with policies that require strong passwords, the usual reason for requiring strong passwords is due to concerns over privilege escalation. Access to one system gives you access to others. In the case of single sign-on, which is a requirement for any business with dozens or more applications, where it is unreasonable for each employee to actually memorize dozens of distinct strong passwords, access to one system gives you access to all systems. Therefore, trust the DBA makes no sense. If the DBA can see my password, then they can login to my employee records and check out what my salary or contact information is, or they can login to one of the secure portals and authorize purchases as me. A *good* system, is not trusted with the password. This is why I say you are focusing on making PostgreSQL what you think is a tiny bit better, but the gain is minor or artificial. If PostgreSQL starts requiring strong passwords - the world is not necessarily a better place in any mind except the person doing the ignorant checkbox evaluation who believes advertising on face value. If you need security - you should know enough to know you need something better than per-application password strength checkers. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/15/2009 01:44 PM, Dave Page wrote: I don't deal with prospective clients, which is where this comes from. I do deal with a team of (pre)sales engineers who complain about this, and maybe half-a-dozen other issues on a very regular basis. They tell me that PostgreSQL loses out in early stages of tech evals because of this issue, and I have no reason to disbelieve them. Sure it's almost certainly not the only reason, but they add up. A lot of evaluations are designed to fit exactly one product, and it's impossible to win here. In my own company, I recently saw the most ridiculous (to me) evaluations over a suite of products, that effectively listed an exact implementation as requirements. This resulted in a huge split between people who considered the evaluation fair and who went with their choice for exactly that one product, and the rest of the people who called the evaluation a sham and refused to participate, choosing to instead use their own choice of products not caring about the outcome of the evaluation. The evaluation, by the way, included other silly statements, like how a database instance costs $48k in license fees, even though everybody knew we were already using PostgreSQL for $0k or even if we chose to be supported by one of the many PostgreSQL support companies, it would not cost $48k. Where did they get that number? Because they presumed they would go with Oracle. The evaluation was a sham from start to finish. Perhaps you can see how little I value some arbitrary checkbox list on some evaluation? If people want to count PostgreSQL off the list from the start - they will, and there is not much you or I can do about it. Bowing to the pressure of fulfilling these checkboxes, when they'll just change them next time to something else that PostgreSQL doesn't quite do, is a waste of time. We should do what is right to do. We should not be focusing on checkboxes raised by other people who are not competent enough to understand the subject matter or who have already made their choice, and the evaluation is just a rubber stamp to pretend they have done due diligence about justifying their choice compared to alternatives. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/15/2009 02:02 PM, Dave Page wrote: On Thu, Oct 15, 2009 at 6:55 PM, Robert Haasrobertmh...@gmail.com wrote: OK, so we're in violent agreement here? From a technical perspective I think we have been for a while. Though clearly some people disagree with my assertion that putting any form of policy enforcement in the client is not actually 'enforcement'. I wonder how many of those folks would implement their website's data sanitisation in the browser only - but I digress... :-) It depends on what your goal is. If your goal is to treat users as monkeys that you do not trust, even with their own password, and the DBA as God, who you absolutely do trust, than you are correct. I don't know about your company - but in my company, the DBAs are in the IT department, and they really have no business knowing my password, which would give them access to my employee records, and my authorization capabilities. For any company that requires security, I do not accept that we can trust the DBA. The database is just one small component in a much larger solution. The DBA is the monkey for a minor backend application, and the designers are the people earning money for the corporation. We have the exact opposite of what you are suggesting. A person can get access to much more data by logging in as the user on their *desktop* than by accessing some database directly. I think you are missing that security is a balance. Your dig at ignorant people who do JS-based browser side checks of input is not applicable. You are exchanging one type of security for another type of security. You think that your proposed type of security is more valid than my proposed type of security. It depends on the application. Sometimes you might be right. Other times, you have arguably made things worse. Any company that truly needs security of this sort - should not be using PostgreSQL based roles with passwords for authentication. The true value of your proposal is pretty limited. I'm not saying don't do it. I am saying that you are not truly achieving any improvement in security for the target audience you are saying that you are representing. I think your proposal might improve things for newbies running PostgreSQL on an open Internet port at home who pick username = password. Frankly, I don't think their data is worth protecting, and their choice to use username = password and make it accessible on an open Internet port confirms that they are either completely ignorant about security, or they also agree that their data is not worth protecting. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/14/2009 12:25 PM, Tom Lane wrote: Let's see you do that (hint: CREATD USER ... PASSWORD is going to throw a syntax error before you realize there's anything there that might need to be protected). And you ignored the question of insecure transmission pathways, anyway. By the time the backend has figured out that it's got a CREATE USER ... PASSWORD command, it's already way too late if the client sent it over a non-SSL connection. It seems like the architectural problem here is that people think of SQL as being a valid way to change one's password. I never thought it was valid? What if, like createdb, or createuser, there was a pgpasswd that did all of the appropriate checks and provided the proper security (MD5 today?) during transit and when storing to a log file? createuser already does password prompting - maybe it should allow checking the password from the createuser client side? I think if we looked at the SQL commands as being a 'base operation not intended for everyday users', this entire debate would seem frivolous? :-) Does Oracle really do password checks on the base SQL commands used to change an Oracle password? That sounds silly. I'm totally on Tom's side. Having the server check passwords is the wrong solution to the problem. It makes things worse - not better. That they're approving a plugin capability on the server at all is generous, as it does lead to a sense of protection that may not be justifiable. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/14/2009 05:33 PM, Dave Page wrote: On Wed, Oct 14, 2009 at 9:50 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Can they check the box if the provided clients include password strength checking? I'm just wondering if we're going at this the hard way, if that really is the main goal. No. Any checks at the client are worthless, as they can be bypassed by 10 minutes worth of simple coding in any of a dozen or more languages. Why care? If the client is purposefully disabling passwords checks to use a weak password - this is an entirely different problem from somebody trying a weak password and being allowed. Circumvention of process is always a risk, and should be dealt with as a human resources problem. Why not stop the admin from disabling the security check when they create their pgadmin password too? We can't trust anybody - right? PAM does security checking client-side I think? I'm sure others do too? I'm not saying server checks are worthless - but I think you are exaggerating to say that client checks are worthless. Sending the password in cleartext via SQL seems bad. Sending it encoded seems only marginally better. Sending it in MD5 is good but means that password strength needs to be done by the client. You are saying that it's worth the loss of security in one area, to improve security in another. Providing client checks in the official clients is probably sufficient for your checkbox that you think is so important. Unless you think it is impossible to circumvent process in any of these other databases that do such a better job? Personally, I don't think PostgreSQL is the best place to manage passwords at this level anyways, beyond the basic usage. PostgreSQL shouldn't need to know the password, and the password should still be required to as strong as the organization requires it. Lots of other solutions here - PAM, LDAP, Kerberos, ... How much of these solutions should PostgreSQL re-implement? Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/14/2009 06:02 PM, Dave Page wrote: On Wed, Oct 14, 2009 at 10:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, sure, but we're talking about a client going out of their way to wrestle the point of the gun toward their own foot, aren't we? If we're worried about the user compromising their own password, we have bigger problems, like that slip of paper in their desk drawer with the password written on it. I mean, I know some of these checklists can be pretty brain-dead (I've been on both sides of the RFP process many times), but it would seem over the top to say that client-side password strength checks aren't OK for the reason you give. See my previous comment about dates. Check-box items aside, I have absolutely no desire to try to give the illusion of a security feature, when in reality any user could easily bypass it. This is only true if you thing 'create/alter role ... password ...' is intended to be used by hand - and if you think this, are you not concerned that the password shows up on the screen as the user types this? It's not an illusion if the user requires effort to bypass it. Who are you protecting? If you are trying to protect the user from themselves - I'd start by disabling 'create/alter role ... password ...' altogether and requiring some other means of the user securely setting a password. At our company we have a single-sign on system where users must login to a central password server to change their password. The central password server provides the password strength test. The intent is for none of the applications to ever know or see the password. It's not about trust - it is about principle. The DBA and/or application has no need or requirement to know the password. They only need to know that it is valid. For web-based applications, the user authentication is accomplished via a forward to one of the central password authentication servers, which upon successful login, forwards the user back to their intended application with a token that proves they have authenticated. If you think security is such an important thing - why not go all the way? Disable the entry level security provided by PostgreSQL allowing SQL commands to 'set' a password entirely. Use a token-based system with a third party trusted authenticator. I'm not being facetious. I think it's silly to argue that 50% is insufficient, and that lots of effort should be spent on reaching 55%, when you are clearly nowhere near 100% in either case. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 10/14/2009 06:25 PM, Peter Eisentraut wrote: On Wed, 2009-10-14 at 18:38 +0200, Magnus Hagander wrote: So throwing out a wild idea that's probably just wild enough to even consider, but one way to deal with the logging side of things would be to deprecate/remove ALTER USER/CREATE USER with password, and add a separate API call. With a separate wire protocol packet. That would certainly take care of the logging part ;) I think that would be the correct fix. Yep. +1. If we are really so paranoid. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use samehost by default in pg_hba.conf?
On 10/01/2009 10:24 AM, Kevin Grittner wrote: Trust authentication has a few valid use cases, but it does tend to worry me that people may leave it enabled in inappropriate situations on production clusters. I don't see how we could get rid of it, but I'd be OK with a warning in the log when a pg_hba.conf file is processed which contains any trust entries. I don't think trust needs to be removed entirely - it is a valid option for demos or training sessions perhaps. By using the word abolishing, I might have created the wrong impression. I just meant the default pg_hba.conf having trust has always seemed to be a really bad thing to me. If people already have pg_hba.conf with trust, I see no reason to stop them. If a new user tries using PostgreSQL for the first time - I think the default configuration they encounter should be conservative and usable out of the box. I can see how samehost fits into this picture. I don't see how trust fits into this picture. Does anybody seriously recommend trust to newbies for production use? Shouldn't the default pg_hba.conf represent a conservative recommendation from the pgsql developers? Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use samehost by default in pg_hba.conf?
On 09/30/2009 10:08 PM, Tom Lane wrote: Now that the samehost/samenet patch is in, I wonder if it wouldn't be a good idea to replace this part of the default pg_hba.conf file: # IPv4 local connections: hostall all 127.0.0.1/32 @authmethod@ # IPv6 local connections: hostall all ::1/128 @authmethod@ with: # local connections via TCP/IP: hostall all samehost @authmethod@ The advantage of this is that connections made with -h machine_name instead of -h localhost would work without customization. I can't see any disadvantage to it. Making the change now would also give us an opportunity to test the samehost/samenet implementation in the buildfarm, at least for machines without Unix sockets. (Note that you would still need a non-default setting of listen_addresses for -h machine_name to actually work.) Although there is probably no rush for it - I think this would be a great first user experience change for PostgreSQL 8.5. If it just works out of the box, this is good. In the past, my experience has been that PostgreSQL rarely works out of the box for common scenarios. I know some people are worried about it not working or creating some theoretical security problem that ends up being route caused to PostgreSQL - but I find this thinking inconsistent when I look at the default configuration of trust. I would like to see the default of trust abolished. It scares me far more than sameuser / samehost would ever scare me. Newbie users won't know to fix it, and experienced users always need to fix it. I think the default file should be something that would be most valid to most people. For example: local all all ident hostall all samehost md5 If this was the default, I think many installations would not require customization, and this would be great. Then again - maybe this will open up a huge can of worms where we debate about which configuration is more likely for the average new user :-) Anything is better than trust - even blocking access entirely! Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rejecting weak passwords
On 09/29/2009 09:07 AM, Gurjeet Singh wrote: On Tue, Sep 29, 2009 at 4:49 AM, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com writes: Hmmm, that would be a useful, easy (I think) security feature: add a GUC for failed_logins_allowed. And the counts would be tracked and enforced where? Combining this with other suggestion: .) Provide a GUC failed_logins_allowed .) Add MAX FAILED LOGINS option to ADD/ALTER USER, which defaults to the GUC if not provided in the command. .) Track per-user failed attempt counts in shared catalog, and reset on a successful login. Reset on successful works _against_ the goal of preventing brute force attacks. Brute force attacks try until successful. I read Josh's original suggestion to eventually evolve to if a particular user account from a particular IP address uses the wrong password more than N times in T minutes, than the IP address is locked out for U minutes. This is the *only* way of significantly reducing the ability of a client to guess the password using brute force. It works pretty successfully in other systems. I know when I forget my voice mail password and I try five times, my account gets locked out for 24+ hours unless I request a password reset from our support organization. It is a pain in the butt - but it means that somebody trying to guess my password either has to get it right in a few short guesses, or they are out of luck. Still, all of this seems a little bit over thought out to me, as unless one goes to this extreme - of actually blocking connections from an IP for a period of time - the ability to brute force passwords is already limited by network capacity, network latency, and protocol restrictions. md5 might be broken from the perspective of a super user having access to the md5 and having access to hardware accelerators (GPU), but brute forcing from a client to a server is still limited to thousands of attempts or less per second. This particular aspect of PostgreSQL has not concerned me. I tend to think that anybody who exposes their PostgreSQL to the Internet directly is asking for trouble no matter how hard pgsql-hackers tries to protect them. On my own network, there are so many other ways of getting at the passwords - including the crypt() passwords being visible over NIS using ypmatch, that this is really the least of my concerns. We have employee agreements in place that prevent the use of hacking, and outsiders are not supposed to have access to our network. Point being - if you want to really be effective compared to what we have today - you need to go all the way. Half way is useless. Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [HACKERS] Rejecting weak passwords
On 09/29/2009 12:54 PM, Josh Berkus wrote: I read Josh's original suggestion to eventually evolve to if a particular user account from a particular IP address uses the wrong password more than N times in T minutes, than the IP address is locked out for U minutes. This is the *only* way of significantly reducing the ability of a client to guess the password using brute force. As pointed out by others, that was a false assertion. Most sophisticated attackers sniff the MD5 password over the network or by other means, and then brute force match it without trying to connect to the DB. I don't know about most. Sniffing requires an inside track. I cannot sniff your network traffic from my notebook in Ottawa. Somebody so inclined would have to first break into your network to see the password you are sending. Then, if they can sniff your traffic, they can do all sorts of other things. It's a bit of a wash in the grand scheme of things. In practice, for the last decade, I have seen peaks of tens of thousands of attempts a day to brute force into my machine from the Internet from locations all over the world. It is not limited to telnet or SSH either - they come in on IMAP ports, VNC ports, SMB ports, or anything else that is widely used and exposed. Brute forcing through remote login is a well used method of cracking. Still, their ability to guess is limited by network capacity and network latency. So, it is on the order of thousands, not millions, and basic password precautions such as don't use a word are still quite effective. I don't think knowing the MD5 is an attack on its own. It might be a component in an escalation vector whereby the first get access to one resource, and then sniff the MD5 or see it in the backend database storage, to break into another resource. In any case - if they get the MD5, PostgreSQL is already compromised, and the next attack is more likely to affect something else - not PostgreSQL itself. Within our company, the crypt() passwords are available to all employees via NIS. Technically, this is a problem - but in practice, how much effort is this worth resolving? If they can get onto our network to get access to the crypt() password, they probably already have access to other intellectual property. Mostly - I'm saying that PostgreSQL using MD5 is a minor issue - switching to SHA-1 is not going to eliminate the problem, it will just make things a tiny bit harder for a would be attacker. To actually close the window, as opposed to push it closed a little tighter, would take a lot more effort. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby on git
On 09/26/2009 10:04 AM, Simon Riggs wrote: If you think there's something useful I could do, let me know and I'll take a look. I feel like I need a better way of unit testing new code. Some of the code in the patch is to handle corner cases, so recreating them is fairly hard. It is a nagging feeling that I am missing some knowledge here and would welcome some insight, or research, into better ways of doing general case unit testing. You might try and steal ideas from EasyMock / PowerMock - but not sure how well the ideas map to C. Generally it means allowing the functions to be called from a mock environment, where subroutine calls that might be called are stubbed out to return sample data that would simulate your scenario. Object oriented languages that require every object to provide an interface where most object methods can be overridden are more ideal for performing this sort of test. I rarely ever see this sort of stuff in FOSS projects, and never that I can remember in FOSS C projects. It's not easy, though. I assume you are doing it through code changing right now. Commenting out lines, replacing them with others, etc? Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby on git
On 09/26/2009 02:28 PM, Dan Colish wrote: There are a variety of projects dedicated to creating C unit test frameworks. I don't have a lot of experience with them, but I have heard good things about check and cunit. Here's a link I found with a longer list of frameworks. http://www.opensourcetesting.org/unit_c.php Looking at check and cunit - I don't see what sort of mock function facility they would provide? One part of unit testing is arranging for functions to be called, tested, and results reported on. This can take you a certain amount of the way. Pure functions, for example, that always generate the same output for the same input parameters, are perfect for this situation. Perhaps test how a qsort() or bsearch() method works under various scenarios? Most real life code gets a little more complicated. For example, what if we want to simulate a network failure or out of disk space condition? What if we want to test out what happens when the Y2038 date is reached? This requires either complex test case setup that is difficult to run reproducibly, or another approach - mock. It means doing things like overriding the write() method, and making it return successful N times, and then failing on the (N+1)th time with ENOSPC. It means overriding the gettimeofday() method to return a time in the future. A major benefit of this sort of testing is that it should not require source changes in order to perform the test. This sort of stuff is a LOT easier to do in OO languages. I see it done in Java a lot. I can't remember ever having seen it done in C. I think it's just too hard compared to the value obtained from the effort. In your list above, it does show a few attempts - CMock sticks out as a for example. It looks more complicated, though. It takes a .h file and generates stubs for you to fill in? That could be difficult to manage for a large project with thousands or many times more unit tests. OO is easier because you can override *only* particular methods, and you can safely call the super method that it overrides to provide the underlying behaviour in the success cases. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba.conf: samehost and samenet [REVIEW]
If looking for representation - I consider the default pg_hba.conf to be problematic. Newbies start with trust access, and then do silly things to open it up. I would use samehost, and if samenet worked the same way it does for Postfix, I would probably use samenet. This information can be pulled from the operating system, and the requirement for it to be hard-coded in pg_hba.conf is inconvenient at best, and problematic at worst. Yes, renumbering requires some thought - but I prefer applications that do the majority of this thought for me over applications that require me to do mundane activities. I would also use DNS in pg_hba.conf if it were available. I can see some of the issues with this (should it be mapped to IP right away, or should it be re-evaluated every time?), but ultimately the feature would be useful, and would be widely used. Especially once we get to IPv6, specification of the addresses will become a horrible chore, and solutions which require the IPv6 address to be spelled out will be painful to use. Both of these are generally one time costs for me. They are a pain, but most of us suck it up and swallow. It hasn't been on my list of itches that I just have to scratch. Remember, though, that the majority of PostgreSQL users are not represented on this list, and my pain here might be acceptable, but a newbie will probably either turn away or do something wrong. Better to give them a sensible configuration from the start from, and allow the experts to specify IP addresses if that is what they want to do. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba.conf: samehost and samenet [REVIEW]
On 09/23/2009 05:37 PM, Andrew Dunstan wrote: Tom Lane wrote: In this case what particularly scares me is the idea that 'samenet' might be interpreted to let in a larger subnet than the user expected, eg 10/8 instead of 10.0.0/24. You'd likely not notice the problem until after you'd been broken into ... I haven't looked at this feature at all, but I'd be inclined, on the grounds you quite reasonably cite, to require a netmask with samenet, rather than just ask the interface for its netmask. I think requiring a netmask defeats some of the value of samenet. When being assigned a new address can change subnet as well. For example, when we moved one of our machines from one room to another it went from /24 to /26. I think it should be understood that the network will not work properly if the user has the wrong network configuration. If they accidentally use /8 instead of /24 on their interface - it's more likely that some or all of their network will become inaccessible, than somebody breaking into their machine. And, anything is better than 0.0.0.0. There are two questions here I think - one is whether or not samenet is valid and would provide value, which I think it is and it does. A second question is whether it should be enabled in the default pg_hba.conf - I think not. Postfix has this capability and it works fine. I use it to allow relay email from machines I trust, because they are on my network. I think many people would use it, and it would be the right solution for many problems. Worrying about how some person somewhere might screw up, when they have the same opportunity to screw up if things are left unchanged (0.0.0.0) is not a practical way of looking at things. How many Postfix servers have you heard of being open relays as a result of samenet? I haven't heard of it ever happening. I suppose it doesn't mean it hasn't happened - but I think getting the network interface configured properly being a necessity for the machine working properly is a very good encouragement for it to work. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba.conf: samehost and samenet [REVIEW]
On 09/23/2009 05:40 PM, Tom Lane wrote: I haven't looked at this feature at all, but I'd be inclined, on the grounds you quite reasonably cite, to require a netmask with samenet, rather than just ask the interface for its netmask. I was just thinking the same thing. Could we then unify samehost and samenet into one thing? sameaddr/24 or something like that, with samehost just being the limiting case of all bits used. I am not sure though if this works nicely for IPv6 as well as IPv4. I could see some people wanting this as well - but it's not a replacement for samenet, it would be an additional feature. For example, at my company, I have a cluster of machines on a /26 subnet, but for some accesses, I would prefer to open it up to /8, since our company has a /8, and I may want to allow anybody in the company to connect, regardless of how things are routed. I may still want samenet in the same configuration, to grant additional access if the person happens to be on my switch compared to anywhere in the company. For my switch, having to hard code the subnet is back to being a pain. If we enlarge our subnet to /25, it's one more thing that I would have to remember to change unnecessarily. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crypto
On the subject of crypto law - the laws have relaxed significantly in the last decade to the point where it is now generally safe to export symmetric encryption up to 128 bits (example: AES), and assymetric encryption up to 1024 bits (example: RSA). Many countries still require some sort of license, though, which takes the form of a formal request may I export this? yes. As a for example, I received approval from our company lawyers to re-export the Java runtime with a program we have which uses exactly 128 bit symmetric and 1024 bit assymetric to all countries except embargoed countries. Since it makes no sense to do business in embargoed countries anyways, there is no point in asking at all. For free / open source software in general, the US has explicit exemptions for freely available software on the Internet, for the most part because it is impossible for them to control it. In this situation, PostgreSQL has a lot more freedom than, say, Oracle, to distribute crypto. As a for example, Firefox includes crypto to support SSL and certificate checking. Now, many countries also have *import* restrictions, so while it's safe to freely export Firefox from the United States over the Internet, in some countries, it is *illegal* for their own citizens to encrypt their data beyond a certain level. If such rules are enforced (I think Australia even had such a rule for a time), then it would be the citizen doing the import that is affected. At present, I wonder about the status of such things in China. While in China, they didn't prevent me from using my high encryption strength VPN software to access work - was I breaking the law by importing the technology and using it? I don't know, and I didn't really think much about it at the time. All this being said - laws change all the time, and the number of countries involved in the equation each which may or may not have rules that apply to PostgreSQL at various times, that I still agree with Andrew - to go from no-crypto to crypto is a huge change that MAY result in downstream consequences which would adversely effect the success of PostgreSQL, or may even end up with some PostgreSQL representative in the chain defending themselves in a court room. I think it would be best to leave crypto *outside* of core, but make it an extremely easy to add plugin with download at your own risk - if you are unsure whether you are allowed to import crypto into your country, you are responsible for seeking your own legal counsel. Java did this with their main software being generally exportable, and their unlimited strength crypto libraries requiring a separate download with appropriate warnings to keep Sun happy that they would not be held legally responsible if somebody did misuse the software. I work for a telecommunications company which requires crypto in most software components, so this stuff is taken very seriously. The last thing you want to see on television is a terrorist using an untraceable secure line with your company's brand name on the front, as they lop off the head of a reporter. There is a level of responsibility required for such things both from a business perspective and from a ethics perspective. Cheers, mark On 09/19/2009 01:55 PM, Andrew Dunstan wrote: David Fetter wrote: As for the suggestion that we should put other crypto functions into the core, AIUI the reason not to is not to avoid problems with US Export Regulations (after all, we've shipped source tarballs with it for many years, including from US repositories), but to make it easier to use Postgres in places where use of crypto is illegal. To date, I have not found an example of such a place. For the record, would you or anyone seeing this be so kind as to provide one, along with some kind of evidence that somewhere, such a law has actually been enforced? There are significant controls in a number of countries. See http://rechten.uvt.nl/koops/cryptolaw/cls-sum.htm. I am not going to do more research on this - I have better things to do with my time. The point has been made elsewhere that including general crypto in core is entirely unnecessary for any purpose we know of. That along with knowledge that its use is at least restricted in several countries should surely be argument enough. This comes up often enough that I'm almost wondering if it deserves an FAQ entry. -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
Not sure that this really belongs on pgsql-committers - maybe pgsql-hackers? No matter what scheme PostgreSQL uses for storing the data, there can be underlying file system limitations. One solution, for example, would be to use a file system that does not have a limitation of 32k subdirectories. Although ext3 and/or ufs has this limit - ext4 has removed this limited. There are many ways PostgreSQL could work around this problem - your suggestion of using sub-directories being one of them - but what happens if this causes performance degradation for existing users, due to the extra file system lookups required on every access? Another solution would be to store everything in the same file. In any case, I think this would be a significant architecture change for something that sounds like a bad idea. I would expect having 32k databases to have significant performance degradations in other ways. In particular, I am thinking about having to open a file descriptor for each of these files. What sort of database architecture requires 32k databases or tables for the same PostgreSQL instance? Have you considered having an additional field for your primary key and combining several tables into one? Cheers, mark On 09/12/2009 02:49 PM, fulan Peng wrote: Hi, pgsql-committers! I cannot created more than 32766 databases with freeBSD in one setup, not as the document says, as many as you like. I found the problem is that the directory pgsql/data/base cannot hold more than 32766 subdirectories. I suggest to make 32766 subdirectories in base directory, say /base/0, /base/1, /base/32765. Then in each subdirectory to put the database description. This way, we can have 32766x32766 databases. This is kind of as many as you like. The ZFS system is not a solution for me. It is snail slow. -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
On 09/12/2009 03:33 PM, Stephen Frost wrote: * Mark Mielke (m...@mark.mielke.cc) wrote: No matter what scheme PostgreSQL uses for storing the data, there can be underlying file system limitations. This is true, but there's a reason we only create 1GB files too. I wouldn't be against a scheme such as described to minimize the impact to PG of these limitations. Ok - but minimizing the impact does not necessarily mean keep doing what we are doing, but work around the issue. One interpretation of the problem is that the problem is that PostgreSQL is trying to use too many sub-directories in the same containing directory. I would argue that the problem is that PostgreSQL requires so many sub-directories in the first place. There are many database designs that do not require one file per database. Berkeley DB JE, for instance, treats each database as one root in a larger tree. The entire database is stored in one set of files, where the files are created due to database volume, not database quantity. Tables can be thought of similarly. There are many ways PostgreSQL could work around this problem - your suggestion of using sub-directories being one of them - but what happens if this causes performance degradation for existing users, due to the extra file system lookups required on every access? Ehhh, it's likely to be cached.. Sounds like a stretch to me that this would actually be a performance hit. If it turns out to really be one, we could just wait to move to subdirectories until some threshold (eg- 30k) is hit. Cached does not eliminate the cost. It just means it doesn't have to go to disk. It still needs to traverse an additional level of the VFS tree. Sure, this is designed to be cheap - but this avoids the real cost from consideration - that of having so many subdirectories in the first place. Another solution would be to store everything in the same file. eh? There is no technical requirement for PostgreSQL to separate data in databases or tables on subdirectory or file boundaries. Nothing wrong with having one or more large files that contain everything. PostgreSQL doesn't happen to do this today - but it's bothered me at times that it has so many files in the database directory - even very small tables require their own files. In any case, I think this would be a significant architecture change for something that sounds like a bad idea. I would expect having 32k databases to have significant performance degradations in other ways. Actually, I think some of the changes to remove flatfiles might improve our performance with large numbers of databases. I also don't see how this would be a significant architecture change at all. If there are still issues that make having lots of databases slow, we might want to look into fixing those issues rather than saying well, just don't do that. I guess I'm not seeing how using 32k tables is a sensible model. So yes, things can be done to reduce the cost - but it seems like something is wrong if this is truly a requirement. There are alternative models of storage that would not require 32k tables, that likely perform better. Although, I don't know your requirements, so perhaps I am missing something. In particular, I am thinking about having to open a file descriptor for each of these files. What sort of database architecture requires 32k databases or tables for the same PostgreSQL instance? Have you considered having an additional field for your primary key and combining several tables into one? I've got a ton of instances that have32K tables. My approach is generally to keep the number of databases low, while having lots of schemas, but there are distinct downsides to that (specifically related to hiding information.. something alot of people care about, but thankfully I don't have to). Do you agree with me that having 32k open file descriptors (or worse, open on demand file descriptors that need to be re-opened many times) is a problem? Looking at PostgreSQL today - I don't think it's designed to scale to this. Looking at SQL today, I think I would find it difficult to justify creating a solution that requires this capability. Honestly - it seems a bit insane. Sorry. :-) Maybe I'm just naive... Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
On 09/12/2009 03:48 PM, Stephen Frost wrote: This would allow for 220M+ databases. I'm not sure how bad it'd be to introduce another field to pg_database which provides the directory (as it'd now be distinct from the oid..) or if that might require alot of changes. Not sure how easy it'd be to implement something to address this problem while we continue to tie the directory name to the oid. Other than bragging rights - what part of this would be a GOOD thing? :-) My God - I thought 32k databases in the same directory was insane. 220M+??? Hehehe... If you can patch PostgreSQL to support such extremes without hurting my performance - I'll shut up and leave you be. :-) Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.
On 09/12/2009 04:17 PM, Stephen Frost wrote: * Mark Mielke (m...@mark.mielke.cc) wrote: There is no technical requirement for PostgreSQL to separate data in databases or tables on subdirectory or file boundaries. Nothing wrong with having one or more large files that contain everything. Uhh, except where you run into system limitations on file size (eg- a 2G max file size..). You'll note PG creates files up to 1G and then splits them into separate files. It's not done just because it's fun. This becomes a bit of a side thread - but note that I carefully didn't say exactly one file. I said one or more large files that contain everything. That is, if we have 3 databases each of size 50 Mbytes, there is no technical reason why this cannot be stored within a single 150 Mbyte data file. Sure, if it goes beyond 2G, we can break it into a set of files, and treat each file as a 2G block in a virtual larger storage pool. VMWare has this for storing virtual drives. If we assume that 32k *databases* is reasonable for a single instance, for 32k databases to *require* 32k immediate sub-directories is the real problem. This can be solved either by: 1) Adding additional depth to the directory height to work around this limit (what the OP and you are proposing), or 2) Storing multiple databases within the same files or sub-directories. If you really must have this amount of scalability, I am suggesting that you consider all of the resources required to access 32k worth of sub-directories in the file systems, specifically including file descriptors, inodes, the backing bitmaps or extent mappings that allocate from the file system free space, the rather inefficient directory layouts of many file systems (many file systems still do LINEAR searches for filenames, making file lookups linearly slower as the directory becomes larger), and the kernel memory caches that track all of these little details. The POSIX guarantees required are certainly more heavy weight than the requirements that PostgreSQL has, and I am certain it is possible to create a targetted solution to this problem that is simpler and faster. For only a few databases and a few files, the effort isn't worth it. But, if supporting 32k+ *databases*, or even 32k+ tables and indexes is a major requirement, and a major design target, then PostgreSQL should do this stuff itself. Modern file systems don't have the 2G problem. ext2/ext3 for 4Kbyte blocks (standard) supports up to 2Tbytes. This also matches the practical limit on addressing a single physical disk, at least on the platforms I am familiar with. The requirement to stay under 2G for a single file is a bit out dated. I guess I'm not seeing how using 32k tables is a sensible model. For one thing, there's partitioning. For another, there's a large user base. 32K tables is, to be honest, not all that many, especially for some of these databases which reach into the multi-TB range.. Talking philosophically - the need to use table-based partitioning to achieve acceptable performance or storage requirements is somewhat of a hacky work around. It's effectively moving the database query logic back into the application space, where the application must know which tables contain which data. The inherited tables and automatic constraint-based query planning helps out, but it's still an elaborate hack. It's exposing data that the application should not need to care about, and then making it possible to hide some of it again. Table partitioning should be far more automatic. I don't want to break my theoretical table containing every call made on my network into per-hour tables, each with a constraint for the time range it includes data for. I want to create a table, with a timestamp column, fill it with billions of records, provide a few hints, and the database engine should be smart enough to partition the table such that my queries just work. Back to reality - maybe things have not reached this level of maturity yet, and people with practical requirements today, have found that they need to use very complex manual partitioning schemes that chew up thousands of tables. So yes, things can be done to reduce the cost - but it seems like something is wrong if this is truly a requirement. I have no idea what you've been working with, but I hardly think it makes sense for PG to consider over 32k tables as not worth supporting. I don't advocate any limits. However, I also don't advocate designing PostgreSQL specifically for the case of 32k tables. If you want to use 32k tables, then you better have a file system that supports 32k+ files in a single directory, and a kernel that is able to work efficiently when postgres has thousands or more file descriptors open and in use at the same time. The system *supports* 32k tables, but if you look at the design, you'll see that it is not optimal for 32k tables. Even
Re: [HACKERS] postgres-r
On 08/12/2009 12:04 PM, Suno Ano wrote: can anybody tell me, is there a roadmap with regards to http://www.postgres-r.org ? I would love to see it become production-ready asap. Even a breakdown of what is left to do might be useful in case any of us want to pick at it. :-) Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby?
On 08/11/2009 09:56 AM, Kevin Grittner wrote: Bruce Momjianbr...@momjian.us wrote: OK, so it is warm slave. That is technically accurate, given the preceding definitions, but it has disturbing connotations. Enough so, in my view, to merit getting a little more creative in the naming. How about warm replica? Other ideas? I agree that the present moniker misleads. I remember this debate from 6 months ago. :-) I prefer not to try and fit square pegs into round holes. Streaming replication sounds like the best description. It may not be the keywords that newbies are looking for, but too bad for them. Calling it something different than what it is, just so that people who don't understand why it is wrong will have something that approximates the right understanding, is not a just cause. :-) Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [HACKERS] Hot standby?
On 08/11/2009 02:52 PM, Robert Haas wrote: On Tue, Aug 11, 2009 at 2:48 PM, Mark Mielkem...@mark.mielke.cc wrote: I remember this debate from 6 months ago. :-) I prefer not to try and fit square pegs into round holes. Streaming replication sounds like the best description. It may not be the keywords that newbies are looking for, but too bad for them. Calling it something different than what it is, just so that people who don't understand why it is wrong will have something that approximates the right understanding, is not a just cause. :-) Uhm, I think you are confused. Hot Standby = Allow read-only queries on a PostgreSQL server during archive recovery Synchronous (or Streaming) Replication = Allow WAL to be streamed on a byte level rather than copied file-by-file Hot Standby is not any sort of replication. I don't think I was confused before - but I am confused now. :-) This patch does not provide streaming replication? Having the standby allow for read-only queries is a would be nice for me, but it's not very useful. I've been monitoring these threads (and wishing they were at a level I could participate and contribute on), because I want the ability to have near real time updates such that the standby can become live. Hot standby to me means the slave is as close to up-to-date as possible and can potentially take over at any time in a near instance. This *implies* some sort of streaming replication (byte level rather than file-by-file) rather than waiting for WAL logs to become full and shipped. If this patch doesn't give me near real time replication, then I am confused about why I would want it at all. pg_standby already gives the ability to do replication on a per completed WAL log file basis. Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [HACKERS] Hot standby?
On 08/11/2009 11:19 PM, Robert Haas wrote: On Tue, Aug 11, 2009 at 9:44 PM, Greg Starkgsst...@mit.edu wrote: No! This is *not* what hot standby means, at least not in the Oracle world. I'm perplexed by this. For example: http://en.wikipedia.org/wiki/Hot_standby Admittedly, wikipedia is not an authoritative source, but I've always understood cold/warm/hot just as Peter described them upthread. Cold means it's on the shelf. Warm means it's plugged in, but you have to have to do something to get it going. Hot means it just takes over when needed. But of course I guess Oracle can call their features what they want to... To further confuse things, the temperature might apply to only a particular aspect of the solution. For example, hot swappable disk drives are drives that probably do sit on a shelf until they are needed, and the hot aspect only implies that the server does not need to be shut down to install the drives, and initialize them for service. :-) For databases, people seem to be associating hot with the ability to issue read only queries. As somebody else said - under a definition of hot that includes read-only clones, pg_dump/pg_restore could be considered a hot standby strategy. I don't agree with that definition. For the clone to be able to perform read-only queries does not imply hot nor does it imply standby. It implies slave. The original poster correctly raised this concern. For myself, I associate hot to mean ready to replace the master, which implies that the data is up-to-date or nearly up-to-date, and implies that the server is within a bit toggle of accepting mastership of the data and serving ALL queries that the master would serve. Key to this is nearly up-to-date (requires some sort of streaming) AND ALL queries (not just read queries!). If the server happens to be able to do read queries while it is waiting in standby more - that's convenient and could be useful to somebody, but that's not the value of a hot standby - that's the value of a read-only slave. The feature being provided is not hot standby. In the case of Oracle, I believe their hot standby provides the nearly up-to-date, and the capability to switch over, which satisfies my requirements. It might *also* allow read-only slave for the standby, but that's just convenience - it's not really part of the definition or expectation of hot standby. log based replication, read-only slaves, and hot standby are all 100% accurate descriptions of what the hot standby patch enables. I do like read only slaves because it's the most precise and meaningful. Me too. Read only slave works for me. Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [HACKERS] Re: Synch Rep: direct transfer of WAL file from the primary to the standby
On 07/08/2009 09:59 AM, Kevin Grittner wrote: Dimitri Fontainedfonta...@hi-media.com wrote: 4. sync: slave is no more lagging, it's applying the stream as it gets it, either as part of the master transaction or not depending on the GUC settings I think the interesting bit is when you're at this point and the connection between the master and slave goes down for a couple days. How do you handle that? Been following with great interest... If the updates are not performed at a regular enough interval, the slave is not truly a functioning standby. I think it's a different problem domain, probably best served by the existing pg_standby support? If the slave can be out of touch with the master for an extended period of time, near real time logs provide no additional benefit over just shipping the archived WAL logs and running the standby in continuous recovery mode? Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [HACKERS] 8.4, One-Time Filter and subquery ( ... FROM function() union all ... )
I found Tom's response ambiguous - but positive in either way, so it gave me a smile. :-) Which of the following two great things occurred? 1) Tom popped a quick fix on CVS HEAD? (Pretty fast!) 2) Tom or somebody else had already done it? Cheers, mark On 07/07/2009 05:14 PM, Sergey Burladyan wrote: Tom Lanet...@sss.pgh.pa.us writes As of CVS HEAD you get QUERY PLAN Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=1) One-Time Filter: false Total runtime: 0.179 ms (3 rows) Thank you, Tom ! -- Mark Mielkem...@mielke.cc
Re: [HACKERS] Named transaction
On 06/18/2009 02:42 AM, Pavel Golub wrote: Now to implement customer's desire in PostgreSQL there are two ways: 1. Each script must be executed in the separate connection context 2. Each script must be executed inside critical section, in other words current scipt must block others until COMMIT or ROLLBACK I don't like both. What don't you like about 1) ? I don't know of any other databases that work this way. Using separate connections and connection pooling seems to be the way to go here. Personally, I found the named transaction concept a little skrewy unless: 1) SQL commands can be sent asynchronously as long as they are for different named transactions, even while other transactions are still running. 2) Each transaction runs in a different server-side thread. If this is what you want, it sounds like you are just trying to multiplex multiple queries and responses over the same TCP/IP connection. For the added complexity on both the client and the server, do you really think it is worth it? If you just want a connection multiplexor that is backed by a connection pool - I think that would be a lot easier to provide. :-) Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Markus Wanner wrote: Quoting Mark Mielke m...@mark.mielke.cc: I am a theory person - I run things in my head. To me, the concept of having more context to make the right decision, and an algorithm that takes advantage of this context to make the right decision, is simple and compelling on its own. Knowing the algorithms that are in use, including how it selects the most recent common ancestor gives me confidence. Than makes me wondering why you are speaking against merges, where there are common ancestors. I'd argue that in theory (and generally) a merge yields better results than cherry-picking (where there is no common ancestor, thus less information). Especially for back-branches, where there obviously is a common ancestor. Nope - definitely not speaking against merges. Automatic merges = best. Automatic cherry picking = second best if the work flow doesn't allow for merges. Doing things by hand = bad but sometimes necessary. Automatic merges or automatic cherry picking with some manual tweaking (hopefully possible from kdiff3) = necessary at times but still better than doing things by hand completely. I think you and I are in agreement. (Even Tom and I are in agreement on many things - I just didn't respond to his well thought out great posts, like the one that describes why back patching is often better than forward patching when having multiple parallel releases open at the same time) No amount of discussions where others say it works great and you say I don't believe you until you provide me with output is going to get anywhere. Well, I guess it can be frustrating for both sides. However, I think these discussions are worthwhile (and necessary) none the less. As not even those who highly appreciate merge algorithms (you and me, for example) are in agreement on how to use them (cherry-picking vs. merging) it doesn't surprise me that others are generally skeptic. We're in agreement on the merge algorithms I think. :-) That said, it is a large domain, and there is room for disagreement even between those with experience, and you are right that it shouldn't be surprising that others are generally sceptic. Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: I have heard it claimed that git is more intelligent than plain diff/patch and could successfully merge patches in cases that currently require manual adjustment of the sort Andrew describes. If that's really true to any significant extent, then it could represent a benefit large enough to persuade us to alter work flows (at least for simple patches that don't require significant rethinking across branches). However, I have yet to see any actual *evidence* in support of this claim. How robust is git about dealing with whitespace changes, nearby variable renamings, and such? Andrew's plperl patches would be an excellent small test case. Anybody want to try them against the experimental git repository and see if git does any better than plain patch Any revision control system should be able to do better than diff/patch as these systems have more information available to them. Normal GIT uses the relatively common 3-way merge based upon the most recent common ancestor algorithm. Assuming there is a most recent common ancestor that isn't file creation, it will have a better chance of doing the right thing. Systems such as ClearCase have had these capabilities for a long time. The difference with distributed version control systems is that they absolutely must work well, as every user has their own repository, and every repository represents a branch, therefore each user of the system is working on a different branch. The need for reliable merges goes up under a distributed version control system. Not to say GIT is truly best-in-class here, but it definitely has motivation to be and benefit of being better than diff/patch. These sorts of tools usually work with another tool such as kdiff3 to allow for only the conflicts the be resolved. If you set it up properly, you can have the automatic merges completely successful, and kdiff3 or similar can present you a graphical interface that allow you to identify and resolve the conflicts that require help. I've used these sorts of tools long enough to completely take them for granted now, and it feels painful to go back to anything more primitive. Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Developer meeting minutes up
Tom Lane wrote: Any revision control system should be able to do better than diff/patch as these systems have more information available to them. Normal GIT uses the relatively common 3-way merge based upon the most recent common ancestor algorithm. Assuming there is a most recent common ancestor that isn't file creation, it will have a better chance of doing the right thing. And I still haven't seen any actual evidence. Could we have fewer undocumented assertions and more experimental evidence? Take Andrew's plperl patches and see if git does any better with them than plain patch does. (If it's not successful with that patch, it's pointless to try it on any bigger cases, I fear.) This comes to the theory vs profiling I suppose. I am a theory person - I run things in my head. To me, the concept of having more context to make the right decision, and an algorithm that takes advantage of this context to make the right decision, is simple and compelling on its own. Knowing the algorithms that are in use, including how it selects the most recent common ancestor gives me confidence. You have the capabilities to test things for yourself. If you have any questions, try it out. No amount of discussions where others say it works great and you say I don't believe you until you provide me with output is going to get anywhere. I could set up a few scenarios or grab actual patches and show you particular success cases and particular failure cases, but will you really believe it? Because you shouldn't. For all you know, I picked the cases I knew would work and put them up against the cases I knew would fail. I've used ClearCase for around 10 years now, and with the exception of cherry picking, it has very strong and mature merge support. We rely on merges being safe while managing many projects much larger than PostgreSQL. Many of the projects have hundreds of users working on them at the same time. CVS is *unusable* in these environments. Recently, however, in spite of investments into ClearCase, we are looking at GIT as providing *stronger* merge capabilities than ClearCase, specifically with regard to propagating changes from one release to another. I'm not going to pull up the last ten years of history and make it available to you. Nothing is going to prove this to you other than trying it out for yourself. People need to be burned by unreliable merge algorithms before they respect the value of a reliable merge algorithm. People need to experience reliable merging before they buy the product. If the theory doesn't work for you, you really are going to have to try it out for yourself. Or not. It doesn't matter to me. :-) In any case - you raised the question - I explained how it works - and you shot me done without any evidence of your own. I explained how it works. It's up to you to try it out for yourself and decide if you are a believer. Cheers, mark P.S. I'm only a bit insulted by these threads. There are a lot of sceptical people in the crowd who until now have raised questions which only make it clear that these people have not ever worked with a capable SCM system on a major project before. I really shouldn't hold this against you, which is why I continue to try and provide the theory and background, so that when you do give it a chance, it will all start to make sense. You'll try it out - find it works great - and wonder how does it do that? Then, hopefully you can go back to my post (or the many others who have tried to help out) and read how it works and say ah hah! excellent! -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Managing multiple branches in git
Tom Lane wrote: I agree, .git as a subdirectory of the working directory doesn't make much sense to me. I wondered for a second about symlinking .git from several checkout directories to a common master, but AFAICT .git stores both the repository and status information about the current checkout, so that's not gonna work. In the one large project that I have a git tree for, .git seems to eat only about as much disk space as the checkout (so apparently the compression is pretty effective). So it wouldn't be totally impractical to have a separate repository for each branch, but it sure seems like an ugly and klugy way to do it. And we'd still end up with the same commit on different branches appearing entirely unrelated I am curious about why an end user would really care? CVS and SVN both kept local workspace directories containing metadata. If anything, I find GIT the least intrusive of these three, as the .git is only in the top-level directory, whereas CVS and SVN like to pollute every directory. Assuming you don't keep binaries under source control, the .git containing all history is very often smaller than the pristine copy kept by CVS or SVN in their metadata directories, so space isn't really the issue. Maybe think of it more like a feature. GIT keeps a local cache of the entire repo, whereas SVN and CVS only keeps a local cache of the commit you are based on. It's a feature that you can review history without network connectivity. Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Managing multiple branches in git
Alvaro Herrera wrote: Mark Mielke wrote: I am curious about why an end user would really care? CVS and SVN both kept local workspace directories containing metadata. If anything, I find GIT the least intrusive of these three, as the .git is only in the top-level directory, whereas CVS and SVN like to pollute every directory. That's not the problem. The problem is that it is kept in the same directory as the checked out copy. It would be a lot more usable if it was possible to store it elsewhere. I'm not following. CVS and SVN both kept such directories in the checked out copy. Recall the CSV/*,v files? As for storing it elsewhere - if you absolute must, you can. There is a --git-dir=GIT_DIR and --work-tree=GIT_WORK_TREE option to all git commands, and GIT_DIR / GIT_WORK_TREE environment variables. I just don't understand why you care. If the CVS directories didn't bug you before, why does the single .git directory bug you now? I'm genuinely interested as I don't get it. :-) Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Managing multiple branches in git
Alvaro Herrera wrote: Mark Mielke wrote: I just don't understand why you care. If the CVS directories didn't bug you before, why does the single .git directory bug you now? I'm genuinely interested as I don't get it. :-) It doesn't. What bugs me is that the database (the pulled tree if you will) is stored in it. It has already been pointed out how to put it elsewhere, so no need to explain that. What *really* bugs me is that it's so difficult to have one pulled tree and create a bunch of checked out copies from that. (In the CVS world, I kept a single rsync'ed copy of the anoncvs repository, and I could do multiple cvs checkout copies from there using different branches.) You say database, but unless you assume you know what is in it, .git isn't really different from CVS/ or .svn. It's workspace metadata. Size might concern you, except that it's generally smaller than CVS/ or .svn. Content might concern you, until you realize that being able to look through history without accessing the network is a feature, not a problem. Time to prepare the workspace might concern you, but I haven't seen people time the difference between building a cvs checkout vs a git clone. You talk about avoiding downloads by rsync'ing the CVS repository. You can do nearly the exact same thing in GIT: 1) Create a 'git clone --bare' that is kept up-to-date with 'git fetch'. This is your equivalent to an rsync'ed copy of the anoncvs repository. 2) Use 'git clone' from your local bare repo, or from the remote using the local bare repo as a reference. Either hard links, or as a reference no links at all will keep your clone smaller than either a CVS or an SVN checkout. Mainly, I want to point out that the existence of .git is not a real problem - it's certainly no worse than before. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Managing multiple branches in git
Tom Lane wrote: Mark Mielke m...@mark.mielke.cc writes: I'm not following. CVS and SVN both kept such directories in the checked out copy. Recall the CSV/*,v files? I can't speak to SVN, but that is *not* how CVS does it. There's a small CVS/ directory, but the repository (with all the ,v files) is somewhere else. In particular I can have N different checked-out working copies without duplicating the repository. Ah - my mistake. It's been too long since I used CVS. CVS keeps the metadata describing what you have, but not the 'pristine copy' that SVN keeps. I just don't understand why you care. If the CVS directories didn't bug you before, why does the single .git directory bug you now? (1) size (ok, not a showstopper) (2) potential for error Blowing away your working directory shouldn't result in loss of your entire project history Perhaps you could describe the 'blowing away your working directory shouldn't result in loss of your entire project history'? Yes, if that's the only copy you have - this is true. But, you would normally have at least one copy, and everybody else will also have a copy. Linus has joked about not needing backups, since he can recover his entire project history from places all over the Internet. As a for example, you could have a local repo that you publish from. Your work spaces could be from that local repo. Others pull from your local repo. For a small project I have, I keep the SVN / centralized model. People upload their changes with 'git push', and pick up updates with 'git pull' ('cvs update'). Whatever works best for you - but it's all available. Just because your workspace happens to have a copy of your entire project history doesn't necessarily mean that blowing away your working directory results in loss of your entire project history. Think multiple redundant copies. It's a feature - not a problem. :-) Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Managing multiple branches in git
Robert Haas wrote: On Tue, Jun 2, 2009 at 3:58 PM, Andres Freund and...@anarazel.de wrote: #Method 1 cd /../child1 git clone --reference /../master/ git://git.postgresql.org/whatever . cd /../child2 git clone --reference /../master/ git://git.postgresql.org/whatever . This way you can fetch from the git url without problem, but when a object is available locally it is not downloaded again. Yeah but now you have to push and pull commits between your numerous local working copies. Boo, hiss. Why? They are only references. They are effectively local caches. Why push to them at all? Push to the central repo. The local copy (caches) will pick up the changes eventually. If you really find .git getting larger and this is a problem (never been a problem for me), git gc can keep it to a minimum. #Method2 cd /../child3 git clone --shared /../postgresql/ child3 ... This way you only fetch from your pulled tree and never possibly from the upstream one. This is so unsafe it's not even worth talking about. See git-clone(1) It's not actually unsafe. There are just things to consider. Particularly, if history is ever removed from /../postgresql/ then the child3 can become corrupt. There is an easy solution here - don't remove history from /../postgresql/. I use the above to save space in a binary-heavy (each workspace is 150 Mbytes+ without --shared) git repo among three designers. It works fine. We've never had a problem. That said, I wouldn't recommend it be used unless you do in fact understand the problem well. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Managing multiple branches in git
Tom Lane wrote: Mark Mielke m...@mark.mielke.cc writes: As a for example, you could have a local repo that you publish from. Your work spaces could be from that local repo. Yes, exactly. How do I do that? My complaint is that git fails to provide a distinction between a repo and a workspace --- they seem to be totally tied together. Hehe... my for example is a bit ambiguous. I was talking about one common model I've seen under git where people have private and public repos. The private repo is where you do your main work. Commits are published by pushing them to your public repo and making them generally available for others to pull from. Under this model, your private repo could clone the public repo using --shared to keep the working copy at minimal size. You could have multiple private repos if this is required for your workflow. Still, it becomes a multi-step process to commit. 1) Commit to your private repo, 2) Push to your public repo, 3) If you use a centralized repo, you need another process to push or pull the change from your public repo to the centralized repo. Another poster referenced git-new-workdir. It really does look like what you are looking for: http://blog.nuclearsquid.com/writings/git-new-workdir If it lives up to its advertisement, it gives you a new working copy with a new index, but linked directly to the shared repo rather than having its own repo. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Managing multiple branches in git
Tom Lane wrote: I can't escape the feeling that we're missing something basic here. It's allegedly one of git's great strengths that it allows you to easily and quickly switch your attention among multiple development branches. Well, so it does, if you haven't got any derived files to rebuild. But rebuilding the Linux kernel is hardly a zero-cost operation, so how have Linus and co failed to notice this problem? There must be some trick they're using that I haven't heard about, or they'd not be nearly so pleased with git. If git has a real weakness - it's that it offer too many workflows, and this just results in confusion and everybody coming up with their own way to build the pyramid. :-) From reading this thread, there are things that you guys do that I am not familiar with. Not to say there isn't good reasons for what you do, but it means that I can only guess and throw suggestions at you, where you might be looking for an authoritative answer. :-) git has a git stash command that I've used to accomplish something like what you describe above. That is, I find myself in mid-work, I want to save the current working copy away and start fresh from a different context. Here is the beginning of the description for it: DESCRIPTION Use git stash when you want to record the current state of the working directory and the index, but want to go back to a clean working directory. The command saves your local modifications away and reverts the working directory to match the HEAD commit. I believe using a repository per release is a common workflow. If you access the Linux git repos, you'll find that Linus has a Linux 2.6 repo available. However, I think you are talking about using branches for far more than just the release stream you are working towards. Each of your sub-systems is in a different branch? That seems a bit insane, and your email suggesting these be different directories in the working copy seemed a lot more sane to me, but then somebody else responded that this was a bad idea, so I pull out of the is this a good idea or not? debate. :-) Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Guaranteed compression of large data fields is the responsibility of the client. The database should feel free to compress behind the scenes if it turns out to be desirable, but an expectation that it compresses is wrong in my opinion. That said, I'm wondering why compression has to be a problem or why 1 Mbyte is a reasonable compromise? I missed the original thread that lead to 8.4. It seems to me that transparent file system compression doesn't have limits like files must be less than 1 Mbyte to be compressed. They don't exhibit poor file system performance. I remember back in the 386/486 days, that I would always DriveSpace compress everything, because hard disks were so slow then that DriveSpace would actually increase performance. The toast tables already give a sort of block-addressable scheme. Compression can be on a per block or per set of blocks basis allowing for seek into the block, or if compression doesn't seem to be working for the first few blocks, the later blocks can be stored uncompressed? Or is that too complicated compared to what we have now? :-) Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Gregory Stark wrote: Mark Mielke m...@mark.mielke.cc writes: It seems to me that transparent file system compression doesn't have limits like files must be less than 1 Mbyte to be compressed. They don't exhibit poor file system performance. Well I imagine those implementations are more complex than toast is. I'm not sure what lessons we can learn from their behaviour directly. I remember back in the 386/486 days, that I would always DriveSpace compress everything, because hard disks were so slow then that DriveSpace would actually increase performance. Surely this depends on whether your machine was cpu starved or disk starved? Do you happen to recall which camp these anecdotal machines from 1980 fell in? I agree. I'm sure it was disk I/O starved - and maybe not just the disk. The motherboard might have contributed. :-) My production machine in 2008/2009 for my uses still seems I/O bound. The main database server I use is 2 x Intel Xeon 3.0 Ghz (dual-core) = 4 cores, and the uptime load average for the whole system is currently 0.10. The database and web server use their own 4 drives with RAID 10 (main system is on two other drives). Yes, I could always upgrade to a fancy/larger RAID array, SAS, 15k RPM drives, etc. but if a PostgreSQL tweak were to give me 30% more performance at a 15% CPU cost... I think that would be a great alternative option. :-) Memory may also play a part. My server at home has 4Mbytes of L2 cache and 4Gbytes of RAM running with 5-5-5-18 DDR2 at 1000Mhz. At these speeds, my realized bandwidth for RAM is 6.0+ Gbyte/s. My L1/L2 operate at 10.0+ Gbyte/s. Compression doesn't run that fast, so at least for me, the benefit of having something in L1/L2 cache vs RAM isn't great, however, my disks in the RAID10 configuraiton only read/write at ~150Mbyte/s sustained, and much less if seeking is required. Compressing the data means 30% more data may fit into RAM or 30% increase in data read from disk, as I assume many compression algorithms can beat 150 Mbyte/s. Is my configuration typical? It's probably becoming more so. Certainly more common than the 10+ disk hardware RAID configurations. The toast tables already give a sort of block-addressable scheme. Compression can be on a per block or per set of blocks basis allowing for seek into the block, The current toast architecture is that we compress the whole datum, then store the datum either inline or using the same external blocking mechanism that we use when not compressing. So this doesn't fit at all. It does seem like an interesting idea to have toast chunks which are compressed individually. So each chunk could be, say, an 8kb chunk of plaintext and stored as whatever size it ends up being after compression. That would allow us to do random access into external chunks as well as allow overlaying the cpu costs of decompression with the i/o costs. It would get a lower compression ratio than compressing the whole object together but we would have to experiment to see how big a problem that was. It would be pretty much rewriting the toast mechanism for external compressed data though. Currently the storage and the compression are handled separately. This would tie the two together in a separate code path. Hm, It occurs to me we could almost use the existing code. Just store it as a regular uncompressed external datum but allow the toaster to operate on the data column (which it's normally not allowed to) to compress it, but not store it externally. Yeah - sounds like it could be messy. or if compression doesn't seem to be working for the first few blocks, the later blocks can be stored uncompressed? Or is that too complicated compared to what we have now? :-) Actually we do that now, it was part of the same patch we're discussing. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Copyright update
Andrew Chernow wrote: Bruce Momjian wrote: Greg Stark wrote: Is that actually legal if we haven't modified the files? Or is the whole source tree considiered one work? One work, I assume. I am not a lawyer, but if its one work, why is there a notice in every source file? ISTM that if it were one work there would only have to be one notice. Would only have to be one notice is correct. You do not need a notice in every file. You put a notice in every file as extra unnecessary effort to make sure that people cannot possibly miss it. It is not a requirement for copyright that every file have a copyright comment on top. That it is in every source file is similar to putting extra parens around expressions or embedding documentation in an API. It does not indicate that the work is not a single work. It is simply making the terms more explicit and easily accessible. Most importantly, the *lack* of a copyright notice, does not indicate that there is no copyright rights defined. If 10 files have a copyright notice, and the 11th file does not, this does not indicate that the 11th file has more or less copyright restrictions than the other 10 that are explicit. The implicit copyright may be All rights reserved whereas the explicit copyright may say You may use this software for free provided that you do not hold the authors responsible for any damages caused by use of the software. Which is more restrictive? Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Copyright update
Bruce Momjian wrote: Andrew Chernow wrote: I am not a lawyer, but if its one work, why is there a notice in every source file? ISTM that if it were one work there would only have to be one notice. Because people often take source files and copy them for use in other projects. As per my previous message, although people do this, it is not safer to copy a file without an explicit copyright embedded within the file, than to copy a file without an explicit copyright embedded within the file. The explicit copyright embedded serves more of a warning for people that don't know better to guilt them into thinking twice before doing whatever they are doing, than an actual legal requirement for enforcement of copyright restrictions. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Synchronous replication, reading WAL for sending
Fujii Masao wrote: - WALSender reads from WAL buffers and/or WAL files and sends the buffers to WALReceiver. In phase one, we may assume that WALSender can only read from WAL buffers and WAL files in pg_xlog directory. Later on, this can be improved so that WALSender can temporarily restore archived files and read from that too. You mean that only walsender performs xlog streaming and copying from pg_xlog serially? I think that this would degrade the performance. And, I'm worried about the situation that the speed to generate xlog on the primary is higher than that to copy them to the standby. We might not be able to start xlog streaming forever. I've seen a few references to this. Somebody else mentioned how a single TCP/IP stream might not have the bandwidth to match changes to the database. TCP/IP streams do have a window size that adjusts with the load, and unless one gets into aggressive networking such as bittorrent which arguably reduce performance of the entire network, why shouldn't one TCP/IP stream be enough? And if one TCP/IP stream isn't enough, doesn't this point to much larger problems, that won't be solved by streaming it some other way over the network? As in, it doesn't matter what you do - your network pipe isn't big enough? Over the Internet from my house to a co-located box, I can reliably get 1.1+ Mbyte/s using a single TCP/IP connection. The network connection at the co-lo is 10Mbit/s and my Internet connection to my house is also 10Mbit/s. One TCP/IP connection seems pretty capable to stream data to the full potential of the network... Also, I assume that most database loads have peaks and lows. Especially for very larger updates, perhaps end of day processing, I see it as a guarantee that all of the stand bys will fall more behind for a period (a few seconds to a minute?), but they will catch up shortly after the peak is over. Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Simon Riggs wrote: You scare me that you see failover as sufficiently frequent that you are worried that being without one of the servers for an extra 60 seconds during a failover is a problem. And then say you're not going to add the feature after all. I really don't understand. If its important, add the feature, the whole feature that is. If not, don't. My expectation is that most failovers are serious ones, that the primary system is down and not coming back very fast. Your worries seem to come from a scenario where the primary system is still up but Postgres bounces/crashes, we can diagnose the cause of the crash, decide the crashed server is safe and then wish to recommence operations on it again as quickly as possible, where seconds count it doing so. Are failovers going to be common? Why? Hi Simon: I agree with most of your criticism to the fail over only approach - but don't agree that fail over frequency should really impact expectations for the failed system to return to service. I see soft fails (*not* serious) to potentially be common - somewhere on the network, something went down or some packet was lost, and the system took a few too many seconds to respond. My expectation is that the system can quickly detect that the node is out of service, be removed from the pool, when the situation is resolved (often automatically outside of my control) automatically catch up and be put back into the pool. Having to run some other process such as rsync seems unreliable as we already have a mechanism for streaming the data. All that is missing is streaming from an earlier point in time to catch up efficiently and reliably. I think I'm talking more about the complete solution though which is in line with what you are saying? :-) Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Good answers, Markus. Thanks. I've bought the thinking of several here that the user should have some control over what they expect (and what optimizations they are willing to accept as a good choice), but that commit should still be able to have a capped time limit. I can think of many of my own applications where I would choose one mode vs another mode, even within the same application, depending on the operation itself. The most important requirement is that transactions are durable. It becomes convenient, though, to provide additional guarantees for some operation sequences. I still see the requirement for seat reservation, bank account, or stock trading, as synchronizing using read-write locks before starting the select, rather than enforcing latest on every select. For my own bank, when I do an online transaction, operations don't always immediately appear in my list of transactions. They appear to sometimes be batched, sometimes in near real time, and sometimes as part of some sort of day end processing. For seat reservation, the time the seat layout is shown on the screen is not usually locked during a transaction. Between the time the travel agent brings up the seats on the plane, and the time they select the seat, the seat could be taken. What's important is that the reservation is durable, and that conflicts are not introduced. The commit must fail if another person has chosen the seat already already. The commit does not need to wait until the reservation is pushed out to all systems before completing. The same is true of stock trading. However, it can be very convenient for commits to be immediately visible after the commit completes. This allows for lazier models, such as a web site that reloads the view on the reservations or recent trades and expects to see recent commits no matter which server it accesses, rather than taking into account that the commit succeeded when presenting the next view. If I look at sites like Google - they take the opposite extreme. I can post a message, and it remembers that I posted the message and makes it immediately visible, however, I might not see other new messages in a thread until a minute or more later. So it looks like there is value to both ends of the spectrum, and while I feel the most value would be in providing a very fast system that scales near linear to the number of nodes in the system, even at the expense of immediately visible transactions from all servers, I can accept that sometimes the expectations are stricter and would appreciate seeing an option to let me choose based upon my requirements. Cheers, mark Markus Wanner wrote: Hi, Mark Mielke wrote: Where does the expectation come from? I find the seat reservation, bank account or stock trading examples pretty obvious WRT user expectations. Nonetheless, I've compiled some hints from the documentation and sources: Since in Read Committed mode each new command starts with a new snapshot that includes all transactions committed up to that instant [1]. This [SERIALIZABLE ISOLATION] level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. [1]. (IMO this implies, that a transaction sees changes from all preceding transactions). All changes made by the transaction become visible to others and are guaranteed to be durable if a crash occurs. [2]. (Agreed, it's not overly clear here, when exactly the changes become visible. OTOH, there's no warning, that another session doesn't immediately see committed transactions. Not sure where you got that from). I don't recall ever reading it in the documentation, and unless the session processes are contending over the integers (using some sort of synchronization primitive) in memory that represent the latest visible commit on every single select, I'm wondering how it is accomplished? See the transaction system's README [3]. It documents the process of snapshot taking and transaction isolation pretty well. Around line 226 it says: What we actually enforce is strict serialization of commits and rollbacks with snapshot-taking. (So the outcome of your experiment is no surprise at all). And a bit later: This rule is stronger than necessary for consistency, but is relatively simple to enforce, and it assists with some other issues as explained below.. While this implies, that an optimization is theoretically possible, I very much doubt it would be worth it (for a single node system). In a distributed system, things are a bit different. Network latency is an order of magnitude higher than memory latency (for IPC). So a similar optimization is very well worth it. However, the application (or the load balancer or both) need to know about this potential lag between nodes. And as you've outlined elsewhere, a limit for how much a single node may lag behind needs to be established. (As a side note
Re: [HACKERS] Sync Rep: First Thoughts on Code
Robert Haas wrote: On Sat, Dec 13, 2008 at 1:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: We won't call it anything, because we never will or can implement that. See the theory of relativity: the notion of exactly simultaneous events OK, fine. I'll be more precise. I think we need to reserve the term synchronous replication for a system where transactions that begin on the standby after the transactions has committed on the master see the effects of the committed transaction. Wouldn't this be serialized transactions? I'd like to see proof of some sort that PostgreSQL guarantees that the instant a 'commit' returns, any transactions already open with the appropriate transaction isolation level, or any new sessions *will* see the results of the commit. I know that most of the time this happens - but what process synchronization steps occur to *guarantee* that this happens? I just googled synchronous replication and read through the first page of hits. Most of them do not address the question of whether synchronous replication can be said to have be completed when WAL has been received by the standby not but yet applied. One of the ones that does is: http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplicationDesign ...which refers to what we're proposing to call Synchronous Replication as Semi-Synchronous Replication (or 2-safe replication) specifically to distinguish it. The other is: http://www.cnds.jhu.edu/pub/papers/cnds-2002-4.pdf ...which doesn't specifically examine the issue but seems to take the opposite position, namely that the server on which the transaction is executed needs to wait only for one server to apply the changes to the database (the others need only to know that they need to commit it; they don't actually need to have done it). However, that same paper refers to two-phase commit as a synchronous replication algorithm, and Wikipedia's discussion of two-phase commit: http://en.wikipedia.org/wiki/Two-phase_commit_protocol ...clearly implies that the transaction must be applied everywhere before it can be said to have committed. The second page of Google results is mostly a further discussion of the MySQL solution, which is mostly described as semi-synchronous replication. Simon Riggs said upthread that Oracle called this synchronous redo transport. That is obviously much closer to what we are doing than synchronous replication. Two phase commit doesn't imply that the transaction is guaranteed to be immediately visible. See my previous paragraph. Unless transactions are locked from starting until they are able to prove that they have the latest commit (a feat which I'm going to theorize as impossible - because the moment you wait for a commit, and you begin again, you really have no guarantee that another commit has not occurred in the mean time), I think it's clear that two phase commit guarantees that the commit has taken place, but does *not* guarantee anything about visibility. It might be a good bet - but guarantee? There is no such guarantee. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Sync Rep: First Thoughts on Code
Simon Riggs wrote: I am truly lost to understand why the *name* synchronous replication causes so much discussion, yet nobody has discussed what they would actually like the software to *do* (this being a software discussion list...). AFAICS we can make the software behave like *any* of the definitions discussed so far. I think people have talked about 'like' in the context of user expectations. That is, there seems to exist a set of people (probably those who've never worked with a multi-replica solution before) who expect that once commit completes on one server, they can query any other master or slave and be guaranteed visibility of the transaction they just committed. These people may theoretically change their decision to not use Postgres-R, or at least change their approach to how they work with Postgres-R, if the name was in some way more intuitive to them in terms of what is actually being provided. Synchronous replication itself says only details about replication, it does not say anything about visibility, so to some degree, people are focusing on the wrong term as the problem. Even if it says asynchronous replication - not sure that I care either way - this doesn't improve the understanding for the casual user of what is happening behind the scenes. Neither synchronous nor asynchronous guarantees that the change will be immediately visible from other nodes after I type 'commit;'. Asynchronous might err on the side of not immediately visible, where synchronous might (incorrectly) imply immediate visibility, but it's not an accurate guarantee to provide. Synchronous does not guarantee visibility immediately after. Some indefinite but usually short time must normally pass from when my 'commit;' completes until when the shared memory visible to my process sees the transaction. Multiple replicas with network latency or reliability issues increases the theoretical minimum size of this window to something that would be normally encountered as opposed to something that is normally not encountered. The only way to guarantee visibility is to ensure that the new transaction is guaranteed to be visible from a shared memory perspective on every machine in the pool, and every active backend process. If my 'commit;' is going to wait for this to occur, first, I think this forces every commit to have numerous network round trips to each machine in the pool, it forces each machine in the pool to be network accessible and responsive, it forces all commits to be serialized in the sense of the slowest machine in the pool determines the time for my commit to complete, and I think it implies some sort of inter-process signalling, or at the very least CPU level signalling about shared memory (in the case of multiple CPUs). People such as myself think that a visibility guarantee is unreasonable and certain to cause scalability or reliability problems. So, my 'like' is an efficient multi-master solution where if I put 10 machines in the pool, I expect my normal query/commit loads to approach 10X as fast. My like prefers scalability over guarantees that may be difficult to provide, and probably are not provided today even in a single server scenario. It is certainly far too early to say what the final exact behaviour will be and there is no reason at all to pre-suppose that it need only be a single behaviour. I'm in favour of options, generally, but I would say that the distinction between some of these options is mostly very fine and strongly doubt whether people would use them if they existed. *But* I think we can add them at a later stage of development if requirements genuinely exist once all the benefits *and* costs are understood. The above 'commit;' behaviour difference - whether it completes when the commit is permanent (it definitely will be applied for certain to all replicas - it just may take time to apply to all replicas), or when the commit has actually taken effect (two-phase commit on all replicas - and both phases have completed on all replicas - what happens if second phase commit fails on one or more servers?), or when the commit is guaranteed to be visible from all existing and new sessionss (two-phase commit plus additional signalling required?) might be such an option. I'm doubtful, though - as the difference in implementation between the first and second is pretty significant. I'm curious about your suggestion to direct queries that need the latest snapshot to the 'primary'. I might have misunderstood it - but it seems that the expectation from some is that *all* sessions see the latest snapshot, so would this not imply that all sessions would be redirect to the 'primary'? I don't think it is reasonable myself, but I might be misunderstanding something... Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http
Re: [HACKERS] Sync Rep: First Thoughts on Code
Mark Mielke wrote: Forget replication - even for the exact same server - I don't expect that if I commit from one session, I will be able to see the change immediately from my other session or a new session that I just opened. Perhaps this is often stable to rely on this, and it is useful for the database server to minimize the window during which the commit becomes visible to others, but I think it's a false expectation from the start that it absolutely will be immediately visible to another session. I'm thinking of situations where some part of the table is in cache. The only way the commit can communicate that the new transaction is available is by during communication between the processes or threads, or between the multiple CPUs on the machine. Do I want every commit to force each session to become fully in alignment before my commit completes? Does PostgreSQL make this guarantee today? I bet it doesn't if you look far enough into the guts. It might be very fast - I don't think it is infinitely fast. FYI: I haven't been able to prove this. Multiple sessions running on my dual-core CPU seem to be able to see the latest commits before they begin executing. Am I wrong about this? Does PostgreSQL provide a intentional guarantee that a commit from one session that completes immediately followed by a query from another session will always find the commit effect visible (provide the transaction isolation level doesn't get in the way)? Or is the machine and algorithms just fast enough that by the time it executes the query (up to 1 ms later) the commit is always visible in practice? Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Heikki Linnakangas wrote: Mark Mielke wrote: FYI: I haven't been able to prove this. Multiple sessions running on my dual-core CPU seem to be able to see the latest commits before they begin executing. Am I wrong about this? Does PostgreSQL provide a intentional guarantee that a commit from one session that completes immediately followed by a query from another session will always find the commit effect visible (provide the transaction isolation level doesn't get in the way)? Yes. PostgreSQL does guarantee that, and I would expect any other DBMS to do the same. Where does the expectation come from? I don't recall ever reading it in the documentation, and unless the session processes are contending over the integers (using some sort of synchronization primitive) in memory that represent the latest visible commit on every single select, I'm wondering how it is accomplished? If they are contending over these integers, doesn't that represent a scaling limitation, in the sense that on a 32-core machine, they're going to be fighting with each other to get the latest version of these shared integers into the CPU for processing? Maybe it's such a small penalty that we don't care? :-) I was never instilled with the logic that 'commit in one session guarantees visibility of the effects in another session'. But, as I say above, I wasn't able to make PostgreSQL fail in this regard. So maybe I have no clue what I am talking about? :-) If you happen to know where the code or documentation makes this promise, feel free to point it out. I'd like to review the code. If you don't know - don't worry about it, I'll find it later... Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Greg Stark wrote: When the database says the data is committed it has to mean the data is really committed. Imagine if you looked at a bank account balance after withdrawing all the money and saw a balance which didn't reflect the withdrawal and allowed you to withdraw more money again... Within the same session - sure. From different sessions? PostgeSQL MVCC let's you see an older snapshot, although it does prefer to have the latest snapshot with each command. For allowing to withdraw more money again, I would expect some sort of locking SELECT ... FOR UPDATE; to be used. This lock then forces the two transactions to become serialized and the second will either wait for the first to complete or fail. Any banking program that assumed that it could SELECT to confirm a balance and then UPDATE to withdraw the money as separate instructions would be a bad banking program. To exploit it, I would just have to start both operations at the same time - they both SELECT, they both see I have money, they both give me the money and UPDATE, and I get double the money (although my balance would show a big negative value - but I'm already gone...). Database 101. When I asked for does PostgreSQL guarantee this? I didn't mean hand waving examples or hand waving expectations. I meant a pointer into the code that has some comment that says we want to guarantee that a commit in one session will be immediately visible to other sessions, and that a later select issued in the other sessions will ALWAYS see the commit whether 1 nanosecond later or 200 seconds later Robert's expectation and yours seem like taking this guarantee for granted rather than being justified with design intent and proof thus far. :-) Given my experiment to try and force it to fail, I can see why this would be taken for granted. Is this a real promise, though? Or just a unlikely scenario that never seems to be hit? To me, the question is relevant in terms of the expectations of a multi-replica solution. We know people have the expectation. We know it can be convenient. Is the expectation valid in the first place? I've probably drawn this question out too long and should do my own research and report back... Sorry... :-) Cheers, mark -- Mark Mielke m...@mielke.cc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Markus Wanner wrote: Tom Lane wrote: We won't call it anything, because we never will or can implement that. See the theory of relativity: the notion of exactly simultaneous events at distinct locations isn't even well-defined That has never been the point of the discussion. It's rather about the question if changes from transactions are guaranteed to be visible on remote nodes immediately after commit acknowledgment. Whether or not this is guaranteed, in both cases the term synchronous replication is commonly used, which is causing confusion. Might it not be true that anybody unfamiliar would be confused and that this is a bit of a straw man? I don't think synchronous replication guarantees that it will be immediately visible. Even if it did push the change to the other machine, and the other machine had committed it, that doesn't guarantee that any reader sees it any more than if I commit to the same machine (no replication), I am guaranteed to see the change from another session. Synchronous replication only means that I can be assured that my change has been saved permanently by the time my commit completes. It doesn't mean anybody else can see my change or is guaranteed to see my change if the query from another session. If my application assumes that it can commit to one server, and then read back the commit from another server, and my application breaks as a result, it's because I didn't understand the problem. Even if PostgreSQL didn't use the word synchronous replication, I could still be confused. I need to understand the problem no matter what words are used. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] Sync Rep: First Thoughts on Code
in terminology? Does it make sense for Postgres-R (which looks excellent to me BTW, at least in principle) be marketed differently, because a few users tie synchronous replication to serialized access? Because that's really what we're talking about - we're talking about transactions in all sessions being serialized between machines to provide less surprise to users who don't understand the complexity of having multiple replicas. Forget replication - even for the exact same server - I don't expect that if I commit from one session, I will be able to see the change immediately from my other session or a new session that I just opened. Perhaps this is often stable to rely on this, and it is useful for the database server to minimize the window during which the commit becomes visible to others, but I think it's a false expectation from the start that it absolutely will be immediately visible to another session. I'm thinking of situations where some part of the table is in cache. The only way the commit can communicate that the new transaction is available is by during communication between the processes or threads, or between the multiple CPUs on the machine. Do I want every commit to force each session to become fully in alignment before my commit completes? Does PostgreSQL make this guarantee today? I bet it doesn't if you look far enough into the guts. It might be very fast - I don't think it is infinitely fast. Cheers, mark -- Mark Mielke m...@mielke.cc
Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats
Robert Haas wrote: 1) Reduced error checking. 2) The '-' is not the only character that people have used. ClearCase uses '.' and ':' as punctuation. 3) People already have the option of translating the UUID from their application to a standard format. 4) As you find below, and is probably possible to improve on, a fixed format can be parsed more efficient. Scenario 1. I have some standard format UUIDs and I want to parse them. This change doesn't bother me at all because if I'm parsing anywhere enough UUIDs for it to matter, the speed of my CPU, disk, and memory subsystems will vastly outweigh the difference between the two implementations. I measured the different between the two by running them both in a tight loop on a fixed string. I challenge anyone to produce a measurable performance distinction by issuing SQL queries. I doubt that it is possible. Put a few changes of 2%-3% impact together and you get 10% or more. I'm not saying you are wrong, but I disagree that performance should be sacrificed for everybody without providing substantial benefit to everybody. The question is then, does relaxed UUID parsing provide substantial benefit to everybody? Scenario 2. I have some non-standard format UUIDs and I want to parse them. This change helps me a lot, because I'm almost positive that calling regexp_replace() and then uuid_in() is going to be MUCH slower than just calling uuid_in(). And if I do that then my error checking will be REALLY weak, unless I write a custom PL function to make sure that dashes only occur where they're supposed to be, in which case it will be even slower. You should know the non-standard format of the UUID, and your application should be doing the error checking. It might be slower for *you*, but *you* are the one with the special needs. That is, unless you are representing a significant portion of the population. What percentage are you representing? Scenario 3. I only want standard-format UUIDs to be accepted into my database. Any non-standard format UUIDs should be rejected at parse time. This change is pretty irritating, because now I have to use regexp matching or something to make sure I've got the right format, and it's going to be significantly slower. My suspicion is that scenario 2 is a lot more common than scenario 3. I prefer strict formats and early failures. I like that PostgreSQL refuses to truncate on insertion. If I have a special format, I'm more than willing to convert it from the special format to a standard format before doing INSERT/UPDATE. What percentage of people out there feel that they benefit from pedantic syntax checking? :-) I don't know. I don't know which implementation was used for the PostgreSQL core, but any hard coded constants would allow for the optimizer to generate instructions that can run in parallel, or that are better aligned to machine words. 2-3% slow down for what gain? It still doesn't handle all cases, and it's less able to check the format for correctness. This change is a long way from letting any old thing through as a UUID. I'm sure there are lots of crazy ways to write UUIDs, but everything I found with a quick Google search would be covered by this patch, so I think that's pretty good. A key point for me is that it's hard to imagine this patch accepting anything that was intended to be something other than a UUID. (I am sure someone will now write back and tell me about their favorite non-UUID thing that happens to have 32 hex digits with dashes for separators, but come on.) It's not that long. If you get ColdFusion support(?), somebody else will want the ':', and somebody else will want the '-'. Anyways - I only somewhat disagree. I remember the original discussions, and I remember agreeing with the points to keep PostgreSQL UUID support thin and rigid. It's valuable for it to be built-in to the database. It's not necessarily valuable for PostgreSQL to support every UUID version or every format. Supporting additional formats is the direction of supporting every UUID format. Three months from now, somebody is going to propose allowing '-' or ':'. What should the answer be then? Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats
Robert Haas wrote: While we could perhaps accept only those variant formats which we specifically know someone to be using, it seems likely that people will keep moving those pesky dashes around, and we'll likely end up continuing to add more formats and arguing about which ones are widely enough used to deserve being on the list. So my vote is - as long as they don't put a dash in the middle of a group of four (aka a byte), just let it go. I somewhat disagree with supporting other formats. Reasons include: 1) Reduced error checking. 2) The '-' is not the only character that people have used. ClearCase uses '.' and ':' as punctuation. 3) People already have the option of translating the UUID from their application to a standard format. 4) As you find below, and is probably possible to improve on, a fixed format can be parsed more efficient. Somewhat to my surprise, this implementation appears to be about 2-3% slower than the one it replaces, as measured using a trivial test harness. I would have thought that eliminating a call to strlen() and an extra copy of the data would have actually picked up some speed, but it seems not. Any thoughts on the reason? In any case, I don't believe there's any possible use case where a 2-3% slowdown in uuid_to_string is actually perceptible to the user, since I had to call it 100 million times in a tight loop to measure it. I don't know which implementation was used for the PostgreSQL core, but any hard coded constants would allow for the optimizer to generate instructions that can run in parallel, or that are better aligned to machine words. 2-3% slow down for what gain? It still doesn't handle all cases, and it's less able to check the format for correctness. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Aidan Van Dyk wrote: One possibility would be to double-buffer the write... i.e. as you calculate your CRC, you're doing it on a local copy of the block, which you hand to the OS to write... If you're touching the whole block of memory to CRC it, it isn't *ridiculously* more expensive to copy the memory somewhere else as you do it... Coming in to this late - so apologies if this makes no sense - but doesn't writev() provide the required capability? Also, what is the difference between the OS not writing the block at all, and writing the block but missing the checksum? This window seems to be small enough (most of the time being faster than the system can schedule the buffer to be dumped?) that the additional risk seems theoretical rather than real. Unless there is evidence that writev() performs poorly, I'd suggest that avoiding double-buffering by using writev() would be preferred. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Tom Lane wrote: Paul Schlie [EMAIL PROTECTED] writes: - yes, if you're willing to compute true CRC's as opposed to simpler checksums, which may be worth the price if in fact many/most data check failures are truly caused by single bit errors somewhere in the chain, FWIW, not one of the corrupted-data problems I've investigated has ever looked like a single-bit error. So the theoretical basis for using a CRC here seems pretty weak. I doubt we'd even consider automatic repair attempts anyway. Single bit failures are probably the most common, but they are probably already handled by the hardware. I don't think I've ever seen a modern hard drive return a wrong bit - I get short reads first. By the time somebody notices a problem, it's probably more than a few bits that have accumulated. For example, if memory has a faulty cell in it, it will create a fault a percentage of every time it is accessed. One bit error easily turns into two, three, ... Then there is the fact that no hardware is perfect, and every single component in the computer has a chance, however small, of introducing bit errors... :-( Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] Ad-hoc table type?
Not that I'm agreeing with the direction but just as a thinking experiment: Tom Lane wrote: [EMAIL PROTECTED] writes: Being able to insert arbitrary named values, and extracting them similarly, IMHO works better and more naturally than some external aggregate system built on a column. I know it is a little outside the box thinking, what do you think? I'm failing to see the point. Allowing columns to spring into existence without any forethought seems to me to be all minuses and no pluses worth mentioning. * What if the column name is just a typo? If it's a field in a data structure from a language such as Java, it's not a typo. * What datatype should it have? (Always varchar is just lame.) SQLite uses always varchar and it doesn't seem to be a problem. For simpler numbers like 0, the text form can be more compact, and the database may be portable across different hardware architectures. * Should it have an index? If so, should it be unique? It might be cool for indexes to automatically appear as they become beneficial (and removed as they become problematic). Unique is a constraint which should be considered separate from whether it should be an index or not. I don't know if it would be useful or not. * If you keep doing this, you'll soon find yourself reading out unbelievably wide tables (lots of columns), which won't be especially easy or efficient to process on either the backend or the client side. Plus you might run into the max-columns-per-tuple limit. Introduce variable field-order for tuples? Only provide values if non-null? :-) If you've expended enough thought to be sure that the column is not just a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN command to tell the database the results of your genius. I do see the point that switching from member of an hstore column to real database column is pretty painful, but I don't see that allow columns to spring into existence solves that in any meaningful way. Is there some other way we could address such conversions? BTW, I think it is (or should be) possible to create an index on hstore-'mycol', so at least one of the reasons why you should *need* to switch to a real database column seems bogus. I find the Oracle nested table and data structure support enticing although I do not have experience with it. It seems like it might be a more mature implementation of hstore? If hstore had everything that was required in terms of performance or flexibility, we wouldn't need fixed columns at all? But yes - I tend to agree that the object persistent layer can be hidden away behind something like the Java object persistence model, automatically doing alter table or providing a configured mapping from a description file. This isn't a problem that needs to be solved at the database layer. Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL future ideas
Chris Browne wrote: [EMAIL PROTECTED] (Jonah H. Harris) writes: Having done quite a bit of internals work with SAP DB (which is an amalgamation of C, C++, and Pascal), I completely agree. The entire system, if possible, should be in a single language. Note that this actually *isn't* possible; PostgreSQL is implemented in a number of languages already: a) C, obviously b) m4 and some autoconf macrology c) GNU make d) There's some awk e) Shell script f) Flex g) Bison And I'm not sure that's all there is :-). Agree on Andrew's point of continuation of this thread is pointless - but not being very pointed, I'd point out that the various integrations with PL/Perl, PL/Python, PL/Tcl, and PL/PgSQL means that PostgreSQL is both intimate about the bindings between C and the language, and knowledge about the languages themselves. So, all one language is indeed unrealistic. But, this thread has split. The first question was whether PostgreSQL should be re-written in C++ or something else, where the answer seems to be almost a universal no. The second question is whether PostgreSQL can be extended with pluggable languages, for which I think the answer is already a yes. If some parts of PostgreSQL are not performance bottlenecks, and they are extremely complicated to write in C, and very easy to write in something else common and simple (I've never used LUA myself?), I imagine it would be acceptable to the community. Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL future ideas
Gevik Babakhani wrote: Advantage of C++ is that it reduce lot of OO code written in C in PostgreSQL, but it is so big effort to do that without small gain. It will increase number of bugs. Do not forget also that C++ compiler is not so common (so good) on different platforms. If somebody interesting in that yes but like a fork ( PostgreSQL++ :-). Reducing OO code that is written in C is one of my major interests. After some investigating myself it appears that having the codebase fully (rewritten in C++ will have an impact on the performance. So I guess such an effort will result the code being more C++ish and fully OO, being a mixture in C with some OO taste. Not sure what reduce means here. Is the following really a *worthwhile* reduction? Class* object = Class_new(...); Class_method(object, ...); Class_destroy(object); Compared to: Class *object = new Class(...); object-method(...); delete object; Yes, this can sometimes be abbreviated by using stack-based objects: Class object (...); object.method(...); Though, this limits capabilities in terms of automatic memory management in terms of passing pointers to object around, or to using a memory area that is cleaned up as a whole in bulk once it is no longer required. STL can help, but it can also hinder. I'm not convinced that a C++ PostgreSQL would be that much smaller either in terms of source lines of code, or in terms of resulting binary size. Also, it may not run faster. If the method calls are virtual, for instance, and derived classes are used, each method call becomes one more level of indirection. Better idea is to start to use C99 in PostgreSQL ;-). I have not investigated this yet. But I am very interested to know what the advantages would be to upgrade the code to C99 standards The code might look a little bit cleaner, but other than that, I don't see it running faster or being significantly smaller. Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL future ideas
Gevik Babakhani wrote: I think the better question about all of this is: What is the problem we are trying to solve? Providing solutions that are looking for problems doesn't help us. Sincerely, Perhaps the current codebase and design in C will serve us for years and years to come. In fact there is no doubt about that and switching to an OO design is no easy task. But times change and technologies evolve. So should any software solution that is hoping to continue and compete with other competitors of the same kind. Procedural programming languages like C may have been languages of choice for many years but they gradually loose developer audience just because of the reason above. I am afraid PG is no exception here. A major problem I have with this suggestion is that PostgreSQL would indeed be equivalent or better re-written in another language. All PostgreSQL benchmarking and design decisions have been based upon measuring the performance of PostgreSQL written in C for common platforms. At it's most basic, if you were to do a strict 1:1 translation of PostgreSQL from C to Java, I feel confident in guaranteeing that you will see a 10 times or more drop in performance. Why? Because what is fast in Java is not the same as what is fast in C. The design decisions would all need to be revisited, and the effect would be exactly as already suggested - an immature design, competing against other mature designs. From C to C++ is only a smaller leap in the sense that pieces of PostgreSQL could be migrated at a time. The result is still that an OO-modelled PostgreSQL would be significantly different from a procedure-modelled PostgreSQL, and you would always be facing the compromise of: 1) Should re-write this part to be OO? or 2) Should we leave it alone for now (and then, why bother using C++?). Somebody working on a thesis or with thousands of hours of spare time and no inclination to work on any other part, might prove that many of the PostgreSQL technologies port well to another language - but it is far more likely that the result will be a bust. I'd rather core developer effort was spent doing what they are doing today. Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] Base64 decode/encode performance
Marko Kreen wrote: (Note: the b64encode there reads 3 chars at a time, b64decode int32 at a time.) There are 2 killer problems: - decode does not seem to handle architectures that segfault on unaligned int32 accesses. Out of curiosity - does this problem exist on any platform for which PostgreSQL is currently ported and supported? Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Proposal - UUID data type
First - please stop copying this list - this is not the convince Jerry to include UUID in MySQL mailing list. Really - I don't care what he thinks. But, on the subjects themselves and how they apply to *PostgreSQL*: Non-standard features just force people to stick with that one product. In the long run, the only people who benefit are the product developers. I chose PostgreSQL over MySQL because it provided numerous features - both standard and non - that I needed on the day I made my decision. I don't care about the long run as a user. One might as well say 90% of the world is wrong for using Microsoft products, because it locks one into Microsoft. One can say this - and people do say this - but none of this changes the fact that 90% of the world is relatively happy with their choice. They voted with their dollars. All decisions should be made on a cost-benefit analysis - they should not be based on some arbitrary code like I will not choose a solution that locks me in. Additionally - in the context of MySQL - the main reason I chose PostgreSQL over MySQL is because it provided things like CREATE VIEW, which MySQL did not at the time. People such as Jerry can pretend that standards guarantee that a feature is in all products, but it seems quite clear that just because something is a standard does NOT mean it is implemented the same everywhere, or even at all. At the time I chose PostgreSQL it was my opinion that PostgreSQL was far more standards-compliant than MySQL was going to be for at least a few years. I am glad I came to the correct conclusion. MySQL implemented ACID as an after-thought. I mean - comone. This is incorrect. UUID at 16 bytes is already long in terms of being used as a primary index. In an 8K page, one can only fit 512 UUIDs (forgetting the requirement for headers) - if it was stored as 32 bytes - or 36 bytes, or 40 bytes (with punctuation), it would be at less than 256 UUIDs per page. For a join table joining one set of UUID to another set, that's 256 vs 128. Doubling the size of an index row roughly doubles the time to look up the value. Incorrect. Doubling the size of the index has very little effect on how long it takes to look up a value. Intelligent databases use a binary search so doubling the size only means one additional comparison need be done. And heavily used indexes are generally cached in memory anyway. Wrong. A binary search that must read double the number of pages, and compare double the number of bytes, will take double the amount of time. There are factors that will reduce this, such as if you assume that most of the pages are in memory or cache memory, therefore the time to read the page is zero, therefore it's only the time to compare bytes - but at this point, the majority of the time is spent comparing bytes, and it's still wrong. If we add in accounting for the fact that UUID is compared using a possibly inlined memcpy() compared to treating it as a string where it is variable sized, and much harder to inline (double the number of oeprations), and it's pretty clear that the person who would make such a statement as above is wrong. As another poster wrote - why not double the size of all other data structures too. It costs nothing, right? Why does MySQL have a 3-byte integer support if they truly believe that saving 1 byte in 4 doesn't result in a savings for keys? Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] Fwd: Proposal - UUID data type
Kless wrote: Yes, they must be managed by the language. Which is why it should be part of the standard. That way, changing databases does not require changing code. You are correct that putting widely used features into a standard that is implemented by everyone is good. This does not extend to the conclusion that one should never put in a feature until it is standard. Look at any successful software product and see how it usually leads the standard rather than follows it. People only tend to make standards once they realize things are getting out of control, which is long after the products are in use. In PostgreSQL they're stored as 16 binary bytes [2], and the core database does not include any function for generating UUIDs Yep, which in the grand scheme of things, probably makes zero difference. The difference between 16 and 32 bytes in any single row is minuscule. This is incorrect. UUID at 16 bytes is already long in terms of being used as a primary index. In an 8K page, one can only fit 512 UUIDs (forgetting the requirement for headers) - if it was stored as 32 bytes - or 36 bytes, or 40 bytes (with punctuation), it would be at less than 256 UUIDs per page. For a join table joining one set of UUID to another set, that's 256 vs 128. Doubling the size of an index row roughly doubles the time to look up the value. I am not in favor of adding more database-specific types to ANY database - and I think PostGres doing it was a mistake. As somebody who wrote his own module to do UUID for PostgreSQL when I needed it in PostgreSQL 8.0, I don't agree. Just as you think defining it in a standard is better than each vendor doing it their own way, I think doing it in one product is better than each user of the product doing it their own way. If there is a demand for it, then it should be added to the SQL standard. That is the correct way to propose a change. That's why there are standards. Provide a real example of any similar product doing this. Exactly which enhancement to a standard was defined without even a prototype existing used in an existing product that purports to implement the standard? I'm sure one or two examples must exist, but I cannot think of any. Every enhancement I can think of that eventually made it into a standard, was first implemented within a popular product, and then demanded as a standard to be applied to all other products. Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] Fwd: Proposal - UUID data type
Kless wrote: I write here the answer of Jerry Stuckle [1] because it looks me interesting and enough logical. Jerry's answer isn't a real answer - and we don't care what MySQL does or does not do. PostgreSQL developers are not going to invest time into helping you get a feature into MySQL - if this is what you are trying to do, please stop. MySQL didn't implement SQL-standards views until what - MySQL 4 or 5? Obviously standards is not their goal either. In Open Source / Free Software, the free contributions are from people with itches that they scratched. In a company like MySQL, it is more about business value or somewhere in between. I was a MySQL 3.x/4.x user until I learned PostgreSQL, and I have no intention of going back. They have so many incorrect assumptions built into their system, that I chose to switch databases instead of arguing with them. It's not worth my time, and I don't intend to go back. So, I will not be helping you get UUID into MySQL because I just don't care about MySQL... Cheers, mark [1] http://groups.google.com/group/comp.databases.mysql/browse_thread/thread/89557609239a995e --- Quite frankly, I don't care that PostGres has user-defined types. They restrict you to a single database, when others might be better for other reasons. And yes, I think other things should have been proposed to the SQL standards committee. It doesn't take that long to get a good proposal into the standards. No, it isn't immediate. But if there is a case to be made for it, then the committee will act. Then all databases get the feature, eventually. As I said. Do it the right way. Submit your proposal. If you have a case, it will be added to the SQL standard. If not, then it's not that important. --- -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Proposal - UUID data type
Gregory Stark wrote: Mark Mielke [EMAIL PROTECTED] writes: I'm sure one or two examples must exist, but I cannot think of any. Every enhancement I can think of that eventually made it into a standard, was first implemented within a popular product, and then demanded as a standard to be applied to all other products. C99? SMTP? NTP? It tends to be important for network protocols since there's no gain in having non-interoperable protocols. For C99 - GCC had most of the C99 features years before C99 started. There are now some incompatibles that need to be dealt with. For SMTP and NTP I think these protocols are just so old that people don't realize how much they have evolved, and how many products existed. I wasn't in the know at the time they were written (I was either a baby or in grade school), but I bet either: 1) they were written before it existed at all (not really an enhancment), or 2) they followed the prototype as it was implemented. There have been many extensions to SMTP that I have been aware of included support for SSL, that I doubt were in the standard first. The RFC is a request for comment. The STD process came a lot later. If we grab a phrase from RFC 1305 for NTP - In Version 3 a new algorithm to combine the offsets of a number of peer time servers is presented in Appendix F. This algorithm is modelled on those used by national standards laboratories to combine the weighted offsets from a number of standard clocks to construct a synthetic laboratory timescale more accurate than that of any clock separately. This seems pretty clear that the standard was updated based upon existing implementation. To some degree, except for the simplest of designs, it is almost bad to write down what WILL be done, without having experience, or a prototype to based ones conclusions from. Ivory tower stuff. The purpose of a standard is to have one common way that things are done - hopefully the best way - not just the only way that was considered. :-) Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] UUID - Data type inefficient
Kless wrote: The new data type, UUID, is stored as a string -char(16)-: struct pg_uuid_t { unsigned char data[UUID_LEN]; }; #define UUID_LEN 16 but this it's very inefficient as you can read here [1]. The ideal would be use bit(128), but today isn't possible. One possible solution would be create a structure with 2 fields, each one with bit(64). [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ That's a general page about UUID vs serial integers. What is the complaint? Do you have evidence that it would be noticeably faster as two 64-bits? Note that a UUID is broken into several non-64 bit elements, and managing it as bytes or 64-bit integers, or as a union with the bit-lengths specified, are probably all efficient or inefficient depending on the operation being performed. The hope should be that the optimizer will generate similar best code for each. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
Mark Mielke wrote: Kless wrote: The new data type, UUID, is stored as a string -char(16)-: struct pg_uuid_t { unsigned char data[UUID_LEN]; }; #define UUID_LEN 16 What is the complaint? Do you have evidence that it would be noticeably faster as two 64-bits? Note that a UUID is broken into several non-64 bit elements, and managing it as bytes or 64-bit integers, or as a union with the bit-lengths specified, are probably all efficient or inefficient depending on the operation being performed. The hope should be that the optimizer will generate similar best code for each. I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16? I will investigate if I have time tonight. There MUST be some mistake or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
Mark Mielke wrote: I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16? I will investigate if I have time tonight. There MUST be some mistake or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. G Kless you've confused me. 32-bit numbers = 4 bytes, 64-bit numbers = 8 bytes, 128-bit numbers = 16 bytes. You are out to lunch and you dragged me with you. Did we have beer at least? :-) Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A Windows x64 port of PostgreSQL
would have to use a union of intptr_t and intmax_t. Or, PostgreSQL will choose to not support some platforms. Windows 64 seems as if it may continue to be as popular as Windows 32, and should probably be supported. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers