Re: [HACKERS] contrib/pg_buffercache
Andrew Dunstan wrote: It fixes the build error on Windows - haven't tried because i don't have time, but I know it won't work on Cygwin, because WIN32 isn't (usually) defined on Cygwin - see previous almost endless discussions. Yes - I recall that discussion a while ago. This patch should sort the issue. One question, should I be using defined(__MINGW32__) as opposed to defined(WIN32)? I figured I didn't as in this case it is not necessary to distinguish between native and cygwin. regards Mark *** pg_buffercache_pages.h.orig Thu Mar 17 10:12:20 2005 --- pg_buffercache_pages.h Thu Mar 17 13:44:45 2005 *** *** 15,18 --- 15,24 extern Datum pg_buffercache_pages(PG_FUNCTION_ARGS); + /* A little hackery for Windows and Cygwin */ + #if defined (WIN32) || defined (__CYGWIN__) + extern DLLIMPORT BufferDesc *BufferDescriptors; + extern DLLIMPORT volatile uint32 InterruptHoldoffCount; + #endif + #endif /* PG_BUFFERCACHE_PAGES_H */ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] contrib/pg_buffercache
Andrew Dunstan wrote: Mark Kirkwood wrote: Andrew Dunstan wrote: It fixes the build error on Windows - haven't tried because i don't have time, but I know it won't work on Cygwin, because WIN32 isn't (usually) defined on Cygwin - see previous almost endless discussions. Yes - I recall that discussion a while ago. This patch should sort the issue. One question, should I be using defined(__MINGW32__) as opposed to defined(WIN32)? I figured I didn't as in this case it is not necessary to distinguish between native and cygwin. You figured correctly. Oh, and thanks to your efforts with the automated build system this gets picked up now instead of lurking till after release - great work! cheers Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] contrib/pg_buffercache
Andrew Dunstan wrote: ... is apparently broken for Windows and Cygwin. See for example http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lorisdt=2005-03-16%2001:55:33 cheers hmmm - never tried to compile it on win32! I am getting http 502 from the above url, so I will have a go at building on win32 tomorrow. cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] We are not following the spec for HAVING without GROUP
Tom Lane wrote: Would those of you with access to other DBMSes try this: create table tab (col integer); select 1 from tab having 1=0; select 1 from tab having 1=1; insert into tab values(1); insert into tab values(2); select 1 from tab having 1=0; select 1 from tab having 1=1; I claim that a SQL-conformant database will return 0, 1, 0, and 1 rows from the 4 selects --- that is, the contents of tab make no difference at all. (MySQL returns 0, 0, 0, and 2 rows, so they are definitely copying our mistake...) Firebird 1.5.1 FreeBSD 5.3 Database: test SQL drop table tab; SQL create table tab (col integer); SQL select 1 from tab having 1=0; SQL select 1 from tab having 1=1; 1 SQL insert into tab values(1); SQL insert into tab values(2); SQL select 1 from tab having 1=0; SQL select 1 from tab having 1=1; 1 SQL ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Data loss, vacuum, transaction wrap-around
Tom Lane wrote: The question is whether we are willing to back-patch a fairly large amount of not-very-well-tested code into 8.0. See http://archives.postgresql.org/pgsql-patches/2005-02/msg00123.php http://archives.postgresql.org/pgsql-committers/2005-02/msg00127.php http://archives.postgresql.org/pgsql-committers/2005-02/msg00131.php I personally don't think it's worth the risk. The code works well enough to commit to development tip, but it's fundamentally alpha quality code. I think this makes the most sense. If we are going to do an extended testing period for 8.0.without-arc then bundling it in there might worth considering. regards Mark ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
Bruce Momjian wrote: Jim C. Nasby wrote: On Mon, Feb 14, 2005 at 09:55:38AM -0800, Ron Mayer wrote: I still suspect that the correct way to do it would not be to use the single correlation, but 2 stats - one for estimating how sequential/random accesses would be; and one for estimating the number of pages that would be hit. I think the existing correlation does well for the first estimate; but for many data sets, poorly for the second type. Should this be made a TODO? Is there some way we can estimate how much this would help without actually building it? I guess I am confused how we would actually do that or if it is possible. I spent a while on the web looking for some known way to calculate local correlation or clumping in some manner analogous to how we do correlation currently. As yet I have only seen really specialized examples that were tangentially relevant. We need a pet statistician to ask. regards Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Data loss, vacuum, transaction wrap-around
Andrew Dunstan wrote: There is no news in the problem you're complaining of. It's completely known and documented. You've stated before that you've been using PostgreSQL for years - why is this suddenly so urgent that we have to drop everything and backpatch old releases? Please move along, there's nothing to see here, these are not the bugs you've been looking for. To be fair to Mark, there does seem to be an increasing number of reports of this issue. In spite of the in-the-works fix for 8.1, it would be a pity to see customers losing data from xid wrap-around. However the quandary is this : even if we did back patches for every version, said customers probably wouldn't know they needed to apply them - hmmm, not much help there. We might be better off merely announcing the need to use vacuumdb on www.postgresql.org! regards Mark (the other one) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] enforcing a plan (in brief)
Although this is all true, consider that adding hints will mean that the Pg developers *never* get bug reports to drive the optimizer improvement process. This will have the effect of stagnating its development. I think this would be a bad thing :-) As an aside note that DB2 UDB does not let you hint its optimizer either...I have heard it argued (by a IBM acquaintance of mine) that their optimizer is better than that other database's whose name begins with O, precisely because of this (He is biased of coarse, but it is an interesting point). regards Mark [EMAIL PROTECTED] wrote: One consistent problem is the planner not being able to handle this or that scenario. At this stage, the *best* way to improve the planner is to add the ability to place hints in the plan. It *is* good enough for 90% of the types of queries you would ever want to do. I am dubious that you can get it demonstrably better in the last 10% or so without making it worse. Simple hints would go a HUGE way to improving the last 10%. Many of the Why doesn't PostgreSQL use my index questions would go away. Most of the time Tom spends looking at people's pg_stats info would drop. It would actually save time. As a PostgreSQL user, I can tell you with 100% confidence, if I had this tool, I could do my job easier. I can also tell you that while I have genuine appreciation for the current quality of the planner, I still would like to be able to tailor queries specifically to test various approaches for performance reasons. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] correlation in pg_stats
[EMAIL PROTECTED] wrote: actually [EMAIL PROTECTED], is Mark Woodward. Pleased to meet you. :) (I hate using my name on lists like this because of spammers) Not to be confused with me :-) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
Maybe I am missing something - ISTM that you can increase your statistics target for those larger tables to obtain a larger (i.e. better) sample. regards Mark [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] writes: Any and all random sampling assumes a degree of uniform distribution. This is the basis of the model. It assumes that chunks of the whole will be representative of the whole (to some degree). This works when normal variations are more or less distributed uniformly. As variations and trends becomes less uniformly distributed, more samples are required to characterize it. Douglas Adams had a great device called the Total Perspective Vortex which infered the whole of the universe from a piece of fairy cake. It was a subtle play on the absurd notion that a very small sample could lead to an understanding of an infinitly larger whole. On a very basic level, why bother sampling the whole table at all? Why not check one block and infer all information from that? Because we know that isn't enough data. In a table of 4.6 million rows, can you say with any mathmatical certainty that a sample of 100 points can be, in any way, representative? Another problem with random sampling is trend analysis. Often times there are minor trends in data. Ron pointed out the lastname firstname trend. Although there seems to be no correlation between firstnames in the table, there are clearly groups or clusters of ordered data that is an ordering that is missed by too small a sample. I understand why you chose the Vitter algorithm, because it provides a basically sound methodology for sampling without knowledge of the size of the whole, but I think we can do better. I would suggest using the current algorithm the first time through, then adjust the number of samples [n] based on the previous estimate of the size of the table [N]. Each successive ANALYZE will become more accurate. The Vitter algorithm is still useful as [N] will always be an estimate. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
[EMAIL PROTECTED] wrote: In this case, the behavior observed could be changed by altering the sample size for a table. I submit that an arbitrary fixed sample size is not a good base for the analyzer, but that the sample size should be based on the size of the table or some calculation of its deviation. I can see your point, however I wonder if the issue is that the default stats settings of '10' (3000 rows, 10 histogram buckets) is too low, and maybe we should consider making a higher value (say '100') the default. There is no reason why old stats can't be used to create more accurate stats. Using succesive analyze operations, we could create better statistics for the planner. We can increase the sample size based on the table size. We could, I suppose, also calculate some sort of deviation statistic so that n_distinct can be calculated better with a smaller sample set. The idea of either automatically increasing sample size for large tables, or doing a few more samplings with different sizes and examining the stability of the estimates is rather nice, provided we can keep the runtime for ANALYZE to reasonable limits, I guess :-) The basic problem, though, is that PostgreSQL performed incorrectly on a simple query after indexes were created and analyze performed. Yes, it can be corrected, that's what led me to my conclusions, but shouldn't we try to devise a better system in the future to improve PostgreSQL so it does not need this sort of tuning? Thanks for clarifying. bets wishes Mark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Patch Count?
Marc G. Fournier wrote: please let me know the URL for the message, so that I can see what it was overlooked, and see if I can't improve the 'search' ... Marc - here is one I submitted that was not picked up :-) http://archives.postgresql.org/pgsql-patches/2005-01/msg00145.php regards Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Much Ado About COUNT(*)
Jim C. Nasby wrote: Does anyone have working code they could contribute? It would be best to give at least an example in the docs. Even better would be something in pgfoundry that helps build a summary table and the rules/triggers you need to maintain it. http://developer.postgresql.org/docs/postgres/plpgsql-trigger.html#PLPGSQL-TRIGGER-SUMMARY-EXAMPLE regards Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ARC patent
Simon Riggs wrote: On Thu, 2005-01-20 at 23:17 +1100, Neil Conway wrote: (snippage) For 8.0.x, I wonder if it would be better to just replace ARC with LRU. Sequential scans will still flood the cache, but I don't view that as an enormous problem. Agree with everything apart from the idea that seq scan flooding isn't an issue. I definitely think it is. Is it feasible to consider LRU + a free-behind or seqscan hint type of replacement policy? regards Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Slow PL/pgSQL 8.0.RC5 (7.4.6. 3times faster)
Michael Fuhr wrote: On Thu, Jan 13, 2005 at 05:05:00PM -0500, Tom Lane wrote: SELECT delitel(100, 1); Mean times over the last five of six runs on my poor 500MHz FreeBSD 4.11-PRERELEASE box: 6741 ms 7.4.6 (from FreeBSD ports collection) 14427 ms 8.0.0rc5 (from CVS source) Looks like something introduced between RC1 and RC5: 7455 ms 8.0.0RC5 3978 ms 8.0.0RC1 3951 ms 7.4.6 Box is FreeBSD 5.3, all 3 Pg versions built using ./configure --prefix=prefix --with-openssl ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] FATAL: catalog is missing 1 attribute(s) for relid
Marc G. Fournier wrote: What the client did was a 'delete from pg_attribute where ... ' ... The database is a 7.4.2 one ... my first thought was one of the older standbys ... rebuild the schema and move the data files into place over top of that ... but of course, 7.3 and beyond are OID based vs name based files, so that doesn't work, unless there is some way of figuring out which file in the old directory corresponds to while oid-file, and without beign able to get into the database to read the system files, thats a wee bit difficult ... This is probably worth a shot, as I think the catalog oid's are always the same (can't find the right place in the code to check), but oid 16396 is pg_am for all the systems here: # select relname,oid,relfilenode from pg_class where oid like '16396'; relname | oid | relfilenode -+---+- pg_am | 16396 | 16396 However, I think it is pg_attribute that you want to rescue - as the system cannot lookup the attributes for pg_am due to the pg_attribute deletion: # select relname,oid from pg_class where relname like 'pg_attribute'; relname| oid --+-- pg_attribute | 1249 You could probably copy 1249 from one of your standbys to your broken system's PGDATA...(better backup the broken system first, or try the rescue on another box). good luck Mark ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] FATAL: catalog is missing 1 attribute(s) for relid
Marc, In case you don't work out a better way to sort this, I can reproduce and fix the error 'catalog is missing n attribute(s) for relid 16396' caused by directly deleting (part of) pg_attribute: Setup : $ initdb $ pg_ctl start $ createdb test Backup : $ pg_ctl stop $ tar -czvf pgdata.tar.gz pgdata Break : $ pg_ctl start $ psql -d test -c delete from pg_attribute where attrelid=16396 $ psql test [gets FATAL catalog is missing 20 attribute(s) for relid 16396] Fix (restore pg_attribute to database test): $ pg_ctl stop $ tar -zxvf pgdata.tar.gz pgdata/base/17142/1249 $ pg_ctl start $ psql test [now works] The caveat is that any relations created or modified between the backup and breakage will not be properly restored. regards Mark ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Bgwriter behavior
Bruce Momjian wrote: well, I usually get results that differ by that much from run to run. Probably you ran in to more checkpoints on the second test. Also, did you reinitialize the bench database with pgbench -i ? I destroyed the database and recreated it. The only way I managed to control the variability in Pgbench was to *reboot the machine* and recreate the database for each test. In addition it seems that using a larger scale factor (e.g 200) helped as well. Having said that, on FreeBSD 5.3 with hw.ata.wc=0 (i.e no write cache) my results for s=200, t=1 and c=4 were 49 (+/- 0.5) tps for both 7.4.6 and 8.0.0RC1 - no measurable difference. If I reduced the number of transactions to t=1000, then 7.4.6 jumped ahead by about 10 tps. Bruce - are you able to try s=200? It would be interesting to see what your setup does. regards Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] bgwriter changes
Mark Kirkwood wrote: It occurs to me that cranking up the number of transactions (say 1000-10) and seeing if said regression persists would be interesting. This would give the smoothing effect of the bgwriter (plus the ARC) a better chance to shine. I ran a few of these over the weekend - since it rained here :-) , and the results are quite interesting: [2xPIII, 2G, 2xATA RAID 0, FreeBSD 5.3 with the same non default Pg parameters as before] clients = 4 transactions = 10 (/client), each test run twice Version tps 7.4.6 49 8.0.0.0RC1 50 8.0.0.0RC1 + rem49 8.0.0.0RC1 + bg250 Needless to way, all well within measurement error of each other (the variability was about 1). I suspect that my previous tests had too few transactions to trigger many (or any) checkpoints. With them now occurring in the test, they look to be the most significant factor (contrast with 70-80 tps for 4 clients with 1000 transactions). Also with a small number of transactions, the fsyn'ed blocks may have all fitted in the ATA disk caches (2x2M). In hindsight I should have disabled this! (might run the smaller no. transactions again with hw.ata.wc=0 and see if this is enlightening) regards Mark ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] bgwriter changes
Simon Riggs wrote: 100pct.patch (SR) Test results to date: 1. Mark Kirkwood ([HACKERS] [Testperf-general] BufferSync and bgwriter) pgbench 1xCPU 1xDisk shared_buffers=1 showed 8.0RC1 had regressed compared with 7.4.6, but patch improved performance significantly against 8.0RC1 It occurs to me that cranking up the number of transactions (say 1000-10) and seeing if said regression persists would be interesting. This would give the smoothing effect of the bgwriter (plus the ARC) a better chance to shine. regards Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [Testperf-general] BufferSync and bgwriter
Simon, I am seeing a reasonably reproducible performance boost after applying your patch (I'm not sure if that was one of the main objectives, but it certainly is nice). I *was* seeing a noticeable decrease between 7.4.6 and 8.0.0RC1 running pgbench. However, after applying your patch, 8.0 is pretty much back to being the same. Now I know pgbench is ..err... not always the most reliable for this sort of thing, so I am interested if this seems like a reasonable sort of thing to be noticing (and also if anyone else has noticed the decrement)? (The attached brief results are for Linux x86, but I can see a similar performance decrement 7.4.6-8.0.0RC1 on FreeBSD 5.3 x86) regards Mark Simon Riggs wrote: Hmm...must confess that my only plan is: i) discover dynamic behaviour of bgwriter ii) fix any bugs or wierdness as quickly as possible iii) try to find a way to set the bgwriter defaults System -- P4 2.8Ghz 1G 1xSeagate Barracuda 40G Linux 2.6.9 glibc 2.3.3 gcc 3.4.2 Postgresql 7.4.6 | 8.0.0RC1 Test Pgbench with scale factor = 200 Pg 7.4.6 clients transactionstps 1 100065.1 2 100072.5 4 100069.2 8 100048.3 Pg 8.0.0RC1 --- clients transactionstps tps (new buff patch + settings) 1 100055.870.9 2 100068.377.9 4 100038.462.8 8 100029.438.1 (averages over 3 runs, database dropped and recreated after each set, with a checkpoint performed after each individual run) Parameters -- Non default postgresql.conf parameters: tcpip_socket = true [listen_addresses = *] max_connections = 100 shared_buffers = 1 wal_buffers = 1024 checkpoint_segments = 10 effective_cache_size = 4 random_page_cost = 0.8 bgwriter settings (used with patch only) bgwriter_delay = 200 bgwriter_percent = 2 bgwriter_maxpages = 100 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
I think a summary of where the discussion went might be helpful (especially for me after a week or so away doing perl). There were a number of approaches suggested, which I will attempt to summarize in a hand wavy fashion - (apologies for any misrepresentation caused): i) Rewrite max/min querys using order by in presence of a suitable index. ii) Provide alternate (i.e rewritten) querys for consideration along with the original, letting the planner use its costing methods to choose as usual. iii) Provide alternate plans based on presence of certain aggregate types in the query, letting the planner use its costing methods to choose as usual. iv) Create short-cut evaluations for certain aggregates that don't actually need to see all the (to-be aggregated) data. v) Create a mechanism for defining per-aggregate optimization operators. Note that some of these ideas may overlap one another to some extent. Some critiques of the various approaches are: i) Too simple, rewrite may not be better than original, only simple queries can be handled this way. Probably reasonably easy to implement. ii) Simple queries will be well handled, but very complex transformations needed to handle even slightly more complex ones. Probably medium - difficult to implement. iii) Rules for creating alternate plans will mimic the issues with ii). Probably medium - difficult to implement. iv) May need different short cuts for each aggregate - datatype combination. Implies conventional and operators, or the existence of similar use definable ones (or a way of finding suitable ones). Guessing medium to implement. v) Is kind of a generalization of iv). The key areas of difficulty are the specification of said optimization operators and the definition of an API for constructing/calling them. Guessing difficult to implement. I am leaning towards ii) or iv) as the most promising approaches - what do people think? regards Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Plperl Safe version check fails for Safe 2.09
It seems that the check in src/pl/plperl/plperl.c eval_pv((safe_version 2.09 ? safe_bad : safe_ok), FALSE); is not working quite as expected (CVS HEAD from today): I have Safe.pm at version 2.09, yet any plperl function I run fails with : ERROR: error from function: trusted perl functions disabled - please upgrade perl Safe module to at least 2.09 at (eval 4) line 1. Just to be sure I amended the test code to : elog(INFO, Safe version = %f, safe_version); eval_pv((safe_version 2.09 ? safe_bad : safe_ok), FALSE); and I see : INFO: Safe version = 2.09 (Followed by the error) I confess some puzzlement - as the code *looks* like it should work! The platform is Linux 2.4.22 glibc 2.3.2, perl 5.8.0 (Patched Redhat 9) regards Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Plperl Safe version check fails for Safe 2.09
A bit more thinking led me to try: float safe_version; ... eval_pv((safe_version (float)2.09 ? safe_bad : safe_ok), FALSE); which seems to fix the issue. (after all float *should* be accurate enough in this case) cheers Mark P.s : trivial patch attached Andrew Dunstan wrote: Could be a rounding issue. What happens if you try this instead:? eval_pv((safe_version = 2.08 ? safe_bad : safe_ok), FALSE); Alternatively, what happens if we make safe_version a double rather than a float? (If nothing else works we might have to fall back on a lexical comparison) cheers andrew --- plperl.c.orig 2004-11-24 17:04:07.0 +1300 +++ plperl.c2004-11-24 17:04:21.0 +1300 @@ -244,7 +244,7 @@ safe_version = SvNV(res); - eval_pv((safe_version 2.09 ? safe_bad : safe_ok), FALSE); + eval_pv((safe_version (float)2.09 ? safe_bad : safe_ok), FALSE); plperl_safe_init_done = true; } ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
I am looking at implementing this TODO item. e.g. (max case): rewrite SELECT max(foo) FROM bar as SELECT foo FROM bar ORDER BY foo DESC LIMIT 1 if there is an index on bar(foo) Suggestions about the most suitable point in the parser/planner stage to perform this sort of rewrite would be most welcome! (as this would be my first non trivial getting of hands dirty in the code). My initial thoughts revolved around extending the existing RULE system to be able to handle more general types of rewrite - like conditionals in SELECT rules and rewrites that change elements of the query other than the target relation. Planning for future note: I would like whatever mechanism that is added for this MAX/MIN stuff to be amenable to more subtle things like aggregate navigation (see R.Kimball's article http://www.dbmsmag.com/9608d54.html). regards Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Tom Lane wrote: A more radical way of handling it would be to detect the relevance of an indexscan in indxpath.c and generate a special kind of Path node; this would not generalize to other sorts of things as you were hoping, but I'm unconvinced that the mechanism is going to be very general-purpose anyway. The major advantage is that this would work conveniently for comparing the cost of a rewritten query to a non-rewritten one. I like this point - it makes sense to check that the rewritten query is less costly to execute than the original! How are you planning to represent the association between MIN/MAX and particular index orderings in the system catalogs? That is the next item to think on, we could have a rewrite catalog that holds possible transformations for certain functions (certain aggregates at this stage I guess). This is a bit like Alvaro's idea - however it may be better to represent it the way he suggested! regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Your example and ones like : SELECT max(foo), count(foo) FROM bar SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b have made me realize that the scope of what should be optimized is somewhat subtle. I am inclined to keep it simple (i.e rather limited) for a first cut, and if that works well, then look at extending to more complex rewrites. What do you think? Jim C. Nasby wrote: On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote: I am looking at implementing this TODO item. e.g. (max case): rewrite SELECT max(foo) FROM bar as SELECT foo FROM bar ORDER BY foo DESC LIMIT 1 if there is an index on bar(foo) Out of curiosity, will you be doing this in such a way that SELECT min(foo), max(foo) FROM bar will end up as SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC LIMIT 1) ? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
There seems to be (as Tom indicated) a choice of approaches: i) rewrite max/min querys and then plan 'em ii) provide alternate plans based on presence of certain aggregate types in the query when I first examined this TODO item, I was really thinking about i), but I suspect that ii) is probably the best approach. regards Mark Bruno Wolff III wrote: On Thu, Nov 11, 2004 at 17:57:42 +1300, Mark Kirkwood [EMAIL PROTECTED] wrote: Your example and ones like : SELECT max(foo), count(foo) FROM bar SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b have made me realize that the scope of what should be optimized is somewhat subtle. I am inclined to keep it simple (i.e rather limited) for a first cut, and if that works well, then look at extending to more complex rewrites. What do you think? I don't think you should be rewriting queries as much as providing alternate plans and letting the rest of the optimizer decided which plan to use. If you just rewrite a query you might lock yourself into using a poor plan. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Probably for a small table, where the machinery of reading the index, followed by checking the table for non-visible tuples is more costly than just scanning the table! regards Mark John Hansen wrote: Why not just change the function all together to 'select $1 from $2 order by $1 desc limit 1;' Is there ANY situation where max(col) as it is, would be faster? ... John ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)
I like it - nice and simple, but targets a large (and likely) foot gun situation. regards Mark Simon Riggs wrote: If a further pg_ctl mode, recover, were implemented, this would allow a fail safe mode for recovery. e.g.pg_ctl -D datadir recover pg_ctl could then check for the existence of a recovery.conf file and return an error if none were found. This mode would not then need to be passed through to the postmaster as an option, which could accidentally be re-invoked later should a crash recovery occur (the main reason to avoid adding recovery.conf options to postgresql.conf in the first place). This mode would do nothing more than: - check for recovery.conf, if not found, return error - invoke a start normally, as if mode=start had been requested The doc for invoking recovery could then be changed to include this new mode, and the potential for error would be removed. This is a change requested for 8.0, to ensure that PITR doesn't fall into disrepute by novice users shooting themselves in the foot. It is a minor change, effecting only PITR, and only to the extent of a documentation change and a file existence check in pg_ctl. No server code would be changed. Alternative suggestions are welcome. Thoughts? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)
While this is nice, it will not help you if the restoration directory is different from your archive directory. That is : restore_command in recovery.conf fetches from somewhere other than where archive_command in postgresql.conf copied. I am not sure how likely this situation is, but setting up log shipping, or maybe recovering from disk failure *might* mean you need to bring the saved archive files back from somewhere else. regards Mark Tom Lane wrote: Another and simpler way is to recommend that people use archive_command strings that won't overwrite an existing archive file. For instance instead of showing the example archive_command = 'cp %p /mnt/server/archivedir/%f' we could show archive_command = 'test ! -f /mnt/server/archivedir/%f cp %p /mnt/server/archivedir/%f' ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)
I was thinking that even mildly experienced folks could benefit from a helpful sanity check. Typically the need to recover a system never comes at a good time, and features that help prevent silly mistakes are a great stress saver. As an aside, while testing recovery during pre beta, I think I probably forgot to put in a recovery.conf about 1 time in 10. Now I was using a small database cluster tar'ed up in /tmp, so no big deal, but if it had been a 100G beast that had to come off tape regards Mark Tom Lane wrote: I can't get very excited about this approach, because it only protects those people who (a) use pg_ctl to start the postmaster (not everyone) and (b) carefully follow the recovery directions (which the people you are worried about are very bad at, by hypothesis). ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] plans for bitmap indexes?
Greg Stark wrote: I think what you're trying to accomplish is better accomplished through partitioned tables. Then the user can decide which keys to use to partition the data and the optimizer can use the data to completely exclude some partitions from consideration. And it wouldn't interfere with indexes to access the data within a partition. Though partitioning will help, you can only partition on one key (I guess the ability to partition *indexes* might help here). I think that bitmap indexes provide a flexible may to get fact access to the result set for multiple low cardinality conditions - something that partitioning will generally not do. regards Mark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] plans for bitmap indexes?
Mark Kirkwood wrote: I think that bitmap indexes provide a flexible may to get fact access to the result set that should be *fast* access tosorry ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] plans for bitmap indexes?
Simon Riggs wrote: I believe that the benefit of on-disk bitmap indexes is supposed to be reduced storage size (compared to btree). The main problem is the need for the table to be read-only. Until we have partitioning, we wouldn't be able to easily guarantee parts of a table as being (effectively) read-only. I don't believe that read only is required. The update/insert performance impact of bimap indexes is however very high (in Oracle's implementation anyway) - to the point where many sites drop them before adding in new data, and recreated 'em afterwards! In the advent that there is a benefit for the small on-disk footprint, the insert/update throughput implications will need to be taken into account. cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] plans for bitmap indexes?
Tom Lane wrote: I believe that the term bitmap index is also used with a different meaning wherein it actually does describe a particular kind of on-disk index structure, with one bit per table row. IMHO building in-memory bitmaps (the first idea) is a very good idea to pursue for Postgres. I'm not at all sold on on-disk bitmap indexes, though ... those I suspect *are* sufficiently replaced by partial indexes. I believe that the benefit of on-disk bitmap indexes is supposed to be reduced storage size (compared to btree). In the cases where I have put them to use, they certainly occupy considerably less disk than a comparable btree index - provided there are not too many district values in the indexed column. regards Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [COMMITTERS] pgsql-server: Rearrange pg_subtrans handling
Greg Stark wrote: It's only allowed when the transaction is in READ UNCOMMITTED isolation level. Something Postgres doesn't currently support. In fact I'm not aware of any SQL database that supports it, though I'm sure there's one somewhere. FYI - DB2 supports this isolation level, I don't know of any others (tho Informix is a possibility). regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Coming soon: PG 7.4.4, 7.3.7, 7.2.5
The pg_dump fix in 8.0 that stops the destruction of existing users in the target database via DELETE FROM pg_shadow WHERE usesysid (... would be great! regards Mark Tom Lane wrote: Comments anyone? Backpatches for other bugs? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Coming soon: PG 7.4.4, 7.3.7, 7.2.5
Sorry - I meant pg_dump*all* rather than pg_dump. Mark Kirkwood wrote: The pg_dump fix in 8.0 that stops the destruction of existing users in the target database via DELETE FROM pg_shadow WHERE usesysid (... would be great! regards Mark Tom Lane wrote: Comments anyone? Backpatches for other bugs? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] fsync vs open_sync
Just out of interest, what happens to the difference if you use *ext3* (perhaps with data=writeback) regards Mark [EMAIL PROTECTED] wrote: I did a little test on the various options of fsync. ... create table testndx (value integer, name varchar); create index testndx_val on testndx (value); for(int i=0; i 100; i++) { printf_query( insert into testndx (value, name) values ('%d', 'test'), random()); // report here } Anyway, with fsync enabled using standard fsync(), I get roughly 300-400 inserts per second. With fsync disabled, I get about 7000 inserts per second. When I re-enable fsync but use the open_sync option, I can get about 2500 inserts per second. (This is on Linux 2.4 kernel, ext2 file system) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Point in Time Recovery
Ok - that is a much better way of doing it! regards Mark Tom Lane wrote: Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: If you use a readable file you will also need a feature for restore (or a tool) to create an appropriate pg_control file, or are you intending to still require that pg_control be the first file backed up. No, the entire point of this exercise is to get rid of that assumption. You do need *a* copy of pg_control, but the only reason you'd need to back it up first rather than later is so that its checkpoint pointer points to the last checkpoint before the dump starts. Which is the information we want to put in the archive-label file insted. If a copy of pg_control were sufficient then I'd be all for using it as the archive-label file, but it's *not* sufficient because you also need the ending WAL offset. So we need a different file layout in any case, and we may as well take some pity on the poor DBA and make the file easily human-readable. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Point in Time Recovery
I was wondering about this point - might it not be just as reasonable for the copied file to *be* an exact image of pg_control? Then a very simple variant of pg_controldata (or maybe even just adding switches to pg_controldata itself) would enable the relevant info to be extracted P.s : would love to be that volunteer - however up to the eyeballs in Business Objects (cringe) and Db2 for the next week or so regards Mark Bruce Momjian wrote: We need someone to code two backend functions to complete PITR. snippage However, once you decide to do things like that, there is no reason why the copied file has to be an exact image of pg_control. I claim it would be more useful if the copied file were plain text so that you could just cat it to find out the starting WAL position; that would let you determine without any special tools what range of WAL archive files you are going to need to bring back from your archives. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PITR COPY Failure (was Point in Time Recovery)
Looks good to me. Log file numbering scheme seems to have changed - is that part of the fix too?. Tom Lane wrote: This is done in CVS tip. Mark, could you retest to verify it's fixed? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PITR COPY Failure (was Point in Time Recovery)
Great that it's not fundamental - and hopefully with this discovery, the probability you mentioned is being squashed towards zero a bit more :-) Don't let this early bug detract from what is really a superb piece of work! regards Mark Tom Lane wrote: In any case this isn't a fundamental bug, just an insufficiently smart safety check. But thanks for finding it! As is, the code has a nonzero probability of failure in the field :-( and I don't know how we'd have tracked it down without a reproducible test case. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PITR COPY Failure (was Point in Time Recovery)
FYI - I can confirm that the patch fixes main issue. Simon Riggs wrote: This was a very confusing test...Here's what I think happened: . The included patch doesn't attempt to address those issues, yet. Best regards, Simon Riggs ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PITR COPY Failure (was Point in Time Recovery)
This is presumably a standard feature of any PITR design - if the failure event destroys the current transaction log, then you can only recover transactions that committed in the last *archived* log. regards Mark Simon Riggs wrote: The test works, but gives what looks like strange results: the test blows away the data directory completely, so the then-current xlog dies too. That contained the commit for the large COPY, so even though the recovery now works, the table has zero rows in it. (When things die you're still likely to lose *some* data). ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] PITR COPY Failure (was Point in Time Recovery)
I have been doing some re-testing with CVS HEAD from about 1 hour ago using the simplified example posted previously. It is quite interesting: i) create the table as: CREATE TABLE test0 (filler TEXT); and COPY 100 000 rows on length 109, then recovery succeeds. ii) create the table as: CREATE TABLE test0 (fillerVARCHAR(120)); and COPY as above, then recovery *fails* with the the signal 6 error below. LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/A4807C LOG: record with zero length at 0/E0 LOG: redo done at 0/30 LOG: restored log file from archive LOG: archive recovery complete PANIC: concurrent transaction log activity while database system is shutting down LOG: startup process (PID 17546) was terminated by signal 6 LOG: aborting startup due to startup process failure (I am pretty sure both TEXT and VARCHAR(120) failed using the original patch) Any suggestions for the best way to dig a bit deeper? regards Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PITR COPY Failure (was Point in Time Recovery)
There are some silly bugs in the script: - forgot to export PGDATA and PATH after changing them - forgot to mention the need to edit test.sql (COPY line needs path to dump file) Apologies - I will submit a fixed version a little later regards Mark Mark Kirkwood wrote: A script to run the whole business can be found here : http://homepages.paradise.net.nz/markir/download/pitr-bug.tar.gz (It will need a bit of editing for things like location of Pg, PGDATA, and you will need to make your own data file) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] PITR COPY Failure (was Point in Time Recovery)
fixed. Mark Kirkwood wrote: There are some silly bugs in the script: - forgot to export PGDATA and PATH after changing them - forgot to mention the need to edit test.sql (COPY line needs path to dump file) Apologies - I will submit a fixed version a little later regards Mark Mark Kirkwood wrote: A script to run the whole business can be found here : http://homepages.paradise.net.nz/markir/download/pitr-bug.tar.gz (It will need a bit of editing for things like location of Pg, PGDATA, and you will need to make your own data file) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] PITR COPY Failure (was Point in Time Recovery)
I decided to produce a nice simple example, so that anyone could hopefully replicate what I am seeing. The scenario is the same as before (the 11 steps), but the CREATE TABLE and COPY step has been reduced to: CREATE TABLE test0 (filler VARCHAR(120)); COPY test0 FROM '/data0/dump/test0.dat' USING DELIMITERS ','; Now the file 'test0.dat' consists of (128293) identical lines, each of 109 'a' charactors (plus end of line) A script to run the whole business can be found here : http://homepages.paradise.net.nz/markir/download/pitr-bug.tar.gz (It will need a bit of editing for things like location of Pg, PGDATA, and you will need to make your own data file) The main points of interest are: - anything =128392 rows in test0.dat results in 1 archived log, and the recovery succeeds - anything =128393 rows in test0.dat results in 2 or more archived logs, and recovery fails on the second log (and gives the zero length redo at 0/1E0 message). Let me know if I can do any more legwork on this (I am considering re-compiling with WAL_DEBUG now that example is simpler) regards Mark Simon Riggs wrote: On Thu, 2004-07-15 at 10:47, Mark Kirkwood wrote: I tried what I thought was a straightforward scenario, and seem to have broken it :-( Here is the little tale 1) initdb 2) set archive_mode and archive_dest in postgresql.conf 3) startup 4) create database called 'test' 5) connect to 'test' and type 'checkpoint' 6) backup PGDATA using 'tar -zcvf' 7) create tables in 'test' and add data using COPY (exactly 2 logs worth) 8) shutdown and remove PGDATA 9) recover using 'tar -zxvf' 10) copy recovery.conf into PGDATA 11) startup This is what I get : LOG: database system was interrupted at 2004-07-15 21:24:04 NZST LOG: recovery command file found... LOG: restore_program = cp %s/%s %s LOG: recovery_target_inclusive = true LOG: recovery_debug_log = true LOG: starting archive recovery LOG: restored log file from archive LOG: checkpoint record is at 0/A48054 LOG: redo record is at 0/A48054; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 496; next OID: 25419 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/A48094 LOG: restored log file 0001 from archive LOG: record with zero length at 0/1E0 LOG: redo done at 0/130 LOG: restored log file 0001 from archive LOG: restored log file 0001 from archive PANIC: concurrent transaction log activity while database system is shutting down LOG: startup process (PID 13492) was terminated by signal 6 LOG: aborting startup due to startup process failure The concurrent access is a bit of a puzzle, as this is my home machine (i.e. I am *sure* noone else is connected!) I can see what is wrong now, but you'll have to help me on details your end... The log shows that xlog 1 was restored from archive. It contains a zero length record, which indicates that it isn't yet full (or thats what the existing recovery code assumes it means). Which also indicates that it should never have been archived in the first place, and should not therefore be a candidate for a restore from archive. The double message restored log file can only occur after you've retrieved a partially full file from archive - which as I say, shouldn't be there. Other messages are essentially spurious in those circumstances. Either: - somehow the files have been mixed up in the archive directory, which is possible if the filing discipline is not strict - various ways, unfortunately I would guess this to be the most likely, somehow - the file that has been restored has been damaged in some way - the archiver has archived a file too early (very unlikely, IMHO - thats the most robust bit of the code) - some aspect of the code has written a zero length record to WAL (which is supposed to not be possible, but we musn't discount an error in recent committed work) - there may also be an effect going on with checkpoints that I don't understand...spurious checkpoint warning messages have already been observed and reported, Best regards, Simon Riggs ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Point in Time Recovery
I tried what I thought was a straightforward scenario, and seem to have broken it :-( Here is the little tale 1) initdb 2) set archive_mode and archive_dest in postgresql.conf 3) startup 4) create database called 'test' 5) connect to 'test' and type 'checkpoint' 6) backup PGDATA using 'tar -zcvf' 7) create tables in 'test' and add data using COPY (exactly 2 logs worth) 8) shutdown and remove PGDATA 9) recover using 'tar -zxvf' 10) copy recovery.conf into PGDATA 11) startup This is what I get : LOG: database system was interrupted at 2004-07-15 21:24:04 NZST LOG: recovery command file found... LOG: restore_program = cp %s/%s %s LOG: recovery_target_inclusive = true LOG: recovery_debug_log = true LOG: starting archive recovery LOG: restored log file from archive LOG: checkpoint record is at 0/A48054 LOG: redo record is at 0/A48054; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 496; next OID: 25419 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/A48094 LOG: restored log file 0001 from archive LOG: record with zero length at 0/1E0 LOG: redo done at 0/130 LOG: restored log file 0001 from archive LOG: restored log file 0001 from archive PANIC: concurrent transaction log activity while database system is shutting down LOG: startup process (PID 13492) was terminated by signal 6 LOG: aborting startup due to startup process failure The concurrent access is a bit of a puzzle, as this is my home machine (i.e. I am *sure* noone else is connected!) Mark P.s : CVS HEAD from about 1 hour ago, PITR 5.2, FreeBSD 4.10 on x86 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Point in Time Recovery
Simon Riggs wrote: First, thanks for sticking with it to test this. I've not received such a message myself - this is interesting. Is it possible to copy that directory to one side and re-run the test? Add another parameter in postgresql.conf called archive_debug = true Does it happen identically the second time? Yes, identical results - I re-initdb'ed and ran the process again, rather than reuse the files. What time difference was there between steps 5 and 6? I think I can here Andreas saying told you I'm thinking the backup might be somehow corrupted because the checkpoint occurred during the backup. Hmmm... I was wondering about this, so left a bit more time in between, and forced a sync as well for good measure. 5) $ psql -d test -c checkpoint; sleep 30;sync;sleep 30 6) $ tar -zcvf /data1/dump/pgdata-7.5.tar.gz * Thanks, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Point in Time Recovery
Couldn't agree more. Maybe we should have made more noise :-) Glen Parker wrote: Simon Riggs wrote: On Thu, 2004-07-15 at 23:18, Devrim GUNDUZ wrote: Thanks for the vote of confidence, on or off list. too many people spend a lot of money for proprietary databases, just for some missing features in PostgreSQL Agreed - PITR isn't aimed at existing users of PostgreSQL. If you use it already, even though it doesn't have it, then you are quite likely to be able to keep going without it. Most commercial users won't touch anything that doesn't have PITR. Agreed. I am surprised at how few requests we have gotten for PITR. I assume people are either using replication or not considering us. Don't forget that there are (must be) lots of us that know it's coming and are just waiting until it's available. I haven't requested per se, but believe me, I'm waiting for it :-) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Point in Time Recovery
Simon Riggs wrote: So far: I've tried to re-create the problem as exactly as I can, but it works for me. This is clearly an important case to chase down. I assume that this is the very first time you tried recovery? Second and subsequent recoveries using the same set have a potential loophole, which we have been discussing. Right now, I'm thinking that the exactly 2 logs worth of data has brought you very close to the end of the log file (E0) ending with 1 and the shutdown checkpoint that is then subsequently written is failing. Can you repeat this your end? It is repeatable at my end. It is actually fairly easy to recreate the example I am using, download http://sourceforge.net/projects/benchw and generate the dataset for Pg - but trim the large fact0.dat dump file using head -10. Thus step 7 consists of creating the 4 tables and COPYing in the data for them. The nearest I can get to the exact record pointers you show are to start recovery at A4807C and to end at with 88. Overall, PITR changes the recovery process very little, if at all. The main areas of effect are to do with sequencing of actions and matching up the right logs with the right backup. I'm not looking for bugs in the code but in subtle side-effects and edge cases. Everything you can tell me will help me greatly in chasing that down. I agree - I will try this sort of example again, but will change the number of rows I am COPYing (currently 10) and see if that helps. Best Regards, Simon Riggs By way of contrast, using the *same* procedure (1-11), but generating 2 logs worth of INSERTS/UPDATES using 10 concurrent process *works fine* - e.g : LOG: database system was interrupted at 2004-07-16 11:17:52 NZST LOG: recovery command file found... LOG: restore_program = cp %s/%s %s LOG: recovery_target_inclusive = true LOG: recovery_debug_log = true LOG: starting archive recovery LOG: restored log file from archive LOG: checkpoint record is at 0/A4803C LOG: redo record is at 0/A4803C; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 496; next OID: 25419 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/A4807C postmaster starting [EMAIL PROTECTED] 7.5]$ LOG: restored log file 0001 from archive cp: cannot stat `/data1/pgdata/7.5-archive/0002': No such file or directory LOG: could not restore 0002 from archive LOG: could not open file /data1/pgdata/7.5/pg_xlog/0002 (log file 0, segment 2): No such file or directory LOG: redo done at 0/1D4 LOG: archive recovery complete LOG: database system is ready LOG: archiver started ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Point in Time Recovery
I noticed that compiling with 5_1 patch applied fails due to XLOG_archive_dir being removed from xlog.c , but src/backend/commands/tablecmds.c still uses it. I did the following to tablecmds.c : 5408c5408 extern char XLOG_archive_dir[]; --- extern char *XLogArchiveDest; 5410c5410 use_wal = XLOG_archive_dir[0] !rel-rd_istemp; --- use_wal = XLogArchiveDest[0] !rel-rd_istemp; Now I have to see if I have broken it with this change :-) regards Mark Simon Riggs wrote: On Wed, 2004-07-14 at 16:55, [EMAIL PROTECTED] wrote: On 14 Jul, Simon Riggs wrote: PITR Patch v5_1 just posted has Point in Time Recovery working Still some rough edgesbut we really need some testers now to give this a try and let me know what you think. Klaus Naumann and Mark Wong are the only [non-committers] to have tried to run the code (and let me know about it), so please have a look at [PATCHES] and try it out. I just tried applying the v5_1 patch against the cvs tip today and got a couple of rejections. I'll copy the patch output here. Let me know if you want to see the reject files or anything else: I'm on it. Sorry 'bout that all - midnight fingers. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Compile Failue on win32 - pipe.c using ereport
For the past few days I have experienced compile failure with CVS HEAD on win32. The culprit is src/port/pipe.c which uses ereport() when it only #includes c.h. I cured the error by #including postgres.h - but after a bit of thought suspect that utils/elog.h is all that is necessary! Is anyone else able to confirm this? I am using mingw 3.1.0-1, msys 1.0.10 and gcc 3.3.1 on win 2003 server. regards Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Compile Failue on win32 - pipe.c using ereport
You are right! (goes away to sign up) Merlin Moncure wrote: p.s. this probably belongs on win32 hackers list. Merlin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] ALTER TABLE ... SET TABLESPACE
I don't know if this provides any more info than you already have - but is my last few lines from a single process backend run with valgrind : ==19666== Syscall param write(buf) contains uninitialised or unaddressable byte(s) ==19666==at 0x404D94F8: __GI___libc_write (in /lib/libc-2.3.2.so) ==19666==by 0x80934F8: XLogFlush (xlog.c:1414) ==19666==by 0x8090723: RecordTransactionCommit (xact.c:550) ==19666==by 0x8090BC0: CommitTransaction (xact.c:931) ==19666==Address 0x4219236A is not stack'd, malloc'd or free'd backend 1: oid (typeid = 26, len = 4, typmod = -1, byval = t) 2: nspname (typeid = 19, len = 64, typmod = -1, byval = f) 3: relname (typeid = 19, len = 64, typmod = -1, byval = f) ==19666== ==19666== Invalid write of size 4 ==19666==at 0x8109B00: DLMoveToFront (dllist.c:237) ==19666==by 0x81B2EB5: SearchCatCache (catcache.c:1155) ==19666==by 0x81B7D72: GetSysCacheOid (syscache.c:606) ==19666==by 0x81B8C7A: get_relname_relid (lsyscache.c:879) ==19666==Address 0xCC3D5C04 is not stack'd, malloc'd or free'd Segmentation fault Gavin Sherry wrote: On Sun, 20 Jun 2004, Tatsuo Ishii wrote: Attached is a patch implementing this functionality. I've modified make_new_heap() as well as swap_relfilenodes() to not assume that tablespaces remain the same from old to new heap. I thought it better to go down this road than introduce a lot of duplicate code. I have tried your patches and it works great. Thanks. One thing I noticed was if I change tablespace for a table having indexes, they are left in the old tablespace and the table itself was moved to the new tablespace. I regard this is a good thing since I could assign different table spaces for table and indexes. It would be even better to assign different tablespaces for each index. Hm. It seems there's a problem with tablespaces. What I did was: pgbench -i test alter table accounts set tablespace mydb2; \d accounts backend crashes by signal 11... ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Improving postgresql.conf
This seems like a nice idea - It might even be worth targeting a couple pf specific ranges - e.g : machines with 1G RAM and 4G RAM ( medium are large come to mind, I know it's a bit like that other database product we know of but that doesn't mean it's necessarily bad!) Mark Christopher Kings-Lynne wrote: I'm strongly in favour of distribution postresql.conf-large as well, with much higher settings for SCSI, 1GM RAM machines, say. This is exactly as MySQL does it and I think it's a great idea. At least then an admin will notice the file there and say to themselves I wonder what I need to do to configure for a big server, i guess that implies that this default postgresql.conf won't perform very well... Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Improving postgresql.conf
Oh dear, a lot of typos here, hopefully still decipherable... apologies. Mark Mark Kirkwood wrote: This seems like a nice idea - It might even be worth targeting a couple pf specific ranges - e.g : machines with 1G RAM and 4G RAM ( medium are large come to mind, I know it's a bit like that other database product we know of but that doesn't mean it's necessarily bad!) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] I just got it: PostgreSQL Application Server -- a
Joshua D. Drake wrote: The PostgreSQL Enhanced Server (How's that name? Maybe we call it Zerver and use PEZ?) idea is how to take the excellent core of PostgreSQL and productize it in much the same way distributions take the Linux kernel and may a GNU/Linux system. It would seem to me that this is more correct in the commercial space. Of course I am biased but what you are talking about sounds a whole lot like RedHat Enterprise versus Fedora etc And Postgresql Inc, Command Prompt, Slony etc... regards Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Frequently updated tables
[EMAIL PROTECTED] wrote: I have been talking about two types of problems which are both based on PostgreSQL's behavior with frequently updated tables. Summary table: In the single row table system, you have to vacuum very requently, and this affects performance. Frequently updated tables: think about the session table for a website. Each new user gets a new session row. Everytime they refresh or act in the site, the row is updated. When they leave or their session times out, the row is deleted. I wrote a RAM only session manager for PHP because PostgreSQL couldn't handle the volume. (2000 hits a second) It would be interesting to see if the vacuum delay patch, fsm tuning + vacuum scheduling could have changed this situation. Clearly there is an issue here (hence a patch...), but ISTM that just as significant is the fact that it is difficult to know how to configure the various bits and pieces, and also difficult to know if it has been done optimally. If you have an active site, with hundreds or thousands of hits a second, vacuuming the table constantly is not practical. I don't think anyone who has seriously looked at these issues has concluded that PostgreSQL works fine in these cases. The question is what, if anything, can be done? The frequent update issue really affects PostgreSQL's acceptance in web applications, and one which MySQL seems to do a better job. As an aside, I have had similar issues with DB2 and high update tables - lock escalations (locklist tuning needed). It is not just non-overwriting storage managers that need the magic tuning wand :-) regards Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequently updated tables
[EMAIL PROTECTED] wrote: The best phrasing would be the accumulating overhead of deletes and updates. Yes. Are you using 7.3? I am asking because in 7.3 high update / delete tables could suffer (index and toast) bloat that was untamable via (lazy) VACUUM and FSM. I believe this is fixed in 7.4, so it should be possible to achieve on disk size control of tables / indexes by configuring FSM and (lazy) VACUUM. Did you find this not to be the case? regards Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS]
The warn 'em in several places seems like a very good approach. regards Mark Matthew T. O'Connor wrote: There will always be people who won't read the notes, or ignore the notes, as there will always be people doing all sorts of stupid things that we can't protect them from. There is only so much we can and should do to protect these types of people. I think if we just make sure we warn people in several places so that anyone who does read the release notes will find it. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Call for 7.5 feature completion
We could perhaps do something similar to the Apache 1.3 win platform notes, where they (still) say *something* like : Apache on windows is not as stable as on unix... but is being actively improved all the time This is a bit more positive than it's dangerous!. As for people not reading the release notes - we could display the platform note (or an href to it) prominently on the download page (they may still not read that...but it has become a matter of choice at that point...). regards Mark David Garamond wrote: Robert Treat wrote: Given that the cygwin version is currently labeled as not ready for production I would say you are right. The truth is that many will never declare win32 good for production simply because of the OS it runs on, but we still want to make it as solid as possible. People _do_ use postgresql+cygwin in production environments though (see the pgsql-cygwin archive). And I suspect people _will_ use 7.5 for win32 in production, despite the release notes and the website clearly saying it's not production ready. Why? 1) The version number is 7.5 and many people will presume the ports are more or less equal in quality/maturity since they have the same version number; 2) People don't read release notes. See the various reviews on the recently released Fedora Core 2, complaining about how it doesn't support MP3 or DVD playback, despite the [legal] issues having been known and documented since Red Hat 8. Strangely enough, these people (who don't read release notes) _do_ write public reviews. They will badmouth PostgreSQL, saying it's unstable, crashes a lot, MySQL being much much more rock solid, etc etc. I suggest we label the win32 port as 7.5 ALPHA or 7.5 DANGEROUS :-) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] CVS HEAD compile failure on Freebsd 4.9
Fresh checkout of CVS HEAD yesterday, updated today : $ ./configure --prefix=/usr/local/pgsql/7.5 $ make gmake[4]: Entering directory `/usr/home/postgres/develop/c/pgsql/src/backend/access/nbtree' gcc -O2 -fno-strict-aliasing -Wall -Wmissing-prototypes -Wmissing-declarations -I../../../../src/include -c -o nbtree.o nbtree.c nbtree.c: In function `btbulkdelete': nbtree.c:600: storage size of `_delay' isn't known nbtree.c:600: warning: implicit declaration of function `select' nbtree.c:600: warning: unused variable `_delay' nbtree.c:602: storage size of `_delay' isn't known nbtree.c:602: warning: unused variable `_delay' gmake[4]: *** [nbtree.o] Error 1 gmake[4]: Leaving directory `/usr/home/postgres/develop/c/pgsql/src/backend/access/nbtree' gmake[3]: *** [nbtree-recursive] Error 2 gmake[3]: Leaving directory `/usr/home/postgres/develop/c/pgsql/src/backend/access' gmake[2]: *** [access-recursive] Error 2 gmake[2]: Leaving directory `/usr/home/postgres/develop/c/pgsql/src/backend' gmake[1]: *** [all] Error 2 gmake[1]: Leaving directory `/usr/home/postgres/develop/c/pgsql/src' gmake: *** [all] Error 2 *** Error code 2 $ uname -a FreeBSD spiney 4.9-RELEASE FreeBSD 4.9-RELEASE #3 kernel customizations : i686 (i.e 386-586 de-selected for kernel build) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] TPC H data
What scale factor TPC H are you importing? additionally - might be worth giving the specs of the machine you are doing this on. (I seem to recall trying this with Pg 7.2 a while ago without this issue, mind you - think I had ~1G of Ram and used the scale fact 1 dataset, i.e 1G) regards Mark Shalu Gupta wrote: Hello, We are trying to import the TPC-H data into postgresql using the COPY command and for the larger files we get an error due to insufficient memory space. We are using a linux system with Postgresql-7.3.4 Is it that Postgresql cannot handle such large files or is there some other possible reason. Thanks Shalu Gupta NC State University. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL configuration
Joe Conway wrote: Tom Lane wrote: Personally I rely quite a lot on setting PGDATA to keep straight which installation I'm currently working with, so I'm not going to be happy with a redesign that eliminates that variable without providing an adequate substitute :-( I'll second that. I'll third (or whatever) it too :-) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL configuration
Bruce Momjian wrote: Let me tell you the compromise I thought of. First, we put the config files (postgresql.conf, pg_hba.conf, pg_ident.conf) in data/etc by default. Sorry Bruce, I was being slow :-) , I was thinking you were going to associate the config files with the binary distribution - I think I now realize that you were looking at pushing them down into $PGDATA/etc, which is quite nice and tidy. best wishes Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL configuration
[EMAIL PROTECTED] wrote: ii) I think the -C switch *WITH* the -D switch has viable usability. Consider this, you are testing two different database layouts and/or RAID controllers. You could easily bounce back and forth from *identical* configurations like this: Convenient indeed, but I would like to see the association of .conf file - data dir remain reasonably solid. Its all about the foot gun. iii) I don't like the PID file at all. Not one bit, but I had a few people ask for it in the patch, it works as advertized and expected. It isn't my place to say how someone should use something. One of my customers wanted it, so I provided them with it. That is the beauty of open source. I think that there is a difference between a special patch suitable for a particular customer and general release, and that maybe this addition falls right in there. best wishes Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL configuration
Tom Lane wrote: I think if you spelled the subdir name config rather than etc, it would be more obvious what's what. How about 'conf' - (familiar to anyone who has used apache or tomcat ) regards Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL configuration
Bruce Momjian wrote: Mark Kirkwood wrote: Bruce Momjian wrote: My idea was to put config files in /usr/local/pgsql/data/etc, not pgsql/etc. We don't put Unix configuration files in /, etc put them in /etc. Sorry, I missed the 'data' pathname. However - I may be a bit slow - but I do not see how this will handle the situation where you have one installation of pgsql running several clusters. (I am not sure how common this situation is mind you) It is common. Moving things to data/etc will make things clearer, and see my later email on an initdb option to put /data/etc/ somewhere else and put a symlink for /data/etc. Hmmm, the current setup handles this situation sensibly and without the need for symlinks. So this does not look like an improvement to me... This *could* work without symlinks if you introduce a name for each initialized cluster, and make this part of the config file name. This would mean that you could use 'data/etc' and have many config files therein, each of which would *unambiguously* point to a given cluster. As a general point I share Tom's concern about breaking the association between the initialized cluster and its configuration file - e.g: I start prod with the configuration for test by mistake, and test has fsync=false... and something pulls the power... regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL configuration
I seems to me that the existing situation is actually correct : The configuration is a property of the initialized database cluster, so a logical place for it is in the root of said cluster. It is *not* a property of the installed binary distribution (e.g /usr/local/pgsql/etc) - as you may have *several* database clusters created using *this* binary distribution, each of which requiring a different configuration. Having said that, I am ok about the 'include' idea. regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL configuration
[EMAIL PROTECTED] wrote: IMHO my patch can do this in a self documenting way, thus making it easier to do, i.e. postmaster -C /etc/postgres/fundb.conf postmaster -C /etc/postgres/testdb.conf I think that is far more intuitive than: postmaster -D /some/path/who/knows/where/fundb postmaster -D /another/path/i/don/t/know/testdb To be honest - to me, both these look about the same on the intuitiveness front :-) I do not like lots of command line agruments so usually use : export PGDATA=/var/pgdata/version pg_ctl start I realize that I cannot objectively argue that this is intuitively better...it is just what I prefer. It is frustrating. I think this is important, as I would not have written and maintained it otherwise, but by being a somewhat subjective feature I can't make any iron clad arguments for it. I can only say it makes administration easier for those who whould like PostgreSQL administered this way. If the prevailing view is we don't think so, then it doesn't get put it, but it doesn't make my arguments any less valid. I completely agree. We are discussing what we would prefer - which is a valid thing to do. Clearly if most people prefer most of what is in your patch, then it would be silly to ignore it! So anyway, here is my vote on it : i) the inlcude - I like it ii) the -C switch - could be persuaded (provided some safety is there - like mutually exclusive with -D or PGDATA) iii) the pid file - don't like it regards Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL configuration
Bruce Momjian wrote: My idea was to put config files in /usr/local/pgsql/data/etc, not pgsql/etc. We don't put Unix configuration files in /, etc put them in /etc. Sorry, I missed the 'data' pathname. However - I may be a bit slow - but I do not see how this will handle the situation where you have one installation of pgsql running several clusters. (I am not sure how common this situation is mind you) regards Mark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CVS HEAD compile failure on Freebsd 4.9
Just updated now. compiles. fast fix :-) Tom Lane wrote: I plan to fix that later tonight if no one beats me to it. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] FYI , Intel CC and PostgreSQL , benchmark by pgsql
It would be interesting to see the results if you do this test again with : - scaling factor somewhere in the 100-300 range (so the generated db ~= your ram amount) - number of clients somewhere in 10-100 - number of transactions somewhere in 1000-1000 best wishes Mark P.s - Are you using the GENERIC kernel or have you built you own (e.g with SMP suppport for your hyperthreaded cpu?) jihuang wrote: Hi, I have a new server and some time to do an interesting simple benchmark. Compile PostgreSQL 7.4.1R by gcc3.2 and Intel CC 8.0 , and use pgbench to evaluate any difference.. Here is the result. -- CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3052.79-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf29 Stepping = 9 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX ,FXSR,SSE,SSE2,SS,HTT,TM,PBE Hyperthreading: 2 logical CPUs real memory = 3221200896 (3071 MB) avail memory = 3130855424 (2985 MB) FreeBSD 5.1-RELEASE-p11 /usr/local/intel_cc_80/bin/icc -V Intel(R) C++ Compiler for 32-bit applications, Version 8.0 Build 20031211Z Package ID: l_cc_p_8.0.055_pe057 Copyright (C) 1985-2003 Intel Corporation. All rights reserved. FOR NON-COMMERCIAL USE ONLY gcc -v Using built-in specs. Configured with: FreeBSD/i386 system compiler Thread model: posix gcc version 3.2.2 [FreeBSD] 20030205 (release) Application: PostgreSQL 7.4.1 Benchmark: pgbench Result : 1. IntelCC ( use ports/database/postgresql7 , default ) ./pgbench -U pgsql -c 30 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 30 number of transactions per client: 10 number of transactions actually processed: 300/300 tps = 34.975026 (including connections establishing) tps = 35.550815 (excluding connections establishing) 2. GNU cc( use ports/database/postgresql7 , default ) ./pgbench -U pgsql -c 30 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 30 number of transactions per client: 10 number of transactions actually processed: 300/300 tps = 38.968321 (including connections establishing) tps = 39.707451 (excluding connections establishing) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] psql \d option list overloaded
Couldn't agree more - syntax like SHOW TABLES; is inituitive and somehow right - [chuckles] - Mysql does not have *everything* wrong! regards Mark Bruce Momjian wrote: I like the idea of adding a new syntax to show that information using simple SQL command syntax, and putting it in the backend so all applications can access it. I know we have information schema, and maybe that can be used to make this simpler. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] *sigh*
Could certainly do that - a scalar function that returns reltuples from pg_class. I was hoping to do 2 additional things: i) provide better accuracy than the last ANALYZE ii) make it behave like an aggregate So I wanted to be able to use estimated_count as you would use count, i.e: SELECT estimated_count() FROM rel returns 1 row, whereas the scalar function : SELECT estimated_count(rel) FROM rel returns the result as many times as there are rows in rel - of course you would use SELECT estimated_count(rel) but hopefully you see what I mean! BTW, the scalar function is pretty simple to achieve - here is a basic example that ignores schema qualifiers: CREATE FUNCTION estimated_count(text) RETURNS real AS ' SELECT reltuples FROM pg_class WHERE relname = $1; ' LANGUAGE SQL; cheers Mark Simon Riggs wrote: Why not implement estimated_count as a dictionary lookup, directly using the value recorded there by the analyze? That would be the easiest way to reuse existing code and give you access to many previously calculated values. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] *sigh*
*growl* - it sounds like the business...and I was all set to code it, however after delving into Pg's aggregation structure a bit, it suffers a fatal flaw : There appears to be no way to avoid visiting every row when defining an aggregate (even if you do nothing on each one) -- which defeats the whole point of my suggestion (i.e avoiding the visit to every row) To make the original idea work requires amending the definition of Pg aggregates to introduce fake aggregates that don't actually get evaulated for every row. At this point I am not sure if this sort of modification is possible or reasonable - others who know feel free to chip in :-) regards Mark Randolf Richardson wrote: [EMAIL PROTECTED] (Mark Kirkwood) wrote in comp.databases.postgresql.hackers: [sNip] How about: Implement a function estimated_count that can be used instead of count. It could use something like the algorithm in src/backend/commands/analyze.c to get a reasonably accurate psuedo count quickly. The advantage of this approach is that count still means (exact)count (for your xact snapshot anyway). Then the situation becomes: Want a fast count? - use estimated_count(*) Want an exact count - use count(*) I think this is an excellent solution. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] *sigh*
Shridhar Daithankar wrote: Something like select reltuples from pg_class where relname='foo'? Shridhar [chuckles] - I had envisaged something more accurate that the last ANALYZE, estimate_count would effectively *do* acquire_sample_rows() then and there for you... regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] *sigh*
How about: Implement a function estimated_count that can be used instead of count. It could use something like the algorithm in src/backend/commands/analyze.c to get a reasonably accurate psuedo count quickly. The advantage of this approach is that count still means (exact)count (for your xact snapshot anyway). Then the situation becomes: Want a fast count? - use estimated_count(*) Want an exact count - use count(*) regards Mark Christopher Browne wrote: For a small table, it will be cheaper to walk through and calculate count(*) directly from the tuples themselves. The situation where it may be worthwhile to do this is a table which is rather large (thus count(*) is expensive) where there is some special reason to truly care how many rows there are in the table. For _most_ tables, it seems unlikely that this will be true. For _most_ tables, it is absolutely not worth the cost of tracking the information. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Oracle/PostgreSQL incompatibilities
Finger or brain trouble, here is the correction : NUMBER - INTEGER when transporting schemas from Oracle to Pg. (This needs a little care - as NUMBER in Oracle has bigger *precision* than INTEGER in Pg) Thinking about this a bit more, its probably fine to just substitute NUMERIC for NUMBER, but obviously INTEGER is going to perform better if it can be used. regards Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Oracle/PostgreSQL incompatibilities
I think he means that you can do this in oracle : CREATE TABLE test (id NUMBER); Oracle treats NUMBER as NUMBER(40) I think. This seems to be an example of Oracle making up standards as they go along - do we want to copy this sort of thing ? I usually just run a substitution of NUMBER(..) - NUMERIC(..) and NUMBER - INTEGER when transporting schemas from Oracle to Pg. (This needs a little care - as NUMBER in Oracle has bigger scale than INTEGER in Pg) regards Mark + PostgreSQL does not support the NUMBER keyword without (...) i.e. something in parenthesis following it. Don't follow this one either. We don't have NUMBER --- are you speaking of NUMERIC? If so, I'm not aware of any context where you're required to put a precision on NUMERIC. Again, may we see an example? Ditto. Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] dump cache summary
As part of attempting to gain an understanding of how Postgres works, I wanted to see if I could display a summary of what relations were using pages in the cache. Having done that, I was all set to trash the code when I wondered if it might be useful in its own right... Here is a sample of the output after a pgbench run with shared_buffers=500: bench=dumpcache; CACHE DUMP - pg_type_oid_index 2 pg_proc 1 history 3 pg_class_relname_nsp_index5 pg_statistic_relid_att_index 2 tellers_pkey 9 invalid oid 4 accounts205 pg_am 1 pg_statistic 1 branches 2 accounts_pkey 215 pg_amproc_opc_procnum_index 2 branches_pkey 9 pg_operator_oid_index 3 tellers 4 pg_class_oid_index2 pg_type 2 pg_index_indexrelid_index 2 pg_trigger_tgrelid_tgname_index 2 pg_amop_opr_opc_index 2 pg_index_indrelid_index 2 pg_class 2 pg_trigger1 pg_proc_oid_index 1 pg_amop_opc_strategy_index2 pg_attribute_relid_attnum_index 4 pg_amop 1 pg_amproc 1 pg_index 2 pg_operator 3 pg_attribute 3 (32 rows) bench= Does this seem like a useful thing to be able to display ? I implemented this by adding a command (as I wanted to know how this was done), but I suspect it would make more sense to use a function a bit like the pg_stat* collection. I have included (most of) the code I used, so that interested parties can show me what I have done wrong :-) (Some things I wondered about were : should I be locking buffers before peeking at the corresponding descriptors?, does it make sense to call RelationIdGetRelation on a relNode? ) regards Mark dumpcache.tar.gz Description: application/macbinary ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Is it a memory leak in PostgreSQL 7.4beta?
Hans, You are right about the startup memory - here is the top line for a few seconds after startup : PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 10116 postgres 15 0 3816 3816 3180 R 33.8 1.0 0:01.03 postmaster seems that VIRT, RES, SHR all get the increase counted against them as time goes on (as Tom suggested, I guess its to do with how top does its accounting on this platform). Hans-Jürgen Schönig wrote: I can hardly imagine that the backend started working with 9mb of memory. what did you do that PostgreSQL needed so much memory from the beginning??? are you using the default settings? usually the postmaster does not need more than 3mb at startup (in this scenario). ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Is it a memory leak in PostgreSQL 7.4beta?
Hans-Jürgen Schönig wrote: I can hardly imagine that the backend started working with 9mb of memory. what did you do that PostgreSQL needed so much memory from the beginning??? are you using the default settings? usually the postmaster does not need more than 3mb at startup (in this scenario). Setup is completely default - i.e run initdb, and start the server after that. I am running an embedded sql program to do the test, rather than an sql script (see enclosed), not sure why/if that would make any difference. On the cautionary side, note that I am using a beta Linux distribution too. regards Mark leak.sqc.gz Description: application/macbinary ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Is it a memory leak in PostgreSQL 7.4beta?
Stephan Szabo wrote: I rebuild without debug, and ran just the start/insert/commit sequence over and over and noticed that on my machine it looked to grow as above but that if I let it go long enough it seemed to basically stop (or at least the growth was slow enough to go without notice as compared to the easily noticable growth before). I'm running the full sequence now, but it's going to be a while before it stops or gets up to the place where it stoped in the s/i/c sequence. This is the Pg backend line from top after about 90 minutes runtime : PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 16083 postgres 17 0 9172 9172 8524 R 94.7 2.4 84:59.68 postmaster No sign of the shared growth stopping at this stage... Pg built with --disable-debug --without-zlib Platform is Linux 2.4.21+ xfs (Mandrake 9.2beta) regards Mark ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Anyone working on better transaction locking?
Greg Stark wrote: Shridhar Daithankar [EMAIL PROTECTED] writes: But database is not webserver. It is not suppose to handle tons of concurrent requests. That is a fundamental difference. And in one fell swoop you've dismissed the entire OLTP database industry. Have you ever called a travel agent and had him or her look up a fare in the airline database within seconds? Ever placed an order over the telephone? Ever used a busy database-backed web site? That situation is usually handled by means of a TP Monitor that keeps open database connections ( e.g, CICS + DB2 ). I think there is some confusion between many concurrent connections + short transactions and many connect / disconnect + short transactions in some of this discussion. OLTP systems typically fall into the first case - perhaps because their db products do not have fast connect / disconnect :-). Postgresql plus some suitable middleware (e.g Php) will handle this configuration *with* its current transaction model. I think you are actually talking about the connect / disconnect speed rather than the *transaction* model per se. best wishes Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Oracle rant
Fred Zellinger wrote: I also am a Linux Nut and use Postgres whenever possible because I like the freedom of access to the HACKERS mailing list...something only a few highly renound DBA snobs have with Oracle. Indeed, I think this is a significant component of the appeal of open source I have been impressed however, with the degree to which Oracle(synonymous with Ellison) has attempted to become Open. Oracle is getting into Linux almost as heavily as IBM, mostly prompted by their common rivalry with M$ and SQLServer. I wonder if the conversion to openness may more a mechanism to distinguish themselves from Microsoft, than a heart-felt belief in the principles themselves but its nice anyway ! regards Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Anyone have a fresh Solaris 8 SPARC system to create
Justin Clift wrote: Hi guys, Have created a Solaris 8 Intel package for PostgreSQL 7.3.1, but don't have any SPARC boxes here any more. Does anyone have a SPARC box handy that would be available for compiling PostgreSQL 7.3.1 on? It would need to be Solaris 8 (or maybe 9), and have things like gcc 2.95.x and similar tools installed, as well as be patched with the latest recommended Solaris patches. Might be a huge ask, but am figuring it to be worth at least trying. :-) Regards and best wishes, Justin Clift I can get access to several boxes with Solaris 8 + gcc 2.95 ( maybe not right-up-to-the minute latest patches, but fairly recently patched). They are firewalled off from the internet with abolutely no chance of external access, but I can build whatever is required ( Pg 7.3.1 is already installed from source) and upload it to techdocs.postgresql.org (or similar). ...I've never tried to create a Solaris package so I will need answers to all the usual dumb questions - including what extra configure options are required as I've been building with *none* :-) regards Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Anyone have a fresh Solaris 8 SPARC system to create
Lamar Owen wrote: On Wednesday 15 January 2003 09:20, Justin Clift wrote: Sound like a plan? Will also need someone else with a Solaris 8 SPARC system to try the packages out too, just in case there are weird library dependencies happening that might catch us out. I have access to several (two dozen) currently unused Ultra30 systems. I can install Sol8 on one and Sol9 on another and provide ssh access (once I figure out how to get ssh working on Solaris) to you, once I know your static IP address or subnet range. It may be a few days to a week before I can do the actual installation, however. I dont have Solaris 9... so Lamar's plan sounds like the way to go However, I am can still built for '8 if Lamar and/or yourself get snowed under cheers Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Oracle rant
snippage The Oracle system stops from time to time because of various arcane reasons. You get the error message, look it up on alltheweb.com, and fix it. The whole system is bogus. It DEMANDS a full time DBA. PostgreSQL does not. I could be accused of being cynical here (gosh)... but I think thats the whole idea - (hook'em with product and leverage consulting or expert dba..). One could be excused for thinking that its all about money. extra rant Once upon a time I did the Oracle 7.3 certification thing , however I subsequently I feel that I really dont *need* to buy into this Dba Guild mentality that the whole business seemed to be about (i.e. arcane little need to know things that trap all but the initiated... and of course certification is all about *being* the initiated...oh...and... maybe the exam fees help perpetuate this thing too...). /extra rant Thanks to you guys for providing the opportunity to share this ;-) Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Oracle rant
Adrian 'Dagurashibanipal' von Bidder wrote: I'm just saying that there are *some* small arcane details in postgres, too (although, at least, they don't affect stability, just performance). Indeed you are right... Pg has its own collection of arcane details too, but hopefully the culture of Postgesql (in common with all open source projects) is to expose and educate rather than confine to a group of the initiated. Does that sound better ? ( i.e no so rabid Oracle bashing) Cheers mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Multiple Key Clustering In Db2 8.1 - Interesting FYI
Dear hackers, I have recently been playing with DB2 8.1 Beta. It has introduced a feature to enable index clustering on more than one key. This reminded me of a previous thread on HACKERS about index access anding/bitmaps in Firebird. So anyway, here is a little snip from the 8.1 manual as a FYI. -- snip As the name implies, MDC tables cluster data on more than one dimension. Each dimension is determined by a column or set of columns that you specify in the ORGANIZE BY DIMENSIONS clause of the CREATE TABLE statement. When you create an MDC table, the following two kinds of indexes are created automatically: * A dimension-block index, which contains pointers to each occupied block for a single dimension. * A composite block index, which contains all dimension key columns. The composite block index is used to maintain clustering during insert and update activity. The optimizer considers dimension-block index scan plans when it determines the most efficient access plan for a particular query. When queries have predicates on dimension values, the optimizer can use the dimension block index to identify, and fetch from, only extents that contain these values. In addition, because extents are physically contiguous pages on disk, this results in more efficient performance and minimizes I/O. -- snipped regards Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] (Fwd) Re: Any Oracle 9 users? A test please...
Tom Lane wrote: Has anyone done the corresponding experiments on the other DBMSes to identify exactly when they allow CURRENT_TIMESTAMP to advance ? I have Db2 on hand and examined CURRENT TIMESTAMP in an sql procedure. (IBM have implemented it without the _ ) The short of it is that CURRENT TIMESTAMP is the not frozen to the transaction start, but reflects time movement within the transaction. Note that db2 +c is equivalent to issueing BEGIN in Pg, and the command line tool (db2) keeps (the same) connection open until the TERMINATE is issued : $ cat stamp.sql create procedure stamp() language sql begin insert into test values(1,current timestamp); insert into test values(2,current timestamp); insert into test values(3,current timestamp); insert into test values(4,current timestamp); insert into test values(5,current timestamp); insert into test values(6,current timestamp); insert into test values(7,current timestamp); insert into test values(8,current timestamp); insert into test values(9,current timestamp); end @ $ db2 connect to dss Database Connection Information Database server= DB2/LINUX 7.2.3 SQL authorization ID = DB2 Local database alias = DSS $ db2 -td@ -f stamp.sql DB2I The SQL command completed successfully. $ db2 +c db2 = call stamp(); STAMP RETURN_STATUS: 0 db2 = commit; DB2I The SQL command completed successfully. db2 = select * from test; ID VAL --- -- 1 2002-10-03-19.35.16.286019 2 2002-10-03-19.35.16.286903 3 2002-10-03-19.35.16.287549 4 2002-10-03-19.35.16.288235 5 2002-10-03-19.35.16.288925 6 2002-10-03-19.35.16.289571 7 2002-10-03-19.35.16.290209 8 2002-10-03-19.35.16.290884 9 2002-10-03-19.35.16.291522 9 record(s) selected. db2 = terminate; regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Improving speed of copy
Have you tried this with Oracle or similar commercial database? I have timed COPY/LOAD times for Postgresql/Mysql/Oracle/Db2 - the rough comparison is : Db2 and Mysql fastest (Db2 slightly faster) Oracle approx twice as slow as Db2 Postgresql about 3.5-4 times slower than Db2 However Postgresql can sometimes create indexes faster than Mysql so that the total time of COPY + CREATE INDEX can be smaller for Postgresql than Mysql. Oracle an Db2 seemed similarish to Postgresql with respect to CREATE INDEX regards Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Script to compute random page cost
Curt Sampson wrote: On Wed, 11 Sep 2002, Mark Kirkwood wrote: Hm, it appears we've both been working on something similar. However, I've just released version 0.2 of randread, which has the following features: funny how often that happens...( I think its often worth the effort to write your own benchmarking / measurement tool in order to gain an good understanding of what you intend to measure) Anyway, feel free to download and play. If you want to work on the program, I'm happy to give developer access on sourceforge. http://sourceforge.net/project/showfiles.php?group_id=55994 I'll take a look. best wishes Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Script to compute random page cost
I was attempting to measure random page cost a while ago - I used three programs in this archive : http://techdocs.postgresql.org/markir/download/benchtool/ It writes a single big file and seems to give more realistic measurements ( like 6 for a Solaris scsi system and 10 for a Linux ide one...) Have a look and see if you can cannibalize it for your program Cheers Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Script to compute random page cost
Tom Lane wrote: Perhaps it's time to remind people that what we want to measure is the performance seen by a C program issuing write() and read() commands, transferring 8K at a time, on a regular Unix filesystem Yes...and at the risk of being accused of marketing ;-) , that is exactly what the 3 programs in my archive do (see previous post for url) : - one called 'write' creates a suitably sized data file (8k at a time - configurable), using the write() call - another called 'read' does sequential reads (8k at a time - configurable), using the read() call - finally one called 'seek' does random reads (8k chunks - configurable), using the lseek() and read() calls I tried to use code as similar as possible to how Postgres does its ioso the results *should* be meaningful ! Large file support in enabled too (as you need to use a file several times bigger than your RAM - and everyone seems to have 1G of it these days...) I think the code is reasonably readable too Its been *tested* on Linux, Freebsd, Solaris, MacosX. The only downer is that they don't automatically compute random_page_cost for you..(I was more interested in the raw sequential read, write and random read rates at the time). However it would be a fairly simple modification to combine the all 3 programs into one executable that outputs random_page_cost... regards Mark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org