Re: [HACKERS] How can I use large object on PostgreSQL Linux
Thank You Michael Fuhr , Now, I already installed the contrib/lo and contrib/dbsize modules from source as your suggestion. From my first information, I got 3 errors /lo, /dbsize and /admin.That mean now only /admin that I cannot found anywhere. I still try to find it. If anybody know more information about /admin (in Windows is admin.dll) please tell me. Michael Fuhr [EMAIL PROTECTED] 08-Feb-05 11:07:10 AM On Mon, Feb 07, 2005 at 02:33:24PM +0700, Premsun Choltanwanich wrote: I'd guess that you haven't installed some third-party modules that you need on the Linux box, or that you've installed them in the wrong place. I don't make sure about third-party information cause all of module I use for Windows Version is install already on the time I install PostgreSQL RC1 on Windows.If you built PostgreSQL from source then you can install thecontrib/lo and contrib/dbsize modules from source as well. Formore information see contrib/README, contrib/lo/README.lo, andcontrib/dbsize/README.dbsize.-- Michael Fuhrhttp://www.fuhr.org/~mfuhr/
Re: [HACKERS] pg_dump bug in 7.3.9 with sequences
Not that I can see. You still have to get the sequence object from someplace, and having it be a distinct datatype is zero help for finding a column-specific sequence. What about allowing sequence qualifiers in the SERIAL definition? Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] AT TIME ZONE
Hi, Just a quick check that the extension to AT TIME ZONE to allow specifying intervals as well as country/city is on the list for 8.1. I believe it was a fairly simple thing to do now that we have our own time zone library... Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Cross column statistics
B) gather a full matrix of the level of correlation between each column and each other column. If this were a single floating point number per pair then it might be feasible. It would still obviously be n^2 in the number of columns though, so there would have to be some way to limit on how many columns would be analyzed this way. Use foreign keys to just record those cross-correlations. Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] AT TIME ZONE
Hi, Just a quick check that the extension to AT TIME ZONE to allow specifying intervals as well as country/city is on the list for 8.1. I believe it was a fairly simple thing to do now that we have our own time zone library... Yeah, this is on my personal hope to do for 8.1 list. At least the country/city part, haven't really thought about the other one. //Magnus ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] AT TIME ZONE
Yeah, this is on my personal hope to do for 8.1 list. At least the country/city part, haven't really thought about the other one. One of the two forms already works...can't quite remember which... ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Is there a way to make VACUUM run completely outside
Ühel kenal päeval (esmaspäev, 7. veebruar 2005, 19:01-0300), kirjutas Alvaro Herrera: Also, why must it be run outside of transaction block if it can be rollbacked ? A vacuum actually uses several transactions, so it wouldn't work as the user would expect if run in a transaction. The first one is committed rather early and new ones are opened and closed. (One per table, IIRC.) So I guess that making it commit and open new transaction at a regular interval (like each minute) during vacuuming single table would not alter its visible behaviour. That would solve my problem of long-running vacuums on large tables polluting unrelated small but heavily updated tables with dead tuples. I'll take a peak at code and try to come up with a naive proposal you can shoot down ;) -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] AT TIME ZONE
On Feb 8, 2005, at 20:43, Christopher Kings-Lynne wrote: Yeah, this is on my personal hope to do for 8.1 list. At least the country/city part, haven't really thought about the other one. One of the two forms already works...can't quite remember which... I think this is perhaps what you were trying to remember: http://archives.postgresql.org/pgsql-hackers/2004-10/msg00870.php Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
[EMAIL PROTECTED] wrote: In this case, the behavior observed could be changed by altering the sample size for a table. I submit that an arbitrary fixed sample size is not a good base for the analyzer, but that the sample size should be based on the size of the table or some calculation of its deviation. Mark, Do you have any evidence that the Sample Size had anything to do with the performance problem you're seeing? I have evidence, if you look through some of the messages in this thread, you'll see how a sample size of 1 provides enough data points to create stats the planner can use. I also do a lot with the complete Census/TIGER database. Cool, have any code for Mapserver? Every problem I have with the optimizer comes down to the fact that the data is loaded (and ordered on disk) by State/County FIPS codes, and then queried by zip-code or by city name. Like this: Alabama36101 [hundreds of pages with zip's in 36***] Alaska 99686 [hundreds of pages with zip's in 9] Arizona85701 [hundreds of pages with zip's in 855**] Note that the zip codes are *NOT* sequential. The correlation statistic sees that the Zip codes are not sequential; so it makes the *HORRIBLE* assumption that they are scattered randomly across the disk. It is my theory that this is because there are too few data points with which to properly characterize the nature of the data. In reality, even though there's no total ordering of the zip codes; any given zip code only exists on a couple disk pages; so index scans would be the right choice. I totally agree. But the single correlation parameter is not sufficient to let the optimizer known this. No matter how large a sample size you choose, ANALYZE will correctly see that Zip codes and State FIPS codes are non-correlated, and the optimizer will overestimate the # of pages an index scan will need. I tried to create an analogy in another post, and TIGER is a perfect example of the analogy. Think of the difference between an oscilloscope and a spectrum analizer. The current sampling code works more like an oscilloscope. It assumes a fairly normalized distribution of data. Given this, it works perfectly fine. When a scope is presented with an audio signal, it looks more like gibberish showing almost no correlation. When you view it in frequency domain, as with a spectrum analyzer, you can see clear patterns in the signal. Now, fortunately, we don't need any sort of absolute visualization of the data in TIGER, we only need to see that the data has many subtle trends rather than one fairly evenly distributed one. That's why more samples works. If we could do anything, I would add more statistics to the database. A standard deviation and maybe a sliding window deviation. A standard deviation might be pretty high, were as a sliding window whould show less localized deviation. Less localized deviation whould favor index scans in. Anyway, like I said. I think the expectation that the data is fairly normalized or evenly distributed works very well for data acquired over time. It is data like TIGER that is in a multiple field order, i.e. state, zipr, zipl that has complex paterns for the secondary sorts that can't be detected with too small a sample. PS: I pointed out workarounds in my earlier posting in this thread. Yes, I'm using the same TIGER data you are. Cool. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
* Mark Kirkwood ([EMAIL PROTECTED]) wrote: I can see your point, however I wonder if the issue is that the default stats settings of '10' (3000 rows, 10 histogram buckets) is too low, and maybe we should consider making a higher value (say '100') the default. Personally, I think that'd be reasonable. The idea of either automatically increasing sample size for large tables, or doing a few more samplings with different sizes and examining the stability of the estimates is rather nice, provided we can keep the runtime for ANALYZE to reasonable limits, I guess :-) I also agree with this and personally don't mind *too* much if analyze takes a little while on a large table to get decent statistics for it. One thing I was wondering about though is if we use the index to get some of the statistics information? Would it be possible, or reasonable? Do we already? I dunno, just some thoughts there, I keep hearing about the number of rows that are sampled and I would have thought it'd make sense to scan the index for things like the number of distinct values... Stephen signature.asc Description: Digital signature
Fw: Re: [HACKERS] float4 regression test failed on linux parisc
Tom, I'm back with this issue. I have comparied the src/backend/utils/adt/float.c from 7.4.6 against CVS HEAD. There was some work done on the infinity handling (don't know who, I am NOT a CVS expert/user). The problem I see is that the float4in does a check to see if the value is infinity BEFORE calling CheckFloat4Val (this was added for 8.0) but the float4div (and friends) doesn't. All I want to do is add a check in CheckFloat4Val for infinity (and remove the individual checks before the CheckFloat4Val call in other routines). I hope I have explained my problem and solution. Jim -- Forwarded Message --- From: Jim Buttafuoco [EMAIL PROTECTED] To: Tom Lane [EMAIL PROTECTED] Cc: pgsql-hackers pgsql-hackers@postgresql.org Sent: Tue, 1 Feb 2005 17:20:17 -0500 Subject: Re: [HACKERS] float4 regression test failed on linux parisc Tom, The issue is with a select 'Infinity'::float4/'Infinity'::float4; which should return NAN. without the cast I get the overflow message from CheckFloat4Val with the cast I get NAN (as expected). How about testing for isnan() inside CheckFloat4Val (just for PARISC / Linux)? I am trying to get this system working for the buildfarm as there are NO other HP PARISC system on the farm. Jim -- Original Message --- From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: pgsql-hackers pgsql-hackers@postgresql.org Sent: Tue, 01 Feb 2005 17:13:52 -0500 Subject: Re: [HACKERS] float4 regression test failed on linux parisc Jim Buttafuoco [EMAIL PROTECTED] writes: Change: CheckFloat4Val(result); To: CheckFloat4Val((float4)result); CheckFloat4Val is defined to take a double, so whatever the above is accomplishing is wrong: probably it's masking an out-of-range result. I think you've hit a bug in Debian's version of gcc for PA-RISC. regards, tom lane --- End of Original Message --- --- End of Forwarded Message --- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
[EMAIL PROTECTED] writes: The basic problem with a fixed sample is that is assumes a normal distribution. That's sort of true, but not in the way you think it is. [snip] Greg, I think you have an excellent ability to articulate stats, but I think that the view that this is like election polling is incorrect. Election polling assumes a very simple outcome: Some standard ditribution of a limited number options. I don't think it applies to this. When you look at a sine wave on an oscilloscope, you can see it clear as day. When you look at music on the scope, you know there are many waves there, but it is difficult to make heads or tails of it. (use xmms or winamp to see for yourself) The waves change in frequency, amplitude, and duration over a very large scale. That's why you use a spectrum analyzer to go from time domain to frequency domain. In frequency domain, you can see the trends better. That's not a bad analogy to many problems where you're measuring data that has non-randomness in it but that are not visible in the domain that the statistics that are being analyzed. This seems to happen a lot with geographic data, for instance. EXACTLY!!! If you find that increasing the stats targets improves things then this isn't true. If you find that it doesn't then what's really needed is a cleverer set of statistics to look for. I will be the first one to say that increasing the samples is not perfect, but it is a methodology that will help without major changes in postgres. Simply increasing the samples to a percentage of the estimated number of rows (with some upper and lower limits of course) will increase the accuracy of the n_distinct and correlation settings (at least a little bit), and that will make a huge impact with very little work. If we want to discuss improved statatistics, then we should include a standard deviation and a sliding window deviation, or something like that. Hell, maybe even FFT. The basic problem, I think, is that the sampling mechanism is more like an oscilloscope looking for large trends instead of a spectrum analyzer looking for the smaller ones. We have to be able to tell the planner that adjacent values are less random even though, as a whole, they are seemingly random. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] float4 regression test failed on linux parisc
Jim Buttafuoco [EMAIL PROTECTED] writes: All I want to do is add a check in CheckFloat4Val for infinity (and remove the individual checks before the CheckFloat4Val call in other routines). That's not at all what you proposed before, and it would have vastly more side-effects than just removing the platform-dependent behavior you are on about. If we did that then this would work: regression=# select ('infinity'::float4) / (1::float4); ERROR: type real value out of range: overflow ... which arguably it ought to, but you'd be changing the behavior everywhere not just for your broken compiler. I think the real question we ought to face up to sometime is what it is we are trying to accomplish with CheckFloat4Val and CheckFloat8Val in the first place. The latter routine in particular seems pretty ill-advised to me: if something can be represented as a double then why don't we just allow it? ISTM that what we really want is to reject out-of-range results, as in these examples: regression=# select (1e37::float4) / (1e-37::float4); ERROR: type real value out of range: overflow regression=# select (1e300::float8) / (1e-37::float8); ERROR: type double precision value out of range: overflow regression=# On machines that have IEEE infinity, I think it would work to report overflow if the result is infinity when neither input is. But I dunno how well that works on non-IEEE hardware. Also, what about rejecting NaN results? Thoughts anyone? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] problemsafter crash
ohp@pyrenet.fr writes: I had a crash last night and since while vacuuming databases (either full or lazy) I get this error: duplcate key violates unique cnstraint pg_statistic_relid_att_index Reindexing pg_statistic would probably make that go away ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] float4 regression test failed on linux parisc
Tom, The other option is to note that on older ( and I mean real old systems where the fp unit is sub par) systems that this test is likely to fail. I have now seen this on my real old Alpha and now HP PARISC systems. Is there a way to just modify the regression test to pass by these test on these platforms? Jim -- Original Message --- From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: pgsql-hackers pgsql-hackers@postgresql.org Sent: Tue, 08 Feb 2005 10:25:26 -0500 Subject: Re: [HACKERS] float4 regression test failed on linux parisc Jim Buttafuoco [EMAIL PROTECTED] writes: All I want to do is add a check in CheckFloat4Val for infinity (and remove the individual checks before the CheckFloat4Val call in other routines). That's not at all what you proposed before, and it would have vastly more side-effects than just removing the platform-dependent behavior you are on about. If we did that then this would work: regression=# select ('infinity'::float4) / (1::float4); ERROR: type real value out of range: overflow ... which arguably it ought to, but you'd be changing the behavior everywhere not just for your broken compiler. I think the real question we ought to face up to sometime is what it is we are trying to accomplish with CheckFloat4Val and CheckFloat8Val in the first place. The latter routine in particular seems pretty ill-advised to me: if something can be represented as a double then why don't we just allow it? ISTM that what we really want is to reject out-of-range results, as in these examples: regression=# select (1e37::float4) / (1e-37::float4); ERROR: type real value out of range: overflow regression=# select (1e300::float8) / (1e-37::float8); ERROR: type double precision value out of range: overflow regression=# On machines that have IEEE infinity, I think it would work to report overflow if the result is infinity when neither input is. But I dunno how well that works on non-IEEE hardware. Also, what about rejecting NaN results? Thoughts anyone? regards, tom lane --- End of Original Message --- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] One Big trend vs multiple smaller trends in table statistics
A couple of us using the US Census TIGER database have noticed something about the statistics gathering of analyze. If you follow the thread Query Optimizer 8.0.1 you'll see the progression of the debate. To summarize what I think we've seen: The current implementation of analyze is designed around sampling a table to characterize the basic trend of the data. The problem with the approach is that it assumes that the data has a singular trend behavior. Greg Stark posts Cross column statistics touches on the general problem. The best analogy so far is the difference between an oscilloscope and a spectrum analizer. The current statistics gathering is like a sampling oscilloscope trying to display a single wave form. Some data trends are more like audio signals where the data has many smaller trends in a seemingly random stream. With a specrum analyzer you can see the various components. Use Winamp or XMMS for a visualization. Lets assume data is in a multiple sort order. Lets assume it is a set of street addresses sorted by: state, streetname, streettyppe, address MA, ABBOT, RD, 100 MA, ABBOT, RD, 200 MA, ABBOT, RD, 300 MA, ABBOT, ST, 100 MA, ABBOT, ST, 200 MA, MAPLE, RD, 100 MA, MAPLE, RD, 200 MA, MAPLE, ST, 100 ... ... WY, ABBOT, RD, 100 etc. This table has MILLIONS of rows, every single address in the country. The trend of state is clearly an increasing step ramp over the entire table. The trend of streetname can be imagined as a waveform of a series of ramps for each state. The trend of streettype, similarly, is a series of ramps per street name, and the wave form for address is a ramp for each streettype. The statistics PostgreSQL currently employs will work great for state, but much less so for streetname. A query of select * from addresses where streetname = 'ABBOT' will be seen as more expensive than it really is. Most of the ABBOTs will be together in about 50 clusters (one for each state, assuming every state has atlease on ABBOT), but the current stats are not designed to detect this. Yes, eventually, if the sub-trends are small enough, the index scans become more expensive than table scans, but the current stats can't tell where that point is. Clearly it is not at the secondary sort (or streetname) level. I've found that increasing the sample size in analyze.c can help in specific cases, but the overall problem remains. The question is: Is this really a problem? If so, what can we do? I was thinking of trying to compute a sliding window standard deviation which should be able to detect smaller trends in an overall table, this would require a lot of work in analyze.c. If the sliding window deviation is low, then the correlation of the table should be increased, telling the planner that an index scan is a better choice. The actual math behind the values has to be worked out, of course, but what do you think about the idea? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
On Mon, Feb 07, 2005 at 17:45:23 -0500, Greg Stark [EMAIL PROTECTED] wrote: However for discrete values like the top ten most common values and the total number of distinct values it's not so clear at all that you can extrapolate from a sample at all. And it's certainly not clear that a fixed size sample gives you at all the same confidence for a large population as it does for a small population. If you were to keep a complete histogram for the sample points (which may or may not be practical) you should be able to estimate the number of distinct values under some assumptions. Such as, that all values outside of the top N values have the same likelihood. I don't think this is unreasonable. If you sampled across the country and found your sample of 600 people had 4 different top choices for president, how do you extrapolate that to calculate the total number of top choices for president the 300 million+ people will have across the country? You could multiply by 500k but that's not going to be right. Hell you're probably better off just going with 4 but that's clearly not right either. Well you can put some bounds on the number. Since no 5th candidate was picked by any of the 600 people, you would expect that the number of people prefering other candidates is on the order of 50 or less, so that the number of distinct values is going to be 50 or less. I think the histogram idea will work well for estimating the number of rows with particular values, but if you are interested in the number of unique values, you are going to have problems with some data sets. (Ones with a few very common values and lots of extremely rare items.) In this case there may be some way to use information from indexes on the data to get better results. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] float4 regression test failed on linux parisc
Jim Buttafuoco [EMAIL PROTECTED] writes: this test is likely to fail. I have now seen this on my real old Alpha and now HP PARISC systems. It works fine on PARISC, and has ever since I've been associated with this project --- I run these tests multiple times a day on old HP hardware, and they have always passed with every compiler I've used (both gcc and HP's). Lots of people have reported clean passes on Alpha as well. One more time: you have a compiler bug, and you really ought to be griping to the gcc people not us. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Is there a way to make VACUUM run completely outside
On Tue, Feb 08, 2005 at 01:55:47PM +0200, Hannu Krosing wrote: So I guess that making it commit and open new transaction at a regular interval (like each minute) during vacuuming single table would not alter its visible behaviour. That would solve my problem of long-running vacuums on large tables polluting unrelated small but heavily updated tables with dead tuples. Interesting. The problem is that a long running VACUUM on a single table will keep in PGPROC a TransactionId that will last very long, which will pollute every concurrent Snapshot; so smaller tables can't be cleaned up because the tuples are visible for the transaction running the vacuum -- except that that transaction cannot possibly want to look at them. -- Alvaro Herrera ([EMAIL PROTECTED]) La tristeza es un muro entre dos jardines (Khalil Gibran) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] float4 regression test failed on linux parisc
except isinf() works just fine on my system. It's just when CheckFloat4Val is called with infinity as the val you you get the overflow message. If I move the isinf into CheckFloat4Val all is fine. If you don't want to fix this, it's fine with me. I am just reporting problems and trying to fix them. I will shut up now and put my energy into other causes! Jim -- Original Message --- From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: pgsql-hackers pgsql-hackers@postgresql.org Sent: Tue, 08 Feb 2005 11:42:11 -0500 Subject: Re: [HACKERS] float4 regression test failed on linux parisc Jim Buttafuoco [EMAIL PROTECTED] writes: this test is likely to fail. I have now seen this on my real old Alpha and now HP PARISC systems. It works fine on PARISC, and has ever since I've been associated with this project --- I run these tests multiple times a day on old HP hardware, and they have always passed with every compiler I've used (both gcc and HP's). Lots of people have reported clean passes on Alpha as well. One more time: you have a compiler bug, and you really ought to be griping to the gcc people not us. regards, tom lane --- End of Original Message --- ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] correlation in pg_stats
Short summary: * It looks to me like the planner vastly overestimates the # of pages read by index scan in quite a few of my tables even though stats collected by ANALYZE are correct. * The problem happens any time you have multiple columns that have a number of repeated values in them, and you CLUSTER the table by a sort using both columns (like city,state,zip,phone# or firstname,lastname). * I think this is the problem that Mark Kirkwood is seeing in his threads Query optimizer 8.0.1 and One Big trend vs multiple smaller trends in hackers. * A test script demonstrating the issue also follows. * I think keeping one more stat per attribute in pg_stastic that could describe this behavior. Longer: If I understand the optimizer correctly, correlation is used to both guess how much random disk access will be required in a query; as well as estimate how many pages will be read. Unfortunately, many tables in my larger databases have columns with values that are tightly packed on a few pages; even though there is no total-ordering across the whole table. Stephan Szabo described this as a clumping effect: http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php The test script below shows a table with 6 columns and 1 million rows. Note that ANALYZE correctly observes that the correlation for all the columns except A is near zero. However, also note that columns A,B,C,and even D will have extremely few UNIQUE values on any given page of data. And conversely, an index scan on any of those columns be a good choice (as seen by EXPLAIN ANALYZE output below). Instead of just storing correlation, I would also like to store a variation of correlation that pretends that the disk-pages for a particular column were sorted by the min value for that column. Wherever the optimizer would use the existing 'correlation' to estimate the number of pages that would be accesses; it could use this correlation discounting the order of blocks value instead. All the math/stastics/theory that suggests correlation is good at estimating # of pages would remain intact. Anyway... I've talked too much... The test script showing 1) the tables 2) the values in pg_stats 3) EXPLAIN ANALYZE of columns showing the problem follows: fli=# create temporary table tmp1mil as select * from (select generate_series as a from generate_series(0,9)) as a, (select generate_series as b from generate_series(0,9)) as b, (select generate_series as c from generate_series(0,9)) as c, (select generate_series as d from generate_series(0,9)) as d, (select generate_series as e from generate_series(0,9)) as e, (select generate_series as f from generate_series(0,9)) as f order by a,b,c,d,e,f; fli-# fli-# fli-# fli-# fli-# fli-# fli-# fli-# SELECT fli=# fli=# vacuum analyze tmp1mil; VACUUM fli=# select * from pg_stats where tablename='tmp1mil'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals| most_common_freqs | histogram_bounds | correlation +---+-+---+---++---++--+- pg_temp_4 | tmp1mil | a | 0 | 4 | 10 | {7,8,5,0,6,3,4,1,2,9} | {0.113,0.106667,0.105667,0.104333,0.101333,0.097,0.095,0.0936667,0.092,0.091} | | 1 pg_temp_4 | tmp1mil | b | 0 | 4 | 10 | {1,4,0,5,6,2,9,7,3,8} | {0.119333,0.112667,0.107,0.101,0.099,0.0976667,0.0946667,0.092,0.0886667,0.088} | |0.229754 pg_temp_4 | tmp1mil | c | 0 | 4 | 10 | {9,5,0,1,8,6,4,7,3,2} | {0.114667,0.107,0.103,0.101667,0.101667,0.0996667,0.0956667,0.094,0.093,0.089} | |0.142119 pg_temp_4 | tmp1mil | d | 0 | 4 | 10 | {4,3,2,8,7,9,5,6,1,0} | {0.114667,0.11,0.108,0.104,0.102667,0.099,0.098,0.090,0.089,0.084} | | 0.0930835 pg_temp_4 | tmp1mil | e | 0 | 4 | 10 | {0,5,1,9,4,7,8,2,3,6} |
Re: [HACKERS] Cross column statistics
Christopher Kings-Lynne [EMAIL PROTECTED] writes: B) gather a full matrix of the level of correlation between each column and each other column. If this were a single floating point number per pair then it might be feasible. It would still obviously be n^2 in the number of columns though, so there would have to be some way to limit on how many columns would be analyzed this way. Use foreign keys to just record those cross-correlations. My email was about cross-column intra-table correlations. inter-table correlations are a whole other ball of wax. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] correlation in pg_stats
Short summary: * It looks to me like the planner vastly overestimates the # of pages read by index scan in quite a few of my tables even though stats collected by ANALYZE are correct. * The problem happens any time you have multiple columns that have a number of repeated values in them, and you CLUSTER the table by a sort using both columns (like city,state,zip,phone# or firstname,lastname). * I think this is the problem that Mark Kirkwood is seeing in his threads Query optimizer 8.0.1 and One Big trend vs multiple smaller trends in hackers. actually [EMAIL PROTECTED], is Mark Woodward. Pleased to meet you. :) (I hate using my name on lists like this because of spammers) * A test script demonstrating the issue also follows. * I think keeping one more stat per attribute in pg_stastic that could describe this behavior. Longer: If I understand the optimizer correctly, correlation is used to both guess how much random disk access will be required in a query; as well as estimate how many pages will be read. Unfortunately, many tables in my larger databases have columns with values that are tightly packed on a few pages; even though there is no total-ordering across the whole table. Stephan Szabo described this as a clumping effect: http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php Yes. I think we are describing the exact same issue. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] PHP/PDO Database Abstraction Layer
Anyone up for it? This is our chance to get a really top notch PHP driver for PostgreSQL that supports all the appropriate goodies. What exactly do you want people to do? My thought process was to beat the heck out of it. Find things that don't work, should work, aren't implemented and hopefully make them work and get them implemented. Sincerely, Joshua D. Drake -- Command Prompt, Inc., your source for PostgreSQL replication, professional support, programming, managed services, shared and dedicated hosting. Home of the Open Source Projects plPHP, plPerlNG, pgManage, and pgPHPtoolkit. Contact us now at: +1-503-667-4564 - http://www.commandprompt.com begin:vcard fn:Joshua D. Drake n:Drake;Joshua D. org:Command Prompt, Inc. adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl. x-mozilla-html:FALSE url:http://www.commandprompt.com/ version:2.1 end:vcard ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PHP/PDO Database Abstraction Layer
On Tue, Feb 08, 2005 at 10:10:58AM -0800, Joshua D. Drake wrote: Anyone up for it? This is our chance to get a really top notch PHP driver for PostgreSQL that supports all the appropriate goodies. What exactly do you want people to do? My thought process was to beat the heck out of it. Find things that don't work, should work, aren't implemented and hopefully make them work and get them implemented. The get last inserted Id business is likely to be hard to implement. It doesn't even take parameters ... -- Alvaro Herrera ([EMAIL PROTECTED]) Tulio: oh, para qué servirá este boton, Juan Carlos? Policarpo: No, aléjense, no toquen la consola! Juan Carlos: Lo apretaré una y otra vez. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] external indices ...
I believe that this is what Oleg et al tap into with the tsearch2 stuff, no? I have someone asking me about it, and want to make sure that I'm telling him the right answer ... is this what GiST is? And, if so, what is a *good* doc for me to point them at to get up to speed with it? Is what we have in the docs the best place, or is there something someone else has written that gets into it even more? Thanks ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] problemsafter crash
Hi Tom, The problem was worse then that: there were actually twice the same row in the table, delete those twins helped. One of my customer suffered the same problem in 2 of his tables and even the OID was the same! how can that be possible? again, deleting those rows helped On Tue, 8 Feb 2005, Tom Lane wrote: Date: Tue, 08 Feb 2005 10:40:33 -0500 From: Tom Lane [EMAIL PROTECTED] To: ohp@pyrenet.fr Cc: pgsql-hackers list pgsql-hackers@postgresql.org Subject: Re: [HACKERS] problemsafter crash ohp@pyrenet.fr writes: I had a crash last night and since while vacuuming databases (either full or lazy) I get this error: duplcate key violates unique cnstraint pg_statistic_relid_att_index Reindexing pg_statistic would probably make that go away ... regards, tom lane -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr -- Make your life a dream, make your dream a reality. (St Exupery) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] external indices ...
I haven't worked with GiST, although I have been curious from time to time. Just never had the time to sit, read, and try out the GiST system. On my text search system (FTSS) I use functions that return sets of data. It make be easier to implement that than a GiST. Basically, I create a unique ID index on a table of data. Create my external index. When a search is done, I return a set of unique IDs and ranks. I believe that this is what Oleg et al tap into with the tsearch2 stuff, no? I have someone asking me about it, and want to make sure that I'm telling him the right answer ... is this what GiST is? And, if so, what is a *good* doc for me to point them at to get up to speed with it? Is what we have in the docs the best place, or is there something someone else has written that gets into it even more? Thanks ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Fast reference without an index?
A question to the hackers: Is there a way, and if I'm being stupid please tell me, to use something like a row ID to reference a row in a PostgreSQL database? Allowing the database to find a specific row without using an index? I mean, an index has to return something like a row ID for the database to find the row, right? Granted it has to find the version with the right transaction ID, but still, you get the idea. In the old days of ISAM stuff, it would just be the offset into the file. In my text search system, rather than create an additional indexed column for row ID, wouldn't it be more efficient if I could use PostgreSQL's data for this? This would greatly improve the performance of FTSS as it would eliminate the extra index operation per row returned. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] correlation in pg_stats
[EMAIL PROTECTED] wrote: actually [EMAIL PROTECTED], is Mark Woodward. Pleased to meet you. :) (I hate using my name on lists like this because of spammers) Not to be confused with me :-) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] external indices ...
Marc, On Tue, 8 Feb 2005, Marc G. Fournier wrote: I believe that this is what Oleg et al tap into with the tsearch2 stuff, no? I have someone asking me about it, and want to make sure that I'm telling him the right answer ... is this what GiST is? And, if so, what is a *good* doc for me to point them at to get up to speed with it? Is what we have in the docs the best place, or is there something someone else has written that gets into it even more? I'm a little bit confused, because I couldn't catch your meaning :) Thanks ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] external indices ...
On Tue, 8 Feb 2005, Oleg Bartunov wrote: Marc, On Tue, 8 Feb 2005, Marc G. Fournier wrote: I believe that this is what Oleg et al tap into with the tsearch2 stuff, no? I have someone asking me about it, and want to make sure that I'm telling him the right answer ... is this what GiST is? And, if so, what is a *good* doc for me to point them at to get up to speed with it? Is what we have in the docs the best place, or is there something someone else has written that gets into it even more? I'm a little bit confused, because I couldn't catch your meaning :) tsearch2 is an external indexing method, no? external in that it isn't intergrated as part of PostgreSQL, but is added on ... with the 'add on' being through the GiST framework ... right ? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Fast reference without an index?
[EMAIL PROTECTED] writes: Is there a way, and if I'm being stupid please tell me, to use something like a row ID to reference a row in a PostgreSQL database? Allowing the database to find a specific row without using an index? ctid ... which changes on every update ... regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] external indices ...
Marc G. Fournier [EMAIL PROTECTED] writes: I believe that this is what Oleg et al tap into with the tsearch2 stuff, no? I have someone asking me about it, and want to make sure that I'm telling him the right answer ... is this what GiST is? Better ask what he means by external exactly. I think I've heard some people use the phrase external index to mean any index that's not actually part of the table itself; which makes all our index types external. (As a counterexample, imagine a table that is itself organized as a btree.) regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] external indices ...
Summary: Currently used with Oracle databases thru their Extensible Indexing Cartridge. What we are interested in is finding out whether or not we can use our technology directly with PostgreSQL. That would involve, from a overly simplistic view, allowing Postgres to use external indexes. Does that help any? On Tue, 8 Feb 2005, Tom Lane wrote: Marc G. Fournier [EMAIL PROTECTED] writes: I believe that this is what Oleg et al tap into with the tsearch2 stuff, no? I have someone asking me about it, and want to make sure that I'm telling him the right answer ... is this what GiST is? Better ask what he means by external exactly. I think I've heard some people use the phrase external index to mean any index that's not actually part of the table itself; which makes all our index types external. (As a counterexample, imagine a table that is itself organized as a btree.) regards, tom lane Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] external indices ...
On Tue, 8 Feb 2005, Marc G. Fournier wrote: On Tue, 8 Feb 2005, Oleg Bartunov wrote: Marc, On Tue, 8 Feb 2005, Marc G. Fournier wrote: I believe that this is what Oleg et al tap into with the tsearch2 stuff, no? I have someone asking me about it, and want to make sure that I'm telling him the right answer ... is this what GiST is? And, if so, what is a *good* doc for me to point them at to get up to speed with it? Is what we have in the docs the best place, or is there something someone else has written that gets into it even more? I'm a little bit confused, because I couldn't catch your meaning :) tsearch2 is an external indexing method, no? external in that it isn't intergrated as part of PostgreSQL, but is added on ... with the 'add on' being through the GiST framework ... right ? I think, that guy asked about if tsearch2 is an external program which running as a separate process. Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Fast reference without an index?
[EMAIL PROTECTED] writes: Is there a way, and if I'm being stupid please tell me, to use something like a row ID to reference a row in a PostgreSQL database? Allowing the database to find a specific row without using an index? ctid ... which changes on every update ... Well, how does an index do it? Say this: select * from mytable where name = 'foo'; The index must return something. Say I have a row that it constantly being updated, or has an original item inserted. An item which is valid within my transaction, and an item which has just be inserted but has a transaction id greater than mine. Dosn't the index have some base number which points to the first valid occurance of the row, and then the valid row is found based on the transaction ID, or has PG changed? Is that the ctid? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] correlation in pg_stats
[EMAIL PROTECTED]@[EMAIL PROTECTED] wrote Short summary: * It looks to me like the planner vastly overestimates the # of pages read by index scan in quite a few of my tables even though stats collected by ANALYZE are correct. * The problem happens any time you have multiple columns that have a number of repeated values in them, and you CLUSTER the table by a sort using both columns (like city,state,zip,phone# or firstname,lastname). * I think this is the problem that Mark Kirkwood is seeing in his threads Query optimizer 8.0.1 and One Big trend vs multiple smaller trends in hackers. actually [EMAIL PROTECTED], is Mark Woodward. Pleased to meet you. :) (I hate using my name on lists like this because of spammers) * A test script demonstrating the issue also follows. * I think keeping one more stat per attribute in pg_stastic that could describe this behavior. Longer: If I understand the optimizer correctly, correlation is used to both guess how much random disk access will be required in a query; as well as estimate how many pages will be read. Unfortunately, many tables in my larger databases have columns with values that are tightly packed on a few pages; even though there is no total-ordering across the whole table. Stephan Szabo described this as a clumping effect: http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php Yes. I think we are describing the exact same issue. I believe that is a very common case. Many natural value distributions follow the Zipf distribution, which is commonly known as the 80-20 curve. Best Regards, SImon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Fast reference without an index?
On Tue, Feb 08, 2005 at 04:29:34PM -0500, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] writes: Is there a way, and if I'm being stupid please tell me, to use something like a row ID to reference a row in a PostgreSQL database? Allowing the database to find a specific row without using an index? ctid ... which changes on every update ... Well, how does an index do it? The index _is_ updated on every update. And on vacuum too, of course; so you'd also need to hack VACUUM to make sure your reference is updated. Maybe you could store your data in a GiST index ... -- Alvaro Herrera ([EMAIL PROTECTED]) [PostgreSQL] is a great group; in my opinion it is THE best open source development communities in existence anywhere.(Lamar Owen) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Interpretation of TRUSTED
Hi, A TRUSTED language specifies that ordinary users can use the language. It also implies that access to the file system should be prevented. In essence, ordinary users can never access the filesystem. Is it OK to design a trusted language so that it allows access to the filesystem provided that the session user is a super-user? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
Mark, Stephen, etc: I can see your point, however I wonder if the issue is that the default stats settings of '10' (3000 rows, 10 histogram buckets) is too low, and maybe we should consider making a higher value (say '100') the default. Personally, I think that'd be reasonable. I don't really think that we want to increase sampling for *all* columns. Each column that doesn't get queried but has increased stats adds the size of the pg_stats table and the amount of vacuuming it needs, decreasing performance of stats lookups. Especially when very large text and BYTEA columns are involved. Neil and I talked about setting a seperate GUC, default_index_stats_target, for 8.0 that would dramatically increase the stats sampling for indexed columns only (say, to 250).Unfortunately, I was unable to come up with a good test case for setting this value. Perhaps the Tiger database would allow us to set up a good test case. Does anyone have a well-indexed PG Tiger I could play with? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Interpretation of TRUSTED
On Tue, Feb 08, 2005 at 11:12:07PM +0100, Thomas Hallgren wrote: Hi, A TRUSTED language specifies that ordinary users can use the language. It also implies that access to the file system should be prevented. In essence, ordinary users can never access the filesystem. Is it OK to design a trusted language so that it allows access to the filesystem provided that the session user is a super-user? I believe that that is what UNTRUSTED languages are for. Only the super-user may create functions in them, although there is no inherent restriction on other users' calling those functions. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Interpretation of TRUSTED
An untrusted language is so because of what it can do and cannot do. An untrusted language cannot access the filesystem, for example. WHO writes and runs untrusted and trusted procedures is a different thing. It is the security mechanism meant to restrict writting and running the functions written in a trusted/untrusted language. --elein On Tue, Feb 08, 2005 at 11:12:07PM +0100, Thomas Hallgren wrote: Hi, A TRUSTED language specifies that ordinary users can use the language. It also implies that access to the file system should be prevented. In essence, ordinary users can never access the filesystem. Is it OK to design a trusted language so that it allows access to the filesystem provided that the session user is a super-user? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Interpretation of TRUSTED
elein wrote: An untrusted language is so because of what it can do and cannot do. An untrusted language cannot access the filesystem, for example. WHO writes and runs untrusted and trusted procedures is a different thing. It is the security mechanism meant to restrict writting and running the functions written in a trusted/untrusted language. Ok, that makes a lot of sense. I read the documentation: 3.4.1 Installing Procedural Languages. It is a bit vague on what an untrusted language should be prevented from doing. do not allow access to database server internals or the file system is what it says. How about network access? Is there a more canonical definition somewhere that I can read ? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Interpretation of TRUSTED
David Fetter [EMAIL PROTECTED] writes: On Tue, Feb 08, 2005 at 11:12:07PM +0100, Thomas Hallgren wrote: Is it OK to design a trusted language so that it allows access to the filesystem provided that the session user is a super-user? I believe that that is what UNTRUSTED languages are for. Only the super-user may create functions in them, although there is no inherent restriction on other users' calling those functions. AFAICS, what Thomas proposes would be exactly equivalent to root running scripts owned by non-root users --- in this case, if session user is root then functions written by other people would be allowed to do things they normally shouldn't be able to do. It strikes me as a great loophole for Trojan-horse functions. Not that a sane superuser would run functions controlled by other people in the first place. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Interpretation of TRUSTED
Tom Lane wrote: On Tue, Feb 08, 2005 at 11:12:07PM +0100, Thomas Hallgren wrote: Is it OK to design a trusted language so that it allows access to the filesystem provided that the session user is a super-user? AFAICS, what Thomas proposes would be exactly equivalent to root running scripts owned by non-root users --- in this case, if session user is root then functions written by other people would be allowed to do things they normally shouldn't be able to do. It strikes me as a great loophole for Trojan-horse functions. Not that a sane superuser would run functions controlled by other people in the first place. Agreed. It's also not how other PLs work. I don't think this definition should be up to the individual language. So my answer to his question above would be No. cheers andrew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Interpretation of TRUSTED
On Tue, 2005-02-08 at 14:51 -0800, elein wrote: An untrusted language is so because of what it can do and cannot do. An untrusted language cannot access the filesystem, for example. There is a bit of confusion in the terminology. Trusted means that the language has been deemed safe, i.e. it cannot do things like access the filesystem. An untrusted language is not safe, i.e. it can do things that you would not want an ordinary user to do. I can see how someone would be easily confused about the terminology, and I have to think about it each time. Trusted vs. untrusted refers to the language, not the user of the language. Regards, Jeff Davis ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Connect By for 8.0
Hans-Jürgen Schönig wrote: Neil Conway wrote: Robert Treat wrote: Actually i believe people want both syntax's as the former is used by oracle and the latter by db2 (iirc) I think the past consensus has been to adopt the SQL standard syntax. Is there any reason to also support the Oracle syntax other than for compatibility? (And if that is it, I think it's a pretty flimsy reason.) -Neil ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) Neil, Compatibility is the ONLY reason and it is related to money (the only language people understand). We have done a lot of migration here and I can tell you that support for Oracle style CONNECT BY would make more people happy than the SQL style syntax. The reason for that is very simple: Code can be migrated without any changes (= without introducing potential bugs). I know that SQL standards are tremendously important but if there is an easy way to support Oracle syntax as well this is definitely the preferred way to go. I think it is important not to think in dogmas (in this case this means SQL syntax is always better) - there should be a reasonable compromise between compatibility and standard. My compromise would be: Support both syntaxes if possible. I can see your point, but imagine if we had Oracle compatibility for lots of cases --- our system would have either non-standard or duplicate ways of doing things, and that would be quite confusing. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] How can I use large object on PostgreSQL Linux
On Tue, Feb 08, 2005 at 02:55:01PM +0700, Premsun Choltanwanich wrote: From my first information, I got 3 errors /lo, /dbsize and /admin. That mean now only /admin that I cannot found anywhere. I still try to find it. If anybody know more information about /admin (in Windows is admin.dll) please tell me. I wonder if admin is pgadmin-tools. http://cvs.pgadmin.org/cgi-bin/viewcvs.cgi/pgadmin-tools/ What does the query below show on the Windows box? If it shows the same functions listed in pgadmin-tools/support/admin.sql.in then you might need to install pgadmin-tools to make the systems identical (although you might not actually need pgadmin-tools for its functionality). SELECT proname, oidvectortypes(proargtypes) AS argtypes FROM pg_proc WHERE probin LIKE '%admin%'; -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] How can I use large object on PostgreSQL Linux
I don't sure. However It maybe possibled cause I use PostgreSQL 8 RC1 with pgAdmin. Michael Fuhr [EMAIL PROTECTED] 09-Feb-05 11:22 AM On Tue, Feb 08, 2005 at 02:55:01PM +0700, Premsun Choltanwanich wrote: From my first information, I got 3 errors /lo, /dbsize and /admin. That mean now only /admin that I cannot found anywhere. I still try to find it. If anybody know more information about /admin (in Windows is admin.dll) please tell me.I wonder if admin is pgadmin-tools.http://cvs.pgadmin.org/cgi-bin/viewcvs.cgi/pgadmin-tools/What does the query below show on the Windows box? If it shows the samefunctions listed in pgadmin-tools/support/admin.sql.in then you mightneed to install pgadmin-tools to make the systems identical (althoughyou might not actually need pgadmin-tools for its functionality).SELECT proname, oidvectortypes(proargtypes) AS argtypesFROM pg_procWHERE probin LIKE '%admin%';-- Michael Fuhrhttp://www.fuhr.org/~mfuhr/
Re: [HACKERS] Connect By for 8.0
Bruce Momjian pgman@candle.pha.pa.us writes: Hans-Jürgen Schönig wrote: My compromise would be: Support both syntaxes if possible. I can see your point, but imagine if we had Oracle compatibility for lots of cases --- our system would have either non-standard or duplicate ways of doing things, and that would be quite confusing. Oracle has the resources to outtalk, outshout, and outlast everyone else on the SQL standards committee. Despite that, their syntax was not adopted as the standard. This should give you some clue about how badly their syntax sucks. Now why exactly should we adopt it? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Connect By for 8.0
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Hans-Jürgen Schönig wrote: My compromise would be: Support both syntaxes if possible. Great... code away. I would suggest a: oracle_compat = true/false in the postgresql.conf Yes I am kidding. The differences between Oracle and PostgreSQL for most things is not that great. If the small syntatical differences are the only thing holding them from using PostgreSQL they were not that serious in the first place. Sincerely, Joshua D. Drake I can see your point, but imagine if we had Oracle compatibility for lots of cases --- our system would have either non-standard or duplicate ways of doing things, and that would be quite confusing. Oracle has the resources to outtalk, outshout, and outlast everyone else on the SQL standards committee. Despite that, their syntax was not adopted as the standard. This should give you some clue about how badly their syntax sucks. Now why exactly should we adopt it? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] libpq API incompatibility between 7.4 and 8.0
Hi! Tom Lane [2005-02-04 10:27 -0500]: This problem isn't worth spending more development time on than it takes to change SO_MAJOR_VERSION (we have lots of higher-priority issues). I just did that: --- postgresql-8.0.1-old/src/interfaces/libpq/Makefile 2005-01-26 20:24:19.0 +0100 +++ postgresql-8.0.1/src/interfaces/libpq/Makefile 2005-02-07 20:09:38.719686432 +0100 @@ -16,8 +16,8 @@ # shared library parameters NAME= pq -SO_MAJOR_VERSION= 3 -SO_MINOR_VERSION= 2 +SO_MAJOR_VERSION= 4 +SO_MINOR_VERSION= 0 DLTYPE= library override CPPFLAGS := -DFRONTEND -I$(srcdir) $(CPPFLAGS) -I$(top_builddir)/src/port Is that right? Or will you use SO_MINOR_VERSION=1? However, I just stumbled across another problem: libpq3 and the new libpq4 use the same translation domain libpq4, thus they cannot be installed in parallel. Can you please change the domain to libpq4 as well? This should generally be done anyway, but at least we can live with one breakage of this rule (for libpq3), so that we don't need to mess with the old libpq3 package. Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developerhttp://www.ubuntulinux.org Debian GNU/Linux Developer http://www.debian.org signature.asc Description: Digital signature
Re: [HACKERS] external indices ...
Marc G. Fournier [EMAIL PROTECTED] writes: Currently used with Oracle databases thru their Extensible Indexing Cartridge. What we are interested in is finding out whether or not we can use our technology directly with PostgreSQL. That would involve, from a overly simplistic view, allowing Postgres to use external indexes. Does that help any? Well, I spent a little time googling on this, and dug through Oracle documents that seem to have been written by marketroids who think componentize is a verb :-( ... but by the time my brain went numb, I hadn't actually seen anything that hasn't been in Postgres for years. What you need to make clear to them is that their existing code base isn't likely to be worth a dime --- if they don't mind recoding, the functionality is probably all there, but there will not be any compatibility of details. There are no standards covering this stuff. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Thinking about breaking up the BufMgrLock
On Sun, Feb 06, 2005 at 07:30:37PM -0500, Tom Lane wrote: ReadBuffer needs to do a lookup to map the page ID to a buffer ID, which in principle requires only a shared lock on the page-to-buffer mapping (embodied in the buf_table hash table). Assuming success, it also needs to mark the buffer pinned and update the LRU-list position of the buffer. Marking pinned is certainly a buffer-local change, so we could imagine splitting up the BufMgrLock like this: 1. A global LWLock for the page-to-buffer mapping, call it say BufMappingLock. Share lock on this is sufficient to allow reading the hash table; must get exclusive lock when reassigning any buffer's page identity. 2. A global LWLock for the LRU strategy information, say BufLRULock or BufStrategyLock. 3. Per-buffer LWLocks (or maybe just spinlocks) protecting the state of each buffer header; you need this lock to examine or change a buffer header. ReleaseBuffer has no locking problems in this formulation: it just grabs the per-buffer lock, decrements its refcount, releases the lock. For the per-buffer, a latch would provide a lightweight method of updating the contents of the buffer without hampering the read-only access. A latch is comprised of a latch bit and a sequence number that can be set in an atomic action. The flow for the two cases is simple: Write: 1. Get latch. 2. Update the buffer. 3. Increment the sequence number. 4. Release the latch. Read: 1. Read version number. 2. Read buffer. 3. Check latch. If latched, go to 1. 4. If version number has changed, go to 1. By using this process, readers will only see a consistent state of the buffer. Also, since the read does not entail a write operation it will not cause a cache line update and contribute to the a cache update storm. The get latch operation can be implemented using an atomic operation such as TAS (test-and-set) and CAS (compare-and-set). This would provide readers an extremely lightweight access to the buffer - no cache line update hit. If you need to have sequenced access to the buffer, then you would need to use LWLocks but in many cases such as 3. in Tom's list a latch would work well. ReadBuffer looks like: * Acquire share lock on BufMappingLock. * Search hash table for desired ID. (we assume success) * acquire per-buffer lock. * increment buffer refcount. * release per-buffer lock. * release share lock on BufMappingLock. * update the LRU state. (We have to bump the pin count on the target buffer before releasing the BufMappingLock, otherwise someone could reassign the buffer as soon as we release BufMappingLock.) This would be pretty good from a locking point of view, except that update the LRU state seems to require taking an exclusive lock on a global data structure, which puts us about back where we were. Contention for a global BufLRULock might be a bit better than for the existing overall BufMgrLock, but it'll still cripple the performance of ReadBuffer. Perhaps someone knows of a neat data structure that can maintain an LRU list with only local updates? I don't though. The clock algorithm is pretty close to this and provides an approximation to LRU that eleminates the need to move buffers to the MRU position by using a reference bit. This would convert the existing strict LRU behavior into an approximate LRU. I'm worried that the change might be penny-wise and pound-foolish: if a poorer cache management algorithm causes us to have to do more I/O, it's probably a net loss to save some lock contention. But without a smart idea about data structures I don't see how to do better. One alternative to an approximate LRU, such as the clock algorithm, would be to have multiple buffer pools as we discussed in the previous thread. The contention would be reduced by 1/N, where N is the number of pools. Of course, buffers should be allocated in a fashion that would maximize locality and minimize the effect of scan cache polution. More food for thought. Ken ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
[EMAIL PROTECTED] wrote: In this case, the behavior observed could be changed by altering the sample size for a table. I submit that an arbitrary fixed sample size is not a good base for the analyzer, but that the sample size should be based on the size of the table or some calculation of its deviation. Mark, Do you have any evidence that the Sample Size had anything to do with the performance problem you're seeing? I also do a lot with the complete Census/TIGER database. Every problem I have with the optimizer comes down to the fact that the data is loaded (and ordered on disk) by State/County FIPS codes, and then queried by zip-code or by city name. Like this: Alabama36101 [hundreds of pages with zip's in 36***] Alaska 99686 [hundreds of pages with zip's in 9] Arizona85701 [hundreds of pages with zip's in 855**] Note that the zip codes are *NOT* sequential. The correlation statistic sees that the Zip codes are not sequential; so it makes the *HORRIBLE* assumption that they are scattered randomly across the disk. In reality, even though there's no total ordering of the zip codes; any given zip code only exists on a couple disk pages; so index scans would be the right choice. But the single correlation parameter is not sufficient to let the optimizer known this. No matter how large a sample size you choose, ANALYZE will correctly see that Zip codes and State FIPS codes are non-correlated, and the optimizer will overestimate the # of pages an index scan will need. Ron PS: I pointed out workarounds in my earlier posting in this thread. Yes, I'm using the same TIGER data you are. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] libpq API incompatibility between 7.4 and 8.0
Hi! Tom Lane [2005-02-03 11:12 -0500]: Martin Pitt [EMAIL PROTECTED] writes: I am thinking the easiest solution will be to re-add get_progname() to 8.0.X and bump the major for 8.1. Seconded. Then we don't need another library version, and it is still not necessary to drag this get_progname accident forever. We're going to stop dragging the accident now, not a year from now. Putting get_progname back into libpq would require re-adding all of path.c, which creates many more issues than just adjusting SO_MAJOR_VERSION would do. The consensus seems to be to just bump the SONAME. The next one will definitively be 4 then? I would like to fix this in my packages before I announce them for public testing, but this issue is the only blocker to it. Will there be a new bugfix release soon? Thanks for your work and have a nice day! Martin -- Martin Pitt http://www.piware.de Ubuntu Developerhttp://www.ubuntulinux.org Debian GNU/Linux Developer http://www.debian.org signature.asc Description: Digital signature
Re: [HACKERS] Connect By for 8.0
A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Hans-Jürgen Schönig) wrote: I think it is important not to think in dogmas (in this case this means SQL syntax is always better) - there should be a reasonable compromise between compatibility and standard. My compromise would be: Support both syntaxes if possible. As long as they can co-exist, that seems plausible. I'd personally rather see the standard form, as it's got a more diverse set of uses. If introducing the CONNECT BY form did anything to prevent implementing it the RIGHT way, I'm pretty sure I'm not the only one that would be most displeased. -- output = (cbbrowne @ gmail.com) http://linuxfinances.info/info/slony.html If anyone ever markets a really well-documented Unix that doesn't require babysitting by a phalanx of provincial Unix clones, there'll be a lot of unemployable, twinky-braindamaged misfits out deservedly pounding the pavements. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] How can I use large object on PostgreSQL Linux
-Original Message- From: [EMAIL PROTECTED] on behalf of Michael Fuhr Sent: Wed 2/9/2005 4:22 AM To: Premsun Choltanwanich Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] How can I use large object on PostgreSQL Linux I wonder if admin is pgadmin-tools. Yes, it is. Regards, Dave ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly