Re: [HACKERS] Socket communication for contrib
On Wed, Apr 07, 2004 at 09:39:15AM -0400, Jan Wieck wrote: the reason why dblink, dbmirror and the rserv prototype are under contrib/ while projects like erserver and slony1 live on gborg - just to take this class of projects as an example. Since the rserv prototype actually does not work any more (at least according to the reports I've seen), it should probably be removed anyway. A -- Andrew Sullivan | [EMAIL PROTECTED] In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Socket communication for contrib
On Mon, Apr 05, 2004 at 06:33:51PM +0200, Hans-J?rgen Sch?nig wrote: I have learned (please correct me if I am wrong) that people tend to look in contrib before they look at gborg. This may be true, but if so, perhaps it's a reason to add a contrib/gborg directory with just a README that says For lots of additional software which is designed to be part of your PostgreSQL installation, go to http://gborg.postgresql.org. A -- Andrew Sullivan | [EMAIL PROTECTED] The plural of anecdote is not data. --Roger Brinner ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Socket communication for contrib
On Fri, Apr 16, 2004 at 08:10:20AM -0400, Andrew Sullivan wrote: On Mon, Apr 05, 2004 at 06:33:51PM +0200, Hans-J?rgen Sch?nig wrote: I have learned (please correct me if I am wrong) that people tend to look in contrib before they look at gborg. This may be true, but if so, perhaps it's a reason to add a contrib/gborg directory with just a README that says For lots of additional software which is designed to be part of your PostgreSQL installation, go to http://gborg.postgresql.org. Or even a text dump of http://gborg.postgresql.org/project/projdisplaylist.php (which would be more useful if all gborg projects had useful descriptions - but that would directly benefit users of gborg too) Cheers, Steve ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Socket communication for contrib
Hans, Andrew, I have learned (please correct me if I am wrong) that people tend to look in contrib before they look at gborg. pgFoundry/project.postgresql.org will be up by next week, I promise. Working on it now. This should increase the visibility of non-core components. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] [Fwd: DBT3-pgsql large performance improvement 2.6.6-rc1]
FYI for those of you tracking Linux development kernel performance on PostgreSQL. -Forwarded Message- From: Mary Edie Meredith [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: DBT3-pgsql large performance improvement 2.6.6-rc1 Date: Fri, 16 Apr 2004 09:51:47 -0700 Performance in DBT-3 (using PostgreSQL) has vastly improved for _both in the power portion (single process/query) and in the throughput portion of the test (when the test is running multiple processes) on our 4-way(4GB) and 8-way(8GB) STP systems as compared 2.6.5 kernel results. Using the default DBT-3 options (ie using LVM, ext2, PostgreSQL version 7.4.1) Note: Bigger numbers are better. KernelRunid..CPUs.Power..%incP.Thruput %incT 2.6.5 291308 4 97.08 base 120.46 base 2.6.6-rc1 291876 4 146.11 50.5% 222.94 85.1% KernelRunid..CPUs.Power..%incP..Thruput %incT 2.6.5 291346 8 101.08 base 138.95 base 2.6.6-rc1 291915 8 151.69 50.1% 273.69 97.0% So the improvement is between 50% and 97%! Profile 2.6.5 8way throughput phase: http://khack.osdl.org/stp/291346/profile/after_throughput_test_1-tick.sort Profile 2.6.6-r1 8way throughput phase: http://khack.osdl.org/stp/291915/profile/after_throughput_test_1-tick.sort What I notice is that radix_tree_lookup is in the top 20 in the 2.6.5 profile, but not in 2.6.6-rc1. Could the radix tree changes be responsible for this? DBT-3 is a read mostly DSS workload and the throughput phase is where we run multiple query streams (as many as we have CPUs). In this workload, the database is stored on a file system, but it is small relative to the amount of memory (4GB and 8GB). It almost completely caches in page cache early on. So there is some physical IO in the first few minutes, but very little to none in the remainder. -- Mary Edie Meredith [EMAIL PROTECTED] 503-626-2455 x42 Open Source Development Labs ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] GiST -- making my index faster makes is slower
I just tried an experiment that I thought would improve the performance of the PostGIS spatial index. The old way used a BOX (bounding box of 4 doubles) as the key in the index. The new way uses a bounding box of 4 single-precision floats (its only 16 bytes long - a BOX is 32). I thought that it would significantly reduce the size of the index (it did) and that the indexing would be faster since there's less disk pages to fiddle through and these pages would be more likely to fit in the disk cache. It turns out this is not the case - its significantly slower. I think it to do with more keys fitting in the leaf tuples. As far as I can tell, the GiST index looks through the internal nodes, then when it hits a leaf node, it search through each of the keys in the leaf. I save time searching through the internal nodes (because there's less of them) - this is O(log n) saving. I lose time searching through the leafs (because there's more keys in a leaf) - this is O(n) cost. For a query that does a nested loop of 10,000 index scans (on the same table), the old method takes about 2 second, the new faster method takes about 20 seconds. I'm still trying to verify that this is the actual reason why its slower - anyone have any other ideas? Anyone know a good way to profile this? dave ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] GiST -- making my index faster makes is slower
Couple of observations: 1. Are you sure your data is handled as 32 bit all the way through? Run time casting will offset performance gains on 32 bit floats. Is your comparison routine casting to double? I thought this might be the case - but I thought it would be small. The only place it might be doing hidden casts would be in statements like query-xmin = key-xmax. 2. Math CPUs usually crunch at 80 bits, can't save much by using 32 bit floats, although cache coherency will be better. 3. 64 bit cpu will probably run better on 64 bit floats. 4. Is your dataset congested enough that you now have duplicate values by loss of precision? This would of course impact performance. How big is your dataset? How big is your avg. result set? My test datasets are quite spatially separate, so I dont expect there to be any accidental overlaps. There's about 12 million rows (in total) in the datasets - I've only noticed about 2 of these overlaps. My test datasets are 1000 rows, 10,000 rows, 10,000,000 rows, and a few different ones in the 200,000 row range. I'm testing queries that return anywhere from 1 geometry to about 10,000. The actual index search is only a few milliseconds longer using the float32 bounding box for a mid-sized table returning a handfull of rows. When the result sets get bigger (or you start doing nested queries), the performance differences become greater. dave ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 7.5 beta version]
Jürgen Cappel wrote: Point 1 I completely agree on: byte order, alignment, padding, etc. is different for each platform and data cannot directly be exchanged. Point 2: who really needs C++ ?? We use it, a multi path TCP router written in C++ and behind there is a Postgresql... Regards Gaetano Mendola ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] GiST -- making my index faster makes is slower
David Blasby [EMAIL PROTECTED] writes: The old way used a BOX (bounding box of 4 doubles) as the key in the index. The new way uses a bounding box of 4 single-precision floats (its only 16 bytes long - a BOX is 32). It turns out this is not the case - its significantly slower. I think it to do with more keys fitting in the leaf tuples. Hm. With twice as many keys per page, you'd think that it could be no more than 2x slower, if the problem is one of O(n) search through the keys on the page. That doesn't explain a 10x slowdown. Could it be that some part of the GIST code is O(n^2), or worse, in the number of keys per page? If so, perhaps it's fixable. I'd suggest profiling the backend with both key types to get an idea of where the time is going. regards, tom lane PS: actually, allowing for the 12-byte index tuple overhead, you couldn't have even twice as many keys per page. So there's something mighty odd here. Keep us posted. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] GiST -- making my index faster makes is slower
Tom Lane wrote: I'd suggest profiling the backend with both key types to get an idea of where the time is going. I've been trying to use gprof to do some profiling, but I'm having troubles. Whats the best way to profile? PS: actually, allowing for the 12-byte index tuple overhead, you couldn't have even twice as many keys per page. So there's something mighty odd here. Keep us posted. Using the old system, I'd get about 7 internal node hits and about 110 leaf hits. Under the new one, I get about 4 internal node hits and about 160 leaf hits. I'm just in the process of changing the key to: typedef struct { float xmin; float ymin; float xmax; float ymax; char junk[16]; // make the 16 byte type into 32! } BOX2DFLOAT4; To see what happens. dave ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] GiST -- making my index faster makes is slower
David Blasby [EMAIL PROTECTED] writes: Tom Lane wrote: I'd suggest profiling the backend with both key types to get an idea of where the time is going. I've been trying to use gprof to do some profiling, but I'm having troubles. Whats the best way to profile? It's not hard; the only real gotcha is that on Linux systems you need to compile with -DLINUX_PROFILE so that the postmaster works around some Linux brain damage with dropping the profile status at fork(). I usually do (in an already configured and built source tree) cd src/backend make clean make PROFILE=-pg -DLINUX_PROFILE all make install-bin Don't forget that the backends will drop their gmon.out files in $PGDATA/data/DBOID/gmon.out. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] GiST -- making my index faster makes is slower
Humm -- strange results here: typedef struct { float xmin; float ymin; float xmax; float ymax; } BOX2DFLOAT4; This takes about 18,000 ms to do a nested query with 10,000 iterations. typedef struct { float xmin; float ymin; float xmax; float ymax; char junk[16]; } BOX2DFLOAT4; This takes about 15,000 ms to do a nested query with 10,000 iterations. typedef struct { double xmin; double ymin; double xmax; double ymax; } BOX2DFLOAT4; This takes about 1500 ms to do a nested query with 10,000 iterations. Yes - that almost 14 seconds faster! This doesnt make a lot of sense since the only part of the GiST index thats being called that actually looks at the bounding boxes is this: retval = (((key-xmax= query-xmax) (key-xmin = query-xmax)) || ((query-xmax= key-xmax) (query-xmin= key-xmax))) (((key-ymax= query-ymax) (key-ymin= query-ymax)) || ((query-ymax= key-ymax) (query-ymin= key-ymax))); dave ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL configuration
Joe Conway wrote: Tom Lane wrote: Personally I rely quite a lot on setting PGDATA to keep straight which installation I'm currently working with, so I'm not going to be happy with a redesign that eliminates that variable without providing an adequate substitute :-( I'll second that. I'll third (or whatever) it too :-) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] GiST -- making my index faster makes is slower
David Blasby [EMAIL PROTECTED] writes: Humm -- strange results here: typedef struct { float xmin; float ymin; float xmax; float ymax; } BOX2DFLOAT4; This takes about 18,000 ms to do a nested query with 10,000 iterations. typedef struct { float xmin; float ymin; float xmax; float ymax; char junk[16]; } BOX2DFLOAT4; This takes about 15,000 ms to do a nested query with 10,000 iterations. That is strange --- I'm still suspecting an O(n^2) bit of logic somewhere. But the size of the discrepancy is a little bit more reasonable. Could you profile these two cases and see where the extra time goes? typedef struct { double xmin; double ymin; double xmax; double ymax; } BOX2DFLOAT4; This takes about 1500 ms to do a nested query with 10,000 iterations. Yes - that almost 14 seconds faster! AFAICS there are only two possible explanations: 1. float-vs-float comparison is a lot slower than double-vs-double on your hardware. Actually, the comparisons might be float-vs-double (is the query struct the same as the key??). The compiler could well be promoting one or both floats to double and then doing double comparison, but even so, that's a heck of a large overhead. 2. The float bounding boxes are presumably slightly inaccurate. If they are a tad too large then perhaps you are visiting more leaf pages than you need to. (A worrisome possibility is that a float box could be slightly too small, causing you to fail to visit an entry you should :-() regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PostgreSQL configuration
Tom Lane wrote: Kevin Brown [EMAIL PROTECTED] writes: The goal here is simply to make it obvious to a system administrator where the PG data directory that a given postmaster is using resides. Why would it not be sufficient to add a read-only GUC variable that tells that? Connect to the postmaster and do show datadir and you're done. (Without this, it's not clear you've made any particular gain anyway, since a given postmaster would typically mean the one I can connect to at this port, no?) That would probably be sufficient for most cases. It wouldn't take care of the case where there's a strict separation of powers between the system administrator and the DBA, but only if the system were managed badly (i.e., the SA and the DBA don't talk to each other very well). That's probably something we shouldn't concern ourselves with. In any case I don't see how removing PGDATA would make this more obvious. You yourself just pointed out that the command-line arguments of a postmaster aren't necessarily visible through ps; if they're not, what have you gained in transparency by forbidding PGDATA? I think you misunderstood what I was saying (which means I didn't say it right). There are ways within a program to change what 'ps' shows as the command line. We use those methods to make it possible to see what a given backend is doing by looking at the 'ps' output. It would be possible to have the postmaster use those ways in order to show which data directory it is using even if it wasn't specified on the command line. But in my experience, those ways don't work reliably on all systems. On the systems that those methods don't work, what 'ps' shows is the original command line that was used. So clearly, the only way 'ps' will show the data directory in that instance is if it was actually specified on the command line. In any case, I'm not at all opposed to having the backend stuff know about PGDATA during development, but for production you should have to explicitly specify the data directory on the command line. If you wish to do things that way, you can; but that doesn't mean that everyone else should have to do it that way too. If there were a security or reliability hazard involved, I might agree with taking the fascist approach, but I see no such hazard here ... Fair enough. The PGDATA issue isn't a big enough one that I'm terribly concerned about it, especially if a read-only GUC variable is available to give that information (something that, I think, should be there anyway). -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Remove MySQL Tools from Source?
On Fri, 16 Apr 2004, Christopher Kings-Lynne wrote: I always ran one of the 2 scripts (can't remember which one) and after that started checking the dump file, because there were things that didn't get changed correctly[1]. [1]: I always remember the first conversion I did. I found out that MySQL accepted dates like 30/2/2000 or 0-0-. Very odd. Yes, MySQL has always accepted those as perfectly valid dates. It's quite broken. the sad thing is that while MySQL implemented a -ansi switch that supposedly turns on ansi compliance, it only fixes about 1/2 of all the non-compliance issues. Yet another half-implemented feature... :) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Accessing RelOptInfo structure from the executor module
Shalu Gupta [EMAIL PROTECTED] writes: I want to access the RelOptInfo data structure from the executor module. You can't, because it doesn't exist any more. RelOptInfo is a planner internal data structure, and is probably pfree'd before the executor starts. Perhaps you could explain your problem at a higher level? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org