Re: [HACKERS] Indexes vs. cache flushes
On Thu, Jan 19, 2006 at 02:46:11AM -0500, Tom Lane wrote: The concerns that I find more interesting are changes in the underlying objects. We don't have an ALTER OPERATOR CLASS, much less an ALTER ACCESS METHOD, but it's certainly theoretically possible to change the definition of a support function used by an index. There isn't presently any mechanism to force timely propagation of such a change, and so you'd be largely on your own --- but realistically, wouldn't such a change require rebuilding the index anyway? I wondered about the same problem when dealing with the collation stuff. If you change anything about a collation, you essentially have to invalidate any indexes, plans or views using it. Like you say, there's isn't really a way of doing this. This isn't the first time I've wondered about a flag on the an index stating broken, pending rebuild. If one of these details changes, we really need to stop using the indexes ASAP until they're back into a consistant state. The only question remaining is when to do the rebuild: you can either wait for manual intervention or (probably better) simply do the REINDEXes in the same transaction as the ALTER OPERATOR CLASS et al. The locking would be painful, perhaps a better way would be to clone the opclass, clone the indexes with the new opclass, build them and then drop the old indexes. Once you've rebuilt the views and indexes, simply delete the old opclass next database restart since plans don't survive DB restart, right?. (Isn't this Read-Copy-Update style locking?). Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
[HACKERS] restrict column-level GRANTs to a single relation?
Fellow hackers, I'm curious about the best way to handle something like this: GRANT SELECT (col1, col2, col3) ON table1, table2 TO grantee; Is it reasonable to restrict this to a single relation, and throw an error if multiple relations are specified? That would require the preceding grant to be specified as: GRANT SELECT (col1, col2, col3) ON table1 TO grantee; GRANT SELECT (col1, col2, col3) ON table2 TO grantee; The SQL standards don't seem to mandate the first form (unless I misread?).. Do y'all think this is a reasonable compromise? -- kevin brintnall =~ [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 12:06:41AM -0500, Tom Lane wrote: Michael Glaesemann [EMAIL PROTECTED] writes: As far as I can tell, the only difference between your position, Dann, and Date and Darwen's, is that you think no natural key is immutable. DD's examples of natural keys are worth a second look though: If a primary key exists for a collection that is known never to change, for example social security number, student registration number, or employee number, then no additional system-assigned UID is required. The problem with SSN is that somebody other than you controls it. No, that's not the big problem. The big problem is that it's very likely illegal for you to use it for anything unless you happen to be the Social Security Administration. If you are the college registrar, then you control the student's registration number, and you don't have to change it. In fact, guess what: you probably generated it in the same way as a surrogate key. True. I'd argue that all of these are in reality the exact same thing as a surrogate key --- from the point of view of the issuing authority. But from anyone else's point of view, they are external data and you can't hang your own database design on the assumption that they won't change. Right :) Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 12:50:52AM -0800, David Fetter wrote: On Thu, Jan 19, 2006 at 12:06:41AM -0500, Tom Lane wrote: No, that's not the big problem. The big problem is that it's very likely illegal for you to use it for anything unless you happen to be the Social Security Administration. Actually no. From reading here[1] it appears anyone is allowed to ask you your SSN and they can do what they like with it. What you're describing is more like the TFN in Australia. Apart from the fact you're not required to have one or provide it if asked, if you're not a bank, or share registrary or some other such institution, you're not allowed to ask for it, let alone store it. Medicare number the same, if you're not a health service provider, you can't ask for it. Anyway, this doesn't mean an SSN is a good key, for all sorts of other reasons people have already stated. [1] http://www.cpsr.org/prevsite/cpsr/privacy/ssn/ssn.faq.html#IsItIllegalToAsk Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
[HACKERS] Bogus path in postmaster.opts
Since 8.0, postmaster.opts has been containing .../bin/postgres even though the postmaster was started. This was evidently broken by some Windows-related reshuffling. Earlier, CreateOptsFile was called with argv, now it's passed the result of find_my_exec instead. I'm not sure whether that change was wrong to begin with or whether find_my_exec is mishaving (it should return something containing postmaster, no?). -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pgxs/windows
Bruce Momjian wrote: OK, thanks. Next question --- are the installed file locations the same for a MinGW install and a pginstaller install? I don't think pginstaller does a MinGW install because it doesn't have the build environment in the tarball. However, the big difference seems to be that Magnus has -Llib and -Lbin, while you have only the -Lbin. I have MinGW and pginstaller installed here. How can I set things up to test this? Now looking at the Makefile.global in the 8.1.2 pginstaller install, in Makefile.global, $libdir is set in a pgxs-specific block: libdir := $(shell pg_config --libdir) and that seems to work: C:\Program Files\PostgreSQL\8.1\binpg_config --libdir C:/PROGRA~1/POSTGR~1/8.1/lib and that is set to LDFLAGS, which is later propogated to SHLIB_LINK, though SHLIB_LINK moves all the -L flags to the front, so what you see on the link line is not the ordering used to create the value. Andrew, can you try echoing $libdir and $SHLIB_LINK in the Makefile to find those values? here is a test case log (including a test makefile). I ran the tests as you can see with both installer 8.1 and mingw installed CVS tip, with the same results. cheers andrew $ touch foo.c $ cat Makefile MODULES = foo SRCS += foo.c OBJS = $(SRCS:.c=.o) PGXS := $(shell pg_config --pgxs) include $(PGXS) override CFLAGS := $(filter-out -Wendif-labels -Wdeclaration-after-statement, $(shell pg_config --cflags)) showme: @echo libdir = $(libdir) @echo SHLIB_LINK = $(SHLIB_LINK) $ which pg_config /c/Program Files/PostgreSQL/8.1/bin/pg_config $ rm -f foo.dll $ make gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -fno-strict-aliasing -I. -Ic:/PROGRA~1/POSTGR~1/8.1/include/server -Ic:/PROGRA~1/POSTGR~1/8.1/include/internal -I./src/include/port/win32 -DEXEC_BACKEND -I/mingw/include/krb5 -Ic:/PROGRA~1/POSTGR~1/8.1/lib/pgxs/src/MAKEFI~1/../../src/include/port/win32 -Ic:/PROGRA~1/POSTGR~1/8.1/include/server/port/win32 -c -o foo.o foo.c dlltool --export-all --output-def foo.def foo.o dllwrap -o foo.dll --def foo.def foo.o c:/PROGRA~1/POSTGR~1/8.1/lib/pgxs/src/MAKEFI~1/../../src/utils/dllinit.o -Lc:/PROGRA~1/POSTGR~1/8.1/bin -lpostgres c:\mingw\bin\..\lib\gcc-lib\mingw32\3.2.3\..\..\..\..\mingw32\bin\ld.exe: cannot find -lpostgres c:\mingw\bin\dllwrap.exe: c:\mingw\bin\gcc exited with status 1 make: *** [foo.dll] Error 1 rm foo.o $ make showme libdir = c:/PROGRA~1/POSTGR~1/8.1/lib SHLIB_LINK = -Lc:/PROGRA~1/POSTGR~1/8.1/bin -lpostgres $ export PATH=/usr/local/pgsql/bin:$PATH $ which pg_config /usr/local/pgsql/bin/pg_config $ rm -f foo.dll $ make gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -fno-strict-aliasing -I. -IC:/msys/1.0/local/pgsql/include/server -IC:/msys/1.0/local/pgsql/include/internal -I./src/include/port/win32 -DEXEC_BACKEND -IC:/msys/1.0/local/pgsql/lib/pgxs/src/MAKEFI~1/../../src/include/port/win32 -IC:/msys/1.0/local/pgsql/include/server/port/win32 -c -o foo.o foo.c dlltool --export-all --output-def foo.def foo.o dllwrap -o foo.dll --def foo.def foo.o C:/msys/1.0/local/pgsql/lib/pgxs/src/MAKEFI~1/../../src/utils/dllinit.o -LC:/msys/1.0/local/pgsql/bin -lpostgres c:\mingw\bin\..\lib\gcc-lib\mingw32\3.2.3\..\..\..\..\mingw32\bin\ld.exe: cannot find -lpostgres c:\mingw\bin\dllwrap.exe: c:\mingw\bin\gcc exited with status 1 make: *** [foo.dll] Error 1 rm foo.o $ make showme libdir = C:/msys/1.0/local/pgsql/lib SHLIB_LINK = -LC:/msys/1.0/local/pgsql/bin -lpostgres ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Unique constraints for non-btree indexes
On Wed, Jan 18, 2006 at 05:27:29PM -0500, Tom Lane wrote: Right, the deadlock risk is exactly the reason you need some secret sauce or other. Btree's page-level lock ensures that two insertions of conflicting keys can't overlap (even if they ultimately get stored on different pages). That's not the only way to fix this but it's a pretty good way. Ok, I'm not that great with locking issues, but how about this: 1. In the root page of the GiST index you store a counter, let's call it the Insertion ID (IID). 2. An index tuple has a state where it is not visible yet and contains an IID. 3. When you go to insert, you follow the normal GiST method all the way down to the leaf node. 4. Once you know where you're going to put it, take an exclusive lock on the page. 5. Now, take an exclusive lock on the root page, read the IID (this one is yours) and store the next one back in the root page. 6. Now write the index tuple to the leaf page recording your IID. 7. Release the root page lock, then the leaf page lock. The thing to note is that once someone has gotten a particular IID, all IIDs that are less than it will already have been written to the index leaf pages. Also, there should be no risk of deadlock since we lock the pages in a consistant order. 8. Do a normal index scan with your ~ operator. Ignore any provisional tuples with IID greater than yours. If a match has IID less than yours you may block on it using the same logic as for b-tree indexes. 9. Once the scan in completed and you know that there are no duplicates with your IID or less, make the tuple fully visible (for your transaction anyway). At this point the IID is no longer relevent and you can forget it. What I tried to acheive was avoiding holding any locks while doing scans, since for GiST indexes they may cover a lot of ground. Perhaps a better name for IID is Generation ID? Storing the IID could take extra space, but you could probably overlap it with the ctid since no-one's going to lookup the data tuple before it's fully visible. OTOH, if the backend dies before completing the process, how does one clean up the provisional index tuple? The ctid would provide a way for VACUUM to know when to remove it. BTW, the deadlock risk also applies to deferred uniqueness checks. Again, in btree it's possible to avoid this if you do a fresh indexscan (and take a lock on the first scanned page while you do that). If you try to do it without consulting the index then you need some other way to break ties. I think the above should work for deferred checks as well, as long as you store the list as tuples to check for each inserted tuple. These lists should form a directed acyclic graph so there should be no risk of deadlock. Conflicts with VACUUM is still an issue. Any thoughts? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] No heap lookups on index
As an Oracle internals person myself, I don't see how making a comparison between the specifics of Oracle's MVCC to PostgreSQL's MVCC is relevant to this discussion.As does *MOST* other commercial databases, Oracle's storage manager performs an update-in-place whereas PostgreSQL's (for the most part) does not. There are several ways to implement update-in-place, and Oracle has chosen their own rollback segment methodology which has issues that without tuning, are major hassles. I'm not saying that one is better than the other in ALL cases, but I and many other Oracle consultants have tuned Oracle installations to eliminate the headaches others in this list have mentioned. Any knowledgable Oracle person evaluating PostgreSQL that may be reading this list is just going to see it as a lot of anti-Oracle discussion with no basis in fact. Regardless, there is NO WAY to perform an apples-to-apples comparison between the implementations, locking strategies, etc. as the MVCC implementations and goals are completely different. In short, Oracle's implementation is not perfect; neither is ours. Oracle's initial design (as a commercial database) is much different than PostgreSQL's (as a research database). While I'm always game for looking at other implementations when designing and discussing new features for PostgreSQL, let's focus on making PostgreSQL better rather than spending time discussing unrealistic comparisons. If we want to do a comparison on the how/why Oracle's index implementation is faster in the context of this situation and how we could make PostgreSQL's faster, let's stick to that. On 1/19/06, Tom Lane [EMAIL PROTECTED] wrote: Greg Stark [EMAIL PROTECTED] writes: I wonder if the bitmap can actually be one bit per page actually.Yeah, I think we'd agreed that per-page was the way to go.Per-tuple bitmaps are painful to manage because of the variable number of tuplesper page.And really all you need to know is whether to read the pageor not --- once you have, examining multiple tuples on it doesn't cost much.regards, tom lane---(end of broadcast)---TIP 6: explain analyze is your friend
Re: [HACKERS] restrict column-level GRANTs to a single relation?
Am Donnerstag, 19. Januar 2006 09:50 schrieb kevin brintnall: GRANT SELECT (col1, col2, col3) ON table1, table2 TO grantee; Is it reasonable to restrict this to a single relation, and throw an error if multiple relations are specified? Yes -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Surrogate keys (Was: enums)
Martijn van Oosterhout wrote: Please provides natural keys for any of the following: - A Person - A phone call: (from,to,date,time,duration) is not enough - A physical address - A phone line: (phone numbers arn't unique over time) - An internet account: (usernames not unique over time either) Ahh, a challenge. Hmm, not sure about all of them, but here goes: A Person - well, you could use a bit map of their fingerprints, or maybe their retinal scan. Of course, that could change due to serious injury. Maybe some kind of representation of their DNA? A physical address - how about longitude/latitude/height from sea level? The point here is two-fold. First, what we call 'natural' is frequently itself a surrogate key (yes, even your name is really just a surrogate key. As with all surrogate keys, it is a sequence of symbols that you use to represent yourself). The second point is even when you find a truly 'natural' key (something not arbitrarily made up by anyone, and uniquely identifying the data in question), it may be completely and utterly inappropriate to use in a database. What is 'natural' anyway? If someone phones in an order, we usually assign an order number to that request. This order number is not the actual order, and the customer couldn't care a less what it is, but I've never heard a DBA argue we should get rid of it (well, to be fair, I've never discussed order numbers with a DBA at all). After all, would it make sense for the key for that order to be the customer's name, the date/time of the order, all the items ordered, and the address to ship the order? That isn't a key, but it's the only 'natural' thing that identifies that order that immediately comes to my mind. On the other hand, would anyone argue that an order_item table should have a surrogate key? Well, I wouldn't. The key for the order_item table should be something like the order number and the inventory item number together (IMHO). The point? Surrogate keys and natural keys are two tools in the database arsenal. Just as it is unwise to use a hammer to drive a screw just because you don't believe in screwdrivers, it is unwise to just off hand discard either method of specifying a key. Rather, use intelligence and education (one of which is discussions such as this) in deciding how best to represent your data to aide in performance, ease of use, and adaptability. Mike Pollard SUPRA Server SQL Engineering and Support Cincom Systems, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote: Martijn van Oosterhout wrote: Please provides natural keys for any of the following: - A Person - A phone call: (from,to,date,time,duration) is not enough - A physical address - A phone line: (phone numbers arn't unique over time) - An internet account: (usernames not unique over time either) Ahh, a challenge. Hmm, not sure about all of them, but here goes: A Person - well, you could use a bit map of their fingerprints, or maybe their retinal scan. Of course, that could change due to serious injury. Maybe some kind of representation of their DNA? Yes. Representation of the DNA is probably best. But - that's a lot of data to use as a key in multiple tables. :-) A physical address - how about longitude/latitude/height from sea level? Planet? Solar system? Galaxy? Universe? :-) I agreed with what you had to say (the stuff I deleted). Just felt like being funny. Not sure if I'm successful. Hehe... Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] restrict column-level GRANTs to a single relation?
kevin brintnall [EMAIL PROTECTED] writes: GRANT SELECT (col1, col2, col3) ON table1, table2 TO grantee; Is it reasonable to restrict this to a single relation, and throw an error if multiple relations are specified? The SQL spec doesn't actually allow multiple things after GRANT ... ON --- that's a PG extension. So you could make the restriction and not violate the spec. OTOH it seems unlikely that this would save much. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] No heap lookups on index
On Thu, Jan 19, 2006 at 01:56:51AM -0500, Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Christopher Kings-Lynne [EMAIL PROTECTED] writes: Oracle does, but you pay in other ways. Instead of keeping dead tuples in the main heap, they shuffle them off to an 'undo log'. This has some downsides: Rollbacks take *forever*, though this usually isn't much of an issue unless you need to abort a really big transaction. It's a good point though. Surely a database should be optimised for the most common operation - commits, rather than rollbacks? The shuffling off of the data is expensive in itself, so I'm not sure you can argue that the Oracle way is more optimal for commits either. You pay in Oracle when you read these records too. If there are pending updates you have to do a second read to the rollback segment to get the old record. This hits long-running batch queries especially hard since by the time they finish a large number of the records they're reading could have been updated and require a second read to the rollback segments. You pay the same cost in PostgreSQL though... If you index-scan to a dead tuple, you get pointed to where the new one is. And if you're seqscanning, well, you'll be reading everything anyway. You also pay if the new value is too big to fit in the same space as the old record. Then you get to have to follow a pointer to the new location. Oracle tries to minimize that by intentionally leaving extra free space but that has costs too. Again, similar to the cost with our MVCC. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.0.5 Bug in unique indexes?
Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Odd problem with unique indexes: What's the database's locale? This could be the same problem fixed in 8.0.6, if the locale has weird ideas about what string equality means. lc_collate | C lc_ctype | C lc_messages| en_US.UTF-8 lc_monetary| en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time| en_US.UTF-8 Sincerely, Joshua D. Drake regards, tom lane -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: PLphp, PLperl - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.0.5 Bug in unique indexes?
Joshua D. Drake [EMAIL PROTECTED] writes: Tom Lane wrote: What's the database's locale? This could be the same problem fixed in 8.0.6, if the locale has weird ideas about what string equality means. lc_collate | C lc_ctype | C OK, scratch that theory. Don't suppose you can create a reproducible test case ;-) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.0.5 Bug in unique indexes?
lc_collate | C lc_ctype | C OK, scratch that theory. Don't suppose you can create a reproducible test case ;-) That may be a bit tough... What really struck me is that the duplication only occurs in this set of 100 tables and the duplication is always violating the same index. We currently have 4-5 tables that are in violation. Let me see what I can do to duplicate this. Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: PLphp, PLperl - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Surrogate keys
[EMAIL PROTECTED] writes: On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote: Martijn van Oosterhout wrote: Please provides natural keys for any of the following: - A Person - A phone call: (from,to,date,time,duration) is not enough - A physical address - A phone line: (phone numbers arn't unique over time) - An internet account: (usernames not unique over time either) Ahh, a challenge. Hmm, not sure about all of them, but here goes: A Person - well, you could use a bit map of their fingerprints, or maybe their retinal scan. Of course, that could change due to serious injury. Maybe some kind of representation of their DNA? Yes. Representation of the DNA is probably best. But - that's a lot of data to use as a key in multiple tables. :-) That is arguably about the best choice possible, for a human being, as DNA isn't supposed to be able to change (much). Mind you, there do exist odd cases where a person might have two sets of DNA in different parts of their body. This commonly (well, it's not really very common...) happens when non-identical twins share a blood supply; that twins were involved may not be noticed if one does not survive to birth... A physical address - how about longitude/latitude/height from sea level? Planet? Solar system? Galaxy? Universe? :-) I agreed with what you had to say (the stuff I deleted). Just felt like being funny. Not sure if I'm successful. Hehe... Well, that's useful for representing a key for a piece of real estate. It's fairly interestingly useless for representing a human attribute, at least in terms of being a primary key... -- output = (cbbrowne @ acm.org) http://cbbrowne.com/info/linux.html It's a pretty rare beginner who isn't clueless. If beginners weren't clueless, the infamous Unix learning cliff wouldn't be a problem. -- david parsons ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 8.0.5 Bug in unique indexes?
Joshua D. Drake wrote: Tom Lane wrote: What's the database's locale? This could be the same problem fixed in 8.0.6, if the locale has weird ideas about what string equality means. lc_collate | C lc_ctype | C You don't user pl/perl, do you -- i.e. I guess you read the latest release notes and the thread here before that? Best Regards, Michael ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] No heap lookups on index
Jonah H. Harris [EMAIL PROTECTED] writes: As an Oracle internals person myself, I don't see how making a comparison between the specifics of Oracle's MVCC to PostgreSQL's MVCC is relevant to this discussion. As does *MOST* other commercial databases, Oracle's storage manager performs an update-in-place whereas PostgreSQL's (for the most part) does not. There are several ways to implement update-in-place, and Oracle has chosen their own rollback segment methodology which has issues that without tuning, are major hassles. I'm not saying that one is better than the other in ALL cases, but I and many other Oracle consultants have tuned Oracle installations to eliminate the headaches others in this list have mentioned. Any knowledgable Oracle person evaluating PostgreSQL that may be reading this list is just going to see it as a lot of anti-Oracle discussion with no basis in fact. Regardless, there is NO WAY to perform an apples-to-apples comparison between the implementations, locking strategies, etc. as the MVCC implementations and goals are completely different. ... Well it seems there were lots of facts posted. Yes you can avoid headaches caused by these issues, but we're not really talking about the headaches. We're comparing the performance costs of what are update-in-place and non-update-in-place approach. All of the costs named so far are to some degree fundamental costs of update-in-place. All you can hope to do in tuning a system is make sure the costs are kept within manageable bounds. There are fundamental costs to non-update-in-place as well. The table sizes are bloated by the amount of space used to store older versions and the dead tuples that haven't been reused yet. Whether this slows down Postgres as much as having to do a second (or third or fourth) read to a rollback segment is a valid area for discussion. It's especially interesting to discuss since the two costs hit different sets of queries unequally. If we want to do a comparison on the how/why Oracle's index implementation is faster in the context of this situation and how we could make PostgreSQL's faster, let's stick to that. Well the main difference is the MVCC implementation. Talking about Oracle's index implementation while avoiding mentioning the elephant in the room would be sort of pointless. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bogus path in postmaster.opts
Peter Eisentraut [EMAIL PROTECTED] writes: Earlier, CreateOptsFile was called with argv, now it's passed the result of find_my_exec instead. I'm not sure whether that change was wrong to begin with or whether find_my_exec is mishaving (it should return something containing postmaster, no?). find_my_exec is not misbehaving: it's designed to expand symlinks, and would in fact be pretty useless if it did not. We could go back to storing the raw value of argv[0] in the opts file, but I rather like having the full path in there. There is another possible answer, and it's something I've been meaning to bring up for awhile. Is there a good reason why postmaster is a symlink to postgres, rather than a hard link? Surely the symlink way is not any more efficient. The thing that ticks me off about this is that it confuses gdb: if you say gdb postmaster and then run, what gets passed as argv[0] is postgres, thus breaking your test. So I not infrequently find myself manually replacing the symlink with a hard link in order to be able to test. If it'd result in nicer output in the opts file too, then all the more reason to change. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] No heap lookups on index
On 19 Jan 2006 11:25:21 -0500, Greg Stark [EMAIL PROTECTED] wrote: Well it seems there were lots of facts posted. Yes you can avoid headachescaused by these issues, but we're not really talking about the headaches.Several were mentioned; some of which could generally be avoided by good tuning. We're comparing the performance costs of what are update-in-place andnon-update-in-place approach. As PostgreSQL is not an update-in-place system, what is the point in discussing the costs? How does this solve David's original problem? There are fundamental costs to non-update-in-place as well. The table sizesare bloated by the amount of space used to store older versions and the deadtuples that haven't been reused yet. Whether this slows down Postgres as much as having to do a second (or third or fourth) read to a rollback segment is avalid area for discussion. It's especially interesting to discuss since thetwo costs hit different sets of queries unequally. I agree, but again, we're not talking apples-to-apples. There's far too many variables to compare Oracle's speed to PostgreSQL's for most types of operations in the varying types of database deployments. Well the main difference is the MVCC implementation. Talking about Oracle'sindex implementation while avoiding mentioning the elephant in the room would be sort of pointless.I agree that Oracle's MVCC plays *a little* into this index discussion, but isn't it pointless to discuss the pitfalls of an MVCC implementation that PostgreSQL does not have? Similarly, how does it solve David's original question. Again, I'm fine with discussing these things, but let's keep on topic for David's sake. He posted a problem that we have discussed many times over. Let's focus on that problem and give him possible options. David has stated that the index to heap visibility check is slowing him down, so what are the possible options:- Visibility in indexes (-hackers archives cover the pros/cons)- True organized heaps- Block level index (Tom/Simon's earlier discussion)
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote: The point? Surrogate keys and natural keys are two tools in the database arsenal. Just as it is unwise to use a hammer to drive a screw just because you don't believe in screwdrivers, it is unwise to just off hand discard either method of specifying a key. Rather, use intelligence and education (one of which is discussions such as this) in deciding how best to represent your data to aide in performance, ease of use, and adaptability. There is one thing to consider: consistency. If you mix and match 'natural' keys and surrogate keys as PK, then how do you know which one you're supposed to be joining on? How does everyone else on the team know? Sure, there's many examples where you don't really need a surrogate key. But there's just as many (if not more) where you want a surrogate key so that you don't have to deal with the pain of a multiple-field key. (Note that I don't consider simply defining a multiple-field key to be unique as painful). So ISTM it's much easier to just use surrogate keys and be done with it. Only deviate when you have a good reason to do so. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bogus path in postmaster.opts
Tom Lane wrote: find_my_exec is not misbehaving: it's designed to expand symlinks, and would in fact be pretty useless if it did not. I don't want to contest that in certain cases this is required but I can easily come up with scenarios (which perhaps no PostgreSQL user has encountered yet) where the currently behavior is broken. One example is a GNU Stow like installation management where each package is installed in a private directory and the canonical locations in /usr/local are symlinks. (It's altogether strange that this would distinguish between symbolic and hard links anyway, except that of course it cannot actually resolve hard links, since many installation schemes that one needs to cope with work the same with hard and soft links.) There is another possible answer, and it's something I've been meaning to bring up for awhile. Is there a good reason why postmaster is a symlink to postgres, rather than a hard link? I don't know of one. Something I have thought of during the recent options reorganization is that we could do away with the postmaster/postgres dichotomy altogether. Just call the thing postmaster and give it a --single-user-mode option. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Surrogate keys (Was: enums)
Martjin, In any of these either misspellings, changes of names, ownership or even structure over time render the obvious useless as keys. There are techniques for detecting and reducing duplication but the point is that for any of these duplicates *can* be valid data. Please point me out where, in the writings of E.F. Codd or in the SQL Standard, it says that keys have to be immutable for the life of the row. Duplicate *values* can be valid data. Duplicate *tuples* show some serious flaws in your database design. If you have a personnel directory on which you've not bothered to define any unique constraints other than the ID column, then you can't match your data to reality. If you have two rows with the same first and last name, you don't know if they are two different people or the same person, duplicated. Which will be a big problem come paycheck time. Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising a set of values definining a *unique* data entity. i.e. The employeee named John Little at extension 4531. There is nothing anywhere said about keys never changing. This is Databases 101 material. Really! --Josh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Surrogate keys (Was: enums)
Jim, So ISTM it's much easier to just use surrogate keys and be done with it. Only deviate when you have a good reason to do so. The lazy man's guide to SQL database design, but Jim Nasby. ;-) --Josh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] No heap lookups on index
Jonah, David has stated that the index to heap visibility check is slowing him down, so what are the possible options: - Visibility in indexes (-hackers archives cover the pros/cons) - True organized heaps - Block level index (Tom/Simon's earlier discussion) also - Frozen relations This last solution was proposed as a possibility for the data warehousing case. For a time-partitioned table, we're going to know that all but one of the partitions has not been updated anywhere within visible transaction scope, and therefore index-only access is a possibility. also - join tables One of the other most valuable targets for index-only access is the many-to-many join table whose primary key consists of two (or more) foreign keys to two (or more) other tables. It's actually not necessary to check visibility on this kind of table as the visibility of tuples in the join table will be determined by the visibility of tuples in the two data tables. Since often join tables consist *only* of the join key, being able to do index-only access on them could dramatically speed up certian kinds of queries. Both of the above are corner cases but are very common ones and might be much easier to implement than the other solutions. --Josh ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.0.5 Bug in unique indexes?
Michael Paesold wrote: Joshua D. Drake wrote: Tom Lane wrote: What's the database's locale? This could be the same problem fixed in 8.0.6, if the locale has weird ideas about what string equality means. lc_collate | C lc_ctype | C You don't user pl/perl, do you -- i.e. I guess you read the latest release notes and the thread here before that? Yes I did. I didn't know that the person was running plPerl. I have verified that they are. We are now going to check if upgrading to 8.0.6 with a deletion of the duplicates and a reindex resolves the issue. Sincerely, Joshua D. Drake Best Regards, Michael ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] No heap lookups on index
Thanks for all the help and thought to our problem. Jonah H. Harris wrote: David has stated that the index to heap visibility check is slowing him down, so what are the possible options: - Visibility in indexes (-hackers archives cover the pros/cons) - True organized heaps - Block level index (Tom/Simon's earlier discussion) Several people seem to have a problem with full index visibility, however both last thread and this thread people seem to agree that keeping a page level visibility would be an acceptable compromise. Perhaps we should implement a rough patch and run some time stats to post for everyone to weigh the pros and cons against hard figures? The discussion on block level indexing and true organized heaps, if I am interpreting this topic correctly, would essentially keep the table optimized for accessing the data through one index? This seems like a very interesting idea, but I don't know that it would really solve our current problem because we need to use more then one index to efficiently access a table. We have a very driving need to solve this problem and so don't mind doing legwork to do feasibility testing or stat collection to help the community decide the best solution for Postgres. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] No heap lookups on index
Tom Lane wrote: What sort of problems are you dealing with exactly? There has been some discussion of changes that would improve certain scenarios. For instance it might be plausible to do joins using index information and only go back to the heap for entries that appear to pass the join test. We tried that scenario, writing a dirty index hack to experiment with, that returned values whether they were valid or not. We saw some definite improvements inside of joins and sub queries, but we were still slowed down at the end because we still had to validate every row being returned. My hands are very tied as to what specific examples I can send, so I apologize for how long it took to get back to you on this. A simple (generalized) example of one the types of queries we are running: SELECT col1, col2, cool_func(stat_count, COALESCE(raw_counts.raw_count, (SELECT alt_count FROM alt_raw_table WHERE alt_raw_table.pk = col2))) as cool_func FROM (SELECT col1, col2, stat_count FROM pair_table WHERE col1 = $1) pair_table LEFT JOIN raw_counts ON pair_table.col2 = raw_counts.pk We tried not validating the return of both of these as we only want to see the rows which have a high value for cool_func, but it was still necessary to validate the rows which did match the criteria. So we did see an improvement, but not enough. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote: So ISTM it's much easier to just use surrogate keys and be done with it. Only deviate when you have a good reason to do so. The lazy man's guide to SQL database design, but Jim Nasby. ;-) Hehe... I was thinking the same thing. I've definately seen cases where the use of surrogate keys verges on ridiculous. It hasn't harmed the application, except it terms of complexity. It still works. It still performs fine. The SQL queries are awful looking. :-) That's where I would tend to draw the line. For me, I find implementation and maintenance to be the most expensive part of my applications. My data hasn't yet become large enough to make disk space, compute resources, or I/O bandwidth a serious concern. If I think the use of surrogate keys may make my life harder, I'll try not to use them. If I think they may make my life easier, I'll use them without blinking an eye. Harder vs. easier = cost to implement. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] tsearchd (tsearch2 daemon) is available for playing
Hi there, we did a 8.1 compatible version of tsearchd, available from http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch2 tsearchd is our old experiment with inverted index. It's fully compatible with tsearch2, actually it's tsearch2+several functions and daemon. Very brief documentation is available http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearchd Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Surrogate keys (Was: enums)
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Josh Berkus Sent: Thursday, January 19, 2006 10:09 AM To: Martijn van Oosterhout Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Surrogate keys (Was: enums) Martjin, In any of these either misspellings, changes of names, ownership or even structure over time render the obvious useless as keys. There are techniques for detecting and reducing duplication but the point is that for any of these duplicates *can* be valid data. Please point me out where, in the writings of E.F. Codd or in the SQL Standard, it says that keys have to be immutable for the life of the row. Only do that for data that you care about. If you think that the data has no value, there is no need to have a way to identify a row. Duplicate *values* can be valid data. Duplicate *tuples* show some serious flaws in your database design. If you have a personnel directory on which you've not bothered to define any unique constraints other than the ID column, then you can't match your data to reality. If you have two rows with the same first and last name, you don't know if they are two different people or the same person, duplicated. Which will be a big problem come paycheck time. Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising a set of values definining a *unique* data entity. i.e. The employeee named John Little at extension 4531. There is nothing anywhere said about keys never changing. This is Databases 101 material. Really! I give it an 'F.' When the data changes, the problems generated are not just due to repercussions related to the child and parent tables related through the primary key. Someone has an invoice, and they call in with a question. A combination of their name and address was used as a primary key. They moved, and sent in a forwarding address. The DBA was smart enough to design the database to cascade results, so that there are no orphan records and we have not compromised the structure of the database. The customer calls in with a question about an old invoice. We have no record of that transaction. I was a DBA for a database for a company with many millions of customers worldwide (e.g. the product registration table was 24 GB). Their design had natural keys in it. It caused dozens of problems, every single day. I content that most people are not smart enough to decide when a natural key is a good idea. The engineers that designed the database were probably pretty smart, since it sort of worked and had thousands of tables and hundreds of millions of rows in it. But one bad decision on a natural key will cause literally millions of dollars of damage. The primary defense I have heard so far is that the Oids are hard to understand. They are nothing in comparison with understanding what to do when you have 25 changes to primary keys on various tables every single day. Once you get used to Oids, I find it hard to believe that any intelligent person finds them confusing. Confusion resulting from having primary keys that are a moving target? Now that's confusion for you. IMO-YMMV. I think it is time for me to give it a rest, though. My experience may be very atypical and I feel strangely passionate about it. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] un-vacuum?
On Thu, Jan 19, 2006 at 02:25:15PM -0500, uwcssa wrote: i have a table that is already vacuumed. for some reason i want to un-vacuum it instead of dropping the table and recreate the table and indexes on it. is there a existing command to do so? What effect do you want this un-vacuum to have? What problem are you trying to solve? -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] un-vacuum?
On 1/19/06, uwcssa [EMAIL PROTECTED] wrote: I have a simple question here, not sure if i should posted here but if you have the quick answer, it helps a lot i have a table that is already vacuumed. for some reason i want to un-vacuum it instead of dropping the table and recreate the table and indexes on it. is there a existing command to do so? can you explain yourself a bit better? vacuum is good, why do you think you want to undo it? why do you think that drop and create will undo vacuum? -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] un-vacuum?
Ühel kenal päeval, N, 2006-01-19 kell 14:25, kirjutas uwcssa: I have a simple question here, not sure if i should posted here but if you have the quick answer, it helps a lot i have a table that is already vacuumed. for some reason i want to un-vacuum it instead of dropping the table and recreate the table and indexes on it. is there a existing command to do so? What exactly are you tryingto achieve ? - Hannu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 02:01:14PM -0500, [EMAIL PROTECTED] wrote: On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote: So ISTM it's much easier to just use surrogate keys and be done with it. Only deviate when you have a good reason to do so. The lazy man's guide to SQL database design, but Jim Nasby. ;-) Hehe... I was thinking the same thing. I've definately seen cases where the use of surrogate keys verges on ridiculous. It hasn't harmed the application, except it terms of complexity. It still works. It still performs fine. The SQL queries are awful looking. :-) Got an example? That's where I would tend to draw the line. For me, I find implementation and maintenance to be the most expensive part of my applications. My data hasn't yet become large enough to make disk space, compute resources, or I/O bandwidth a serious concern. Which is exactly what my thought process is. If you mix surrogate and non-surrogate keys, how do you know which table has which? Sure, while you're actively writing the code it's not an issue, but what about 6 months later? What about if someone else picks up the code? I know Josh was poking fun with his comment about me being lazy, but lazy can make for better code. I can go back to code I wrote 3 years ago and I know that 99% of tables will have something_id (where something is almost certain to be the table name) as a surrogate key to join on; there's no need for me to go and figure out what does and what doesn't have a surrogate key. The 1% that don't fall into that generally aren't an issue because they're normally very large tables that nothing joins to. There's actually an article floating around somewhere about how lazy coders are good coders... :) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 11:22:24AM -0800, Dann Corbit wrote: In any of these either misspellings, changes of names, ownership or even structure over time render the obvious useless as keys. There are techniques for detecting and reducing duplication but the point is that for any of these duplicates *can* be valid data. Please point me out where, in the writings of E.F. Codd or in the SQL Standard, it says that keys have to be immutable for the life of the row. Only do that for data that you care about. If you think that the data has no value, there is no need to have a way to identify a row. Erm... if you don't care, why are you storing it? :) I was a DBA for a database for a company with many millions of customers worldwide (e.g. the product registration table was 24 GB). Their design had natural keys in it. It caused dozens of problems, every single day. I content that most people are not smart enough to decide when a natural key is a good idea. The engineers that designed the database were probably pretty smart, since it sort of worked and had thousands of tables and hundreds of millions of rows in it. But one bad decision on a natural key will cause literally millions of dollars of damage. The primary defense I have heard so far is that the Oids are hard to understand. They are nothing in comparison with understanding what to do when you have 25 changes to primary keys on various tables every single day. Once you get used to Oids, I find it hard to believe that any intelligent person finds them confusing. Confusion resulting from having primary keys that are a moving target? Now that's confusion for you. Well, I wouldn't use OIDs as in the PostgreSQL OID, but I agree. If nothing else an ID gives you a fallback... if you absolutely can't find a customer (or whatever else) through natural keys, you ask them for their customer ID/number, which has no reason to ever change. BTW, if you want to see a mess*, take a look at the distributed.net stats code, which unfortunately uses email as the means to identify participants. It made perfect sense originally, anyone running the client was bound to have an email address, and they all had to be unique, right? Worked great until the first person contacted us wondering how to change his email address in stats because he'd changed ISPs. If you look at todays statscode (at least the database portion of it) approximately 50% of it is there to deal with people retiring one email address into another, and I'd say that 90%+ of the bugs are in this code. Had we just required new users to register to get a nice shiny unique numeric ID (or a unique username...), none of that code would exist. * note that I'm not trying to rag on any of the numerous people who've been involved in the stats code over the years, but it is insightful to look at some of the 'dumb mistakes' that have been made and the large amount of pain that it's caused. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] No heap lookups on index
On Thu, Jan 19, 2006 at 10:19:01AM -0800, Josh Berkus wrote: One of the other most valuable targets for index-only access is the many-to-many join table whose primary key consists of two (or more) foreign keys to two (or more) other tables. It's actually not necessary to check visibility on this kind of table as the visibility of tuples in the join table will be determined by the visibility of tuples in the two data tables. Since often join tables consist *only* of the join key, being able to do index-only access on them could dramatically speed up certian kinds of queries. How would that handle 'delinking' item A from foobaz 2? (IE: DELETE FROM join_table WHERE id1=231 and id2=24842) The only way I can see this working is if it is required that items in both tables as well as the link in the many-many table are only inserted and deleted in the same transaction, which seems to be really pushing this into corner-case territory. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] No heap lookups on index
On Thu, Jan 19, 2006 at 10:35:30AM -0800, David Scott wrote: Tom Lane wrote: What sort of problems are you dealing with exactly? There has been some discussion of changes that would improve certain scenarios. For instance it might be plausible to do joins using index information and only go back to the heap for entries that appear to pass the join test. Do you still have that patch that folks could look at? ISTM that this technique would be rather dependant on your actual workload, and as such could result in a big win for certain types of queries. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 01:48:18PM -0600, Jim C. Nasby wrote: On Thu, Jan 19, 2006 at 02:01:14PM -0500, [EMAIL PROTECTED] wrote: On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote: So ISTM it's much easier to just use surrogate keys and be done with it. Only deviate when you have a good reason to do so. The lazy man's guide to SQL database design, but Jim Nasby. ;-) Hehe... I was thinking the same thing. I've definately seen cases where the use of surrogate keys verges on ridiculous. It hasn't harmed the application, except it terms of complexity. It still works. It still performs fine. The SQL queries are awful looking. :-) Got an example? Sure, but I have to be vague, because it's a company DB. :-) The DB has a primary table, that maps the primary key to a surrogate key. The surrogate key is used in several 1:1 and 1:N relationships. Pretty straight forward. (I tend to put the primary key in the most primary table that would have a 1:1 relationship, of which there is one in this database - but whatever) The primary key is an identifier used for all inputs and outputs to the application. It is used by manual and automatic processes internal and external to the company. Definately a primary key / surrogate key scenario. The problem here, is that the primary key *is* a natural key. It is generated to be unique, and it is immutable. There are no interfaces provided to allow the rename of the key. It is a short character string of 5 to 20 characters. All queries to the table are joined with this primary key/surrogate key table, to allow lookup by the primary key, for records only identified by the surrogate key. The database is only likely to have a few thousands records, with the 1:N relationships not exceeding 5 or 10, and not recursive. For performance, or disk space, it doesn't really matter which way they went. The confusion, though, of joining using a surrogate, that is intended to be opaque (the value is never queried), ensures that the program has no simple queries. All queries involve at least one join. I said almost ridiculous. It's not enough for me to complain, and request a re-design. I don't really care what it does, as long as it accepts my data, and allows me to query my data. But, it does seem silly to me. That's where I would tend to draw the line. For me, I find implementation and maintenance to be the most expensive part of my applications. My data hasn't yet become large enough to make disk space, compute resources, or I/O bandwidth a serious concern. Which is exactly what my thought process is. If you mix surrogate and non-surrogate keys, how do you know which table has which? Sure, while you're actively writing the code it's not an issue, but what about 6 months later? What about if someone else picks up the code? It's usually pretty obvious, looking at a database diagram. You look up the primary key, and see that it only shows up in one table. :-) I know Josh was poking fun with his comment about me being lazy, but lazy can make for better code. I can go back to code I wrote 3 years ago and I know that 99% of tables will have something_id (where something is almost certain to be the table name) as a surrogate key to join on; there's no need for me to go and figure out what does and what doesn't have a surrogate key. The 1% that don't fall into that generally aren't an issue because they're normally very large tables that nothing joins to. I don't disagree with you. I just don't mind deciding to use a surrogate key if I'm unsure, and not using a surrogate if it seems more effort than gain. There's actually an article floating around somewhere about how lazy coders are good coders... :) Dunno where it started, but that's one of the tenets of the developers of Perl. Of course, with Perl 6, they admitted to having made quite a few deisgn errors with Perl 5 and earlier... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 8.0.5 Bug in unique indexes?
Joshua D. Drake wrote: Michael Paesold wrote: You don't user pl/perl, do you -- i.e. I guess you read the latest release notes and the thread here before that? Yes I did. I didn't know that the person was running plPerl. I have verified that they are. We are now going to check if upgrading to 8.0.6 with a deletion of the duplicates and a reindex resolves the issue. I thought I'd ask because this sound so familiar... Best Regards, Michael Paesold ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Indexes vs. cache flushes
On Thu, Jan 19, 2006 at 02:46:11AM -0500, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: But you couldn't make any meaningful changes in the definition of an index, such as changing its column set, operator classes, partial-index predicate, etc, except by dropping and recreating it. The only example that comes to mind of something you might want to be able to twiddle and wouldn't expect to be a slow operation is making a unique index a non-unique index. I think actually that that would still work, so long as you acquired exclusive lock on the parent table first (which you'd have to do anyway, because this would constitute a significant change to the table's schema --- it could invalidate plans for example). The lock would guarantee that no one has the index open. It's only in the case of an opened index that I propose not flushing the index support info. The concerns that I find more interesting are changes in the underlying objects. We don't have an ALTER OPERATOR CLASS, much less an ALTER ACCESS METHOD, but it's certainly theoretically possible to change the definition of a support function used by an index. There isn't presently any mechanism to force timely propagation of such a change, and so you'd be largely on your own --- but realistically, wouldn't such a change require rebuilding the index anyway? How would this affect changing the type of a column? Specifically, I'm thinking of the case of domains, where it would be very handy if the type of a domain could be changed (possibly with some restrictions). IE: CREATE DOMAIN name varchar(50); is now too small and you want to up it to varchar(64). Granted, not supported now, though changing constraints is currently supported. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Bogus path in postmaster.opts
Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane wrote: There is another possible answer, and it's something I've been meaning to bring up for awhile. Is there a good reason why postmaster is a symlink to postgres, rather than a hard link? I don't know of one. Something I have thought of during the recent options reorganization is that we could do away with the postmaster/postgres dichotomy altogether. Just call the thing postmaster and give it a --single-user-mode option. No strong objection here, though combining the two manpages is likely to be a mess :-( regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Surrogate keys (Was: enums)
On Thu, Jan 19, 2006 at 10:09:26AM -0800, Josh Berkus wrote: Martjin, In any of these either misspellings, changes of names, ownership or even structure over time render the obvious useless as keys. There are techniques for detecting and reducing duplication but the point is that for any of these duplicates *can* be valid data. Please point me out where, in the writings of E.F. Codd or in the SQL Standard, it says that keys have to be immutable for the life of the row. Possibly nowhere. But when you send invoices to customers, any details on there *are* immutable. Sure, in your database you don't care if things change, but then they don't match reality anymore do they? Duplicate *values* can be valid data. Duplicate *tuples* show some serious flaws in your database design. If you have a personnel directory on which you've not bothered to define any unique constraints other than the ID column, then you can't match your data to reality. If you have two rows with the same first and last name, you don't know if they are two different people or the same person, duplicated. Which will be a big problem come paycheck time. I never said there were duplicate tuples, just that the data has no natural keys. The tuples are unique because there's a surrogate key. It is entirely possible to have two people with the same first name, last name and date of birth. Rather uncommon, but the database must be able to support it. I don't understand your example though. If you have a personnel directory with two rows with the same first and last name, what does that tell you. Nothing. You have to go find out whether there really are two of those people or not. You can simplify the process by taking into account the fact that it's very unlikely, but a unique constraint is not the answer. Besides, it's far more likely the same person will appear twice with two different spellings of their name. :) Anyway, the discussion was about surrogate vs natural keys. Nothing here has convinced me that there are any useful natural keys to be found in the examples I gave. Most of the examples I gave come from a system I had to maintain where some designer had assumed there was some kind of natural key and in *each* and *every* case it caused problems... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] suppress output for benchmarking
Ühel kenal päeval, K, 2006-01-18 kell 22:35, kirjutas uwcssa: I am testing the performance of postgresql on a set of workloads. However, the output significantly affects the performance evaluation. Is there a way to by-pass all output of select statements so the timing reflects only the query evaluation process? If you do EXPLAIN ANALYSE QUERY instead of just QUERY, then the backend discards all rows returned and just gives back performance data --- Hannu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Cache-flush stress testing
I've completed a round of stress testing the system for vulnerabilities to unexpected cache flush events (relcache, catcache, or typcache entries disappearing while in use). I'm pleased to report that the 8.1 branch now passes all available regression tests (main, contrib, pl) with CLOBBER_CACHE_ALWAYS defined as per the attached patch. I have not had the patience to run a full regression cycle with CLOBBER_CACHE_RECURSIVELY (I estimate that would take over a week on the fastest machine I have) but I have gotten through the first dozen or so tests, and I doubt that completing the full set would find anything not found by CLOBBER_CACHE_ALWAYS. HEAD is still broken pending resolution of the lookup_rowtype_tupdesc() business. 8.0 should be OK but I haven't actually tested it. I'm still bothered by the likelihood that there are cache-flush bugs in code paths that are not exercised by the regression tests. The CLOBBER_CACHE patch is far too slow to consider enabling on any regular basis, but it seems that throwing in cache flushes at random intervals, as in the test program I posted here: http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php doesn't provide very good test coverage. Has anyone got any ideas about better ways to locate such bugs? regards, tom lane Index: inval.c === RCS file: /cvsroot/pgsql/src/backend/utils/cache/inval.c,v retrieving revision 1.74 diff -c -r1.74 inval.c *** inval.c 22 Nov 2005 18:17:24 - 1.74 --- inval.c 19 Jan 2006 21:47:07 - *** *** 625,630 --- 625,660 { ReceiveSharedInvalidMessages(LocalExecuteInvalidationMessage, InvalidateSystemCaches); + + /* +* Test code to force cache flushes anytime a flush could happen. +* +* If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a +* fairly thorough test that the system contains no cache-flush hazards. +* However, it also makes the system unbelievably slow --- the regression +* tests take about 100 times longer than normal. +* +* If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY. +* This slows things by at least a factor of 1, so I wouldn't suggest +* trying to run the entire regression tests that way. It's useful to +* try a few simple tests, to make sure that cache reload isn't subject +* to internal cache-flush hazards, but after you've done a few thousand +* recursive reloads it's unlikely you'll learn more. +*/ + #if defined(CLOBBER_CACHE_ALWAYS) + { + static bool in_recursion = false; + + if (!in_recursion) + { + in_recursion = true; + InvalidateSystemCaches(); + in_recursion = false; + } + } + #elif defined(CLOBBER_CACHE_RECURSIVELY) + InvalidateSystemCaches(); + #endif } /* ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] un-vacuum?
Michael Fuhr [EMAIL PROTECTED] writes: On Thu, Jan 19, 2006 at 03:54:33PM -0500, uwcssa wrote: I want to do this for repeating some experiment results, not for tuning the db (pretty much like using an old machine to find performance difference for an algorithm). so if i have a way of knowing which tables are storing the statistics, i guess i can delete all from that table to archieve this. pg_statistic stores statistics. I think it's safe to delete rows, DELETE FROM pg_statistic is safe enough, but it's more of an un-analyze than an un-vacuum. There is no un-vacuum. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Indexes vs. cache flushes
Jim C. Nasby [EMAIL PROTECTED] writes: How would this affect changing the type of a column? It doesn't, because we drop and rebuild indexes completely during ALTER COLUMN TYPE. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] un-vacuum?
I want to do this for repeating some experiment results, not for tuning the db (pretty much like using an old machine to find performance difference for an algorithm). so if i have a way of knowing which tables are storing the statistics, i guess i can delete all from that table to archieve this. On 1/19/06, Hannu Krosing [EMAIL PROTECTED] wrote: Ühel kenal päeval, N, 2006-01-19 kell 14:25, kirjutas uwcssa: I have a simple question here, not sure if i should posted here but if you have the quick answer, it helps a lot i have a table that is already vacuumed. for some reason i want to un-vacuum it instead of dropping the table and recreate the table and indexes on it. is there a existing command to do so? What exactly are you tryingto achieve ? - Hannu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bug: random() can return 1.0
Andrew - Supernews wrote: src/backend/utils/adt/float.c: /* *drandom - returns a random number */ Datum drandom(PG_FUNCTION_ARGS) { float8 result; /* result 0.0-1.0 */ result = ((double) random()) / ((double) MAX_RANDOM_VALUE); PG_RETURN_FLOAT8(result); } Whoever wrote this obviously did intend it to return values in [0.0,1.0] but this makes it totally useless for generating uniform random ranges in the usual way, since random() * N will return N with probability 2^-31. The documentation is sufficiently imprecise about this to cause confusion (seen in questions asked on the IRC channel), and the problem can't be worked around at the application level without knowing the value of MAX_RANDOM_VALUE in order to correct the range to [0.0,1.0). Because random returns a double, I think it is very possible that we could return 1 due to rounding, and I see no way to avoid that. I think re-running random if it returns 1 is likely to return even less random values. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] No heap lookups on index
On Thu, 19 Jan 2006, Jim C. Nasby wrote: Do you still have that patch that folks could look at? ISTM that this technique would be rather dependant on your actual workload, and as such could result in a big win for certain types of queries. It is not a patch, per se. It is a c language function which calls some of the nbtree functions to return things from the index. The syntax for calling it is rather obtuse, since those of us who don't understand the parser are doomed to attempt circumventing it ;P. I tarred up the code, and put it on a web server so that interested parties can play with it. The url is http://linux.apptechsys.com/~jeremyd/postgresql/fakeidxscan.tar.gz It is very hackish, so definately do not assume that it is in any way correct, rather assume the opposite. I have run it on x86 and x86_64 boxes, and it compiles and runs on those. Here is an example of its usage, so you can see the nasty syntax required and perhaps grok how to use it better. create table test_table (a integer, b integer); create index test_table_a_b_idx on test_table (a, b); insert into test_table (a, b) select a, b from generate_series(1,100) a, generate_series(1,100) b; select * from fakeidxrowscomposite( 'test_table', -- relation 'test_table_a_b_idx', -- index 1, --number of scan keys ARRAY[1, 2]::smallint[], -- numbers of the index attributes to return ARRAY[1]::smallint[], -- numbers of the attrs the scankeys apply to ARRAY['=(integer,integer)'::regoperator], -- operators for the scankeys ARRAY[3]::smallint[], -- btree strategy for the scankeys (42,0) -- values for the scankeys to compare against (if there is only -- one, you have to put a fake one in since otherwise the parser -- does not think it is a record) ) AS (a integer, b integer); -- tell the parser what columns to expect This example returns 100 rows in which the first column contains 42 and the second column contains the numbers between 1 and 100, in order. Feel free to do whatever with this, it's pretty fast for tables where seeks to validate tuples would hurt, but you do get back dead things... -- When you know absolutely nothing about the topic, make your forecast by asking a carefully selected probability sample of 300 others who don't know the answer either. -- Edgar R. Fiedler ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?
Greetings, I've got a fast FreeBSD/amd64 server available to run Buildfarm on. However, I see we already have a couple of others running it. My questions are: 1) do we need another one? 2) if yes, what options need coverage? Thanks, LER -- Larry Rosenman Database Support Engineer PERVASIVE SOFTWARE. INC. 12365B RIATA TRACE PKWY 3015 AUSTIN TX 78727-6531 Tel: 512.231.6173 Fax: 512.231.6597 Email: [EMAIL PROTECTED] Web: www.pervasive.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] suppress output for benchmarking
On Thu, Jan 19, 2006 at 11:46:29PM +0200, Hannu Krosing wrote: ??hel kenal p??eval, K, 2006-01-18 kell 22:35, kirjutas uwcssa: I am testing the performance of postgresql on a set of workloads. However, the output significantly affects the performance evaluation. Is there a way to by-pass all output of select statements so the timing reflects only the query evaluation process? If you do EXPLAIN ANALYSE QUERY instead of just QUERY, then the backend discards all rows returned and just gives back performance data The flipside is that EXPLAIN ANALYZE adds it's own (non-trivial) overhead to the query. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] [PATCH] Better way to check for getaddrinfo function.
Where are we on this? Rajesh, I think we are waiting for more information from you. --- R, Rajesh (STSD) wrote: That was very much situation specific. But the bottomline is the default test does not include netdb.h in the test code. So, pg uses getaddrinfo.c.And the getaddrinfo.c does not work for me. Ipv6 client authenciation fails. I have modified the patch. $ diff -r configure.in configure.in.new 918a919 AC_MSG_CHECKING([for getaddrinfo]) 920c921,926 AC_REPLACE_FUNCS([getaddrinfo]) --- AC_TRY_LINK([#include netdb.h #include assert.h], [char (*f)();f=getaddrinfo;], ac_cv_func_getaddrinfo=yes, ac_cv_func_getaddrinfo=no) if test x$ac_cv_func_getaddrinfo = xyes; then AC_DEFINE(HAVE_GETADDRINFO,1,[Define if you have the getaddrinfo function]) fi 923a930 AC_MSG_RESULT([$ac_cv_func_getaddrinfo]) Rajesh R -- This space intentionally left non-blank. -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Monday, January 16, 2006 11:28 PM To: R, Rajesh (STSD) Cc: pgsql-hackers@postgresql.org; pgsql-general@postgresql.org Subject: Re: [GENERAL] [PATCH] Better way to check for getaddrinfo function. R, Rajesh (STSD) [EMAIL PROTECTED] writes: Just thought that the following patch might improve checking for getaddrinfo function (in configure.in) Since AC_TRY_RUN tests cannot work in cross-compilation scenarios, you need an *extremely* good reason to put one in. I thought this might improve things doesn't qualify. Exactly what problem are you trying to solve and why is a run-time test necessary? Why doesn't the existing coding work for you? regards, tom lane configure-in.patch Content-Description: configure-in.patch [ Attachment, skipping... ] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] questions about varlist and Query
Fellow hacker, I am new comer to postgres development community. Currently, I am implementing tightly coupled machine classifiers within postgres. The grammer looks like Train Parameter_list (class1,class2,class3...). I have two major problems right now. 1. Train is a statement and it is suppose to return some parameters in the form of a query. (To be used by a classifier later.) How can I return a Query with self specified column name and data vectors? 2. class1, class2 are all supposed to be relations/query result. But what is a proper container to hold them. I found var_list to be a good candidate as it can contain unconstrained number of var_values. But var_values are of constant types. Can I just add query as a constant? Anyone had this kind of implementing experience? Thanks, John ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] un-vacuum?
On Thu, Jan 19, 2006 at 03:54:33PM -0500, uwcssa wrote: I want to do this for repeating some experiment results, not for tuning the db (pretty much like using an old machine to find performance difference for an algorithm). so if i have a way of knowing which tables are storing the statistics, i guess i can delete all from that table to archieve this. pg_statistic stores statistics. I think it's safe to delete rows, but you might want to wait for one of the developers to comment before mucking around with the stored values, especially if you're not familiar with reading the pg_stats view. http://www.postgresql.org/docs/8.1/interactive/catalog-pg-statistic.html -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] un-vacuum?
On Thu, 2006-01-19 at 14:25 -0500, uwcssa wrote: I have a simple question here, not sure if i should posted here but if you have the quick answer, it helps a lot i have a table that is already vacuumed. for some reason i want to un-vacuum it instead of dropping the table and recreate the table and indexes on it. is there a existing command to do so? I think your best route to experimentation is to stick to executing real commands in as a very similar environment to actual usage. I would never trust experimental results derived from the use of such a command, should such a thing ever exist. Good testing takes time and care; there are few shortcuts to good experimental results in any scientific endeavour. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Cache-flush stress testing
On Thu, Jan 19, 2006 at 05:03:20PM -0500, Tom Lane wrote: I'm still bothered by the likelihood that there are cache-flush bugs in code paths that are not exercised by the regression tests. The CLOBBER_CACHE patch is far too slow to consider enabling on any regular basis, but it seems that throwing in cache flushes at random intervals, as in the test program I posted here: http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php doesn't provide very good test coverage. Has anyone got any ideas about better ways to locate such bugs? Some of the machines in the buildfarm do nothing else useful, if this was turned into a configure option it would be trivial to setup some of those machines to just hammer away at this. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Bug: random() can return 1.0
src/backend/utils/adt/float.c: /* * drandom - returns a random number */ Datum drandom(PG_FUNCTION_ARGS) { float8 result; /* result 0.0-1.0 */ result = ((double) random()) / ((double) MAX_RANDOM_VALUE); PG_RETURN_FLOAT8(result); } Whoever wrote this obviously did intend it to return values in [0.0,1.0] but this makes it totally useless for generating uniform random ranges in the usual way, since random() * N will return N with probability 2^-31. The documentation is sufficiently imprecise about this to cause confusion (seen in questions asked on the IRC channel), and the problem can't be worked around at the application level without knowing the value of MAX_RANDOM_VALUE in order to correct the range to [0.0,1.0). -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] No heap lookups on index
On Wed, 2006-01-18 at 20:13 -0500, Tom Lane wrote: Come to think of it, the idea also seems to map nicely into bitmap index scans: the index will directly hand back a list of potential pages to look at, but they are all marked lossy because the index doesn't know exactly which tuple(s) on the target pages match the query. The existing bitmap-heap-scan code can take it from there. Yes, I've privately suggested this solution in that context. I think there is enough meat there to make this topic worth discussing further, but not on list again just yet. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bug: random() can return 1.0
Bruce Momjian pgman@candle.pha.pa.us writes: Because random returns a double, I think it is very possible that we could return 1 due to rounding, Not unless your machine has a double type with less than 32 bits of precision, which seems pretty unlikely. It'd be sufficient to do /* result 0.0 = x 1.0 */ result = ((double) random()) / ((double) MAX_RANDOM_VALUE + 1.0); regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] PostgreSQL and shared memory.
Hi, I'd like to get a general understanding of what kind of structures PostgreSQL puts in shared memory and how they are managed. I'd like some hints on where to start looking. Source, docs, prior discussions, anything is considered helpful. TIA, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL and shared memory.
Thomas Hallgren wrote: Hi, I'd like to get a general understanding of what kind of structures PostgreSQL puts in shared memory and how they are managed. I'd like some hints on where to start looking. Source, docs, prior discussions, anything is considered helpful. Have you looked on the developers page. My internals through pictures talk should help too. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] un-vacuum?
You could also do this by doing a filesystem copy of $PG_DATA (with postgresql shut down), and then restoring that copy after your test. If you used rsync (or something that allowed filesystem snapshots) this probably wouldn't be very painful. On Thu, Jan 19, 2006 at 03:54:33PM -0500, uwcssa wrote: I want to do this for repeating some experiment results, not for tuning the db (pretty much like using an old machine to find performance difference for an algorithm). so if i have a way of knowing which tables are storing the statistics, i guess i can delete all from that table to archieve this. On 1/19/06, Hannu Krosing [EMAIL PROTECTED] wrote: ?hel kenal p?eval, N, 2006-01-19 kell 14:25, kirjutas uwcssa: I have a simple question here, not sure if i should posted here but if you have the quick answer, it helps a lot i have a table that is already vacuumed. for some reason i want to un-vacuum it instead of dropping the table and recreate the table and indexes on it. is there a existing command to do so? What exactly are you tryingto achieve ? - Hannu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] No heap lookups on index
On Thu, Jan 19, 2006 at 02:50:39PM -0800, Jeremy Drake wrote: On Thu, 19 Jan 2006, Jim C. Nasby wrote: Do you still have that patch that folks could look at? ISTM that this technique would be rather dependant on your actual workload, and as such could result in a big win for certain types of queries. ... Feel free to do whatever with this, it's pretty fast for tables where seeks to validate tuples would hurt, but you do get back dead things... How'd you then weed out the dead tuples? Basically, numbers talk. If there were convincing numbers for something that wasn't a corner-case that showed a marked improvement then there'd be much more interest in getting this into the backend in some fashion. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)
Simon Riggs [EMAIL PROTECTED] writes: This seems to lead to a super-geometric progression in the number of files required, But we double the number of batches at each step, so there are going to be at most 20 or so levels, and that's only assuming a *horridly* wrong initial guess by the planner. In practice I think it's reasonable to assume at most a couple rounds of doubling. If you have more than that, the extra data-shuffling is going to exhaust your patience anyway. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] No heap lookups on index
Jim C. Nasby [EMAIL PROTECTED] writes: Basically, numbers talk. If there were convincing numbers for something that wasn't a corner-case that showed a marked improvement then there'd be much more interest in getting this into the backend in some fashion. I've got no doubt that there are *some* non corner cases for which this would be a win. The lower the update load on the database, the better it's going to look. The issue really is where does it start being a loss, and can you convince us that those cases are all corner ones? (The subtext here, of course, is the assumption that it's an all-or-nothing choice. I'm of the opinion that supporting both options would be infeasible from a code complexity and maintenance standpoint; but a simple patch that did both would of course prove that opinion wrong ...) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)
On Tue, 2006-01-17 at 21:43 +, Simon Riggs wrote: On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote: I was thinking along the lines of having multiple temp files per hash bucket. If you have a tuple that needs to migrate from bucket M to bucket N, you know that it arrived before every tuple that was assigned to bucket N originally, so put such tuples into a separate temp file and process them before the main bucket-N temp file. This might get a little tricky to manage after multiple hash resizings, but in principle it seems doable. You can manage that with file naming. Rows moved from batch N to batch M would be renamed N.M, so you'd be able to use file ordering to retrieve all files for *.M That scheme would work for multiple splits too, so that filenames could grow yet retain their sort order and final target batch properties. This seems to lead to a super-geometric progression in the number of files required, if we assume that the current batch could be redistributed to all future batches each of which could be similarly redistributed. batches 1 no files 2 1 file 4 7 files 8 64 files 16 64,000 files 32 4 billion files ish So it does seem important whether we demand sorted input or not. Or at least requires us to provide the executor with a starting point for the number of batches, so we could manage that. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] un-vacuum?
On Thu, Jan 19, 2006 at 04:54:21PM -0600, Jim C. Nasby wrote: You could also do this by doing a filesystem copy of $PG_DATA (with postgresql shut down), and then restoring that copy after your test. If you used rsync (or something that allowed filesystem snapshots) this probably wouldn't be very painful. Hmmm...wouldn't using a template database work the same way? Doesn't CREATE DATABASE simply do a recursive copy of the template database's directory? I'm thinking you could 1. Set up the initial test conditions in some database. This could include creating unanalyzed tables in dire need of vacuuming. 2. Use createdb or CREATE DATABASE to create a new database using the database in (1) as the template. 3. Run tests in the new database. 4. Repeat (2) and (3) as necessary. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?
Larry Rosenman [EMAIL PROTECTED] writes: I've got a fast FreeBSD/amd64 server available to run Buildfarm on. However, I see we already have a couple of others running it. My questions are: 1) do we need another one? 2) if yes, what options need coverage? Looks like we're fairly well covered on freebsd already. Are you willing to consider running some less-popular OS on it? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Surrogate keys (Was: enums)
Yes. Representation of the DNA is probably best. But - that's a lot of data to use as a key in multiple tables. :-) No then you have problems with identical twins :) Chris ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] No heap lookups on index
On Thu, 19 Jan 2006, Jim C. Nasby wrote: Feel free to do whatever with this, it's pretty fast for tables where seeks to validate tuples would hurt, but you do get back dead things... How'd you then weed out the dead tuples? I didn't get that far with it. The purpose of this function was to quickly put something together to demonstrate that the overhead of seeking to the proper tuples in the heap to determine their visibility was the main component of the time being spent to satisfy our queries. Basically, numbers talk. If there were convincing numbers for something that wasn't a corner-case that showed a marked improvement then there'd be much more interest in getting this into the backend in some fashion. I could get some numbers of how much time validating tuples adds to a query, but I don't think that that would be horribly novel. BTW, hopefully I did not make you think that I intended to get this into the official backend. This function was only meant to demonstrate to the people around here that the visibility check was the bottleneck we were seeing. The function may also be interesting as a demonstration of how indexes are handled in postgres, as you can see when tuples are flagged as no longer valid and when they are not. I have put xmin into an index so that I could use this function to better visualize when index tuples are left behind (I tried to put xmax in there too, but I never saw them change, after checking the code it turns out that the index is never told about changes in xmax). We were seeing this case: All rows in our table are visible (we are the only transaction on the machine and we did a VACUUM FULL ANALYZE before). We rebooted to ensure no caching. We were seeing times which, upon division by the number of rows returned by the index scan, were remarkably close to the average seek time listed on the specs for the hard drive in the testing box. This was about 5ms, which doesn't sound like much, but given a large enough number of rows and a few joins, 5ms per tuple adds up quickly. This implies that we were seeing approximately the worst case as far as the distribution of the relevant tuples on pages, ie each tuple we wanted was on a different heap page. Digging back to some times we had collected from this experiment, apparently we were taking about 15 to 20 seconds to run a particular query, and when we used the function I previously posted those times were reduced to 5 seconds. This was a while ago, however, so these times are probably not very accurate and we probably made other tweaks to speed things up since then. But it gives an idea. We could come up with more absolute numbers, but I think people already know what they would look like. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Surrogate keys (Was: enums)
On Jan 20, 2006, at 10:50 , Christopher Kings-Lynne wrote: Yes. Representation of the DNA is probably best. But - that's a lot of data to use as a key in multiple tables. :-) No then you have problems with identical twins :) And, looking forward, clones. Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Questions about varlist and Query
Fellow hacker,I am new comer to postgres development community. Currently, I am implementing tightly coupled machine classifiers within postgres. The grammer looks like Train Parameter_list (class1,class2,class3...). I have two major problems right now.1. Train is a statement and it is suppose to return some parameters in the form of a query. (To be used by a classifier later.) How can I return a Query with self specified column name and data vectors?2. class1, class2 are all supposed to be relations/query result. But what is a proper container to hold them. I found var_list to be a good candidate as it can contain unconstrained number of var_values. But var_values are of constant types. Can I just add query as a constant? Anyone had this kind of implementing experience?Thanks,John
Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?
Tom Lane wrote: Larry Rosenman [EMAIL PROTECTED] writes: I've got a fast FreeBSD/amd64 server available to run Buildfarm on. However, I see we already have a couple of others running it. My questions are: 1) do we need another one? 2) if yes, what options need coverage? Looks like we're fairly well covered on freebsd already. Are you willing to consider running some less-popular OS on it? Not particularly, as I want it to do other things. However, I might be willing To get VMWare running on it, and run something else in a VM. What were you thinking? LER -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3683 US ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?
Larry Rosenman ler@lerctr.org writes: Tom Lane wrote: Looks like we're fairly well covered on freebsd already. Are you willing to consider running some less-popular OS on it? What were you thinking? [ shrug... ] Anything you don't see paired with amd64 on the buildfarm roster is OK by me ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend