Re: [HACKERS] Bug in UTF8-Validation Code?
Andrew Dunstan wrote: Albe Laurenz wrote: A fix could be either that the server checks escape sequences for validity This strikes me as essential. If the db has a certain encoding ISTM we are promising that all the text data is valid for that encoding. The question in my mind is how we help people to recover from the fact that we haven't done that. I would also say that it's a bug that escape sequences can get characters into the database that are not valid in the specified encoding. If you compare the encoding to table constraints, there is no way to simply escape a constraint check. This seems to violate the principle of consistency in ACID. Additionally, if you include pg_dump into ACID, it also violates durability, since it cannot restore what it wrote itself. Is there anything in the SQL spec that asks for such a behaviour? I guess not. A DBA will usually not even learn about this issue until they are presented with a failing restore. Best Regards, Michael Paesold ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bug in UTF8-Validation Code?
On Tue, Mar 13, 2007 at 04:42:35PM +0100, Mario Weilguni wrote: Am Dienstag, 13. März 2007 16:38 schrieb Joshua D. Drake: Is this any different than the issues of moving 8.0.x to 8.1 UTF8? Where we had to use iconv? What issues? I've upgraded several 8.0 database to 8.1. without having to use iconv. Did I miss something? http://www.postgresql.org/docs/8.1/interactive/release-8-1.html Some users are having problems loading UTF-8 data into 8.1.X. This is because previous versions allowed invalid UTF-8 byte sequences to be entered into the database, and this release properly accepts only valid UTF-8 sequences. One way to correct a dumpfile is to run the command iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bug in VACUUM FULL ?
Tom Lane wrote: I've developed the attached patch against HEAD, and no longer see any funny behavior. Would appreciate it if you'd test some more, though. The patch works for me. With the patch applied, I don't see the weird errors in the pgbench and other customized tests that I used to see earlier. I looked at the patch as well. ISTM that we are now moving chains in pieces where each piece is terminated by a DEAD tuple. That implies that the MOVED_OFF chain is actually broken. This should not be a problem as long as our assumption that all RECENTLY_DEAD tuples preceding a DEAD tuple must also be DEAD and its only the way OldtestXmin is calculated that we see them as RECENTLY_DEAD. If that assumption is true (and it must be true for us to move the chain in pieces), doesn't that mean we don't really need to move the RECENTLY_DEAD tuples preceding a DEAD tuple ? One way to do so would be to collect the target tuple (the tuple from where we started following the t_ctid chain) in the free_offsets if a DEAD or INSERT_IN_PROGRESS tuple is found while following the t_ctid chain. One-by-one we would collect all the RECENTLY_DEAD tuples preceding a DEAD tuple in the truncated pages. Not that I am suggesting we do this, just wanted to check if there is a flaw in my thinking. I agree that we should not be spending too much time on fixing this corner case and the patch that you have developed is good enough. Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Mar 13, 2007 at 05:39:05PM +0300, Teodor Sigaev wrote: Hmm, hstore + (optionally) functional indexes. Is it answer? I have used it in a (yet) test system. It works surprisingly well. Thanks - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFF96sWBcgs9XrR2kYRApgIAJwJ/dbyHc5gnsgm8ljMvF0RL5qSiwCfXzSz PVrJkIbi497Ezns2+vXOLM8= =xGj+ -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bug in UTF8-Validation Code?
Am Mittwoch, 14. März 2007 08:01 schrieb Michael Paesold: Is there anything in the SQL spec that asks for such a behaviour? I guess not. I think that the octal escapes are a holdover from the single-byte days where they were simply a way to enter characters that are difficult to find on a keyboard. In today's multi-encoding world, it would make more sense if there were an escape sequence for a *codepoint* which is then converted to the actual encoding (if possible and valid) in the server. The meaning of codepoint is, however, character set dependent as well. The SQL standard supports escape sequences for Unicode codepoints, which I think would be a very useful feature (try entering a UTF-8 character bytewise ...), but it's a bit weird to implement and it's not clear how to handle character sets other than Unicode. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Synchronized Scan update
The advantage of sync_scan_offset is that, in some situations, a second scan can actually finish faster than if it were the only query executing, because a previous scan has already caused some blocks to be cached. However, 16 is a small number because that benefit would only be once per scan, and sync scans are only helpful on large tables. Agreed. Alright, understood. That last part is actually something I now want to avoid because it's using the current cache-spoiling behaviour of seqscans to advantage. I'd like to remove that behaviour, but it sounds like we can have both - SeqScans that don't spoil cache - Synch scans by setting sync_scan_offset to zero. I like the idea of reducing tuning parameters, but we should, at a minimum, still allow an on/off button for sync scans. My tests revealed that the wrong combination of OS/FS/IO-Scheduler/Controller could result in bad I/O behavior. Agreed Do you have an opinion about sync_scan_threshold versus a simple sync_scan_enable? enable_sync_scan? Seems the suggested guc's are very related. IIRC The agreed suggestion was to use NBuffers (or a percentage thereof ?) to decide whether to spoil the buffer cache for a seq scan. I seems this same metric should be used to decide whether to sync a scan when sync scan is enabled. So when the tablesize is below NBuffers (or a percentage thereof) you neighter recycle buffers nor sync the seq scans. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Bitmapscan changes
Tom Lane wrote: At this point I'm feeling unconvinced that we want it at all. It's sounding like a large increase in complexity (both implementation-wise and in terms of API ugliness) for a fairly narrow use-case --- just how much territory is going to be left for this between HOT and bitmap indexes? I'm in a awkward situation right now. I've done my best to describe the use cases for clustered indexes. I know the patch needs refactoring, I've refrained from making API changes and tried to keep all the ugliness inside the b-tree, knowing that there's changes to the indexam API coming from the bitmap index patch as well. I've been seeking for comments on the design since November, knowing that this is a non-trivial change. I have not wanted to spend too much time polishing the patch, in case I need to rewrite it from scratch because of some major design flaw or because someone comes up with a much better idea. It's frustrating to have the patch dismissed at this late stage on the grounds of it's not worth it. As I said in February, I have the time to work on this, but if major changes are required to the current design, I need to know. Just to recap the general idea: reduce index size taking advantage of clustering in the heap. Clustered indexes have roughly the same performance effect and use cases as clustered indexes on MS SQL Server, and Index-Organized-Tables on Oracle, but the way I've implemented them is significantly different. On other DBMSs, the index and heap are combined to a single b-tree structure. The way I've implemented them is less invasive, there's no changes to the heap for example, and it doesn't require moving live tuples. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] need help in understanding gist function
I need help in understanding certain things in gist ... 1. What is the functionality of gistplacetopage() function in gist.c ?Can you please give me detail description of this function. 2. I have left and right buffers with values that I have filled. Now, how do I write these changes(buffers) permanently into gistentry pages? Thanks Sharat
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: Zdenec, I have following idea: 1) add guc varibale which enable usage of OS time zone files 2) add extra parameters into ./configure script which enable OS TZ support in the code and get path to OS TZ files. If we're adding it as a configure-time variable, there's no reason to have a GUC. I see zero reason to have either. It would only make sense to do this in the context of a platform-specific distribution such as an RPM, and in that context the simplest solution is to let the RPM specfile make the substitution (ie, after make install and before packaging, rm -rf PG's timezone tree and insert a symlink). Then it's on the RPM packager's head whether it's the right thing to do or not. A configure switch strikes me as mostly a foot-gun, because the average user of Postgres won't have any way to know whether the files are compatible. I don't think to make a symlink is good solution. It generates a lot of future problem with package update or patching. Configure switch is much comfortable for packagers/patch makers. In case when average user want to compile own postgres we can offer regression test focused on TZ validation. (By the way average user is surprise, that postgres has own zone files) I also think, usage system's timezone files should be by default and configure script determines zonefiles location based on OS. Another location could be set by switch. If for some platform will be necessary use own copy, special switch (e.g. --enable-internal-tzfiles) setup postgres for process own timezone copy. Zdenek PS: For information there are TZ locations on several OS: /usr/share/lib/zoneinfo Solaris /usr/share/zoneinfo Redhat /opt/dce/lib/zoneinfo HP-UX (no TZif magic word) /etc/zoneinfo/ Tru64 (no TZif magic word) /usr/share/zoneinfo/MacOS registers MS Windows ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Granted, but a configure switch would allow users who want to use OS TZ file in conjunction with a compiled from source installation. Many users of OSes with package managers such as Debian or RedHat may, for whatever reason, want to use a source tarball to install and also use the OS TZ list. That being said, this user group may be small enough to ignore. Just throwing it in for thought. Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: Zdenec, I have following idea: 1) add guc varibale which enable usage of OS time zone files 2) add extra parameters into ./configure script which enable OS TZ support in the code and get path to OS TZ files. If we're adding it as a configure-time variable, there's no reason to have a GUC. I see zero reason to have either. It would only make sense to do this in the context of a platform-specific distribution such as an RPM, and in that context the simplest solution is to let the RPM specfile make the substitution (ie, after "make install" and before packaging, rm -rf PG's timezone tree and insert a symlink). Then it's on the RPM packager's head whether it's the right thing to do or not. A configure switch strikes me as mostly a foot-gun, because the average user of Postgres won't have any way to know whether the files are compatible. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] need help in understanding gist function
1. What is the functionality of gistplacetopage() function in gist.c ?Can you please give me detail description of this function. It tries to place index tuple on page. If page hasn't enough space, gistplacetopage splits page on two or more pages. Number of page to be splitted depends, basically, on quality of user-defined picksplit and distribution of tuple's length. 2. I have left and right buffers with values that I have filled. Now, how do I write these changes(buffers) permanently into gistentry pages? Sorry, I don't understand your questions. Where you have left and right buffers? Real write are produced by bgwriter process, in backend we should just mark byffer as dirty with a help of MarkBufferDirty call. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, Mar 14, 2007 at 12:07:45PM +1300, Edward Stanley wrote: On Wed, 14 Mar 2007, David Fetter wrote: On Tue, Mar 13, 2007 at 05:54:34PM +, Richard Huxton wrote: David Fetter wrote: On Tue, Mar 13, 2007 at 02:21:37PM +, Richard Huxton wrote: David Fetter wrote: On Tue, Mar 13, 2007 at 09:31:45AM +, Richard Huxton wrote: * Another good example is the questionnaire. With all due respect, this is a solved problem *without EAV or run-time DDL*. The URL below has one excellent approach to this. http://www.varlena.com/GeneralBits/110.php Which broadly speaking was the solution I used for my questionnaire, except I had a restricted set of types so basically just coerced them to text and side-stepped the inheritance issue. To the extent that it's dynamic, it's still just EAV though. That's precisely the difference between the above solution and yours, and it's the difference between a good design and one that will come up and bit you on the as^Hnkle. It's still basically EAV (either approach). The key fault with EAV is that the tables have no semantic meaning - answer_int contains number of oranges, days since birth and the price of a tube ticket in pennies. Stuffing all of those into an answer_int is *precisely* what the end user must not do. That's pilot error. Now, with a questionnaire that might not matter because everything is an answer and you're not necessarily going to do much more than count/aggregate it. See above. It doesn't remove the need for run-time DDL if you allow users to add their own questions. Sure it does. When a user, who should be talking with you, wants to ask a new kind of question, that's the start of a discussion about what new kind(s) of questions would be generally applicable in the questionnaire schema. Then, when you come to an agreement, you roll it into the new schema, and the whole system gets an improvement. Fine, but if you're not letting the user extend the system, then it's not really addressing Edward's original posting, is it? It's my contention that Edward's original idea is ill-posed. SQL is just fine for doing this kind of thing, and it's *not that hard*. If the user's talking to me, I might as well just write the DDL myself - it's the talk that'll take the time, not writing a dozen lines of SQL. It's the talk that's the important part. Machines are really bad at seeing the broader picture. In the attempt to save a few minutes' discussion, he's trying to borrow that time from a system asked to do things that computers are inherently bad at doing, and every end user will pay that time back at a very high rate of interest. This is precisely the kind of false economy that so plagues software development and maintenance these days. The interesting part of the problem (from a Comp-Sci point of view) is precisely in automating part of that discussion. It's providing an abstraction so that you don't end up with a mass of attributes while still providing freedom to the user. This freedom and efficiency you're talking about is better supplied, IMHO, by putting a standard DDL for questionnaires up on a pgfoundry or an SF.net. That way, improvements to the DDL get spread all over the world, and a very large amount of wheel reinvention gets avoided. Reusable components are a big chunk of both freedom and efficiency. :) Cheers, D Maybe I should rethink the problem a bit - from the very brief initial research I've done, it seems EAV schemas have two common uses: 1) When new attributes have to be created on-the-fly 2) When the number of possible properties for an entity greatly (orders of magnitude) exceeds the number of properties any one entity is likely to have. Um, no. The first use case is bad coding practice, and the second is a classic case for a join table, which is the standard way to handle M:N relationships. I'm not sure about solving the first problem - there seems to be a lot of debate around this. I can see reasons for and against allowing this. However I think the second is a very real problem. One such example is a patient record system. For each patient we have a table of common data (dob, sex, height, weight etc) but as well as this a patient can present with many symptoms. This might be a table of 40,000 possible symptoms. Here's how I'd do that: CREATE TABLE patient ( patient_id SERIAL PRIMARY KEY, /* for simplicity. Some combination of columns in the table would also have a UNIQUE NOT NULL constraint on it. */ ... ); CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with (
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
On Wed, Mar 14, 2007 at 01:13:58PM +0100, Zdenek Kotala wrote: I don't think to make a symlink is good solution. It generates a lot of future problem with package update or patching. Configure switch is much comfortable for packagers/patch makers. In case when average user want to compile own postgres we can offer regression test focused on TZ validation. (By the way average user is surprise, that postgres has own zone files) What is the actual problem being solved here? That people expected the timezone changes to be picked up automatically? think if you weigh it up, that problem is less significant than: 1. You do a minor system upgrade and now postgres crashes because the file format changed or the files moved. 2. You run a replication system and get different results on different machine. I think that from a data integrity point of view the current system is the best. At the very least what you propose is a modularity violation: Postgres depending on undocumented private data of another system component. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Martijn van Oosterhout wrote: I think that from a data integrity point of view the current system is the best. At the very least what you propose is a modularity violation: Postgres depending on undocumented private data of another system component. I don't think you can reasonably describe the system timezone database as undocumented private data. Plenty of other systems rely on it, as we used to do. But I take Tom's point about most users not knowing if their TZ database is usable or not. Maybe we need a tool (maybe on pgfoundry) that will do some analysis to find out, if such a thing is possible. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Martijn van Oosterhout wrote: On Wed, Mar 14, 2007 at 01:13:58PM +0100, Zdenek Kotala wrote: I don't think to make a symlink is good solution. It generates a lot of future problem with package update or patching. Configure switch is much comfortable for packagers/patch makers. In case when average user want to compile own postgres we can offer regression test focused on TZ validation. (By the way average user is surprise, that postgres has own zone files) What is the actual problem being solved here? That people expected the timezone changes to be picked up automatically? think if you weigh it up, that problem is less significant than: People expect consistent timezone setting for all application on one machine. 1. You do a minor system upgrade and now postgres crashes because the file format changed or the files moved. When you perform minor system upgrade which will delivery new TZ file format, than new version of libc must be delivery anyway and you probably must recompile postgres on upgraded system - you can check if TZ files works fine and if not you can compile it with build in. If file is moved, postgres raises error. But I don't see problem there. If you compare changes between 8.1.5 and 8.1.6, you can see a lot of removed files. 2. You run a replication system and get different results on different machine. However on another point of view, You very often have application and postgres on one machine. And if you have different tz files for application and for postgres, the result should be really strange. This case is most common than replication issue. I think that from a data integrity point of view the current system is the best. At the very least what you propose is a modularity violation: Postgres depending on undocumented private data of another system component. Yes, it is true, dependency on private data is not good. But It is smaller evil, than have more different timezone on one system. Zdenek ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bug in VACUUM FULL ?
Pavan Deolasee [EMAIL PROTECTED] writes: If that assumption is true (and it must be true for us to move the chain in pieces), doesn't that mean we don't really need to move the RECENTLY_DEAD tuples preceding a DEAD tuple ? As I've already said several times: they are dead, but at least for VACUUM FULL's purposes it seems unreasonably difficult to determine that and remove them. The point at which we'd figure this out is after we've already performed dead-tuple removal (at least for some of the pages involved). It would make for a significant increase in logical complexity, and that code is too damn complicated already. Since we know this is a seldom-seen corner case, I'm not going to risk introducing new bugs to recycle a few tuples a bit sooner. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Bitmapscan changes
Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki Linnakangas: Tom Lane wrote: At this point I'm feeling unconvinced that we want it at all. It's sounding like a large increase in complexity (both implementation-wise and in terms of API ugliness) for a fairly narrow use-case --- just how much territory is going to be left for this between HOT and bitmap indexes? I'm in a awkward situation right now. I've done my best to describe the use cases for clustered indexes. ... Just to recap the general idea: reduce index size taking advantage of clustering in the heap. Clustered indexes have roughly the same performance effect and use cases as clustered indexes on MS SQL Server, and Index-Organized-Tables on Oracle, but the way I've implemented them is significantly different. On other DBMSs, the index and heap are combined to a single b-tree structure. The way I've implemented them is less invasive, there's no changes to the heap for example, and it doesn't require moving live tuples. Do you keep visibility info in the index ? How does this info get updated when visibility data changes in the heap ? If there is no visibility data in index, then I can't see, how it gets the same performance effect as Index-Organized-Tables, as lot of random heap access is still needed. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Andrew Dunstan wrote: Martijn van Oosterhout wrote: I think that from a data integrity point of view the current system is the best. At the very least what you propose is a modularity violation: Postgres depending on undocumented private data of another system component. I don't think you can reasonably describe the system timezone database as undocumented private data. Plenty of other systems rely on it, as we used to do. But I take Tom's point about most users not knowing if their TZ database is usable or not. Maybe we need a tool (maybe on pgfoundry) that will do some analysis to find out, if such a thing is possible. I guess some regression test should test TZ validity? Zdenek ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Andrew Dunstan wrote: Martijn van Oosterhout wrote: I think that from a data integrity point of view the current system is the best. At the very least what you propose is a modularity violation: Postgres depending on undocumented private data of another system component. I don't think you can reasonably describe the system timezone database as undocumented private data. Plenty of other systems rely on it, as we used to do. No -- we relied on the libc's TZ API, which is not the same. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. Where is the boolean above? It is M:N, with each having whatever data is required. The issue I have with the above is that it seems unnecessarily inefficient. Whenever mapping from a patient to a symptom, or a symptom to a patient, it requires searching indexes for three tables. Perhaps this would work well if there was heavy overlap of symptoms for different patients. For the cases I have hit this problem, however, there may be overlap, but it is not easy to detect, and even if it was detected, we would end with some sort of garbage collection requirements where symptoms are removed once all references to the symptoms are removed. The case most familiar to me, is a set of viewing preferences for web pages. Some users specify no preferences, while others have dozens of preferences. As I have no requirements to search for users with a particular preference, I chose to solve this by packing many of the preferences together into a TEXT field, and having the application pack/unpack the data. I still have tables that map object id to attribute/value, but they are used for the data that can require longer queries. Without clustering the data, searching for a dozen of these attributes requires either querying all attributes, where the attributes could be scattered throughout the table, or querying them one by one, which is worse. If there was an efficient way to do this for both of my use cases, I would be strongly tempted to use it. :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Bitmapscan changes
Hannu Krosing wrote: Ühel kenal päeval, K, 2007-03-14 kell 10:22, kirjutas Heikki Linnakangas: Tom Lane wrote: At this point I'm feeling unconvinced that we want it at all. It's sounding like a large increase in complexity (both implementation-wise and in terms of API ugliness) for a fairly narrow use-case --- just how much territory is going to be left for this between HOT and bitmap indexes? I'm in a awkward situation right now. I've done my best to describe the use cases for clustered indexes. ... Just to recap the general idea: reduce index size taking advantage of clustering in the heap. This is what I suggest. Provide a tarball of -head with the patch applied. Provide a couple of use cases that can be run with explanation of how to verify the use cases. Allow the community to drive the inclusion by making it as easy as possible to allow a proactive argument to take place by the people actually using the product. Proving that a user could and would use the feature is something that is a very powerful argument. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Martijn van Oosterhout kleptog@svana.org writes: What is the actual problem being solved here? That people expected the timezone changes to be picked up automatically? think if you weigh it up, that problem is less significant than: ... One other point is that symlinking to system timezone info will not cause Postgres to pick up changes automatically. You'll probably still need a postmaster restart to get the new settings loaded in. If the timezone info update is delivered as part of a Postgres update, the need for this is fairly obvious, but if it's happening as part of an update of a seemingly unrelated package, not so much. To me, the idea of linking to system timezone info is a convenience for packagers, and it is (and should be) the packagers' responsibility to determine whether it's a sane idea for their environment. If they are competent to determine that, they are certainly competent enough to make it happen without assistance from our configure script. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. With a unique across (patient_id,symptom_id) you could have a single patient with as many unique symptoms as could be listed. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
[EMAIL PROTECTED] writes: On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. Where is the boolean above? It is M:N, with each having whatever data is required. No, the above schema can only show whether a patient has or doesn't have a symptom. There is nowhere to store *where* the pain, inflammation, swelling, aneurism, etc is, or how severe it is, or when it occurred, etc. In any case the above arguably *is* an EA schema anyways. Your symptom is just as much an abstract meaningless concept from a database point of view as the questionnaire's answer or the bug tracker's tag. Especially once you start actually having to record information *about* the symptom. This is a silly argument. The only reasonable conclusion is that any dogmatic principle that doesn't take into account the application requirements is wrong. In some cases you want a flexible abstract schema because the application is flexible and abstract, in others you need the database schema to understand your specific data structures so it can help you manipulate it. You have to pick which is more useful for your application, you can't have your cake and eat it too. And all of the sudden I have a craving for cake... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, 2007-03-14 at 16:08, [EMAIL PROTECTED] wrote: On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. Where is the boolean above? It is M:N, with each having whatever data is required. The boolean is assumed in the symptoms table. In any case, even if it's not a boolean value, even if maybe the symptoms table is a complex one on it's own, it still is one single type for all symptoms of all patients. The real problem is that in some real world applications you have a mix of wildly varying types of attributes a user might want to use, and you can't know what those will be beforehand... the symptoms thing is simple to solve in the way David did it, but there really are other situations which a simple m:n can't easily cover. How would you handle a data base of user settings for 10K different applications and 100M different users where each application must be able to store it's own (type safe !!) settings in the same structure, and applications come and go with their own settings ? Come up with a good solution to this combined with queries like give me all the users who have this set of settings set to these values running fast, and then you're talking. Cheers, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Joshua D. Drake wrote: Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. With a unique across (patient_id,symptom_id) you could have a single patient with as many unique symptoms as could be listed. I think Greg is suggesting that the table needs to be augmented with, say, a severity field. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, Mar 14, 2007 at 03:25:48PM +, Gregory Stark wrote: [EMAIL PROTECTED] writes: On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. Where is the boolean above? It is M:N, with each having whatever data is required. No, the above schema can only show whether a patient has or doesn't have a symptom. There is nowhere to store *where* the pain, inflammation, swelling, aneurism, etc is, or how severe it is, or when it occurred, etc. What do you think the '...' is in the symptom table? :-) I'm not saying I would do it that way - but it seems to me that this is a bit of a religious debate, without people listening to each other. For one person to suggest a workable solution, and another person to write it off as quickly as you did, misunderstanding, or misrepresenting it, is what I would call not listening. :-) In any case the above arguably *is* an EA schema anyways. Your symptom is just as much an abstract meaningless concept from a database point of view as the questionnaire's answer or the bug tracker's tag. Especially once you start actually having to record information *about* the symptom. This is a silly argument. The only reasonable conclusion is that any dogmatic principle that doesn't take into account the application requirements is wrong. In some cases you want a flexible abstract schema because the application is flexible and abstract, in others you need the database schema to understand your specific data structures so it can help you manipulate it. You have to pick which is more useful for your application, you can't have your cake and eat it too. No. Another reasonable conclusion is that the answer is not simple. This doesn't mean the answer is undesirable. It means that people need more time. :-) I prefer abstractions, especially if they are more efficient than if I were to roll my own each time. Relational databases do a good job today. They can do a better job tomorrow. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4
Andrew Dunstan [EMAIL PROTECTED] writes: But I take Tom's point about most users not knowing if their TZ database is usable or not. Maybe we need a tool (maybe on pgfoundry) that will do some analysis to find out, if such a thing is possible. It's not really *that* hard: diff between our timezone files and the system files will make it pretty clear. For instance, diffing a CVS HEAD installation against a not-too-up-to-date Fedora Core 5 system shows only a few different files, reflecting the fact that they're different snapshots of the zic database: $ diff -r --br . /usr/share/zoneinfo Files ./America/Havana and /usr/share/zoneinfo/America/Havana differ Files ./America/Santiago and /usr/share/zoneinfo/America/Santiago differ Files ./Antarctica/Palmer and /usr/share/zoneinfo/Antarctica/Palmer differ Files ./Australia/Perth and /usr/share/zoneinfo/Australia/Perth differ Files ./Australia/West and /usr/share/zoneinfo/Australia/West differ Files ./Chile/Continental and /usr/share/zoneinfo/Chile/Continental differ Files ./Chile/EasterIsland and /usr/share/zoneinfo/Chile/EasterIsland differ Files ./Cuba and /usr/share/zoneinfo/Cuba differ Files ./Pacific/Easter and /usr/share/zoneinfo/Pacific/Easter differ Only in ./US: Pacific-New Only in /usr/share/zoneinfo: iso3166.tab Only in /usr/share/zoneinfo: posix Only in /usr/share/zoneinfo: posixrules Only in /usr/share/zoneinfo: right Only in /usr/share/zoneinfo: zone.tab $ But IMHO the thing that you need to know to make an informed decision is what the future update path for that system is likely to be. In the case of me packaging Postgres for Red Hat, I feel pretty comfortable that there will be no major surgery on glibc's timezone code within any single RHEL release series, so if it works today it'll still work tomorrow. A Sun engineer packaging Postgres for Solaris may be able to make the same kind of determination. But I think Joe Average User is sticking his neck out if he assumes such a thing for say a Gentoo box ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] autovacuum next steps, take 3
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: 1. Grab the AutovacSchedule LWLock exclusively. 2. Check to see if another worker is currently processing that table; if so drop LWLock and go to next list entry. 3. Recompute whether table needs vacuuming; if not, drop LWLock and go to next entry. (This test covers the case where someone vacuumed the table since you made your list.) 4. Put table OID into shared memory, drop LWLock, then vacuum table. 5. Clear current-table OID from shared memory, then repeat for next list entry. The point I'm not very sure about is that this proposal means we need to do I/O with the AutovacSchedule LWLock grabbed, to obtain up-to-date stats. True. You could probably drop the lock while rechecking stats, at the cost of having to recheck for collision (repeat step 2) afterwards. Or recheck stats before you start, but if collisions are likely then that's a waste of time. But on the third hand, does it matter? Rechecking the stats should be much cheaper than a vacuum operation, so I'm not seeing that there's going to be a problem. It's not like there are going to be hundreds of workers contending for that lock... Turns out that it does matter, because not only we need to read pgstats, but we also need to fetch the pg_autovacuum and pg_class rows again for the table. So we must release the AutovacuumSchedule lock before trying to open pg_class etc. Unless we are prepared to cache (keep a private copy of) the contents of said tuples between the first check (i.e. when building the initial table list) and the recheck? This is possible as well, but it gives me an uneasy feeling. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
[EMAIL PROTECTED] writes: On Wed, Mar 14, 2007 at 03:25:48PM +, Gregory Stark wrote: [EMAIL PROTECTED] writes: On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. Where is the boolean above? It is M:N, with each having whatever data is required. No, the above schema can only show whether a patient has or doesn't have a symptom. There is nowhere to store *where* the pain, inflammation, swelling, aneurism, etc is, or how severe it is, or when it occurred, etc. What do you think the '...' is in the symptom table? :-) Ah, I did in fact miss that. So then this is just a standard EA schema. I thought you were on the other side of the debate? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. What's in the symptom table is up to the doctor. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
David Fetter wrote: On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. What's in the symptom table is up to the doctor. Surely the point is that for high temperature I need a temperature value, whereas for tooth decay I'll want a tooth number (or whatever they use). Which brings us back to where we started... -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, 2007-03-14 at 16:50, David Fetter wrote: On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: David Fetter [EMAIL PROTECTED] writes: CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. What's in the symptom table is up to the doctor. OK, and here's the problem: each doctor might want to put something else in the symptom table. Each doctor might want to do it in a type safe way, e.g. so that the application enforces an enumeration of high/moderate/low for the symptom fever (or maybe another doctor wants it in exact degrees)... you can all stuff it in a string field, but you know how reliable that will be. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Bitmapscan changes
Joshua D. Drake wrote: Allow the community to drive the inclusion by making it as easy as possible to allow a proactive argument to take place by the people actually using the product. This seems to be a rather poor decision making process: Are the users happy with the new feature? If so, then apply the patch. It leads to unmanageable code. Which is why we don't do things that way. The code must fit within the general architecture before application -- particularly if it's an internal API change. That's what the review process is for. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] hash join hashtable size and work_mem
in nodeHash.c, the function ExecChooseHashTableSize() uses two different methods for determining the number of buckets to use. the current code looks something like: if (ntuples * tuplesize work_mem * 1024) buckets = (work_mem * 1024) / (tupsize * 10); else buckets = ntuples/10 So for the case where a spill is expected; we use work_mem to decide on our hash size. For the case where a spill isn't expected; we rely on the row estimate alone -- and make no provision for speeding the join by using the memory that we're allowed to use. When profiling large hash-joins, it often is the case that scanning the hash-buckets is a bottleneck; it would be nice for the user to be able to throw memory at a join to improve performance. Am I missing something about the current implementation ? I would expect that the bucket count would be calculated something like: buckets = (work_mem * 1024L) / (tup_size * NTUP_PER_BUCKET) for both cases ? making this change appears to improve hash-join performance substantially in some cases, and as far as I can tell doesn't hurt anything (apart from using memory that it is allowed to use given a particular work_mem setting). -Tim -- [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Bitmapscan changes
Alvaro Herrera wrote: Joshua D. Drake wrote: Allow the community to drive the inclusion by making it as easy as possible to allow a proactive argument to take place by the people actually using the product. This seems to be a rather poor decision making process: Are the users happy with the new feature? If so, then apply the patch. It leads to unmanageable code. Perhaps reading my message again is in order. I think it is pretty obvious that the a user shouldn't determine if a patch should be applied. My whole point was that if people are clamoring for the feature, it could drive that feature to be more aggressively reviewed. I can't even count how many times I see: This seems like a corner case feature, I don't think we should add it. So I am suggesting a way to insure that the feature is not considered corner case. (if it is indeed not a corner case) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] hash join hashtable size and work_mem
Timothy J. Kordas [EMAIL PROTECTED] writes: Am I missing something about the current implementation ? If the planner has correctly predicted the number of rows, the table loading should be about NTUP_PER_BUCKET in either regime. Are you sure you aren't just wishing that NTUP_PER_BUCKET were smaller? I don't see that making the hashtable much larger than ntuples is a good idea --- that just spreads out the live entries over more cache lines, resulting in more cache thrashing. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] hash join hashtable size and work_mem
Tom Lane wrote: If the planner has correctly predicted the number of rows, the table loading should be about NTUP_PER_BUCKET in either regime. Are you sure you aren't just wishing that NTUP_PER_BUCKET were smaller? Maybe I wish NTUP_PER_BUCKET was smaller. But I don't think that's the whole story. The planner estimates definitely play a role in my concern here. For mis-estimated inner relations, the current calculation may over-subscribe the hash-table even if more work_mem was available (that is, there are too many hash collisions *and* memory isn't being used to the fullest extent allowed). I've been tracking the number of tuples which land in each bucket, and I'd like to see that number go down as I increase work_mem. I would expect for the same data a hash-join with a work_mem of 256MB to run faster than one run with 32MB; even if the inner relation is only 30MB. the implementation I've been experimenting with actually takes the average of the current implementation (ntuples/10) and the spill version (work_mem/(tupsize * 10). -Tim ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] hash join hashtable size and work_mem
Timothy J. Kordas [EMAIL PROTECTED] writes: I would expect for the same data a hash-join with a work_mem of 256MB to run faster than one run with 32MB; even if the inner relation is only 30MB. Once you get to the point where each tuple is in a different bucket, it is clearly impossible for further increases in hashtable size to improve matters. All you can do is waste RAM and cache lines. Now if we set NTUP_PER_BUCKET = 1 we would not be exactly at that critical point because of uneven bucket loading and other factors ... but I question whether there's enough incremental improvement available to justify making the hashtable much larger than that. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] hash join hashtable size and work_mem
On Wed, 2007-03-14 at 10:28 -0700, Timothy J. Kordas wrote: I would expect for the same data a hash-join with a work_mem of 256MB to run faster than one run with 32MB; even if the inner relation is only 30MB. Certainly not for all data, but for some distrubutions yes, probably. The easiest thing to do is prove thats true and then work out how to spot that case ahead of time, or at least find a place where you can adjust your assumptions cheaply enough to improve things. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] how to add seconds to a TimestampTz
Is there a better way than going to time_t and back? I am currently using this: db-next_worker = time_t_to_timestamptz(timestamptz_to_time_t(current_time) + autovacuum_naptime); (db-next_worker is a TimestampTz, as is current_time. autovacuum_naptime is integer for a number of seconds) but it doesn't seem clean, and the comments in the functions more or less say that their use is discouraged. I saw about doing it via the interval input/output but that's an awful lot of work ... Is this the first time this is going to be done in the backend? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] how to add seconds to a TimestampTz
Alvaro Herrera [EMAIL PROTECTED] writes: Is there a better way than going to time_t and back? Isn't the standard SQL-level locution timestamptz + numeric_value * '1 second'::interval ? I'm not sure what would be the most convenient realization of this at the C level, but please stay away from time_t ... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] UPDATE using sub selects
Hi, I have coded up a patch which solves the following TODO. I will submit a patch for this soon: - UPDATE - Allow UPDATE tab SET ROW (col, ...) = (SELECT...) http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php The question is that since this enhances the UPDATE syntax, what changes and where all they need to be made with respect to the documentation? Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] UPDATE using sub selects
NikhilS [EMAIL PROTECTED] writes: I have coded up a patch which solves the following TODO. I will submit a patch for this soon: http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php Cool... The question is that since this enhances the UPDATE syntax, what changes and where all they need to be made with respect to the documentation? Documentation is the very least of your worries. What exactly is your implementation plan? If this were a simple or narrow fix it would have been done already. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] UPDATE using sub selects
Hi, The question is that since this enhances the UPDATE syntax, what changes and where all they need to be made with respect to the documentation? Documentation is the very least of your worries. What exactly is your implementation plan? If this were a simple or narrow fix it would have been done already. The implementation that I have planned is pretty similar to the way INSERT INTO ... SELECT has been implemented. Along with the grammar changes in gram.y, the changes are localized in the transformUpdateStmt code path. The SELECT clause ends up becoming a subquery to the update query with the target column expressions transformed properly to include the subquery expressions. Does this sound ok? I have tried some update-subselect variations and they seem to work. For example the case in the src/test/regress/sql/update.sql, which used to fail till now, seems to work: UPDATE update_test SET (a,b) = (select a,b FROM update_test where c = 'foo') WHERE a = 10; Will try testing out some other variations too. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com