Re: [HACKERS] Oracle/PostgreSQL incompatibilities
Finger or brain trouble, here is the correction : NUMBER - INTEGER when transporting schemas from Oracle to Pg. (This needs a little care - as NUMBER in Oracle has bigger *precision* than INTEGER in Pg) Thinking about this a bit more, its probably fine to just substitute NUMERIC for NUMBER, but obviously INTEGER is going to perform better if it can be used. regards Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Mike Mascari kirjutas L, 04.10.2003 kell 06:32: 2) The query language should be computationally complete. The user should be able to author complete applications in the language, rather than the language being a sublanguage. To me it seems like requiring that one should be able to author complete programs in regex. Yes, when all you have is a hammer everything looks like a nail ;) Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Using backslash in query
Tom Lane kirjutas R, 03.10.2003 kell 18:34: Michael Brusser [EMAIL PROTECTED] writes: But when I doubled the number of backslashes: SELECT smth. FROM tbl WHERE situation LIKE '%%'; - it actually worked fine. Backslash is special to both the string-literal parser and the LIKE code. So when you write the above, the pattern value that arrives at the LIKE processor has one less level of backslashing: %\\% and the LIKE processor interprets this as percent, a literal backslash, and another percent. Regarding the dollar-quoting discussions - Will we be able to write the above query as SELECT smth. FROM tbl WHERE WHERE situation LIKE $$%\\%$$; in 7.4 or is \ still special there ? if it is then one \ in regex in plpython still needs to be entered as which has some geek coolness but would not be what I'd prefer to do on a regular basis. Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] hr translations
I see that hr translation for libpq is applied, but not that one for backend that I send few days ago. Is there any problem with it ? Regards ! ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Quick question
Thanks - that's what I ended up doing. Hooray for CLUSTER indexes and constraints in phpPgAdmin CVS now! Chris Tom Lane wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: If someone could help me with this, it would be cool. How do I query the catalogs to find the underlying index for a constraint? (Assuming the constraint is primary or unique) A first approximation is that the constraint and the index have the same name, but I suppose someone could break that association by renaming the index. Look in pg_depend for an internal dependency entry from the index to the constraint if you want to be sure. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Weird locking situation
regression=# UPDATE foo SET f2=now() WHERE f1=1; ERROR: deadlock detected DETAIL: Process 18122 waits for AccessExclusiveLock on relation 154635 of database 17139; blocked by process 18133. Process 18133 waits for ShareLock on transaction 6330; blocked by process 18122. The trouble here is that GIST indexes are not concurrency-safe. This is on the TODO list but I fear it's not a small task ... Wow, that's bad. I always thought the TODO item was talking about poor concurrent performance - not actual concurrency errors! Chris ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Index/Function organized table layout (from Re:
Christopher Browne kirjutas R, 03.10.2003 kell 00:57: [EMAIL PROTECTED] (Jean-Luc Lachance) writes: That's one of the draw back of MVCC. I once suggested that the transaction number and other house keeping info be included in the index, but was told to forget it... It would solve once and for all the issue of seq_scan vs index_scan. It would simplify the aggregate problem. It would only simplify _one_ case, namely the case where someone cares about the cardinality of a relation, and it would do that at _considerable_ cost. A while back I outlined how this would have to be done, and for it to be done efficiently, it would be anything BUT simple. Could this be made a TODO item, perhaps with your attack plan. Of course as strictly optional feature useful only for special situations (see below) I cross-post this to [HACKERS] as it seem relevant to a problem recently discussed there. It would be very hairy to implement it correctly, and all this would cover is the single case of SELECT COUNT(*) FROM SOME_TABLE; Not really. Just yesterday there was a discussion on [HACKERS] about implementing btree-organized tables, which would be much less needed if the visibility info were kept in indexes. If you had a single WHERE clause attached, you would have to revert to walking through the tuples looking for the ones that are live and committed, which is true for any DBMS. If the WHERE clause could use the same index (or any index with visibility info) then there would be no need for walking through the tuples in data relation. the typical usecase cited on [HACKERS] was time series data, where inserts are roughly in (timestamp,id)order but queries in (id,timestamp) order. Now if the index would include all relevant fields (id,timestamp,data1,data2,...,dataN) then the query could run on index only touching just a few pages and thus vastly improving performance. I agree that this is not something everybody needs, but when it is needed it is needed bad. And it still begs the same question, of why the result of this query would be particularly meaningful to anyone. I don't see the usefulness; I don't see the value of going to the considerable effort of fixing this purported problem. Being able to do fast count(*) is just a side benefit. Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Using backslash in query
- Original Message - From: Hannu Krosing [EMAIL PROTECTED] Regarding the dollar-quoting discussions - Will we be able to write the above query as SELECT smth. FROM tbl WHERE WHERE situation LIKE $$%\\%$$; Yes, as I understand it (as long as you remove the one of the WHEREs :-) ). If not we need some rethinking. Think of this as our equivalent of XML's CDATA quoting - you don't need to escape or or inside a CDATA node. in 7.4 or is \ still special there ? I don't believe so. cheers andrew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Using backslash in query
but this will be in 7.5, not 7.4, won't it? Andrew Dunstan wrote: - Original Message - From: Hannu Krosing [EMAIL PROTECTED] Regarding the dollar-quoting discussions - Will we be able to write the above query as SELECT smth. FROM tbl WHERE WHERE situation LIKE $$%\\%$$; Yes, as I understand it (as long as you remove the one of the WHEREs :-) ). If not we need some rethinking. Think of this as our equivalent of XML's CDATA quoting - you don't need to escape or or inside a CDATA node. in 7.4 or is \ still special there ? I don't believe so. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Beta4 Tag'd and Bundled ...
Hi, Many of the regression tests are failing on my OSX 10.2.6 machine. I have put the regression.diffs file here http://bugs.sghms.ac.uk/downloads/regression.diffs Has this been seen before? Thanks adam Check her over and let me know if there are any problems ... will do a full general announce tomorrow for it ... ---(end of broadcast)--- TIP 8: explain analyze is your friend -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
The world rejoiced as [EMAIL PROTECTED] (Mike Mascari) wrote: It's a very provocative read. At a minimum, one can learn what to avoid with SQL. The language looks neat on paper. Perhaps one day someone will provide an open source implementation. One could envision a D project along the same lines as the same sort of project that added SQL to Postgres... I think you summed it up nicely. The manifesto is a provocative, if painful, read. It is very useful at pointing out pointy edges of SQL that might be wise to avoid. I'm not thrilled with the language; I think they have made a mistake in trying to make it too abbreviation-oriented. They keep operator names short, to a fault. As you say, the most likely way for a D to emerge in a popular way would be by someone adding the language to an existing database system. There is a project out on SourceForge for a D implementation, called Duro. It takes the opposite approach; the operators are all defined as C functions, so you write all your code in C. It uses a data store built atop Berkeley DB. I think an implementor would be better off using an SQL database underneath, and using their code layer in between to accomplish the divorce from the aspects of SQL that they disapprove of. Sort of like MaVerick, a Pick implementation in Java that uses a DBMS such as PostgreSQL as the underlying data store. You do a proof of concept by building something that translates D requests to SQL requests. And _then_ get a project going to put a D parser in as an alternative to the SQL parser. (Yes, that oversimplifies matters. Tough...) -- let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;; http://www3.sympatico.ca/cbbrowne/rdbms.html Rules of the Evil Overlord #81. If I am fighting with the hero atop a moving platform, have disarmed him, and am about to finish him off and he glances behind me and drops flat, I too will drop flat instead of quizzically turning around to find out what he saw. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Thoughts on maintaining 7.3
Bruce Momjian [EMAIL PROTECTED] writes: Do we move empty index pages to the end before truncation during vacuum full? No. You'd be better off using REINDEX for that, I think. IIRC we have speculated about making VAC FULL fix the indexes via REINDEX rather than indexbulkdelete. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Using backslash in query
Hannu Krosing [EMAIL PROTECTED] writes: Regarding the dollar-quoting discussions - Will we be able to write the above query as SELECT smth. FROM tbl WHERE WHERE situation LIKE $$%\\%$$; in 7.4 or is \ still special there ? We were discussing that for 7.5 not 7.4. But yeah, it would work that way because \ won't be treated specially in dollar-quoted literals. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Beta4 Tag'd and Bundled ...
Adam Witney [EMAIL PROTECTED] writes: Many of the regression tests are failing on my OSX 10.2.6 machine. I have put the regression.diffs file here http://bugs.sghms.ac.uk/downloads/regression.diffs Seems to be all ! psql: FATAL: sorry, too many clients already What did initdb set your max_connections to? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Beta4 Tag'd and Bundled ...
On 4/10/03 5:16 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: Many of the regression tests are failing on my OSX 10.2.6 machine. I have put the regression.diffs file here http://bugs.sghms.ac.uk/downloads/regression.diffs Seems to be all ! psql: FATAL: sorry, too many clients already What did initdb set your max_connections to? regards, tom lane From src/test/regress/log/initdb.log selecting default max_connections... 10 Is this the info you are asking for? adam ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] timestamp.c is broken (probably by ecpg) in 7.4
On Fri, Oct 03, 2003 at 08:57:49 -0500, Bruno Wolff III [EMAIL PROTECTED] wrote: I get the error message below when trying to 'make' current cvs: gcc -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -fpic -I../../../../src/interfaces/ecpg/include -I../../../../src/include/utils -I../../../../src/include -D_GNU_SOURCE -g -c timestamp.c -o timestamp.o In file included from ../../../../src/interfaces/ecpg/include/compatlib.h:7, from ../../../../src/interfaces/ecpg/include/datetime.h:4, from timestamp.c:14: ../../../../src/interfaces/ecpg/include/ecpglib.h:9: libpq-fe.h: No such file or directory make[4]: *** [timestamp.o] Error 1 My config is: ./configure --prefix=/usr/local/pgsql --enable-integer-datetimes --with-openssl --with-pgport=5433 This is on a redhat 6.1 system. It looks like this has been fixed in the current cvs. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Beta4 Tag'd and Bundled ...
Adam Witney [EMAIL PROTECTED] writes: On 4/10/03 5:16 pm, Tom Lane [EMAIL PROTECTED] wrote: What did initdb set your max_connections to? From src/test/regress/log/initdb.log selecting default max_connections... 10 Hm. The parallel regression tests require at least 20. I deliberately allowed initdb to select values as small as 10 on the theory that installing and not being able to run the parallel regression tests is better than not installing at all. Does anyone want to argue the opposite? Perhaps we just need a note in the documentation --- but if so, where? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Beta4 Tag'd and Bundled ...
Hm. The parallel regression tests require at least 20. I deliberately allowed initdb to select values as small as 10 on the theory that installing and not being able to run the parallel regression tests is better than not installing at all. Does anyone want to argue the opposite? Perhaps we just need a note in the documentation --- but if so, where? Another alternative is to have the regression suite discover the max connections, and defer tests when there are (max_connections - 1) connections already. signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Beta4 Tag'd and Bundled ...
Rod Taylor [EMAIL PROTECTED] writes: Hm. The parallel regression tests require at least 20. I deliberately allowed initdb to select values as small as 10 on the theory that installing and not being able to run the parallel regression tests is better than not installing at all. Another alternative is to have the regression suite discover the max connections, and defer tests when there are (max_connections - 1) connections already. Maybe. After thinking a bit more, I seem to recall one of the reasons for having wide parallel sets in the regression tests is that we *wanted* to consider inability to support a dozen or two connections as a serious problem. If we still believe that old logic, then indeed the right thing to do is for initdb to insist on setting max_connections no smaller than 20. (Pre-7.4, the default setting was generally 32, so this is still more flexible than before from a portability standpoint.) regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] pg_dump bug in 7.4
The following is still a problem in current cvs (as of 2 hours ago). Normally I wouldn't bug people about this again this soon, but with talk of a release candidate next week I wanted to make sure that it wasn't forgotten. On Sun, Sep 28, 2003 at 20:14:03 -0500, Bruno Wolff III [EMAIL PROTECTED] wrote: If you have a check constraint that tests if a boolean column is not false by just using the column name, pg_dump doesn't include parens around the check constraint which causes a syntax error when reloading the database. Using the following to create a table: create table test (col1 boolean constraint test check (col1)); pg_dump -c produced the following: -- -- PostgreSQL database dump -- SET SESSION AUTHORIZATION 'postgres'; SET SESSION AUTHORIZATION 'bruno'; SET search_path = public, pg_catalog; DROP TABLE public.test; SET SESSION AUTHORIZATION 'postgres'; -- -- TOC entry 3 (OID 2200) -- Name: public; Type: ACL; Schema: -; Owner: postgres -- REVOKE ALL ON SCHEMA public FROM PUBLIC; GRANT ALL ON SCHEMA public TO PUBLIC; SET SESSION AUTHORIZATION 'bruno'; -- -- TOC entry 4 (OID 605016) -- Name: test; Type: TABLE; Schema: public; Owner: bruno -- CREATE TABLE test ( col1 boolean, CONSTRAINT test CHECK col1 ); -- -- Data for TOC entry 5 (OID 605016) -- Name: test; Type: TABLE DATA; Schema: public; Owner: bruno -- COPY test (col1) FROM stdin; \. SET SESSION AUTHORIZATION 'postgres'; -- -- TOC entry 2 (OID 2200) -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres -- COMMENT ON SCHEMA public IS 'Standard public schema'; ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] pg_dump bug in 7.4
When grilled further on (Sat, 4 Oct 2003 12:50:27 -0500), Bruno Wolff III [EMAIL PROTECTED] confessed: The following is still a problem in current cvs (as of 2 hours ago). Normally I wouldn't bug people about this again this soon, but with talk of a release candidate next week I wanted to make sure that it wasn't forgotten. I just hit the same problem (with 7.4b4). CREATE TABLE processing ( index integer NOT NULL, time timestamp with time zone DEFAULT now() NOT NULL, archname_index integer, CONSTRAINT archname_index CHECK NULL::boolean ); Cheers, Rob -- 11:49:30 up 64 days, 4:19, 4 users, load average: 4.37, 3.83, 3.53 pgp0.pgp Description: PGP signature
Re: [HACKERS] hr translations
Darko Prenosil writes: I see that hr translation for libpq is applied, but not that one for backend that I send few days ago. Is there any problem with it ? It might have gotten caught in my mail filters. I've taken it from the archives and applied it now. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] pg_dump bug in 7.4
I have added this to the 7.4 open items list: Fix pg_dump of CREATE TABLE test (col1 BOOLEAN CONSTRAINT test CHECK (col1)) --- Bruno Wolff III wrote: The following is still a problem in current cvs (as of 2 hours ago). Normally I wouldn't bug people about this again this soon, but with talk of a release candidate next week I wanted to make sure that it wasn't forgotten. On Sun, Sep 28, 2003 at 20:14:03 -0500, Bruno Wolff III [EMAIL PROTECTED] wrote: If you have a check constraint that tests if a boolean column is not false by just using the column name, pg_dump doesn't include parens around the check constraint which causes a syntax error when reloading the database. Using the following to create a table: create table test (col1 boolean constraint test check (col1)); pg_dump -c produced the following: -- -- PostgreSQL database dump -- SET SESSION AUTHORIZATION 'postgres'; SET SESSION AUTHORIZATION 'bruno'; SET search_path = public, pg_catalog; DROP TABLE public.test; SET SESSION AUTHORIZATION 'postgres'; -- -- TOC entry 3 (OID 2200) -- Name: public; Type: ACL; Schema: -; Owner: postgres -- REVOKE ALL ON SCHEMA public FROM PUBLIC; GRANT ALL ON SCHEMA public TO PUBLIC; SET SESSION AUTHORIZATION 'bruno'; -- -- TOC entry 4 (OID 605016) -- Name: test; Type: TABLE; Schema: public; Owner: bruno -- CREATE TABLE test ( col1 boolean, CONSTRAINT test CHECK col1 ); -- -- Data for TOC entry 5 (OID 605016) -- Name: test; Type: TABLE DATA; Schema: public; Owner: bruno -- COPY test (col1) FROM stdin; \. SET SESSION AUTHORIZATION 'postgres'; -- -- TOC entry 2 (OID 2200) -- Name: SCHEMA public; Type: COMMENT; Schema: -; Owner: postgres -- COMMENT ON SCHEMA public IS 'Standard public schema'; ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_dump bug in 7.4
Bruno Wolff III [EMAIL PROTECTED] writes: The following is still a problem in current cvs (as of 2 hours ago). Not any more ;-) regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_dump bug in 7.4
Bruce Momjian [EMAIL PROTECTED] writes: I have added this to the 7.4 open items list: Fix pg_dump of CREATE TABLE test (col1 BOOLEAN CONSTRAINT test CHECK (col1)) Fixed now. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd and Bundled ...)
I said: Hm. The parallel regression tests require at least 20. I deliberately allowed initdb to select values as small as 10 on the theory that installing and not being able to run the parallel regression tests is better than not installing at all. Actually, after trying to reproduce the problem on my own OS X machine, I realize that it's a little more subtle than I thought. The Darwin port does not use SysV semaphores at all (it uses Posix semaphores) and so the resource restriction you're hitting must actually be the max-shared-memory limit, rather than number-of-semaphores as I first assumed. max_connections does have an impact on shared memory size, though not as large as shared_buffers has. Therefore, the real problem is that initdb initially probes for a workable number of shared_buffers while using max_connections=5, and then it selects max_connections while holding shared_buffers constant. I was thinking that a small max_connections would prevent us from mistakenly choosing tiny shared_buffers when the system's real restriction is on number of semaphores. But what we seem to have here is that the selected number of buffers was just a little under the system's max-shared-memory limit, so that max_connections could be raised to 10 but not to 20. (BTW, on my OS X machine, with out-of-the-box configuration, initdb selects shared_buffers 400 and max_connections 20. I'm guessing that you had either a nondefault shared memory limit, or some other process using shared memory.) Perhaps we should avoid all attempts at cuteness and just run the initial probes for workable shared_buffers with max_connections=20, as well as making that be the minimum max_connections value probed for. Anyone see a better way? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Index/Function organized table layout
On 10/2/03 11:34 PM, Hannu Krosing [EMAIL PROTECTED] wrote: James Rogers kirjutas N, 02.10.2003 kell 23:44: Not exactly. What you are describing is more akin to partitioning or hash-organized tables i.e. sorting insert/update tuples to various pages according to some hash function. What I actually thought I was describing is how CLUSTER should work in a postgres flavour of MVCC storage ;). Not the CLUSTER command, but the whole feature. Yeah, I can see this. Clustering doesn't really imply ordering, just tuple proximity in the heap. Loosely implemented by default (i.e. only grouping a tuple if it is cheap to do so at insert/update time) on a table's primary key might give measurable query performance improvement in the typical case without impacting the average performance of queries that use other indexes. Even if the tuples were not ordered in the heap, tuple proximity in the heap would solve a significant percentage of the performance issue that index-organized tables solve i.e. greatly improved cache efficiency. AFAICS we could resolve this problem (querying indexes only) by keeping a copy of visibility info (tmin,tmax,...) in index tuples. This would make index updates bigger and thus slower, so this should be optional. It wouldn't be a feature you would want to use anyway if actually indexing a table. But yes, adding the necessary support to make an index page masquerade as a proper heap page would be most of the effort. That and making the SQL execution efficiently make use of the fact that the index and the table are the same relation with two different interfaces, such that you don't have to modify or access both when using either one. If you then put all fields in primary key, then the main table could be dropped. If there is no data table then no other indexes would then be allowed, or they must be double-indexes referencing the primary key, not tuple and thus even bigger ... Putting an index on an index is pretty horrible. But quite frankly, if you actually require a second index on a table, then B-Tree organized tables are not what you need. But if we had clustered the table on (id, timestamp), then the data would be in right order for queries, if cluster worked well. Right, and it doesn't even have to be in perfect index order really, as what I'm really trying to do is significantly decrease the amount of buffers required for a typical query. Having to order the tuples later is a small matter. Yeah, index-organized tables seems exact fit for your problem, but then my abstract idea of what clustering should do is exactly that - keep tuples in roughly the same order as an index ;) It would be a good approximation of what index-organizing aims to accomplish. So what really is needed is a smart tuple-placer which can keep tuples that are close (as defined by index) together in a small number of pages. These pages themselves need not be coninuous, they can be sprinkled around in the whole data table, but they need to stay clusters of index-close tuples. More or less, yes. Cheers, -James Rogers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] initdb
Andrew Dunstan [EMAIL PROTECTED] writes: I now have a C implementation of initdb, ... There's a little work still to go (see below), but I'd appreciate some eyeballs on it to see if I have made any major booboos, or could have done things better. What's the best way to proceed? Put it on a web page and post the link, or if you don't have a website, send it to pgsql-patches (clearly marked as not ready to apply). regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)
Hannu Krosing [EMAIL PROTECTED] writes: Christopher Browne kirjutas R, 03.10.2003 kell 00:57: A while back I outlined how this would have to be done, and for it to be done efficiently, it would be anything BUT simple. Could this be made a TODO item, perhaps with your attack plan. If I recall that discussion correctly, no one including Christopher thought the attack plan was actually reasonable. What this keeps coming down to is that an optimization that helps only COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in development and maintenance effort to justify its existence. At least if you insist on an exact, MVCC-correct answer. So far as I've seen, the actual use cases for unqualified COUNT(*) could be handled equally well by an approximate answer. What we should be doing rather than wasting large amounts of time trying to devise exact solutions is telling people to look at pg_class.reltuples for approximate answers. We could also be looking at beefing up support for that approach --- maybe provide some syntactic sugar for the lookup, maybe see if we can update reltuples in more places than we do now, make sure that the autovacuum daemon includes keep reltuples accurate as one of its design goals, etc etc. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function
Tom Lane kirjutas L, 04.10.2003 kell 19:07: Hannu Krosing [EMAIL PROTECTED] writes: Christopher Browne kirjutas R, 03.10.2003 kell 00:57: A while back I outlined how this would have to be done, and for it to be done efficiently, it would be anything BUT simple. Could this be made a TODO item, perhaps with your attack plan. If I recall that discussion correctly, no one including Christopher thought the attack plan was actually reasonable. What this keeps coming down to is that an optimization that helps only COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in development and maintenance effort to justify its existence. Please read further in my email ;) The point I was trying to make was that faster count(*)'s is just a side effect. If we could (conditionally) keep visibility info in indexes, then this would also solve the problem fo much more tricky question of index-structured tables. Count(*) is *not* the only query that could benefit from not needing to go to actual data table for visibilty info, The much more needed case would be the inveres time series type of queries, which would otherways trash cache pages badly. Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd and Bundled ...)
Tom Lane [EMAIL PROTECTED] writes: (BTW, on my OS X machine, with out-of-the-box configuration, initdb selects shared_buffers 400 and max_connections 20. I'm guessing that you had either a nondefault shared memory limit, or some other process using shared memory.) This points out another issue with this approach of probing for the maximum shared memory. There might be another program using shared memory when the probe is done -- or worse when the database is started but *not* when the probe is run. Perhaps the shared_buffers should only be set to 50% of the maximum size probed? That would a) give postgres a decent chance of starting even if some other program uses some amount of the shared memory between initdb and database starting. and b) leave enough memory for a reasonable max_connections? Anyone see a better way? Switch everything to mmap and pthreads and dump all this antiquated SysV IPC and semaphore junk? *DUCK* -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Index/Function organized table layout
James Rogers [EMAIL PROTECTED] writes: On 10/2/03 11:34 PM, Hannu Krosing [EMAIL PROTECTED] wrote: What I actually thought I was describing is how CLUSTER should work in a postgres flavour of MVCC storage ;). Not the CLUSTER command, but the whole feature. Yeah, I can see this. Clustering doesn't really imply ordering, just tuple proximity in the heap. So I'm a bit confused about the term Clustering. It seems Postgres uses it to mean simply ordering the tuple storage within an otherwise normal table. However in other databases it seems to mean something more complex. I've never used it in Oracle, but from what I read it seems Oracle thinks clustering means storing the tuples for one table in the heap for *another* table entirely. The typical usage scenario envisioned is to store the tuples for child records near the parent record in a related table. Ie, say you have a users table with an ACL list, each user has possibly many ACL entries on the ACL list. You could cluster the ACL entry list onto the users table so that the entries for a given user would be stored *in the users table heap* near the record for that user. I've never seen anyone use this feature, and I never seriously considered it myself. It sort of has the feel of an antiquated feature that traded too much flexibility and abstraction for raw performance on very slow disk hardware. However I wonder if the nested tables feature doesn't use it under the hood though. It seems they would both be useful for the same types of tables. I'm not sure what this means for Postgres. I'm not sure if Postgres should use a different name to avoid confusion and possibly to leave room in the future for the possibility of supporting something like this. Or perhaps something like this would be useful for Postgres now or in the near future? Or perhaps the consensus is as I said, that this is an old idea that no longer gets any respect and postgres should just pretend it doesn't exist? -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd and Bundled ...)
Greg Stark [EMAIL PROTECTED] writes: Perhaps the shared_buffers should only be set to 50% of the maximum size probed? I think it's reasonable to expect the DBA to make any adjustments needed for changes in environment. Partly this is because I don't see any defensible way to do otherwise --- your 50% slop figure is without foundation in terms of what might really be going on --- and partly because we'd be handicapping ourselves unnecessarily if there *aren't* any subsequent changes in environment. On machines where shared memory actually gets used for anything by default, I think that the default limits are likely to be fairly sane. If shared memory is tight, then very likely Postgres is the only thing on the machine that's going to want it. We might as well use what we can get. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Index/Function organized table layout
Greg Stark [EMAIL PROTECTED] writes: So I'm a bit confused about the term Clustering. It seems Postgres uses it to mean simply ordering the tuple storage within an otherwise normal table. However in other databases it seems to mean something more complex. My take is that clustering means not only placing related tuples near each other, but taking steps to preserve that organization over time. PG is really misusing the term because our current CLUSTER command does the first but not the second. If tuple insertions were to try to preserve the heap ordering induced by the latest CLUSTER command, then I think we'd have something closer to what is usually meant by the term. I've never used it in Oracle, but from what I read it seems Oracle thinks clustering means storing the tuples for one table in the heap for *another* table entirely. I think that's an implementation detail rather than the fundamental meaning. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: max_connections/shared_buffers (was Re: [HACKERS] Beta4 Tag'd
Anyone see a better way? Switch everything to mmap and pthreads and dump all this antiquated SysV IPC and semaphore junk? *DUCK* You are a brave soult. I salute you. Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC - S/JDBC Postgresql support, programming, shared hosting and dedicated hosting. +1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL.Org - Editor-N-Chief - http://www.postgresql.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)
Hannu Krosing [EMAIL PROTECTED] writes: The point I was trying to make was that faster count(*)'s is just a side effect. If we could (conditionally) keep visibility info in indexes, I think that's not happening, conditionally or otherwise. The atomicity problems alone are sufficient reason why not, even before you look at the performance issues. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Uses for Index/Function organizing
On 10/4/03 2:00 AM, Hannu Krosing [EMAIL PROTECTED] wrote: If the WHERE clause could use the same index (or any index with visibility info) then there would be no need for walking through the tuples in data relation. the typical usecase cited on [HACKERS] was time series data, where inserts are roughly in (timestamp,id)order but queries in (id,timestamp) order. Now if the index would include all relevant fields (id,timestamp,data1,data2,...,dataN) then the query could run on index only touching just a few pages and thus vastly improving performance. I agree that this is not something everybody needs, but when it is needed it is needed bad. I would add that automatically index-organizing tuples isn't just useful for time-series data (though it is a good example), but can be used to substantially improve the query performance of any really large table in a number of different and not always direct ways. Once working sets routinely exceed the size of physical RAM, buffer access/utilization efficiency often becomes the center of performance tuning, but not one that many people know much about. One of the less direct ways of using btree-organized tables for improving scalability is to materialize table indexes of tables that *shouldn't* be btree-organized. Not only can you turn tables into indexes, but you can also turn indexes into tables, which can have advantages in some cases. For example, I did some scalability consulting at a well-known movie rental company with some very large Oracle databases running on big Sun boxen. One of the biggest problems was that their rental history table, which had a detailed record of every movie ever rented by every customer, had grown so large that the performance was getting painfully slow. To make matters worse, it and a few related tables had high concurrent usage, a mixture of many performance-sensitive queries grabbing windows of a customer's history plus a few broader OLAP queries which were not time sensitive. Everything was technically optimized in a relational and basic configuration sense, and the database guys at the company were at a loss on how to fix this problem. Performance of all queries was essentially bound by how fast pages could be moved between the disk and buffers. Issue #1: The history rows had quite a lot of columns and the OLAP processes used non-primary indexes, so the table was not particularly suitable for btree-organizing. Issue #2: Partitioning was not an option because it would have exceeded certain limits in Oracle (at that time, I don't know if that has changed). Issue #3: Although customer histories were being constantly queried, data needed most was really an index view of the customers history, not the details of the history itself. The solution I came up with was to use a synced btree-organized partial clone of the main history table that only contained a small number of key columns that mattered for generating customer history indexes in the applications that used them. While this substantially increased the disk space footprint for the same data (since we were cloning it), it greatly reduced the total number of cache misses for the typical query, only fetching the full history row pages when actually needed. In other words, basically concentrating more buffer traffic into a smaller number of page buffers. What we had was an exceedingly active but relatively compact materialized index of the history table that could essentially stay resident in RAM, and a much less active history table+indexes that while less likely to be buffered than before, had pages accessed at such a reduced frequency that there was a huge net performance gain because disk access plummeted. Average performance improvement for the time sensitive queries: 50-70x So btree-organized tables can do more than make tables behave like indexes. They can also make indexes behave like tables. Both are very useful in some cases when your working set exceeds the physical buffer space. For smaller databases this has much less utility and users need to understand the limitations, nonetheless when tables and databases get really big it becomes an important tool in the tool belt. Cheers, -James Rogers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [pgsql-hackers-win32] [HACKERS] initdb
Excellent idea. Here's the URL: http://www.dunslane.net/~andrew/Initdb_In_C.html cheers andrew - Original Message - From: Tom Lane [EMAIL PROTECTED] Andrew Dunstan [EMAIL PROTECTED] writes: I now have a C implementation of initdb, ... There's a little work still to go (see below), but I'd appreciate some eyeballs on it to see if I have made any major booboos, or could have done things better. What's the best way to proceed? Put it on a web page and post the link, or if you don't have a website, send it to pgsql-patches (clearly marked as not ready to apply). ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Thoughts on maintaining 7.3
On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Do we move empty index pages to the end before truncation during vacuum full? No. You'd be better off using REINDEX for that, I think. IIRC we have speculated about making VAC FULL fix the indexes via REINDEX rather than indexbulkdelete. I can't agree with that idea. Imagine having to VACUUM FULL a huge table. Not only it will take the lot required to do the VACUUM in the heap itself, it will also have to rebuild all indexes from scratch. I think there are scenarios where the REINDEX will be much worse, say when there are not too many deleted tuples (but in that case, why is the user doing VACUUM FULL in the first place?). Of course there are also scenario where the opposite is true. I wonder if VACUUM FULL could choose what method to use based on some statistics. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Vivir y dejar de vivir son soluciones imaginarias. La existencia está en otra parte (Andre Breton) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Thoughts on maintaining 7.3
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Do we move empty index pages to the end before truncation during vacuum full? No. You'd be better off using REINDEX for that, I think. IIRC we have speculated about making VAC FULL fix the indexes via REINDEX rather than indexbulkdelete. I guess my point is that if you forget to run regular vacuum for a month, then realize the problem, you can just do a VACUUM FULL and the heap is back to a perfect state as if you had been running regular vacuum all along. That is not true of indexes. It would be nice if it would. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Thoughts on maintaining 7.3
On Sat, Oct 04, 2003 at 11:17:09PM -0400, Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Do we move empty index pages to the end before truncation during vacuum full? No. You'd be better off using REINDEX for that, I think. IIRC we have speculated about making VAC FULL fix the indexes via REINDEX rather than indexbulkdelete. I guess my point is that if you forget to run regular vacuum for a month, then realize the problem, you can just do a VACUUM FULL and the heap is back to a perfect state as if you had been running regular vacuum all along. That is not true of indexes. It would be nice if it would. In this scenario, the VACUUM FULL-does-REINDEX idea would be the perfect fit because it will probably be much faster than doing indexbulkdelete. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Endurecerse, pero jamás perder la ternura (E. Guevara) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Thoughts on maintaining 7.3
Alvaro Herrera [EMAIL PROTECTED] writes: On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote: No. You'd be better off using REINDEX for that, I think. IIRC we have speculated about making VAC FULL fix the indexes via REINDEX rather than indexbulkdelete. I can't agree with that idea. Why not? There is plenty of anecdotal evidence in the archives saying that it's faster to drop indexes, VACUUM FULL, recreate indexes than to VACUUM FULL with indexes in place. Most of those reports date from before we had the lazy-vacuum alternative, but I don't think that renders them less relevant. Imagine having to VACUUM FULL a huge table. Not only it will take the lot required to do the VACUUM in the heap itself, it will also have to rebuild all indexes from scratch. A very large chunk of VACUUM FULL's runtime is spent fooling with the indexes. Have you looked at the code in any detail? It goes like this: 1. Scan heap looking for dead tuples and free space. 2. Make a pass over the indexes to delete index entries for dead tuples. 3. Copy remaining live tuples to lower-numbered pages to compact heap. 3a. Every time we copy a tuple, make new index entries pointing to its new location. (The old index entries still remain, though.) 4. Commit transaction so that new copies of moved tuples are good and old ones are not. 5. Make a pass over the indexes to delete index entries for old copies of moved tuples. When there are only a few tuples being moved, this isn't too bad of a strategy. But when there are lots, steps 2, 3a, and 5 represent a huge amount of work. What's worse, step 3a swells the index well beyond its final size. This used to mean permanent index bloat. Nowadays step 5 will be able to recover some of that space --- but not at zero cost. I think it's entirely plausible that dropping steps 2, 3a, and 5 in favor of an index rebuild at the end could be a winner. I think there are scenarios where the REINDEX will be much worse, say when there are not too many deleted tuples (but in that case, why is the user doing VACUUM FULL in the first place?). Yeah, I think that's exactly the important point. These days there's not a lot of reason to do VACUUM FULL unless you have a major amount of restructuring to do. I would once have favored maintaining two code paths with two strategies, but now I doubt it's worth the trouble. (Or I should say, we have two code paths, the other being lazy VACUUM --- do we need three?) regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Thoughts on maintaining 7.3
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No. You'd be better off using REINDEX for that, I think. I guess my point is that if you forget to run regular vacuum for a month, then realize the problem, you can just do a VACUUM FULL and the heap is back to a perfect state as if you had been running regular vacuum all along. That is not true of indexes. It would be nice if it would. A VACUUM FULL that invoked REINDEX would accomplish that *better* than one that didn't, because of the problem of duplicate entries for moved tuples. See my response just now to Alvaro. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] COUNT(*) again (was Re: [HACKERS] Index/Function organized
Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: The point I was trying to make was that faster count(*)'s is just a side effect. If we could (conditionally) keep visibility info in indexes, I think that's not happening, conditionally or otherwise. The atomicity problems alone are sufficient reason why not, even before you look at the performance issues. What are the atomicity problems of adding a create/expire xid to the index tuples? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Thoughts on maintaining 7.3
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: No. You'd be better off using REINDEX for that, I think. I guess my point is that if you forget to run regular vacuum for a month, then realize the problem, you can just do a VACUUM FULL and the heap is back to a perfect state as if you had been running regular vacuum all along. That is not true of indexes. It would be nice if it would. A VACUUM FULL that invoked REINDEX would accomplish that *better* than one that didn't, because of the problem of duplicate entries for moved tuples. See my response just now to Alvaro. Right, REINDEX is closer to what you expect VACUUM FULL to be doing --- it mimicks the heap result of full compaction. I think Alvero's point is that if you are doing VACUUM FULL on a large table with only a few expired tuples, the REINDEX could take a while, which would seem strange considering you only have a few expired tuples --- maybe we should reindex only if +10% of the heap rows are expired, or the index contains +10% empty space, or something like that. Of course, that is very abitrary, but only VACUUM knows how many rows it is moving --- the user typically will not know that. In an extreme case with always REINDEX, I can imagine a site that is doing only VACUUM FULL at night, but no regular vacuums, and they find they can't do VACUUM FULL at night anymore because it is taking too long. By doing REINDEX always, we eliminate some folks are are happy doing VACUUM FULL at night, because very few tuples are expired. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Open 7.4 items
P O S T G R E S Q L 7 . 4 O P E NI T E M S Current at ftp://momjian.postgresql.org/pub/postgresql/open_items. Changes --- Fix REVOKE ALL ON FUNCTION error when removing owner permissions Improve speed of building of constraints during restore What to do about exposing the list of possible SQLSTATE error codes Documentation Changes - Move release notes to SGML Freeze message strings -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Open 7.4 items
On Sun, 5 Oct 2003, Bruce Momjian wrote: Improve speed of building of constraints during restore Did we get consensus on what to do with this, whether we're doing only the superuser option to not check, only speeding up fk constraint checks by using a statement instead of the repeated calls, both, or something else? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html