Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Mon, Oct 20, 2008 at 12:05 PM, Julius Stroffek [EMAIL PROTECTED]wrote: Topics that seem to be of interest and most of them were already discussed at developers meeting in Ottawa are 1.) parallel sorts 2.) parallel query execution 3.) asynchronous I/O 4.) parallel COPY 5.) parallel pg_dump 6.) using threads for parallel processing [...] 2.) Different subtrees (or nodes) of the plan could be executed in parallel on different CPUs and the results of this subtrees could be requested either synchronously or asynchronously. I don't see why multiple CPUs can't work on the same node of a plan. For instance, consider a node involving a scan with an expensive condition, like UTF-8 string length. If you have four CPUs you can bring to bear, each CPU could take every fourth page, computing the expensive condition for each tuple in that page. The results of the scan can be retired asynchronously to the next node above. -jwb
Re: [HACKERS] Block-level CRC checks
On Tue, Sep 30, 2008 at 1:41 PM, Bruce Momjian [EMAIL PROTECTED] wrote: Alvaro Herrera wrote: A customer of ours has been having trouble with corrupted data for some time. Of course, we've almost always blamed hardware (and we've seen RAID controllers have their firmware upgraded, among other actions), but the useful thing to know is when corruption has happened, and where. So we've been tasked with adding CRCs to data files. Maybe a stupid question, but what I/O subsystems corrupt data and fail to report it? Practically all of them. Here is a good paper on various checksums, their failure rates, and practical applications. Parity Lost and Parity Regained http://www.usenix.org/event/fast08/tech/full_papers/krioukov/krioukov_html/index.html -jwb
Re: [HACKERS] 8.3.1 autovacuum stopped doing anything months ago
On Fri, Sep 19, 2008 at 11:42 AM, Robert Treat [EMAIL PROTECTED] wrote: On Friday 19 September 2008 00:23:34 Jeffrey Baker wrote: Anyway, I have some issues. One, of course, is that the autovacuum should not have been deadlocked or otherwise stalled like that. Perhaps it needs a watchdog of some kind. Has anyone else experienced an issue like that in 8.3.1? The only thing I can see in the release notes that indicates this problem may have been fixed is the following: We have several checks in the check_postgres script which are in this area Are you referring to the nagios plugin? I already use it, and nagios didn't make a peep. Perhaps I should check for a more recent revision. -jwb
[HACKERS] 8.3.1 autovacuum stopped doing anything months ago
I have an 8.3.1 instance on Linux and since June 29th the autovacuum process has claimed to be working on the same three tables. That's OK, I am a very patient man, and these are very large tables. Today I started to get transaction wraparound warnings, so I go and check it out. Turns out the autovacuum processes are all just doing nothing. When I strace them, they are all three blocked on syscalls. So I restart the database and run a vacuum. Of course, once the wraparound warning is reached, there's no way to disable the autovac, so now my vacuum maintenance job is competing with three invulnerable autovacuum processes. I am thinking of sending them SIGSTOP. Anyway, I have some issues. One, of course, is that the autovacuum should not have been deadlocked or otherwise stalled like that. Perhaps it needs a watchdog of some kind. Has anyone else experienced an issue like that in 8.3.1? The only thing I can see in the release notes that indicates this problem may have been fixed is the following: Repair two places where SIGTERM exit of a backend could leave corrupted state in shared memory (Tom) However I don't know who or what would have sent SIGTERM to the autovacuum children. Secondly, there really does need to be an autovacuum=off,really,thanks so that my maintenance can proceed without competition for i/o resources. Is there any way to make that happen? Is my SIGSTOP idea dangerous? -jwb
[HACKERS] proposal for smaller indexes on index-ordered tables
The way I read it, the current btree index stores the index value and the TID of every tuple having that value. When you have a table with three columns, you index one of them and you get an index which is practically as large as the table itself. Supposing the table is generally or strictly ordered by the column to be indexed, it would be more compact if the index stored ranges of tuples. Instead of storing the TID of every tuple with that value, the index would store a first and last TID, between which all tuples have the value. Example: table with one million rows indexed on a column having one thousand distinct values. Table is in-order by the indexed column. The traditional index would contain a million TIDs, whereas a range index would contain only two thousand. The range index would be 500 times smaller, more likely to be cached, etc. Thoughts? -jwb
Re: [HACKERS] proposal for smaller indexes on index-ordered tables
On Tue, Jun 24, 2008 at 1:59 PM, Zoltan Boszormenyi [EMAIL PROTECTED] wrote: Jeffrey Baker írta: The way I read it, the current btree index stores the index value and the TID of every tuple having that value. When you have a table with three columns, you index one of them and you get an index which is practically as large as the table itself. Supposing the table is generally or strictly ordered by the column to be indexed, it would be more compact if the index stored ranges of tuples. Instead of storing the TID of every tuple with that value, the index would store a first and last TID, between which all tuples have the value. Example: table with one million rows indexed on a column having one thousand distinct values. Table is in-order by the indexed column. The traditional index would contain a million TIDs, whereas a range index would contain only two thousand. The range index would be 500 times smaller, more likely to be cached, etc. Thoughts? -jwb Example with your theory: One (not yet committed) transaction changes one tuple that was in the middle of a range before but the tuple's indexed column changed. What would you do? Insert the new tuple at the end of the table and add another range to the index. Leave the old tuple in place and don't touch the original index range. You need to keep track of multiple index versions: 1. the range has to be split for the not-yet-committed modifier transaction, it might need to re-read the same table. 2. the old range has to be kept for reader transactions that still see the old data This is only true if you update the tuple in-place. Imagine you have thousands of UPDATEs in flight on different rows. I'm quite aware of the problems of maintaining such a table and index, but the fact is that data warehouse type tables may never be updated after being created. The particular application I'm struggling with does a SELECT ... INTO ... ORDER BY to make an ordered table for querying every night. The problem is it takes longer, much longer, to create the index than to create the table, and in the end the index is as big as half the table anyway. So this type of index would only be useful for an essentially read-only table. I agree. Quite another proposal would be to somehow instruct the database that the table is strictly in-order by a column and allow a binary search access method. Then you don't need any index at all. -jwb
Re: [HACKERS] proposal for smaller indexes on index-ordered tables
On Tue, Jun 24, 2008 at 2:38 PM, Tom Lane [EMAIL PROTECTED] wrote: Jeffrey Baker [EMAIL PROTECTED] writes: I'm quite aware of the problems of maintaining such a table and index, but the fact is that data warehouse type tables may never be updated after being created. The particular application I'm struggling with does a SELECT ... INTO ... ORDER BY to make an ordered table for querying every night. The problem is it takes longer, much longer, to create the index than to create the table, and in the end the index is as big as half the table anyway. There's something wrong with that: sorting the table rows surely ought to take longer than sorting the same number of (smaller) index entries. Have you done any profiling to find out what the problem is? Perhaps there's something wrong with the setting of maintenance_work_mem (vs work_mem). For this query, work_mem is 100MB and maintenance_work_mem is 1GB, on a system with 8GB of memory. Notably I just installed a new storage subsystem and upgraded to 8.3.1 less than a week ago, so my experience with this instance is somewhat limited. Creating the table in this case takes half an hour and then indexing it requires almost an hour. Subsequently analyzing the table takes less than a minute, with statistics set to maximum. Query performance is excellent. I was just brainstorming on ways to save time on the creation. -jwb
Re: [HACKERS] proposal for smaller indexes on index-ordered tables
On Tue, Jun 24, 2008 at 3:08 PM, Tom Lane [EMAIL PROTECTED] wrote: Kevin Grittner [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Now, *why* it is a mistake is interesting to speculate about, but let's confirm the theory first. Could this be related to hint bit rewrites during indexing? If so, changing maintenance_work_mem won't improve the situation. What I personally suspect is that Jeff's index build is swapping like crazy, or else there's just some problem in the sort code for such a large sort arena. But let's get some evidence about how the index build time varies with maintenance_work_mem before jumping to conclusions. Well it definitely isn't that, because the machine doesn't even have a swap area defined. vmstat during the table creation and index creation look really quite different. During the table sort there's a heavy r/w traffic 12-20MB/s, during the index creation it's lower. But seem to be CPU limited (i.e. one CPU is maxed out the whole time, and iowait is not very high). I guess nobody has any interest in my proposal, only in the departure of my described experience from expected behavior :-(
[HACKERS] pg_dump fails to include sequences, leads to restore fail in any version
It is impossible to dump (with pg_dump -Ocx) and restore (with psql) a database which contains sequences in any of 8.1, 8.2, or 8.3: [...] -- -- Name: transaction_transaction_id_seq; Type: SEQUENCE SET; Schema: mercado; Owner: prod -- SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('transaction_backup', 'transaction_id'), 6736138, true); -- -- Name: transaction; Type: TABLE; Schema: mercado; Owner: prod; Tablespace: -- CREATE TABLE transaction ( transaction_id integer DEFAULT nextval('transaction_transaction_id_seq'::regclass) NOT NULL, buyer_account_id integer, seller_account_id integer, date date, item_id integer, source text ); [...] 2008-06-16 19:26:41 PDT ERROR: relation transaction_transaction_id_seq does not exist Why? Because pg_dump mysteriously omits all sequences: think=# \d transaction_transaction_id_seq Sequence mercado.transaction_transaction_id_seq Column | Type ---+- sequence_name | name last_value| bigint increment_by | bigint max_value | bigint min_value | bigint cache_value | bigint log_cnt | bigint is_cycled | boolean is_called | boolean think=# \ds List of relations Schema |Name| Type | Owner -++--+--- mercado | account_account_id_seq | sequence | prod mercado | account_stat_account_stat_id_seq | sequence | prod mercado | category_category_id_seq | sequence | prod mercado | category_stat_category_stat_id_seq | sequence | prod mercado | country_country_id_seq | sequence | prod mercado | country_stat_country_stat_id_seq | sequence | prod mercado | dict_dict_id_seq | sequence | prod mercado | expire_icon_expire_icon_id_seq | sequence | prod mercado | expire_time_expire_time_id_seq | sequence | prod mercado | fx_fx_id_seq | sequence | prod mercado | icon_icon_id_seq | sequence | prod mercado | item_icon_item_icon_id_seq | sequence | prod mercado | item_item_id_seq | sequence | prod mercado | item_stat_item_stat_id_seq | sequence | prod mercado | transaction_transaction_id_seq | sequence | prod (15 rows) [EMAIL PROTECTED]:~$ pg_dump -s -n mercado think | grep CREATE\ SEQUENCE [EMAIL PROTECTED]:~$ Therefore when the restore is attempted, the table using the sequence as default value cannot be created.
Re: [HACKERS] pg_dump fails to include sequences, leads to restore fail in any version
On Tue, Jun 17, 2008 at 10:59 AM, Tom Lane [EMAIL PROTECTED] wrote: Jeffrey Baker [EMAIL PROTECTED] writes: It is impossible to dump (with pg_dump -Ocx) and restore (with psql) a database which contains sequences in any of 8.1, 8.2, or 8.3: I should think we would have heard about it before now if such a sweeping claim were true. Last time this problem came up, in August, you dismissed it somewhat rudely. So perhaps the lack of reports is due more to perception than any other thing. What I suspect is that you are using 8.1's pg_dump, and you have tripped over one of the corner cases that made us redesign dumping of serial sequences for 8.2. Do you get better results if you dump the problem database with 8.2 or 8.3 pg_dump? What's the corner case exactly? 8.3 dumps it correctly, but that's not really much of a consolation because I need to restore _this_ dump, not some other one. It was necessary for me to recreate all the sequences and set the curvals manually. Can't the fix be backported to 8.1? -jwb
Re: [HACKERS] pg_dump fails to include sequences, leads to restore fail in any version
On Tue, Jun 17, 2008 at 2:43 PM, Alvaro Herrera [EMAIL PROTECTED] wrote: Jeffrey Baker escribió: On Tue, Jun 17, 2008 at 10:59 AM, Tom Lane [EMAIL PROTECTED] wrote: Jeffrey Baker [EMAIL PROTECTED] writes: It is impossible to dump (with pg_dump -Ocx) and restore (with psql) a database which contains sequences in any of 8.1, 8.2, or 8.3: I should think we would have heard about it before now if such a sweeping claim were true. Last time this problem came up, in August, you dismissed it somewhat rudely. So perhaps the lack of reports is due more to perception than any other thing. How did you set it up exactly? I have no problem with this situation: [snip] The table was originally created this way: CREATE TABLE transaction ( transaction_id SERIAL PRIMARY KEY, buyer_account_idINTEGER, seller_account_id INTEGER, dateDATE, item_id INTEGER, source TEXT ); However, when dumped with pg_dump 8.1, it comes out this way: CREATE TABLE transaction ( transaction_id integer DEFAULT nextval('transaction_transaction_id_seq'::regclass) NOT NULL, buyer_account_id integer, seller_account_id integer, date date, item_id integer, source text ); .. and the sequence does not get dumped with it. -jwb
Re: [HACKERS] pg_dump fails to include sequences, leads to restore fail in any version
On Tue, Jun 17, 2008 at 6:31 PM, Tom Lane [EMAIL PROTECTED] wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Jeffrey Baker escribió: The table was originally created this way: Okay, but was it created on 8.1 or was it already created on an older version and restored? I don't see this behavior if I create it in 8.1 -- the field is dumped as SERIAL, unlike what you show. There's something interesting in the original report: -- -- Name: transaction_transaction_id_seq; Type: SEQUENCE SET; Schema: mercado; Owner: prod -- SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('transaction_backup', ^^ 'transaction_id'), 6736138, true); So pg_dump found a pg_depend entry linking that sequence to some table named transaction_backup, not transaction. That explains why transaction isn't being dumped using a SERIAL keyword --- it's not linked to this sequence. But how things got this way is not apparent from the stated facts. Hrmm, I think that's a bit of a red herring. I probably should not have pasted that part of the dump, because it's misleading. There really is a table transaction_backup, definition is the same as transaction. Reading from that part of the dump again, just for clarity: -- -- Name: transaction_backup; Type: TABLE; Schema: mercado; Owner: prod; Tablespace: -- CREATE TABLE transaction_backup ( transaction_id serial NOT NULL, buyer_account_id integer, seller_account_id integer, date date, item_id integer, source text ); ALTER TABLE mercado.transaction_backup OWNER TO prod; -- -- Name: transaction_transaction_id_seq; Type: SEQUENCE SET; Schema: mercado; Owner: prod -- SELECT pg_catalog.setval(pg_catalog.pg_get_serial_sequence('transaction_backup', 'transaction_id'), 6736139, true); -- -- Name: transaction; Type: TABLE; Schema: mercado; Owner: prod; Tablespace: -- CREATE TABLE transaction ( transaction_id integer DEFAULT nextval('transaction_transaction_id_seq'::regclass) NOT NULL, buyer_account_id integer, seller_account_id integer, date date, item_id integer, source text ); ALTER TABLE mercado.transaction OWNER TO prod; The two tables are defined the same way, but one of them gets dumped with a SERIAL declaration and the other gets dumped with a DEFAULT nextval(). Is it possible that pg_dump became confused if transaction was renamed transaction_backup and then redefined? I can't guarantee that did in fact happen, but it's within the realm of possibility. I don't see the backup table in the sql source code for this product, so it's likely that it was created by a user in the course of maintenance. One possibility is that Jeffrey is getting bit by this bug or something related: http://archives.postgresql.org/pgsql-bugs/2006-07/msg00021.php I don't think it's that one. All this stuff is in the same schema (and in any case the dump file contains all schemas). There are links to some other known serial-sequence problems in 8.1 in this message: http://archives.postgresql.org/pgsql-hackers/2006-08/msg01250.php That one seems closer to the point. http://archives.postgresql.org/pgsql-hackers/2006-08/msg01250.php -jwb
[HACKERS] meaning of backend exit code 2?
I have a need to find out the meaning of a backend exiting unexpectedly with exit code 2. Leafing through the source of 8.1 I can't really find it. Is there anything in postgres which would exit with code 2, or should I be looking at libraries and junk dragged in by languages?
Re: [HACKERS] meaning of backend exit code 2?
On Wed, Jun 4, 2008 at 11:31 AM, Tom Lane [EMAIL PROTECTED] wrote: Jeffrey Baker [EMAIL PROTECTED] writes: I have a need to find out the meaning of a backend exiting unexpectedly with exit code 2. Leafing through the source of 8.1 I can't really find it. But are you running 8.1? In 8.2 and up this is the expected result from SIGQUIT. Yes, I'm running 8.1.9. I have a strong suspicion that the exit is coming either from R via pl/R or from perl via pl/perl. From my reading of the code, expected backend exit codes are 1, 0, and -1 mainly? -jwb
Re: [HACKERS] Bug in libpq causes local clients to hang
On Sun, Mar 23, 2008 at 8:35 PM, Tom Lane [EMAIL PROTECTED] wrote: Jeffrey Baker [EMAIL PROTECTED] writes: You'll note that I'm using the DBD::Pg Perl interface. So far I've never seen this happen with TCP connections, only with UNIX sockets. If it works over TCP and not over Unix socket, it's a kernel bug. The libpq code doesn't really know the difference after connection setup. The same thought occurred to me, but it could also be a race condition which the unix socket is fast enough to trigger but the TCP socket is not fast enough to trigger. I'm peeking around in the code but nothing jumps out yet. -jwb - Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in libpq causes local clients to hang
On Mon, Mar 24, 2008 at 9:24 AM, Tom Lane [EMAIL PROTECTED] wrote: Jeffrey Baker [EMAIL PROTECTED] writes: On Sun, Mar 23, 2008 at 8:35 PM, Tom Lane [EMAIL PROTECTED] wrote: If it works over TCP and not over Unix socket, it's a kernel bug. The libpq code doesn't really know the difference after connection setup. The same thought occurred to me, but it could also be a race condition which the unix socket is fast enough to trigger but the TCP socket is not fast enough to trigger. I'm peeking around in the code but nothing jumps out yet. Fairly hard to believe given that you're talking about communication between two sequential processes. Anyway I'd suggest that the first thing to do is extract a reproducible test case. It'd be useful to see if it hangs on other platforms... The stack trace doesn't actually make sense, does it? I think that (at least) the PQmblen frame is spurious. -jwb - Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug in libpq causes local clients to hang
Lately I've noticed that local (UNIX socket) clients using libpq4 8.1.9 (Debian 8.1.9-0etch1) and the same version of the server can hang forever waiting in poll(). The symptom is that the local client waits forever, using no CPU time, until it is interrupted by some event (such as attaching gdb or strace to it), after which it proceeds normally. From the server's perspective, such clients are in the state IDLE in transaction as reported via pg_stat_activity. I attached GDB to one such client, and the stack trace is as follows: #0 0x2b4f2f914d7f in poll () from /lib/libc.so.6 #1 0x2b4f3038449f in PQmblen () from /usr/lib/libpq.so.4 #2 0x2b4f30384580 in pqWaitTimed () from /usr/lib/libpq.so.4 #3 0x2b4f30383e62 in PQgetResult () from /usr/lib/libpq.so.4 #4 0x2b4f30383f3e in PQgetResult () from /usr/lib/libpq.so.4 #5 0x2b4f3025f014 in dbd_st_execute () from /usr/lib/perl5/auto/DBD/Pg/Pg.so #6 0x2b4f302548b6 in XS_DBD__Pg__db_do () from /usr/lib/perl5/auto/DBD/Pg/Pg.so #7 0x2b4f2fd201f0 in XS_DBI_dispatch () from /usr/lib/perl5/auto/DBI/DBI.so #8 0x2b4f2f310b95 in Perl_pp_entersub () from /usr/lib/libperl.so.5.8 #9 0x2b4f2f30f36e in Perl_runops_standard () from /usr/lib/libperl.so.5.8 #10 0x2b4f2f2ba7dc in perl_run () from /usr/lib/libperl.so.5.8 #11 0x004017ac in main () You'll note that I'm using the DBD::Pg Perl interface. So far I've never seen this happen with TCP connections, only with UNIX sockets. I see it with about 1 in 100 local client invocations. As a workaround I've configured my local clients to use TCP anyway, and this seems to solve the problem. Is this something that might have been fixed in a post-8.1 version of libpq? -jwb - Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in libpq causes local clients to hang
On Sun, Mar 23, 2008 at 7:12 PM, Jeffrey Baker [EMAIL PROTECTED] wrote: Lately I've noticed that local (UNIX socket) clients using libpq4 8.1.9 (Debian 8.1.9-0etch1) and the same version of the server can hang forever waiting in poll(). The symptom is that the local client waits forever, using no CPU time, until it is interrupted by some event (such as attaching gdb or strace to it), after which it proceeds normally. From the server's perspective, such clients are in the state IDLE in transaction as reported via pg_stat_activity. I attached GDB to one such client, and the stack trace is as follows: #0 0x2b4f2f914d7f in poll () from /lib/libc.so.6 #1 0x2b4f3038449f in PQmblen () from /usr/lib/libpq.so.4 #2 0x2b4f30384580 in pqWaitTimed () from /usr/lib/libpq.so.4 #3 0x2b4f30383e62 in PQgetResult () from /usr/lib/libpq.so.4 #4 0x2b4f30383f3e in PQgetResult () from /usr/lib/libpq.so.4 #5 0x2b4f3025f014 in dbd_st_execute () from /usr/lib/perl5/auto/DBD/Pg/Pg.so #6 0x2b4f302548b6 in XS_DBD__Pg__db_do () from /usr/lib/perl5/auto/DBD/Pg/Pg.so #7 0x2b4f2fd201f0 in XS_DBI_dispatch () from /usr/lib/perl5/auto/DBI/DBI.so #8 0x2b4f2f310b95 in Perl_pp_entersub () from /usr/lib/libperl.so.5.8 #9 0x2b4f2f30f36e in Perl_runops_standard () from /usr/lib/libperl.so.5.8 #10 0x2b4f2f2ba7dc in perl_run () from /usr/lib/libperl.so.5.8 #11 0x004017ac in main () Following up to myself, I note that a very similar issue was reported, with a very similar stack, only two days ago, with subject ecpg program getting stuck archived at http://groups.google.com/group/pgsql.general/browse_thread/thread/0b7ede57faad803e/9abfd7ab1b7e1d86 -jwb - Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap scans, btree scans, and tid order
Tom Lane wrote: Jeffrey W. Baker [EMAIL PROTECTED] writes: I see that Tom has already done the infrastructure work by adding getmulti, but getmulti isn't used by nodeIndexscan.c, only nodeBitmapIndexscan.c. Will btree index scans be executed by creating in-memory bitmaps in 8.1, or will some scans still be executed the usual way? We aren't going to remove the existing indexscan behavior, because bitmap scans lose the ordering of the underlying index. There are many situations where that ordering is important. (See for instance the recent changes to make MAX/MIN use that behavior.) Would you take a patch that retained the optimized executions of plans returning 1 tuple and also fixed the random heap problem? -jwb ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] bitmap scans, btree scans, and tid order
Neil Conway wrote: Jeffrey Baker wrote: Would you take a patch that retained the optimized executions of plans returning 1 tuple and also fixed the random heap problem? Can you elaborate on what you're proposing? Obviously sorted b+-tree output is important for a lot more than just min()/max(). I don't see an obvious way to produce sorted output from a bitmap tree index scan without requiring an additional sort step (which would be rather pointless -- the whole point of the optimization is to avoid an additional sort). I understand the importance of returning tuples in index order for many plans (although I probably haven't thought of all the cases. min/max is the most obvious; order by + limit is another). The only problem I'm trying to solve is when an indexscan returns a large result, causing the heap to be visited in index order, which is to say random order, from the disk's perspective. When I investigated this last year, sorting the intermediate result of the index scan in disk order was good for a reduction by two-thirds in actual execution time, and sorting the scan result in chunks of 1000 tuples was enough to reduce the time by half. I'm considering one of the following courses of action: Change nodeIndexscan.c to call index_getmulti, and to handle multiple tuples returned. That code would sort the tuple array and store the tuples in the result in disk order. -or- Change the planner/executor to use the bitmap scan in all cases where index order is unimportant. From my reading of the current code, the bitmap scan is only used in case of a join. -jwb ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly