Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!

2008-10-20 Thread Jeffrey Baker
On Mon, Oct 20, 2008 at 12:05 PM, Julius Stroffek
[EMAIL PROTECTED]wrote:

 Topics that seem to be of interest and most of them were already
 discussed at developers meeting in Ottawa are
 1.) parallel sorts
 2.) parallel query execution
 3.) asynchronous I/O
 4.) parallel COPY
 5.) parallel pg_dump
 6.) using threads for parallel processing

[...]

 2.)
 Different subtrees (or nodes) of the plan could be executed in parallel
 on different CPUs and the results of this subtrees could be requested
 either synchronously or asynchronously.


I don't see why multiple CPUs can't work on the same node of a plan.  For
instance, consider a node involving a scan with an expensive condition, like
UTF-8 string length.  If you have four CPUs you can bring to bear, each CPU
could take every fourth page, computing the expensive condition for each
tuple in that page.  The results of the scan can be retired asynchronously
to the next node above.

-jwb


Re: [HACKERS] Block-level CRC checks

2008-09-30 Thread Jeffrey Baker
On Tue, Sep 30, 2008 at 1:41 PM, Bruce Momjian [EMAIL PROTECTED] wrote:

 Alvaro Herrera wrote:
  A customer of ours has been having trouble with corrupted data for some
  time.  Of course, we've almost always blamed hardware (and we've seen
  RAID controllers have their firmware upgraded, among other actions), but
  the useful thing to know is when corruption has happened, and where.
 
  So we've been tasked with adding CRCs to data files.

 Maybe a stupid question, but what I/O subsystems corrupt data and fail
 to report it?


Practically all of them.  Here is a good paper on various checksums, their
failure rates, and practical applications.

Parity Lost and Parity Regained
http://www.usenix.org/event/fast08/tech/full_papers/krioukov/krioukov_html/index.html

-jwb


Re: [HACKERS] 8.3.1 autovacuum stopped doing anything months ago

2008-09-19 Thread Jeffrey Baker
On Fri, Sep 19, 2008 at 11:42 AM, Robert Treat [EMAIL PROTECTED]
 wrote:

 On Friday 19 September 2008 00:23:34 Jeffrey Baker wrote:
  Anyway, I have some issues.  One, of course, is that the autovacuum
 should
  not have been deadlocked or otherwise stalled like that.  Perhaps it
 needs
  a watchdog of some kind.  Has anyone else experienced an issue like that
 in
  8.3.1?  The only thing I can see in the release notes that indicates this
  problem may have been fixed is the following:
 

 We have several checks in the check_postgres script which are in this area


Are you referring to the nagios plugin?  I already use it, and nagios didn't
make a peep.  Perhaps I should check for a more recent revision.

-jwb


[HACKERS] 8.3.1 autovacuum stopped doing anything months ago

2008-09-18 Thread Jeffrey Baker
I have an 8.3.1 instance on Linux and since June 29th the autovacuum process
has claimed to be working on the same three tables.  That's OK, I am a very
patient man, and these are very large tables.  Today I started to get
transaction wraparound warnings, so I go and check it out.  Turns out the
autovacuum processes are all just doing nothing.  When I strace them, they
are all three blocked on syscalls.

So I restart the database and run a vacuum.  Of course, once the wraparound
warning is reached, there's no way to disable the autovac, so now my vacuum
maintenance job is competing with three invulnerable autovacuum processes.
I am thinking of sending them SIGSTOP.

Anyway, I have some issues.  One, of course, is that the autovacuum should
not have been deadlocked or otherwise stalled like that.  Perhaps it needs a
watchdog of some kind.  Has anyone else experienced an issue like that in
8.3.1?  The only thing I can see in the release notes that indicates this
problem may have been fixed is the following:

Repair two places where SIGTERM exit of a backend could leave corrupted
state in shared memory (Tom)

However I don't know who or what would have sent SIGTERM to the autovacuum
children.

Secondly, there really does need to be an autovacuum=off,really,thanks so
that my maintenance can proceed without competition for i/o resources.  Is
there any way to make that happen?  Is my SIGSTOP idea dangerous?

-jwb


[HACKERS] proposal for smaller indexes on index-ordered tables

2008-06-24 Thread Jeffrey Baker
The way I read it, the current btree index stores the index value and the
TID of every tuple having that value.  When you have a table with three
columns, you index one of them and you get an index which is practically as
large as the table itself.

Supposing the table is generally or strictly ordered by the column to be
indexed, it would be more compact if the index stored ranges of tuples.
Instead of storing the TID of every tuple with that value, the index would
store a first and last TID, between which all tuples have the value.

Example: table with one million rows indexed on a column having one thousand
distinct values.  Table is in-order by the indexed column.  The traditional
index would contain a million TIDs, whereas a range index would contain only
two thousand.  The range index would be 500 times smaller, more likely to be
cached, etc.

Thoughts?

-jwb


Re: [HACKERS] proposal for smaller indexes on index-ordered tables

2008-06-24 Thread Jeffrey Baker
On Tue, Jun 24, 2008 at 1:59 PM, Zoltan Boszormenyi [EMAIL PROTECTED] wrote:

 Jeffrey Baker írta:
  The way I read it, the current btree index stores the index value and
  the TID of every tuple having that value.  When you have a table with
  three columns, you index one of them and you get an index which is
  practically as large as the table itself.
 
  Supposing the table is generally or strictly ordered by the column to
  be indexed, it would be more compact if the index stored ranges of
  tuples.  Instead of storing the TID of every tuple with that value,
  the index would store a first and last TID, between which all tuples
  have the value.
 
  Example: table with one million rows indexed on a column having one
  thousand distinct values.  Table is in-order by the indexed column.
  The traditional index would contain a million TIDs, whereas a range
  index would contain only two thousand.  The range index would be 500
  times smaller, more likely to be cached, etc.
 
  Thoughts?
 
  -jwb

 Example with your theory:
 One (not yet committed) transaction changes one tuple
 that was in the middle of a range before but the tuple's indexed
 column changed. What would you do?


Insert the new tuple at the end of the table and add another range to the
index.  Leave the old tuple in place and don't touch the original index
range.


 You need to keep track of multiple index versions:
 1. the range has to be split for the not-yet-committed modifier
 transaction,
it might need to re-read the same table.
 2. the old range has to be kept for reader transactions that still see
 the old data


This is only true if you update the tuple in-place.


 Imagine you have thousands of UPDATEs in flight on different rows.


I'm quite aware of the problems of maintaining such a table and index, but
the fact is that data warehouse type tables may never be updated after being
created.  The particular application I'm struggling with does a SELECT ...
INTO ... ORDER BY to make an ordered table for querying every night.  The
problem is it takes longer, much longer, to create the index than to create
the table, and in the end the index is as big as half the table anyway.

So this type of index would only be useful for an essentially read-only
table.  I agree.

Quite another proposal would be to somehow instruct the database that the
table is strictly in-order by a column and allow a binary search access
method.  Then you don't need any index at all.

-jwb


Re: [HACKERS] proposal for smaller indexes on index-ordered tables

2008-06-24 Thread Jeffrey Baker
On Tue, Jun 24, 2008 at 2:38 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Jeffrey Baker [EMAIL PROTECTED] writes:
  I'm quite aware of the problems of maintaining such a table and index,
 but
  the fact is that data warehouse type tables may never be updated after
 being
  created.  The particular application I'm struggling with does a SELECT
 ...
  INTO ... ORDER BY to make an ordered table for querying every night.  The
  problem is it takes longer, much longer, to create the index than to
 create
  the table, and in the end the index is as big as half the table anyway.

 There's something wrong with that: sorting the table rows surely ought
 to take longer than sorting the same number of (smaller) index entries.
 Have you done any profiling to find out what the problem is?  Perhaps
 there's something wrong with the setting of maintenance_work_mem (vs
 work_mem).


For this query, work_mem is 100MB and maintenance_work_mem is 1GB, on a
system with 8GB of memory.  Notably I just installed a new storage subsystem
and upgraded to 8.3.1 less than a week ago, so my experience with this
instance is somewhat limited.  Creating the table in this case takes half an
hour and then indexing it requires almost an hour.  Subsequently analyzing
the table takes less than a minute, with statistics set to maximum.

Query performance is excellent.  I was just brainstorming on ways to save
time on the creation.

-jwb


Re: [HACKERS] proposal for smaller indexes on index-ordered tables

2008-06-24 Thread Jeffrey Baker
On Tue, Jun 24, 2008 at 3:08 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Kevin Grittner [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] wrote:
  Now, *why* it is a mistake is interesting to speculate about, but
  let's confirm the theory first.

  Could this be related to hint bit rewrites during indexing?

 If so, changing maintenance_work_mem won't improve the situation.

 What I personally suspect is that Jeff's index build is swapping like
 crazy, or else there's just some problem in the sort code for such a
 large sort arena.  But let's get some evidence about how the index build
 time varies with maintenance_work_mem before jumping to conclusions.


Well it definitely isn't that, because the machine doesn't even have a swap
area defined.  vmstat during the table creation and index creation look
really quite different.  During the table sort there's a heavy r/w traffic
12-20MB/s, during the index creation it's lower.  But seem to be CPU limited
(i.e. one CPU is maxed out the whole time, and iowait is not very high).

I guess nobody has any interest in my proposal, only in the departure of my
described experience from expected behavior :-(


[HACKERS] pg_dump fails to include sequences, leads to restore fail in any version

2008-06-17 Thread Jeffrey Baker
It is impossible to dump (with pg_dump -Ocx) and restore (with psql) a
database which contains sequences in any of 8.1, 8.2, or 8.3:

[...]

--
-- Name: transaction_transaction_id_seq; Type: SEQUENCE SET; Schema:
mercado; Owner: prod
--

SELECT
pg_catalog.setval(pg_catalog.pg_get_serial_sequence('transaction_backup',
'transaction_id'), 6736138, true);


--
-- Name: transaction; Type: TABLE; Schema: mercado; Owner: prod; Tablespace:

--

CREATE TABLE transaction (
transaction_id integer DEFAULT
nextval('transaction_transaction_id_seq'::regclass) NOT NULL,
buyer_account_id integer,
seller_account_id integer,
date date,
item_id integer,
source text
);

[...]

2008-06-16 19:26:41 PDT ERROR:  relation transaction_transaction_id_seq
does not exist

Why?  Because pg_dump mysteriously omits all sequences:

think=# \d transaction_transaction_id_seq
Sequence mercado.transaction_transaction_id_seq
Column |  Type
---+-
 sequence_name | name
 last_value| bigint
 increment_by  | bigint
 max_value | bigint
 min_value | bigint
 cache_value   | bigint
 log_cnt   | bigint
 is_cycled | boolean
 is_called | boolean

think=# \ds
List of relations
 Schema  |Name|   Type   | Owner
-++--+---
 mercado | account_account_id_seq | sequence | prod
 mercado | account_stat_account_stat_id_seq   | sequence | prod
 mercado | category_category_id_seq   | sequence | prod
 mercado | category_stat_category_stat_id_seq | sequence | prod
 mercado | country_country_id_seq | sequence | prod
 mercado | country_stat_country_stat_id_seq   | sequence | prod
 mercado | dict_dict_id_seq   | sequence | prod
 mercado | expire_icon_expire_icon_id_seq | sequence | prod
 mercado | expire_time_expire_time_id_seq | sequence | prod
 mercado | fx_fx_id_seq   | sequence | prod
 mercado | icon_icon_id_seq   | sequence | prod
 mercado | item_icon_item_icon_id_seq | sequence | prod
 mercado | item_item_id_seq   | sequence | prod
 mercado | item_stat_item_stat_id_seq | sequence | prod
 mercado | transaction_transaction_id_seq | sequence | prod
(15 rows)

[EMAIL PROTECTED]:~$ pg_dump -s -n mercado think | grep CREATE\ SEQUENCE
[EMAIL PROTECTED]:~$

Therefore when the restore is attempted, the table using the sequence as
default value cannot be created.


Re: [HACKERS] pg_dump fails to include sequences, leads to restore fail in any version

2008-06-17 Thread Jeffrey Baker
On Tue, Jun 17, 2008 at 10:59 AM, Tom Lane [EMAIL PROTECTED] wrote:

 Jeffrey Baker [EMAIL PROTECTED] writes:
  It is impossible to dump (with pg_dump -Ocx) and restore (with psql) a
  database which contains sequences in any of 8.1, 8.2, or 8.3:

 I should think we would have heard about it before now if such a
 sweeping claim were true.


Last time this problem came up, in August, you dismissed it somewhat
rudely.  So perhaps the lack of reports is due more to perception than any
other thing.


 What I suspect is that you are using 8.1's pg_dump, and you have tripped
 over one of the corner cases that made us redesign dumping of serial
 sequences for 8.2.  Do you get better results if you dump the problem
 database with 8.2 or 8.3 pg_dump?


What's the corner case exactly?  8.3 dumps it correctly, but that's not
really much of a consolation because I need to restore _this_ dump, not some
other one.  It was necessary for me to recreate all the sequences and set
the curvals manually.

Can't the fix be backported to 8.1?

-jwb


Re: [HACKERS] pg_dump fails to include sequences, leads to restore fail in any version

2008-06-17 Thread Jeffrey Baker
On Tue, Jun 17, 2008 at 2:43 PM, Alvaro Herrera [EMAIL PROTECTED]
wrote:

 Jeffrey Baker escribió:
  On Tue, Jun 17, 2008 at 10:59 AM, Tom Lane [EMAIL PROTECTED] wrote:
 
   Jeffrey Baker [EMAIL PROTECTED] writes:
It is impossible to dump (with pg_dump -Ocx) and restore (with psql)
 a
database which contains sequences in any of 8.1, 8.2, or 8.3:
  
   I should think we would have heard about it before now if such a
   sweeping claim were true.
 
  Last time this problem came up, in August, you dismissed it somewhat
  rudely.  So perhaps the lack of reports is due more to perception than
 any
  other thing.

 How did you set it up exactly?  I have no problem with this situation:


[snip]

The table was originally created this way:

CREATE TABLE transaction
(
transaction_id  SERIAL PRIMARY KEY,
buyer_account_idINTEGER,
seller_account_id   INTEGER,
dateDATE,
item_id INTEGER,
source  TEXT
);

However, when dumped with pg_dump 8.1, it comes out this way:

CREATE TABLE transaction (
transaction_id integer DEFAULT
nextval('transaction_transaction_id_seq'::regclass) NOT NULL,
buyer_account_id integer,
seller_account_id integer,
date date,
item_id integer,
source text
);

.. and the sequence does not get dumped with it.

-jwb


Re: [HACKERS] pg_dump fails to include sequences, leads to restore fail in any version

2008-06-17 Thread Jeffrey Baker
On Tue, Jun 17, 2008 at 6:31 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Alvaro Herrera [EMAIL PROTECTED] writes:
  Jeffrey Baker escribió:
  The table was originally created this way:

  Okay, but was it created on 8.1 or was it already created on an older
  version and restored?  I don't see this behavior if I create it in 8.1
  -- the field is dumped as SERIAL, unlike what you show.

 There's something interesting in the original report:

  --
  -- Name: transaction_transaction_id_seq; Type: SEQUENCE SET; Schema:
 mercado; Owner: prod
  --
 
  SELECT
  pg_catalog.setval(pg_catalog.pg_get_serial_sequence('transaction_backup',
   ^^
  'transaction_id'), 6736138, true);

 So pg_dump found a pg_depend entry linking that sequence to some table
 named transaction_backup, not transaction.  That explains why
 transaction isn't being dumped using a SERIAL keyword --- it's not
 linked to this sequence.  But how things got this way is not apparent
 from the stated facts.


Hrmm, I think that's a bit of a red herring.  I probably should not have
pasted that part of the dump, because it's misleading.  There really is a
table transaction_backup, definition is the same as transaction.

Reading from that part of the dump again, just for clarity:

--
-- Name: transaction_backup; Type: TABLE; Schema: mercado; Owner: prod;
Tablespace:
--

CREATE TABLE transaction_backup (
transaction_id serial NOT NULL,
buyer_account_id integer,
seller_account_id integer,
date date,
item_id integer,
source text
);


ALTER TABLE mercado.transaction_backup OWNER TO prod;

--
-- Name: transaction_transaction_id_seq; Type: SEQUENCE SET; Schema:
mercado; Owner: prod
--

SELECT
pg_catalog.setval(pg_catalog.pg_get_serial_sequence('transaction_backup',
'transaction_id'), 6736139, true);


--
-- Name: transaction; Type: TABLE; Schema: mercado; Owner: prod; Tablespace:

--

CREATE TABLE transaction (
transaction_id integer DEFAULT
nextval('transaction_transaction_id_seq'::regclass) NOT NULL,
buyer_account_id integer,
seller_account_id integer,
date date,
item_id integer,
source text
);


ALTER TABLE mercado.transaction OWNER TO prod;

The two tables are defined the same way, but one of them gets dumped with a
SERIAL declaration and the other gets dumped with a DEFAULT nextval().

Is it possible that pg_dump became confused if transaction was renamed
transaction_backup and then redefined?  I can't guarantee that did in fact
happen, but it's within the realm of possibility.  I don't see the backup
table in the sql source code for this product, so it's likely that it was
created by a user in the course of maintenance.



 One possibility is that Jeffrey is getting bit by this bug or
 something related:
 http://archives.postgresql.org/pgsql-bugs/2006-07/msg00021.php


I don't think it's that one.  All this stuff is in the same schema (and in
any case the dump file contains all schemas).


 There are links to some other known serial-sequence problems in 8.1
 in this message:
 http://archives.postgresql.org/pgsql-hackers/2006-08/msg01250.php


That one seems closer to the point.
http://archives.postgresql.org/pgsql-hackers/2006-08/msg01250.php

-jwb


[HACKERS] meaning of backend exit code 2?

2008-06-04 Thread Jeffrey Baker
I have a need to find out the meaning of a backend exiting unexpectedly with
exit code 2.  Leafing through the source of 8.1 I can't really find it.

Is there anything in postgres which would exit with code 2, or should I be
looking at libraries and junk dragged in by languages?


Re: [HACKERS] meaning of backend exit code 2?

2008-06-04 Thread Jeffrey Baker
On Wed, Jun 4, 2008 at 11:31 AM, Tom Lane [EMAIL PROTECTED] wrote:

 Jeffrey Baker [EMAIL PROTECTED] writes:
  I have a need to find out the meaning of a backend exiting unexpectedly
 with
  exit code 2.  Leafing through the source of 8.1 I can't really find it.

 But are you running 8.1?  In 8.2 and up this is the expected result from
 SIGQUIT.


Yes, I'm running 8.1.9.  I have a strong suspicion that the exit is coming
either from R via pl/R or from perl via pl/perl.  From my reading of the
code, expected backend exit codes are 1, 0, and -1 mainly?

-jwb


Re: [HACKERS] Bug in libpq causes local clients to hang

2008-03-24 Thread Jeffrey Baker
On Sun, Mar 23, 2008 at 8:35 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Jeffrey Baker [EMAIL PROTECTED] writes:
   You'll note that I'm using the DBD::Pg Perl interface.  So far I've
   never seen this happen with TCP connections, only with UNIX sockets.

  If it works over TCP and not over Unix socket, it's a kernel bug.
  The libpq code doesn't really know the difference after connection
  setup.

The same thought occurred to me, but it could also be a race condition
which the unix socket is fast enough to trigger but the TCP socket is
not fast enough to trigger.  I'm peeking around in the code but
nothing jumps out yet.

-jwb

-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in libpq causes local clients to hang

2008-03-24 Thread Jeffrey Baker
On Mon, Mar 24, 2008 at 9:24 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Jeffrey Baker [EMAIL PROTECTED] writes:
   On Sun, Mar 23, 2008 at 8:35 PM, Tom Lane [EMAIL PROTECTED] wrote:

  If it works over TCP and not over Unix socket, it's a kernel bug.
   The libpq code doesn't really know the difference after connection
   setup.

   The same thought occurred to me, but it could also be a race condition
   which the unix socket is fast enough to trigger but the TCP socket is
   not fast enough to trigger.  I'm peeking around in the code but
   nothing jumps out yet.

  Fairly hard to believe given that you're talking about communication
  between two sequential processes.  Anyway I'd suggest that the first
  thing to do is extract a reproducible test case.  It'd be useful
  to see if it hangs on other platforms...

The stack trace doesn't actually make sense, does it?  I think that
(at least) the PQmblen frame is spurious.

-jwb

-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug in libpq causes local clients to hang

2008-03-23 Thread Jeffrey Baker
Lately I've noticed that local (UNIX socket) clients using libpq4
8.1.9 (Debian 8.1.9-0etch1) and the same version of the server can
hang forever waiting in poll().  The symptom is that the local client
waits forever, using no CPU time, until it is interrupted by some
event (such as attaching gdb or strace to it), after which it proceeds
normally.  From the server's perspective, such clients are in the
state IDLE in transaction as reported via pg_stat_activity.  I
attached GDB to one such client, and the stack trace is as follows:

#0  0x2b4f2f914d7f in poll () from /lib/libc.so.6
#1  0x2b4f3038449f in PQmblen () from /usr/lib/libpq.so.4
#2  0x2b4f30384580 in pqWaitTimed () from /usr/lib/libpq.so.4
#3  0x2b4f30383e62 in PQgetResult () from /usr/lib/libpq.so.4
#4  0x2b4f30383f3e in PQgetResult () from /usr/lib/libpq.so.4
#5  0x2b4f3025f014 in dbd_st_execute () from
/usr/lib/perl5/auto/DBD/Pg/Pg.so
#6  0x2b4f302548b6 in XS_DBD__Pg__db_do () from
/usr/lib/perl5/auto/DBD/Pg/Pg.so
#7  0x2b4f2fd201f0 in XS_DBI_dispatch () from /usr/lib/perl5/auto/DBI/DBI.so
#8  0x2b4f2f310b95 in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
#9  0x2b4f2f30f36e in Perl_runops_standard () from /usr/lib/libperl.so.5.8
#10 0x2b4f2f2ba7dc in perl_run () from /usr/lib/libperl.so.5.8
#11 0x004017ac in main ()

You'll note that I'm using the DBD::Pg Perl interface.  So far I've
never seen this happen with TCP connections, only with UNIX sockets.
I see it with about 1 in 100 local client invocations.

As a workaround I've configured my local clients to use TCP anyway,
and this seems to solve the problem.  Is this something that might
have been fixed in a post-8.1 version of libpq?

-jwb

-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in libpq causes local clients to hang

2008-03-23 Thread Jeffrey Baker
On Sun, Mar 23, 2008 at 7:12 PM, Jeffrey Baker [EMAIL PROTECTED] wrote:
 Lately I've noticed that local (UNIX socket) clients using libpq4
  8.1.9 (Debian 8.1.9-0etch1) and the same version of the server can
  hang forever waiting in poll().  The symptom is that the local client
  waits forever, using no CPU time, until it is interrupted by some
  event (such as attaching gdb or strace to it), after which it proceeds
  normally.  From the server's perspective, such clients are in the
  state IDLE in transaction as reported via pg_stat_activity.  I
  attached GDB to one such client, and the stack trace is as follows:

  #0  0x2b4f2f914d7f in poll () from /lib/libc.so.6
  #1  0x2b4f3038449f in PQmblen () from /usr/lib/libpq.so.4
  #2  0x2b4f30384580 in pqWaitTimed () from /usr/lib/libpq.so.4
  #3  0x2b4f30383e62 in PQgetResult () from /usr/lib/libpq.so.4
  #4  0x2b4f30383f3e in PQgetResult () from /usr/lib/libpq.so.4
  #5  0x2b4f3025f014 in dbd_st_execute () from
  /usr/lib/perl5/auto/DBD/Pg/Pg.so
  #6  0x2b4f302548b6 in XS_DBD__Pg__db_do () from
  /usr/lib/perl5/auto/DBD/Pg/Pg.so
  #7  0x2b4f2fd201f0 in XS_DBI_dispatch () from 
 /usr/lib/perl5/auto/DBI/DBI.so
  #8  0x2b4f2f310b95 in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
  #9  0x2b4f2f30f36e in Perl_runops_standard () from 
 /usr/lib/libperl.so.5.8
  #10 0x2b4f2f2ba7dc in perl_run () from /usr/lib/libperl.so.5.8
  #11 0x004017ac in main ()

Following up to myself, I note that a very similar issue was reported,
with a very similar stack, only two days ago, with subject ecpg
program getting stuck archived at

http://groups.google.com/group/pgsql.general/browse_thread/thread/0b7ede57faad803e/9abfd7ab1b7e1d86

-jwb

-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bitmap scans, btree scans, and tid order

2005-05-16 Thread Jeffrey Baker
Tom Lane wrote:
Jeffrey W. Baker [EMAIL PROTECTED] writes:
I see that Tom has already done the infrastructure work by adding
getmulti, but getmulti isn't used by nodeIndexscan.c, only
nodeBitmapIndexscan.c.  Will btree index scans be executed by creating
in-memory bitmaps in 8.1, or will some scans still be executed the usual
way?

We aren't going to remove the existing indexscan behavior, because
bitmap scans lose the ordering of the underlying index.  There are many
situations where that ordering is important.  (See for instance the
recent changes to make MAX/MIN use that behavior.)
Would you take a patch that retained the optimized executions of plans 
returning 1 tuple and also fixed the random heap problem?

-jwb
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] bitmap scans, btree scans, and tid order

2005-05-16 Thread Jeffrey Baker
Neil Conway wrote:
Jeffrey Baker wrote:
Would you take a patch that retained the optimized executions of plans 
returning 1 tuple and also fixed the random heap problem?

Can you elaborate on what you're proposing? Obviously sorted b+-tree 
output is important for a lot more than just min()/max(). I don't see an 
obvious way to produce sorted output from a bitmap tree index scan 
without requiring an additional sort step (which would be rather 
pointless -- the whole point of the optimization is to avoid an 
additional sort).
I understand the importance of returning tuples in index order for many 
plans (although I probably haven't thought of all the cases.  min/max is 
the most obvious; order by + limit is another).  The only problem I'm 
trying to solve is when an indexscan returns a large result, causing the 
heap to be visited in index order, which is to say random order, from 
the disk's perspective.  When I investigated this last year, sorting the 
intermediate result of the index scan in disk order was good for a 
reduction by two-thirds in actual execution time, and sorting the scan 
result in chunks of 1000 tuples was enough to reduce the time by half.

I'm considering one of the following courses of action:
Change nodeIndexscan.c to call index_getmulti, and to handle multiple 
tuples returned.  That code would sort the tuple array and store the 
tuples in the result in disk order.

-or-
Change the planner/executor to use the bitmap scan in all cases where 
index order is unimportant.  From my reading of the current code, the 
bitmap scan is only used in case of a join.

-jwb
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly