date:20041128

Re: [HACKERS] Non-C locale and LIKE

2004-11-28 Thread Tatsuo Ishii

 I know we can't currently use an index with non-C locales and LIKE
 except when we create a sepcial type of index for LIKE indexing
 (text_pattern_ops).
 
 However, I am wondering if we should create a character lookup during
 initdb that has the characters ordered so we can do:
 
   col LIKE 'ha%' AND col = ha and col = hb
 
 Could we do this easily for single-character encodings?  We could have:
 
   A   1
   B   2
   C   3
 
 and a non-C locale could be:
 
   A   1
   A`  2
   B   3
 
 We can't handle multi-byte encodings because the number of combinations
 is too large or not known.
 
 Also, we mention you should use the C locale to use normal indexes for
 LIKE but isn't it more correct to say the encoding has to be SQL_ASCII?

Why? C locale works well for multibyte encodings such as EUC-JP too.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Fix for NLS in pgport

2004-11-28 Thread Peter Eisentraut

Bruce Momjian wrote:
 I saw Peter's commit to allow NLS lookups from libpgport functions.
 Here is the change to pg_ctl/nls.mk:

   GETTEXT_FILES := pg_ctl.c

   GETTEXT_FILES := pg_ctl.c ../../port/exec.c

 Peter, do you have to know the C file used by pg_ctl to make these
 adjustments?

Yes.

 This seems pretty hard to do and maintain.

True, but it's only a workaround that gets the job done without moving 
around too much code at this point.  Clearly, this will have to be 
revisited later.

 Do your  
 tools do checks to make sure all the nls.mk files are properly
 modified?  Is there a cleaner way to do this like putting all the
 strings in single C file and including that somehow?

I don't think that will make it substantially better.  In fact, putting 
everything in one file would create more useless work for translators.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Non-C locale and LIKE

2004-11-28 Thread Peter Eisentraut

Bruce Momjian wrote:
 However, I am wondering if we should create a character lookup during
 initdb that has the characters ordered so we can do:

That won't work.  Real-life collations are too complicated.

 Also, we mention you should use the C locale to use normal indexes
 for LIKE but isn't it more correct to say the encoding has to be
 SQL_ASCII?

No, the locale decides the ordering.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

[HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren

Hi,
I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found 
the inv_api.c and will use that as the base for my implementation. I 
lack two calls:

int inv_length(LargeObjectDesc* lo);
void inv_truncate(LargeObjectDesc* lo, int position);
Searching the archives I found some questions concerning this but there 
seem to be no resolution at this point. Is this correct? If so I might 
want to give it a try. I didn't find the topic on the TODO list though. 
Shouldn't it be?

Has anyone done some work on this already?
Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Peter Eisentraut

Am Sonntag, 28. November 2004 10:22 schrieb Thomas Hallgren:
 I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found
 the inv_api.c and will use that as the base for my implementation.

The inv_api large objects are deprecated.  CLOBs and BLOBs should be based 
on text and bytea, respectively.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren

Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 10:22 schrieb Thomas Hallgren:
 

I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found
the inv_api.c and will use that as the base for my implementation.
   

The inv_api large objects are deprecated.  CLOBs and BLOBs should be based 
on text and bytea, respectively.
 

Hmm, ok. But there's no way to stream them in and out from disk. From 
what I can see, you have to bring all of it into memory. Not so ideal 
perhaps if you want to provide streaming media for thousands of users.

Hopefully I'm missing something.
Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Tatsuo Ishii

 Am Sonntag, 28. November 2004 10:22 schrieb Thomas Hallgren:
  I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found
  the inv_api.c and will use that as the base for my implementation.
 
 The inv_api large objects are deprecated.  CLOBs and BLOBs should be based 
 on text and bytea, respectively.

How do you solve the problem, text and bytea require large amount of
RAM?
--
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Peter Eisentraut

Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
 Hmm, ok. But there's no way to stream them in and out from disk. From
 what I can see, you have to bring all of it into memory. Not so ideal
 perhaps if you want to provide streaming media for thousands of users.

You can use the substring function to read the pieces you need.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren

Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
Hmm, ok. But there's no way to stream them in and out from disk. From
what I can see, you have to bring all of it into memory. Not so ideal
perhaps if you want to provide streaming media for thousands of users.

You can use the substring function to read the pieces you need.
Won't the substring function bring the whole thing into memory in the 
backend before it pass you the piece you need? Let's assume I want to 
stream 4k at a time of a 40mb image, that's a whole lot of byte swapping 
if that's the case.

How do you handle writes without first creating the whole image in memory?
From what I can see, the current JDBC driver uses the lo_xxx client 
api's and they seem to map to the inv_xxx server api's.

Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Non-C locale and LIKE

2004-11-28 Thread Bruce Momjian

Peter Eisentraut wrote:
 Bruce Momjian wrote:
  However, I am wondering if we should create a character lookup during
  initdb that has the characters ordered so we can do:
 
 That won't work.  Real-life collations are too complicated.

OK.

  Also, we mention you should use the C locale to use normal indexes
  for LIKE but isn't it more correct to say the encoding has to be
  SQL_ASCII?
 
 No, the locale decides the ordering.

Oh, OK.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Fix for q with psql display paging dumps out of psql

2004-11-28 Thread Jim Seymour

Hi,

I'm kind of wondering if anybody on the dev team noticed this and
what, if anything, they planned to do with it?

Jim

[EMAIL PROTECTED] (Jim Seymour) wrote:
 
 
 Hi,
 
 Environment:
 
 SunOS 5.7 Generic_106541-29 sun4u sparc SUNW,UltraSPARC-IIi-Engine
 Postgresql-7.4.6
 Build config: --with-java --enable-thread-safety
 gcc version 3.3.1
 less-381
 readline-4.3
 
 $ echo $PAGER
 /usr/local/bin/less
 $ echo $LESS
 -e
 
 I recently upgraded from 7.4.2 to 7.4.6 and ran into an annoying
 problem.  Thereafter, most times, quitting out of paged display
 output would dump me straight out of psql if the query result was
 sufficiently large.
 
 A bit of debugging revealed psql receiving a SIGPIPE to be the
 problem.
 
 Building pgsql w/o --enable-thread-safety eliminated the problem.
 
 It looks like the culprit is asynchronous SIGPIPE signals under Solaris
 7 when using thread-safe libraries.  Here's a reference:
 http://www.lambdacs.com/cpt/FAQ.html#Q339.
 
 I do not know if Solaris 8 and beyond behave this way.  (Apparently
 async SIGPIPE is not POSIX-compliant, so one hopes Sun has fixed this
 broken behaviour.)
 
 Here's a fix that's simple, effective and doesn't hurt anything else:
 
 - begin included text --
 *** src/bin/psql/print.c-orig   Wed Nov 17 08:04:47 2004
 --- src/bin/psql/print.cSat Nov 20 10:43:22 2004
 ***
 *** 1119,1124 
 --- 1119,1128 
 {
 pclose(output);
   #ifndef WIN32
 +   /* The SIG_IGN is to compensate for broken Solaris 7
 +* (async) SIGPIPE handling with --enable-thread-safety
 +*/
 +   pqsignal(SIGPIPE, SIG_IGN);
 pqsignal(SIGPIPE, SIG_DFL);
   #endif
 }
 -- end included text ---
 
 Thanks to Andrew, over at SuperNews, for the idea.
 
 Regards,
 Jim
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Error: column nsptablespace does not exist

2004-11-28 Thread Christopher Kings-Lynne

with the new Beta5 you will receive an error column nsptablespace 
does not exist on phpPgAdmin and EMS PostgreSQL-Manager. Perhaps there 
will be some more applications around which are broken now.

What is the future in this area? Back to schema of Beta4, or must all 
the utilities be ported to the new behavour?
You are using a pre-release version of a database server, and 
phpPgAdmin's behaviour has had to change _several_ times to track it.

Don't expect a pre-release to work in any way.
We'll fix up phpPgAdmin CVS sometime this week.  We might even do a 
point release.

No other applications will be broken because no other application is 
crazy enough to worry about displaying the tablespace on a schema just yet.

Chris
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Bernd Helmle

--On Sonntag, November 28, 2004 14:55:29 +0100 Thomas Hallgren 
[EMAIL PROTECTED] wrote:

 From what I can see, the current JDBC driver uses the lo_xxx client
api's and they seem to map to the inv_xxx server api's.
Huh, does that mean the libpq's lo_*() API is deprecated, too? That would 
be bad news.



--
 Bernd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Joshua D. Drake


The inv_api large objects are deprecated.  CLOBs and BLOBs should be based 
on text and bytea, respectively.
 

Until bytea is actually useful with large scale binaries I would
say that large objects are far from deprecated. You can't reasonably
store large binary date in bytea.
Large objects are currently much more useful and easier to work with.
Sincerely,
Joshua D. Drake


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Stopgap solution for table-size-estimate updating

2004-11-28 Thread Simon Riggs

On Sat, 2004-11-27 at 00:54, Tom Lane wrote:
 Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes:
  rel-pages = RelationGetNumberOfBlocks(relation);
 
  Is RelationGetNumberOfBlocks cheap enough that you can easily use it for the
  optimizer ?
 
 It's basically going to cost one extra lseek() kernel call ... per
 query, per table referenced in the query.  I no longer think this is
 a killer argument.  lseek isn't going to induce any I/O, so it should
 be cheap as kernel calls go.
 

OK, didn't believe it at first, but it is just one call, since md.c's
mdnblocks() is already optimized to avoid O(N) behaviour in terms of
kernel calls. Cool.

  I myself have always preferred more stable estimates that only change
  when told to. I never liked that vacuum (without analyze) and create index
  change those values, imho only analyze should.
 
 Indeed, this is probably the most significant argument in favor of
 leaving things as they are.  But the other side of the coin is: why
 shouldn't a bunch of inserts or updates cause the plan to change?
 The people who are complaining about it have certainly posted plenty
 of examples where it would help to change the plans.
 

ISTM that you both have good arguments. Andreas' argument is what most
people expect to happen, if they come from other commercial RDBMS. This
was my starting place, though upon reflection I think Tom's proposal
seems to be the way of the future, even if it does seem now to be a more
dynamic approach with less direct control for the DBA.

If we look at this from the perspective of how many people will post
problems about this issue, I'd say this late in the beta cycle there's a
big risk that it will cause much grief. However, the issue already does
cause much grief to those who don't manage it well (as Richard Huxton
points out on a previous thread today). There doesn't seem any benefit
in staying where we are today apart for those few who already precisely
and accurately control statistics collection.

Andreas called for a GUC to control that behaviour. Given that the more
dynamic behaviour suggested needs to be turned on by default, it does
seem reasonable to have a GUC that allows you to turn it off. There may
be other side effects discovered later that require more manual control
and it would make sense at that point to have a switch to turn it off if
not required. 

So, I vote in favour of the new dynamic estimation method to be added to
8.0, on by default, but with a GUC to turn off if problems arise. 

... enable_dynamic_statistics=true

If it holds good, like I'm sure it will then this can be deprecated
later. Many other aspects of statistics collection can occur dynamically
also, such as post-execution cardinality statistics.

Or perhaps some_default_estimate was itself a GUC, that would turn off
this feature off when it was set to 0...? If not, what value is
proposed?


On the topic of accuracy of the estimate: Updates cause additional data
to be written to the table, so tables get bigger until vacuumed. Tables
with many Inserts are also regularly trimmed with Deletes. With a
relatively static workload and a regular vacuum cycle, the table size
for many major tables eventually levels off, remaining roughly constant
but the number of non-zero pages will vary over time in a saw-tooth
curve. Estimating the cardinality by using the number of blocks would
ignore the fact that many of them are empty for much of the time. That
would then lead to a systematic over-estimate of the cardinality of the
regularly updated tables. You have to take the estimate from somewhere,
but I note that current practice of using a VACUUM ANALYZE would mean
that the statistics would be collected when free space in the table was
highest. That estimate would differ from the dynamic method suggested
since this would lead to a calculation equivalent to the taking an
ANALYZE immediately before a VACUUM, rather than after it. 
How easy would it be to take into account the length of the FSM for the
relation also?

[...IIRC DB2 has a VOLATILE option at table level, which enables dynamic
estimation of statistics.]

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Error: column nsptablespace does not exist

2004-11-28 Thread Dave Page




-Original Message-
From: [EMAIL PROTECTED] on behalf of Christopher Kings-Lynne
Sent: Sun 11/28/2004 2:57 PM
To: Roland Volkmann
Cc: PostgreSQL Developers
Subject: Re: [HACKERS] Error: column nsptablespace does not exist
 
 No other applications will be broken because no other application is 
 crazy enough to worry about displaying the tablespace on a schema just yet.

Sorry Chris - obviously the pgAdmin team are just a bit crazier than your lot 
:-)

/D


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Joe Conway

Thomas Hallgren wrote:
Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
Hmm, ok. But there's no way to stream them in and out from disk. From
what I can see, you have to bring all of it into memory. Not so ideal
perhaps if you want to provide streaming media for thousands of users.
You can use the substring function to read the pieces you need.
Won't the substring function bring the whole thing into memory in the 
backend before it pass you the piece you need? Let's assume I want to 
stream 4k at a time of a 40mb image, that's a whole lot of byte swapping 
if that's the case.
Not if the column is storage type EXTERNAL. See a past discussion here:
http://archives.postgresql.org/pgsql-general/2003-07/msg01447.php

How do you handle writes without first creating the whole image in memory?
You can't currently, but it would be a nice addition ;-)
I agree with Peter -- I think effort is better spent improving bytea.
BTW, someone on GENERAL just started a very similar thread 
(implementing a lo_truncate operation, and possibly a lo_size). I've 
cc'd him here.

Joe
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Stopgap solution for table-size-estimate updating problem

2004-11-28 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 On the topic of accuracy of the estimate: Updates cause additional data
 to be written to the table, so tables get bigger until vacuumed. Tables
 with many Inserts are also regularly trimmed with Deletes. With a
 relatively static workload and a regular vacuum cycle, the table size
 for many major tables eventually levels off, remaining roughly constant
 but the number of non-zero pages will vary over time in a saw-tooth
 curve. Estimating the cardinality by using the number of blocks would
 ignore the fact that many of them are empty for much of the time. That
 would then lead to a systematic over-estimate of the cardinality of the
 regularly updated tables.

You mean underestimate.  After a VACUUM, the tuples-per-page figure
would be set to a relatively low value, and then subsequent inserts
would fill in the free space, causing the actual density to rise
while the physical number of blocks stays more or less constant.
So the proposed method would always give an accurate number of blocks,
but it would tend to underestimate the number of tuples in a dynamic
situation.

Still, it's better than the current method, which is likely to
underestimate both parameters.  I believe that having an accurate block
count and an underestimated tuple count would tend to favor choosing
indexscans over seqscans, which is probably a good thing --- when was
the last time you saw someone complaining that the planner had
improperly chosen an indexscan over a seqscan?

 How easy would it be to take into account the length of the FSM for the
 relation also?

Don't think this would help; the FSM doesn't really track number of
tuples.  Free space isn't a good guide to number of tuples because you
can't distinguish inserts from updates at that level.  (I'm also a bit
concerned about turning the FSM into a source of contention.)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Fix for q with psql display paging dumps out of psql

2004-11-28 Thread Tom Lane

[EMAIL PROTECTED] (Jim Seymour) writes:
 I'm kind of wondering if anybody on the dev team noticed this and
 what, if anything, they planned to do with it?

Can we make it #ifdef SOLARIS7 somehow?  I'm uneager to put a
performance penalty on every platform because of an admittedly
broken signal implementation on one.

Also, you didn't explain what changed between 7.4.2 and 7.4.6 to
make this necessary when it wasn't before.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Error: column nsptablespace does not exist

2004-11-28 Thread Roland Volkmann

Hello Christopher, Hello Tom,
thank you for your answers.

You are using a pre-release version of a database server, and 
phpPgAdmin's behaviour has had to change _several_ times to track it.

Don't expect a pre-release to work in any way.
I know that PostgreSQL 8.0 isn't ready for use in production 
environment. But it's the first native win32 version, so I don't have 
any option for my current project, except using another database  ;-)
When my project will be ready, I'm sure PostgreSQL will be stable enough.


We'll fix up phpPgAdmin CVS sometime this week.  We might even do a 
point release.
that's good news - thank you.
With best regards,
Roland
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin

On Fri, 19 Nov 2004 10:35:20 -0500, Tom Lane wrote:

 2. Does someone know of interesting documentation (perhaps
in the form of interesting code comments) which I should
read, as a basis for creating a non-standard index type
in PostgreSQL?
 
 There's not a whole lot :-( and you should definitely expect to have to
 read code, not just comments.

I have read some code, and have gained some understanding. I think/hope.
However:

For the suffix array to work, it needs to store positions from the base
table (among other pieces of information): A suffix array stores the
complete set of suffixes from the indexed string, in sorted order. Storage
is in the form of pointers to the indexed string.

What kind of (logical) block identifier should I point to in my index? Can
I be guaranteed that the block will not move, leaving dangling pointers in
the index?

Am I right that the answer is related to BlockIdData? If I store
BlockIds in an index, do I then have to worry about physical blocks on
the disk which might somehow move?

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane

Troels Arvin [EMAIL PROTECTED] writes:
 What kind of (logical) block identifier should I point to in my index?

CTID (block # + line #) is the only valid pointer from an index to a
table.  It doesn't change over the life of an index entry.  I think
though that you'd be making a serious mistake by not duplicating the
suffixes into the index (rather than expecting to retrieve them from the
table every time, as you seem to be contemplating).  You need to be able
to scan the index and identify rows matching a query without making lots
of probes into the table.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

[HACKERS] SQL:2003 TODO items

2004-11-28 Thread Simon Riggs


Could I suggest that we re-work the TODO list slightly, with regard to
mentioning the ANSI SQL standard?

As of last count, we have 14 items listed which would take PostgreSQL
into full SQL:2003 Core conformance. Troels Arvin has recently listed
some of these as low hanging fruit in the recent thread about SQL
conformance related patch, 

Some SQL:2003 features are mentioned in other parts of the TODO list,
though we don't refer to whether these are Mandatory or Optional (e.g.
MERGE is Optional). ISTM that it would be useful to group the Mandatory
ones together into a new section of the TODO list: SQL:2003 Core.

I'm aware that a number of those are possibly deliberately not
supported, in which case, we could mention those in brackets or simply
leave them off the list etc..

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin

On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote:

 CTID (block # + line #) is the only valid pointer from an index to a
 table.

Thanks.

 I think
 though that you'd be making a serious mistake by not duplicating the
 suffixes into the index (rather than expecting to retrieve them from the
 table every time, as you seem to be contemplating).

Yes, I've thought about this, and I may end up doing that.

 You need to be able
 to scan the index and identify rows matching a query without making lots
 of probes into the table.

But is it cheaper, IO-wise to jump around in an index than to go back
and forth between index and tuple blocks?

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] SQL:2003 TODO items

2004-11-28 Thread Bruce Momjian

Simon Riggs wrote:
 
 Could I suggest that we re-work the TODO list slightly, with regard to
 mentioning the ANSI SQL standard?
 
 As of last count, we have 14 items listed which would take PostgreSQL
 into full SQL:2003 Core conformance. Troels Arvin has recently listed
 some of these as low hanging fruit in the recent thread about SQL
 conformance related patch, 
 
 Some SQL:2003 features are mentioned in other parts of the TODO list,
 though we don't refer to whether these are Mandatory or Optional (e.g.
 MERGE is Optional). ISTM that it would be useful to group the Mandatory
 ones together into a new section of the TODO list: SQL:2003 Core.
 
 I'm aware that a number of those are possibly deliberately not
 supported, in which case, we could mention those in brackets or simply
 leave them off the list etc..

Yes, we could create a new section or add SQL:2003 identifiers as part
of the description for certain items.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Stopgap solution for table-size-estimate updating

2004-11-28 Thread Simon Riggs

On Sun, 2004-11-28 at 18:52, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On the topic of accuracy of the estimate: Updates cause additional data
  to be written to the table, so tables get bigger until vacuumed. Tables
  with many Inserts are also regularly trimmed with Deletes. With a
  relatively static workload and a regular vacuum cycle, the table size
  for many major tables eventually levels off, remaining roughly constant
  but the number of non-zero pages will vary over time in a saw-tooth
  curve. Estimating the cardinality by using the number of blocks would
  ignore the fact that many of them are empty for much of the time. That
  would then lead to a systematic over-estimate of the cardinality of the
  regularly updated tables.
 
 You mean underestimate.  After a VACUUM, the tuples-per-page figure
 would be set to a relatively low value, and then subsequent inserts
 would fill in the free space, causing the actual density to rise
 while the physical number of blocks stays more or less constant.
 So the proposed method would always give an accurate number of blocks,
 but it would tend to underestimate the number of tuples in a dynamic
 situation.

OK, just *wrong* then (the estimate, as well as myself) :-)

 Still, it's better than the current method, which is likely to
 underestimate both parameters.  I believe that having an accurate block
 count and an underestimated tuple count would tend to favor choosing
 indexscans over seqscans, which is probably a good thing --- when was
 the last time you saw someone complaining that the planner had
 improperly chosen an indexscan over a seqscan?

Well, yes, but user perception is not always right.

Given we expect an underestimate, can we put in a correction factor
should the estimate get really low...sounds like we could end up
choosing nested joins more often when we should have chosen merge joins.
That is something that people do regularly complain about (indirectly).

  How easy would it be to take into account the length of the FSM for the
  relation also?
 
 Don't think this would help; the FSM doesn't really track number of
 tuples.  Free space isn't a good guide to number of tuples because you
 can't distinguish inserts from updates at that level.  (I'm also a bit
 concerned about turning the FSM into a source of contention.)

Agreed.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [JDBC] Strange server error with current 8.0beta driver

2004-11-28 Thread Tom Lane

Oliver Jowett [EMAIL PROTECTED] writes:
 Perhaps PerformCursorOpen should copy the query tree before planning, or 
 plan in a different memory context?

 Patch attached. It moves query planning inside the new portal's memory 
 context. With this applied I can run Barry's testcase without errors, 
 and valgrind seems OK with it too.

I think the better solution is the first way (copy the querytree first).
The problem with the way you did it is that all the temporary structures
built by the planner will be left behind in the cursor's memory context,
and can't be reclaimed until the cursor is destroyed.  In the case of a
complex query that could represent a pretty serious memory leak.  It
seems better to eat the cost of copying the querytree an extra time,
especially since this way forms a patch that's easy to reverse whenever
we fix the planner to be less cavalier about scribbling on its input.

I've applied the attached patch instead (and analogously in 7.4 branch).
Would you confirm it fixes the problem you see?

regards, tom lane

*** src/backend/commands/portalcmds.c.orig  Thu Sep 16 12:58:28 2004
--- src/backend/commands/portalcmds.c   Sun Nov 28 17:02:22 2004
***
*** 62,73 
RequireTransactionChain((void *) stmt, DECLARE CURSOR);
  
/*
 * The query has been through parse analysis, but not rewriting or
 * planning as yet.  Note that the grammar ensured we have a SELECT
 * query, so we are not expecting rule rewriting to do anything
 * strange.
 */
!   rewritten = QueryRewrite((Query *) stmt-query);
if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query))
elog(ERROR, unexpected rewrite result);
query = (Query *) linitial(rewritten);
--- 62,82 
RequireTransactionChain((void *) stmt, DECLARE CURSOR);
  
/*
+* Because the planner is not cool about not scribbling on its input,
+* we make a preliminary copy of the source querytree.  This prevents
+* problems in the case that the DECLARE CURSOR is in a portal and is
+* executed repeatedly.  XXX the planner really shouldn't modify its
+* input ... FIXME someday.
+*/
+   query = copyObject(stmt-query);
+ 
+   /*
 * The query has been through parse analysis, but not rewriting or
 * planning as yet.  Note that the grammar ensured we have a SELECT
 * query, so we are not expecting rule rewriting to do anything
 * strange.
 */
!   rewritten = QueryRewrite(query);
if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query))
elog(ERROR, unexpected rewrite result);
query = (Query *) linitial(rewritten);

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren

Joe Conway wrote:
Thomas Hallgren wrote:
Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
Hmm, ok. But there's no way to stream them in and out from disk. From
what I can see, you have to bring all of it into memory. Not so ideal
perhaps if you want to provide streaming media for thousands of users.

You can use the substring function to read the pieces you need.
Won't the substring function bring the whole thing into memory in the 
backend before it pass you the piece you need? Let's assume I want to 
stream 4k at a time of a 40mb image, that's a whole lot of byte 
swapping if that's the case.

Not if the column is storage type EXTERNAL. See a past discussion here:
http://archives.postgresql.org/pgsql-general/2003-07/msg01447.php

How do you handle writes without first creating the whole image in 
memory?

You can't currently, but it would be a nice addition ;-)
I agree with Peter -- I think effort is better spent improving bytea.
BTW, someone on GENERAL just started a very similar thread 
(implementing a lo_truncate operation, and possibly a lo_size). I've 
cc'd him here.

Seems to me both solutions have things (and people) speaking for them.
My goal is to provide a handle to a large amount of data. In Java, this 
is a java.sql.Blob/Clob and it maps to an SQL locator. This object 
supports random access so you can do the equivalent of seek, tell, read, 
write, and truncate. A Blob/Clob object must be valid for the duration 
of a transaction.

Mapping this object to a LargeObjectDesc and get full functionality 
seems to be fairly easy. The only thing missing is the truncate since 
a size function can use seek with SEEK_END on a temporary descriptor copy.

Another fairly significant advantage using large objects is that the 
client JDBC driver is using it. I'd like to keep my driver as similar as 
possible.

Mapping a Blob to a bytea appears to be much more difficult and the 
result is more limited (half the size).

I understand from Peter and Joe that they consider bytea to be the 
future. One thing to really consider for the future is the current 1 GB 
limit. I'd like to see a future where bytea remains more or less as it 
is today, i.e. optimal for sizes 1 GB and less, and LargeObjectDesc is 
rewritten to use 64 bit quantities for seek, size, and tell, thus 
allowing for much larger objects to be stored.

What is the quality of the large object solution today. Does it have 
known flaws that nobody cares about since it's discontinued or is it 
considered a maintained and worthy part of the overall solution?

Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Stopgap solution for table-size-estimate updating problem

2004-11-28 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 Given we expect an underestimate, can we put in a correction factor
 should the estimate get really low...sounds like we could end up
 choosing nested joins more often when we should have chosen merge joins.

One possibility: vacuum already knows how many tuples it removed.  We
could set reltuples equal to, say, the mean of the number-of-tuples-
after-vacuuming and the number-of-tuples-before.  In a steady state
situation this would represent a fairly reasonable choice.  In cases
where the table size has actually decreased permanently, it'd take a few
cycles of vacuuming before reltuples converges to the new value, but that
doesn't seem too bad.

A standalone ANALYZE should still do what it does now, though, I think;
namely set reltuples to its best estimate of the current value.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane

Troels Arvin [EMAIL PROTECTED] writes:
 On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote:
 You need to be able
 to scan the index and identify rows matching a query without making lots
 of probes into the table.

 But is it cheaper, IO-wise to jump around in an index than to go back
 and forth between index and tuple blocks?

Perhaps not --- but why would you be jumping around?  Wouldn't the
needed info appear in consecutive locations in the index?

Even if that's not the case, the index should be much denser than the
table because it's only storing the keys and not the rest of the
columns.  So I'd expect less net I/O even if the access pattern is just
as random.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Adding Reply-To: listname to Lists configuration ...

2004-11-28 Thread Marc G. Fournier

What is the general opinion of this?  I'd like to implement it, but not so 
much so that I'm going to beat my head against a brick wall on it ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] psql and schemas

2004-11-28 Thread Neil Conway

On Sat, 2004-11-27 at 23:11 -0500, Bruce Momjian wrote:
 Is there a TODO here?  Or a few?

Sure: you could add a TODO item like Improve psql schema behavior, and
assign it to me. I'll send in a patch that implements the behavior I
proposed for 8.1

-Neil



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] Adding Reply-To: listname to Lists configuration ...

2004-11-28 Thread Tom Lane

Marc G. Fournier [EMAIL PROTECTED] writes:
 What is the general opinion of this?  I'd like to implement it, but not so 
 much so that I'm going to beat my head against a brick wall on it ...

I think we've discussed this in the past, and the consensus has always
been that more people like it as-is than want to change it.  I'm
certainly in the leave-it-as-is camp.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Stopgap solution for table-size-estimate updating

2004-11-28 Thread Simon Riggs

On Sun, 2004-11-28 at 22:35, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Given we expect an underestimate, can we put in a correction factor
  should the estimate get really low...sounds like we could end up
  choosing nested joins more often when we should have chosen merge joins.
 
 One possibility: vacuum already knows how many tuples it removed.  We
 could set reltuples equal to, say, the mean of the number-of-tuples-
 after-vacuuming and the number-of-tuples-before.  In a steady state
 situation this would represent a fairly reasonable choice.  In cases
 where the table size has actually decreased permanently, it'd take a few
 cycles of vacuuming before reltuples converges to the new value, but that
 doesn't seem too bad.

That sounds good to me.  Covers all cases I can see from here.

 A standalone ANALYZE should still do what it does now, though, I think;
 namely set reltuples to its best estimate of the current value.

A GUC-free solution...but yet manual control is possible. Sounds good to
me - and for you Andreas, also?

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread David Garamond

Joe Conway wrote:
Not if the column is storage type EXTERNAL. See a past discussion here:
http://archives.postgresql.org/pgsql-general/2003-07/msg01447.php
what is the reasoning behind this syntax?
 ALTER TABLE [ ONLY ] table [ * ]
 ALTER [ COLUMN ] column SET STORAGE
 { PLAIN | EXTERNAL | EXTENDED | MAIN }
I find it nonintuitive and hard to remember. Perhaps something like this 
is better (I know, it's probably too late):

 ALTER [ COLUMN ] column SET STORAGE { INLINE | EXTERNAL }
 ALTER [ COLUMN ] column SET COMPRESSION { YES | NO }
--
dave
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Tom Lane

David Garamond [EMAIL PROTECTED] writes:
 I find it nonintuitive and hard to remember. Perhaps something like this 
 is better (I know, it's probably too late):

   ALTER [ COLUMN ] column SET STORAGE { INLINE | EXTERNAL }
   ALTER [ COLUMN ] column SET COMPRESSION { YES | NO }

The semantics are not simply two independent variables, however.
In particular, IIRC the precedence of different possible actions
is such that you couldn't cleanly express it that way.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [pgsql-www] pg_autovacuum is nice ... but ...

2004-11-28 Thread Bruce Momjian


Should I add a TODO to warn if FSM values are too small?  Is that doable?

---

Marc G. Fournier wrote:
 
 Moved to -hackers where this belongs :)
 
 On Fri, 5 Nov 2004, Justin Clift wrote:
 
  Tom Lane wrote:
  snip
  Yup. 2  23072, so you're losing some proportion of FSM entries.
  What's worse, the FSM relation table is maxed out (1000 = 1000) which
  suggests that there are relations not being tracked at all; you have
  no idea how much space is getting leaked in those.
  
  You can determine the number of relations potentially needing FSM
  entries by
 select count(*) from pg_class where relkind in ('r','i','t');
  --- sum over all databases in the cluster to get the right result.
  
  Once you've fixed max_fsm_relations, do vacuums in all databases, and
  then vacuum verbose should give you a usable lower bound for
  max_fsm_pages.
 
  Would making max_fsm_relations and max_fsm_pages dynamically update 
  themselves whilst PostgreSQL runs be useful?  Sounds like they're the 
  kind of things that many people would receive maximum benefit if 
  PostgreSQL altered these settings as needed itself.
 
 I'm not sure if I like this one too much ... but it would be nice if 
 something like this triggered a warning in the logs, maybe a feature of 
 pg_autovacuum itself?
 
 
 Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
 Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] unnest

2004-11-28 Thread Bruce Momjian


I assume this is not something for our PostgreSQL CVS, even the later
SRF implementation.

---

John Hansen wrote:
 Attached, array - rows iterator.
 
 select * from unnest(array[1,2,3,4,5]);
 
 Unnest
 ---
  1
  2
  3
  4
  5
 5 rows
 
 The switch statement could probably be done in a different way, but
 there doesn't seem to be any good examples of how to return anyitem. If
 anyone have a better way, please let me know.
 
 Does anyone know how to check individual array elements for NULL values?
 PG_ARG_ISNULL() seems to return true if ANY array element is null; ex::
 array[1,2,3,null,4,5]
 
 Comments / improvements welcome.
 
 Kind regards,
 
 John
 

Content-Description: Makefile

[ Attachment, skipping... ]

Content-Description: unnest.c

[ Attachment, skipping... ]

Content-Description: unnest.sql

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

[HACKERS] Auto Vacuum

2004-11-28 Thread Russell Smith

Hi All,

I am doing serious thinking about the implementation of Auto Vacuum as part of 
the backend, Not using libpq, but classing internal functions directly.
It appears to me that calling internal functions directly is a better 
implementation than using the external library to do the job.

I know I might be stepping on Matthew's toes, but I don't really want to.  I am 
a complete newbie to the postgresql code, however I am trying.
Vacuum appears to be one of the bigger saw points with administrator having to 
configure it via scheduled tasks.

I have outlined things I have thought about below.  I've surely missed a lot, 
and am open to feedback.  Others may like the current tuning used
by the external autovacuum, however to have stats that are collected at vacuum, 
and used for later vacuum would mean we don't need the
stats collector running.

The major autovacuum issues 

1. Transaction Wraparound
2. Vacuum of relations
3. Tracking of when to do vacuums
4. Where to store information needed by auto vacuum

I would suggest the first step is to replicate the wraparound and relation 
vacuuming from the current code first. But I could be wrong.
Currently there are a lot of tuning options in the external autovacuum, and the 
best way to run vacuum internally will need to be thought out.

1. Transaction Wraparound

It appears from the code, that the best time to do a transaction wraparound db 
wide vacuum is when the
frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform-datfrozenxid).  And 
is probably the most
simple vacuum to implement.


2. Vacuuming of relations

Currently, the entire heap must be vacuumed at one time.  I would possible be 
desireable to have only part of the relation vacuumed at
a time.  If you can find out which parts of the relation have the most slack 
space.  There is a todo item regarding tracking recent deletions
so they can be resused.  Some form of this would be helpful to work out what to 
vacuum.  Performance issues for this type of activity 
may be a concern.  But I have no experience to be able to make comment on them. 
 So I welcome yours.

3. Tracking of when to vacuum

Current autovacuum relies the stats collector to be running.  I would like to 
only use the stats if they are available,
and have an option to be able to vacuum accurately without having to have stats 
running.

By adding certain histograms, on tuples, filesize and slack space we can guage 
the time between vacuums

number of tuples will show the inserts as opposed to updates.
file size will show that the file is growing and by how much between vacuums.
slack space will show the delete/updated records.

A new guc and relation option would need to be implemented to give a target 
slack space in a file.
this will help to reduce growth in relations if vacuum happens to not run 
frequently enough.  This information
can also inform autovacuum that it should be vacuuming more frequently.  The 
number would be a percentage,
eg 10% of the total file size is allowed to be unused.  Also alter table would 
allow users to set levels of slackness
for each relation.  If the table gets too much more than the target slack 
space, a partial vacuum full could be run
to reduce the size of the table by moving tuples at the end of the table to 
slack space near the beginning and 
shortening the table length.  It would require a full table lock, but you may 
be able to space it out, to only do a page
at a time.

/* target percentage of slack space */
vacuum_default_target_slack_space = 10

ALTER TABLE SET TARGET SLACK SPACE = 10;

4. Where to store information required by auto vacuum.

Auto vacuum needs somewhere to stop the information it needs about current and 
future vacuuming.  I am unsure of where
to put this.  It appears as if it will have a number of fields.  I feel like 
pg_class is the best place to put the information, but 
again I have no idea.

That's the best I can do for now.  I can clarify things further if required.

Regards

Russell.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Documentation on PITR still scarce

2004-11-28 Thread Bruce Momjian


Is this a TODO?

---

Greg Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  I suppose it might be useful to have some kind of suspended animation
  behavior where you could bring up a backend and look at the database in
  a strict read-only fashion, not really executing transactions at all,
  just to see what you had.  Then you could end the recovery and go to
  normal operations, or allow the recovery to proceed further if you
  decided this wasn't where you wanted to be yet.  However that would
  require a great deal of mechanism we haven't got (yet).  In particular
  there is no such thing as strict read-only examination of the database.
 
 That would be a great thing to have one day for other reasons aside from the
 ability to test out a recovered database. It makes warm standby databases much
 more useful.
 
 A warm standby is when you keep a second machine constantly up to date by
 applying the archived PITR logs as soon as they come off your server. You're
 ready to switch over at the drop of a hat and don't have to go through the
 whole recovery process, you just switch the database from recovery mode to
 active mode and make it your primary database. But in the until then the
 backup hardware languishes, completely useless.
 
 Oracle has had a feature for a long time that you can actually open the
 standby database in a strict read-only mode and run queries. This is great for
 a data warehouse situation where you want to run long batch jobs against
 recent data.
 
 -- 
 greg
 
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Tom Lane

Thomas Hallgren [EMAIL PROTECTED] writes:
 What is the quality of the large object solution today. Does it have 
 known flaws that nobody cares about since it's discontinued or is it 
 considered a maintained and worthy part of the overall solution?

More the former than the latter, I think, at least in the minds of
the usual suspects for backend work.

The main problem I'd see with the idea of supporting over-2GB LOs is
that we store all LOs in a database in the same table (pg_largeobject)
and so you would run into the table size limit (around 16TB IIRC) with
not an amazingly large number of such LOs.  We used to store each LO in
its own table but that was not better, as a few thousand LOs could
easily bring the filesystem to its knees (on platforms where the
directory lookup mechanism doesn't scale to huge numbers of entries in
a single directory).  I don't think there'd be any point in upgrading
the LO support to 64 bits without some rethinking of the underlying
storage structure.

A generic issue with LOs is the extreme pain involved in dump/reload;
not only the difficulty of transporting the LOs themselves, but that
of updating references to them from the database.  Vacuuming
no-longer-referenced LOs is a serious problem too.  If LOs were
considered a first-class feature then I'd want to see more interest
in dealing with those problems.

Lesser issues with LOs are protection (there isn't any), user-accessible
locking (there isn't any), MVCC (there isn't any).  The latter has been
on the to-do list since
http://archives.postgresql.org/pgsql-hackers/2002-05/msg00875.php
I think it could actually be fixed now without too much pain because
there is a mechanism for finding out the surrounding query's snapshot,
which functions could not do before 8.0.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Auto Vacuum

2004-11-28 Thread Bruce Momjian


I have added an auto-vacuum TODO item:

* Auto-vacuum
o Move into the backend code
o Scan the buffer cache to find free space or use background writer
o Use free-space map information to guide refilling


---

Russell Smith wrote:
 Hi All,
 
 I am doing serious thinking about the implementation of Auto Vacuum as part 
 of the backend, Not using libpq, but classing internal functions directly.
 It appears to me that calling internal functions directly is a better 
 implementation than using the external library to do the job.
 
 I know I might be stepping on Matthew's toes, but I don't really want to.  I 
 am a complete newbie to the postgresql code, however I am trying.
 Vacuum appears to be one of the bigger saw points with administrator having 
 to configure it via scheduled tasks.
 
 I have outlined things I have thought about below.  I've surely missed a lot, 
 and am open to feedback.  Others may like the current tuning used
 by the external autovacuum, however to have stats that are collected at 
 vacuum, and used for later vacuum would mean we don't need the
 stats collector running.
 
 The major autovacuum issues 
 
 1. Transaction Wraparound
 2. Vacuum of relations
 3. Tracking of when to do vacuums
 4. Where to store information needed by auto vacuum
 
 I would suggest the first step is to replicate the wraparound and relation 
 vacuuming from the current code first. But I could be wrong.
 Currently there are a lot of tuning options in the external autovacuum, and 
 the best way to run vacuum internally will need to be thought out.
 
 1. Transaction Wraparound
 
 It appears from the code, that the best time to do a transaction wraparound 
 db wide vacuum is when the
 frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform-datfrozenxid).  
 And is probably the most
 simple vacuum to implement.
 
 
 2. Vacuuming of relations
 
 Currently, the entire heap must be vacuumed at one time.  I would possible be 
 desireable to have only part of the relation vacuumed at
 a time.  If you can find out which parts of the relation have the most slack 
 space.  There is a todo item regarding tracking recent deletions
 so they can be resused.  Some form of this would be helpful to work out what 
 to vacuum.  Performance issues for this type of activity 
 may be a concern.  But I have no experience to be able to make comment on 
 them.  So I welcome yours.
 
 3. Tracking of when to vacuum
 
 Current autovacuum relies the stats collector to be running.  I would like to 
 only use the stats if they are available,
 and have an option to be able to vacuum accurately without having to have 
 stats running.
 
 By adding certain histograms, on tuples, filesize and slack space we can 
 guage the time between vacuums
 
 number of tuples will show the inserts as opposed to updates.
 file size will show that the file is growing and by how much between vacuums.
 slack space will show the delete/updated records.
 
 A new guc and relation option would need to be implemented to give a target 
 slack space in a file.
 this will help to reduce growth in relations if vacuum happens to not run 
 frequently enough.  This information
 can also inform autovacuum that it should be vacuuming more frequently.  The 
 number would be a percentage,
 eg 10% of the total file size is allowed to be unused.  Also alter table 
 would allow users to set levels of slackness
 for each relation.  If the table gets too much more than the target slack 
 space, a partial vacuum full could be run
 to reduce the size of the table by moving tuples at the end of the table to 
 slack space near the beginning and 
 shortening the table length.  It would require a full table lock, but you may 
 be able to space it out, to only do a page
 at a time.
 
 /* target percentage of slack space */
 vacuum_default_target_slack_space = 10
 
 ALTER TABLE SET TARGET SLACK SPACE = 10;
 
 4. Where to store information required by auto vacuum.
 
 Auto vacuum needs somewhere to stop the information it needs about current 
 and future vacuuming.  I am unsure of where
 to put this.  It appears as if it will have a number of fields.  I feel like 
 pg_class is the best place to put the information, but 
 again I have no idea.
 
 That's the best I can do for now.  I can clarify things further if required.
 
 Regards
 
 Russell.
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Auto Vacuum

2004-11-28 Thread Gavin Sherry

Matthew T. O'Connor looked at this fairly closely leading up to 8.0
feature freeze. There was a long discussion earlier this year with respect
to libpq vs. using backend functions directly to vacuum multiple
databases.

http://archives.postgresql.org/pgsql-hackers/2004-03/msg00931.php

This should be mandatory reading before proceeding. Also, you want to talk
to Matthew and review his patch (submitted early July, I think).

Thanks,

Gavin

On Sun, 28 Nov 2004, Bruce Momjian wrote:


 I have added an auto-vacuum TODO item:

 * Auto-vacuum
 o Move into the backend code
 o Scan the buffer cache to find free space or use background writer
 o Use free-space map information to guide refilling


 ---

 Russell Smith wrote:
  Hi All,
 
  I am doing serious thinking about the implementation of Auto Vacuum as part 
  of the backend, Not using libpq, but classing internal functions directly.
  It appears to me that calling internal functions directly is a better 
  implementation than using the external library to do the job.
 
  I know I might be stepping on Matthew's toes, but I don't really want to.  
  I am a complete newbie to the postgresql code, however I am trying.
  Vacuum appears to be one of the bigger saw points with administrator having 
  to configure it via scheduled tasks.
 
  I have outlined things I have thought about below.  I've surely missed a 
  lot, and am open to feedback.  Others may like the current tuning used
  by the external autovacuum, however to have stats that are collected at 
  vacuum, and used for later vacuum would mean we don't need the
  stats collector running.
 
  The major autovacuum issues
 
  1. Transaction Wraparound
  2. Vacuum of relations
  3. Tracking of when to do vacuums
  4. Where to store information needed by auto vacuum
 
  I would suggest the first step is to replicate the wraparound and relation 
  vacuuming from the current code first. But I could be wrong.
  Currently there are a lot of tuning options in the external autovacuum, and 
  the best way to run vacuum internally will need to be thought out.
 
  1. Transaction Wraparound
 
  It appears from the code, that the best time to do a transaction wraparound 
  db wide vacuum is when the
  frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform-datfrozenxid).  
  And is probably the most
  simple vacuum to implement.
 
 
  2. Vacuuming of relations
 
  Currently, the entire heap must be vacuumed at one time.  I would possible 
  be desireable to have only part of the relation vacuumed at
  a time.  If you can find out which parts of the relation have the most 
  slack space.  There is a todo item regarding tracking recent deletions
  so they can be resused.  Some form of this would be helpful to work out 
  what to vacuum.  Performance issues for this type of activity
  may be a concern.  But I have no experience to be able to make comment on 
  them.  So I welcome yours.
 
  3. Tracking of when to vacuum
 
  Current autovacuum relies the stats collector to be running.  I would like 
  to only use the stats if they are available,
  and have an option to be able to vacuum accurately without having to have 
  stats running.
 
  By adding certain histograms, on tuples, filesize and slack space we can 
  guage the time between vacuums
 
  number of tuples will show the inserts as opposed to updates.
  file size will show that the file is growing and by how much between 
  vacuums.
  slack space will show the delete/updated records.
 
  A new guc and relation option would need to be implemented to give a target 
  slack space in a file.
  this will help to reduce growth in relations if vacuum happens to not run 
  frequently enough.  This information
  can also inform autovacuum that it should be vacuuming more frequently.  
  The number would be a percentage,
  eg 10% of the total file size is allowed to be unused.  Also alter table 
  would allow users to set levels of slackness
  for each relation.  If the table gets too much more than the target slack 
  space, a partial vacuum full could be run
  to reduce the size of the table by moving tuples at the end of the table to 
  slack space near the beginning and
  shortening the table length.  It would require a full table lock, but you 
  may be able to space it out, to only do a page
  at a time.
 
  /* target percentage of slack space */
  vacuum_default_target_slack_space = 10
 
  ALTER TABLE SET TARGET SLACK SPACE = 10;
 
  4. Where to store information required by auto vacuum.
 
  Auto vacuum needs somewhere to stop the information it needs about current 
  and future vacuuming.  I am unsure of where
  to put this.  It appears as if it will have a number of fields.  I feel 
  like pg_class is the best place to put the information, but
  again I have no idea.
 
  That's the best I can do for now.  I can clarify things further if required.
 
  Regards

Re: [HACKERS] Adding Reply-To: listname to Lists configuration ...

2004-11-28 Thread Jim Seymour

Marc G. Fournier [EMAIL PROTECTED] wrote:
 
 
 What is the general opinion of this?  I'd like to implement it, but not so 
 much so that I'm going to beat my head against a brick wall on it ...

The procmail rules I set up for each mailing list to which I subscribe
sets Reply-To to the mailing list *unless* there's already a Reply-To
set.  If you're going to do it, that's what I'd recommend doing.

Jim

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Adding Reply-To: listname to Lists configuration ...

2004-11-28 Thread Bruce Momjian


Are you saying this is going to make it impossible for me to reply just
to the poster, or is this an option that is set by the user via majordomo?

---

Jim Seymour wrote:
 Marc G. Fournier [EMAIL PROTECTED] wrote:
  
  
  What is the general opinion of this?  I'd like to implement it, but not so 
  much so that I'm going to beat my head against a brick wall on it ...
 
 The procmail rules I set up for each mailing list to which I subscribe
 sets Reply-To to the mailing list *unless* there's already a Reply-To
 set.  If you're going to do it, that's what I'd recommend doing.
 
 Jim
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Adding Reply-To: listname to Lists configuration

2004-11-28 Thread Marc G. Fournier

On Sun, 28 Nov 2004, Bruce Momjian wrote:
Are you saying this is going to make it impossible for me to reply just
to the poster, or is this an option that is set by the user via majordomo?
No, the poster will still be included as part of the headers ... what 
happens, at least under Pine, is that I am prompted whther I want to honor 
the reply-to, if I hit 'y', then the other headers *are* strip'd and the 
mail is set right back to the list ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian


OK, what solutions do we have for this?  Not being able to load dumped
data is a serious bug.  I have added this to the open items list:

* fix COPY CSV with \r,\n in data

My feeling is that if we are in a quoted string we just process whatever
characters we find, even passing through an EOL.  I realize it might not
mark missing quote errors well but that seems minor compared to not
loading valid data.

---

Andrew Dunstan wrote:
 
 This example should fail  on data line 2 or 3 on any platform, 
 regardless of the platform's line-end convention, although I haven't 
 tested on Windows.
 
 cheers
 
 andrew
 
 [EMAIL PROTECTED] inst]$ bin/psql -e -f csverr.sql ; od -c 
 /tmp/csverrtest.csv
 create table csverrtest (a int, b text, c int);
 CREATE TABLE
 insert into csverrtest values(1,'a',1);
 INSERT 122471 1
 insert into csverrtest values(2,'foo\r\nbar',2);
 INSERT 122472 1
 insert into csverrtest values(3,'baz\nblurfl',3);
 INSERT 122473 1
 insert into csverrtest values(4,'d',4);
 INSERT 122474 1
 insert into csverrtest values(5,'e',5);
 INSERT 122475 1
 copy csverrtest to '/tmp/csverrtest.csv' csv;
 COPY
 truncate csverrtest;
 TRUNCATE TABLE
 copy csverrtest from '/tmp/csverrtest.csv' csv;
 psql:cvserr.sql:9: ERROR:  literal carriage return found in data
 HINT:  Use \r to represent carriage return.
 CONTEXT:  COPY csverrtest, line 2: 2,foo
 drop table csverrtest;
 DROP TABLE
 000   1   ,   a   ,   1  \n   2   ,  f   o   o  \r  \n   b   a
 020   r  ,   2  \n   3   ,  b   a   z  \n   b   l   u   r
 040   f   l  ,   3  \n   4   ,   d   ,   4  \n   5   ,   e   ,
 060   5  \n
 062
 [EMAIL PROTECTED] inst]$
 
 Bruce Momjian wrote:
 
 Can I see an example of such a failure line?
 
 ---
 
 Andrew Dunstan wrote:
   
 
 Darcy Buskermolen has drawn my attention to unfortunate behaviour of 
 COPY CSV with fields containing embedded line end chars if the embedded 
 sequence isn't the same as those of the file containing the CSV data. In 
 that case we error out when reading the data in. This means there are 
 cases where we can produce a CSV data file which we can't read in, which 
 is not at all pleasant.
 
 Possible approaches to the problem:
 . make it a documented limitation
 . have a csv read mode for backend/commands/copy.c:CopyReadLine() that 
 relaxes some of the restrictions on inconsistent line endings
 . escape embedded line end chars
 
 The last really isn't an option, because the whole point of CSVs is to 
 play with other programs, and my understanding is that those that 
 understand multiline fields (e.g. Excel) expect them not to be escaped, 
 and do not produce them escaped.
 
 So right now I'm tossing up in my head between the first two options. Or 
 maybe there's another solution I haven't thought of.
 
 Thoughts?
 
 cheers
 
 andrew
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 
 
 
 
   
 
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Adding Reply-To: listname to Lists configuration ...

2004-11-28 Thread Bruce Momjian

Marc G. Fournier wrote:
 On Sun, 28 Nov 2004, Bruce Momjian wrote:
 
 
  Are you saying this is going to make it impossible for me to reply just
  to the poster, or is this an option that is set by the user via majordomo?
 
 No, the poster will still be included as part of the headers ... what 
 happens, at least under Pine, is that I am prompted whther I want to honor 
 the reply-to, if I hit 'y', then the other headers *are* strip'd and the 
 mail is set right back to the list ...

I think my mail reader, 'elm', will always honor the reply-to, which is
bad I think.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] [HACKERS] Adding Reply-To: listname to Lists

2004-11-28 Thread Doug McNaught

Marc G. Fournier [EMAIL PROTECTED] writes:

 No, the poster will still be included as part of the headers ... what
 happens, at least under Pine, is that I am prompted whther I want to
 honor the reply-to, if I hit 'y', then the other headers *are* strip'd
 and the mail is set right back to the list ...

I'm in the Reply-To considered harmful camp.  I also don't see any
real evidence that the current setup is causing problems. 

-Doug

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 OK, what solutions do we have for this?  Not being able to load dumped
 data is a serious bug.

Which we do not have, because pg_dump doesn't use CSV.  I do not think
this is a must-fix, especially not if the proposed fix introduces
inconsistencies elsewhere.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] -V, --version -- deprecated?

2004-11-28 Thread Neil Conway

On Wed, 2004-11-24 at 20:25 -0500, Bruce Momjian wrote:
 FreeBSD had a problem with double-dash args but I thought that related
 to getopt, and I can't remember how that fits in.  Maybe we defined '-'
 in getopt and said it took an argument and tested for '-help' and
 '-verbose', but now we just check argv right inside main.  I can't
 remember totally.

ISTM that port/getopt_long.c ought to allow long options to work even if
the platform doesn't provide a getopt_long() itself.

BTW, pg_dump's -X ... options seem weird. Why is the -X prefix
necessary? ISTM pg_dump would be more consistent with standard
command-line tools if we just provided the long options (such as
--disable-triggers and so on) and did away with the -X prefixes.

I'd like to propose these changes:

(1) remove documentation for -V, declare it deprecated. I don't see
any reason to actually remove it, but this should at least make the
current status quo more consistent.

(2) add documentation for --help and --version flags, where
appropriate

(3) remove documentation for pg_dump's -X ... flags, just document the
--long-option variant. Again, I don't see a need to remove support for
the -X options, but we should declare them deprecated.

Comments?

 Agreed.  psql --help certainly looks inconsistent --- only --help and
 --version are long.

Well, perhaps, but I don't think that's a problem (there is no reason
that _every_ command-line flag needs to have both long and short
options).

-Neil



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] bug fix request

2004-11-28 Thread Christopher Kings-Lynne

I would like to beg for some sort of fix to psql's handling of COPY data 
if the COPY fails.

Presently you get a million lines of '\N command not recognised' and 
various other random things because if a line of the copy fails due to 
say a FK constraint, or even if the COPY is run in an aborted 
transaction, it tries to execute all the stdin data as actual 
statements.  It's really, really, really annoying.

Also, sometimes when you copy and paste SQL into a psql window, it 
executes help on commands for each line, although it doesn't affect the 
paste.  That is also really annoying.  I'll add to this email when it 
happens to me again, cos I tried a few pastes and couldn't reproduce it :/

Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  OK, what solutions do we have for this?  Not being able to load dumped
  data is a serious bug.
 
 Which we do not have, because pg_dump doesn't use CSV.  I do not think
 this is a must-fix, especially not if the proposed fix introduces
 inconsistencies elsewhere.

Sure, pg_dump doesn't use it but COPY should be able to load anything it
output.

Can this be fixed if we ignore the problem with reporting errors?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] bug fix request

2004-11-28 Thread Tim Allen

Christopher Kings-Lynne wrote:
Also, sometimes when you copy and paste SQL into a psql window, it 
executes help on commands for each line, although it doesn't affect the 
paste.  That is also really annoying.  I'll add to this email when it 
happens to me again, cos I tried a few pastes and couldn't reproduce it :/
That last one is probably due to tabs in the pasted text. Happens to me 
all the time...

Chris
Tim
--
---
Tim Allen  [EMAIL PROTECTED]
Proximity Pty Ltd  http://www.proximity.com.au/
  http://www4.tpg.com.au/users/rita_tim/
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [GENERAL] Adding Reply-To: listname to Lists configuration ...

2004-11-28 Thread Peter Eisentraut

Marc G. Fournier wrote:
 What is the general opinion of this?  I'd like to implement it, but
 not so much so that I'm going to beat my head against a brick wall on
 it ...

Please, please, please, please don't.  The choice of the reply path lies 
with the author or the replier, not with an intermediate party.  Both 
of the former two parties have adequate technical means to achieve 
their preferred choice automatically [sender: Mail-Followup-To; 
receiver: reply-to functions in the mail client].  Writing please Cc 
me, I'm not subscribed is not one of them.

We have many people writing to the lists while not being subscribed.  
Consider people writing to pgsql-bugs and not getting replies.  I've 
had that happen to me too many times in other forums.

Also, I *want* to be in the recipient list of replies to my posts, so 
it's easier to find these posts.

We've done quite well with the current setup, so I don't see a need to 
tinker with it.  I've always found the Reply-to-enabled lists I'm on to 
be a more lossy medium.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [GENERAL] [HACKERS] Adding Reply-To: listname to Lists

2004-11-28 Thread Andrew Dunstan

Doug McNaught said:
 Marc G. Fournier [EMAIL PROTECTED] writes:

 No, the poster will still be included as part of the headers ... what
 happens, at least under Pine, is that I am prompted whther I want to
 honor the reply-to, if I hit 'y', then the other headers *are* strip'd
 and the mail is set right back to the list ...

 I'm in the Reply-To considered harmful camp.  I also don't see any
 real evidence that the current setup is causing problems.



And the historical document referred to can be found here:

http://www.unicom.com/pw/reply-to-harmful.html

and an opposing view here:

http://www.metasystema.net/essays/reply-to.mhtml


cheers

andrew



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] bug fix request

2004-11-28 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 I would like to beg for some sort of fix to psql's handling of COPY data 
 if the COPY fails.

 Presently you get a million lines of '\N command not recognised' and 
 various other random things because if a line of the copy fails due to 
 say a FK constraint, or even if the COPY is run in an aborted 
 transaction, it tries to execute all the stdin data as actual 
 statements.

I'd like to see a test case for this in v3 protocol.  It was definitely
a problem before that, but I thought I fixed it.

 Also, sometimes when you copy and paste SQL into a psql window, it 
 executes help on commands for each line, although it doesn't affect the 
 paste.  That is also really annoying.

Disable tab completion, or don't paste tabs.  I don't think psql can be
expected to recognize that a tab is coming from pasted input.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Andrew Dunstan

Bruce Momjian said:
 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  OK, what solutions do we have for this?  Not being able to load
  dumped data is a serious bug.

 Which we do not have, because pg_dump doesn't use CSV.  I do not think
 this is a must-fix, especially not if the proposed fix introduces
 inconsistencies elsewhere.

 Sure, pg_dump doesn't use it but COPY should be able to load anything
 it output.

 Can this be fixed if we ignore the problem with reporting errors?



When I looked at it I could not see any simple fix that was not worse than
the symptom. If the asymmetry offends you, then we could do as Tom suggested
and rip out the multiline processing completely for now. Personally I would
regard that as a pity, as it would disallow many cases that will work quite
happily as we are, and because this is a feature that was requested when we
did this work. The limitation has been documented - my incliniation would be
to revisit this during the 8.1 dev cycle.

cheers

andrew



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] bug fix request

2004-11-28 Thread Christopher Kings-Lynne

Presently you get a million lines of '\N command not recognised' and 
various other random things because if a line of the copy fails due to 
say a FK constraint, or even if the COPY is run in an aborted 
transaction, it tries to execute all the stdin data as actual 
statements.
I'd like to see a test case for this in v3 protocol.  It was definitely
a problem before that, but I thought I fixed it.
This is with 7.4 and it still does it  Here is an example:
Make file called test.sql:
BEGIN;
CREATE TABLE blah (a int4);
COPY blah FROM STDIN;
1
2
3
\.
COMMIT;
Now, execute it TWICE on a clean database:
-bash-2.05b$ psql -f test.sql test
BEGIN
CREATE TABLE
COMMIT
-bash-2.05b$ psql -f test.sql test
BEGIN
psql:test.sql:2: ERROR:  relation blah already exists
psql:test.sql:3: ERROR:  current transaction is aborted, commands 
ignored until end of transaction block
psql:test.sql:7: invalid command \.
psql:test.sql:8: ERROR:  syntax error at or near 1 at character 1
-bash-2.05b$

Also, sometimes when you copy and paste SQL into a psql window, it 
executes help on commands for each line, although it doesn't affect the 
paste.  That is also really annoying.
Disable tab completion, or don't paste tabs.  I don't think psql can be
expected to recognize that a tab is coming from pasted input.
How about if it's in a literal, eg. function definition.
Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] [GENERAL] Adding Reply-To: listname to Lists configuration ...

2004-11-28 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 We've done quite well with the current setup, so I don't see a need to 
 tinker with it.  I've always found the Reply-to-enabled lists I'm on to 
 be a more lossy medium.

The basic issue is that the current setup encourages
reply-to-author-and-list, while adding Reply-To encourages
reply-to-list-only (at least when the replier is using one of the mail
clients I'm used to).  Peter correctly notes that reply-to-list-only
creates problems for authors who aren't subscribed.  The other point
that looms large for me is that reply-to-list-only forces every
conversation to occur just at the speed and reliability of the list
'bot.  Without wishing to tread on anyone's toes, it's undeniable that
we have a long history of slow and unreliable forwarding through the PG
list server.  I'd rather have contributors to a thread converse among
themselves, and let the list server catch up when it can.

Personally: if Reply-To is added to the list headers, I can and will
reprogram my mail software to ignore it.  But I doubt that most
contributors to the lists have that option.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] bug fix request

2004-11-28 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 Presently you get a million lines of '\N command not recognised' and 
 various other random things because if a line of the copy fails due to 
 say a FK constraint, or even if the COPY is run in an aborted 
 transaction, it tries to execute all the stdin data as actual 
 statements.
 
 I'd like to see a test case for this in v3 protocol.  It was definitely
 a problem before that, but I thought I fixed it.

 This is with 7.4 and it still does it  Here is an example:

Hmm.  This error is not coming from a line of the copy, it is occurring
because the COPY command itself fails, and so the server never tells
psql to shift into COPY mode.  I'm not sure that a reasonable fix for
this is possible.  As a counterexample, if you misspelled COPY as COPZ,
would you expect the software to decide that following lines up to
\. should be ignored?  If you manually misentered a COPY command and got
an error, would you be surprised to have psql ignore everything you
typed until you typed \. ?  (I can bet we'd get bug reports about that.)

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] bug fix request

2004-11-28 Thread Christopher Kings-Lynne

Hmm.  This error is not coming from a line of the copy, it is occurring
because the COPY command itself fails, and so the server never tells
psql to shift into COPY mode.  I'm not sure that a reasonable fix for
this is possible.  As a counterexample, if you misspelled COPY as COPZ,
would you expect the software to decide that following lines up to
\. should be ignored?  If you manually misentered a COPY command and got
an error, would you be surprised to have psql ignore everything you
typed until you typed \. ?  (I can bet we'd get bug reports about that.)
Hmmm...doesn't stop it being annoying, however.
I presumed I was replicating the same problem I get when running SQL 
scripts that insert a few million rows.  Basically I start it running, 
then maybe some command before the COPY fails, then it gets to the COPY 
anyway and start barfing millions of lines.  Then I have to change my 
terminal settings to record heaps of lines and then try to ctrl-C the 
query before it scrolls too far off, just to find out the line that 
caused the error.

I guess the kind of difference in this case to me is that it's in a 
transaction, and the only error that the COPY command causes is that 
it's runnning in a rolled-back transaction.

Low tech I know...but it's kind of annoying :)
Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

62 matches

Mail list logo