Re: [HACKERS] Non-C locale and LIKE

2004-11-28 Thread Tatsuo Ishii
> I know we can't currently use an index with non-C locales and LIKE
> except when we create a sepcial type of index for LIKE indexing
> (text_pattern_ops).
> 
> However, I am wondering if we should create a character lookup during
> initdb that has the characters ordered so we can do:
> 
>   col LIKE 'ha%' AND col >= "ha" and col <= "hb"
> 
> Could we do this easily for single-character encodings?  We could have:
> 
>   A   1
>   B   2
>   C   3
> 
> and a non-C locale could be:
> 
>   A   1
>   A`  2
>   B   3
> 
> We can't handle multi-byte encodings because the number of combinations
> is too large or not known.
> 
> Also, we mention you should use the "C" locale to use normal indexes for
> LIKE but isn't it more correct to say the encoding has to be SQL_ASCII?

Why? "C" locale works well for multibyte encodings such as EUC-JP too.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Fix for NLS in pgport

2004-11-28 Thread Peter Eisentraut
Bruce Momjian wrote:
> I saw Peter's commit to allow NLS lookups from libpgport functions.
> Here is the change to pg_ctl/nls.mk:
>
>  < GETTEXT_FILES := pg_ctl.c
>
>  > GETTEXT_FILES := pg_ctl.c ../../port/exec.c
>
> Peter, do you have to know the C file used by pg_ctl to make these
> adjustments?

Yes.

> This seems pretty hard to do and maintain.

True, but it's only a workaround that gets the job done without moving 
around too much code at this point.  Clearly, this will have to be 
revisited later.

> Do your  
> tools do checks to make sure all the nls.mk files are properly
> modified?  Is there a cleaner way to do this like putting all the
> strings in single C file and including that somehow?

I don't think that will make it substantially better.  In fact, putting 
everything in one file would create more useless work for translators.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Non-C locale and LIKE

2004-11-28 Thread Peter Eisentraut
Bruce Momjian wrote:
> However, I am wondering if we should create a character lookup during
> initdb that has the characters ordered so we can do:

That won't work.  Real-life collations are too complicated.

> Also, we mention you should use the "C" locale to use normal indexes
> for LIKE but isn't it more correct to say the encoding has to be
> SQL_ASCII?

No, the locale decides the ordering.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren
Hi,
I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found 
the inv_api.c and will use that as the base for my implementation. I 
lack two calls:

int inv_length(LargeObjectDesc* lo);
void inv_truncate(LargeObjectDesc* lo, int position);
Searching the archives I found some questions concerning this but there 
seem to be no resolution at this point. Is this correct? If so I might 
want to give it a try. I didn't find the topic on the TODO list though. 
Shouldn't it be?

Has anyone done some work on this already?
Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Peter Eisentraut
Am Sonntag, 28. November 2004 10:22 schrieb Thomas Hallgren:
> I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found
> the inv_api.c and will use that as the base for my implementation.

The "inv_api" large objects are deprecated.  CLOBs and BLOBs should be based 
on text and bytea, respectively.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren
Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 10:22 schrieb Thomas Hallgren:
 

I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found
the inv_api.c and will use that as the base for my implementation.
   

The "inv_api" large objects are deprecated.  CLOBs and BLOBs should be based 
on text and bytea, respectively.
 

Hmm, ok. But there's no way to stream them in and out from disk. From 
what I can see, you have to bring all of it into memory. Not so ideal 
perhaps if you want to provide streaming media for thousands of users.

Hopefully I'm missing something.
Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Tatsuo Ishii
> Am Sonntag, 28. November 2004 10:22 schrieb Thomas Hallgren:
> > I'm in the phase of implementing CLOB's and BLOB's in PL/Java. I found
> > the inv_api.c and will use that as the base for my implementation.
> 
> The "inv_api" large objects are deprecated.  CLOBs and BLOBs should be based 
> on text and bytea, respectively.

How do you solve the problem, text and bytea require large amount of
RAM?
--
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Peter Eisentraut
Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
> Hmm, ok. But there's no way to stream them in and out from disk. From
> what I can see, you have to bring all of it into memory. Not so ideal
> perhaps if you want to provide streaming media for thousands of users.

You can use the substring function to read the pieces you need.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren
Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
Hmm, ok. But there's no way to stream them in and out from disk. From
what I can see, you have to bring all of it into memory. Not so ideal
perhaps if you want to provide streaming media for thousands of users.

You can use the substring function to read the pieces you need.
Won't the substring function bring the whole thing into memory in the 
backend before it pass you the piece you need? Let's assume I want to 
stream 4k at a time of a 40mb image, that's a whole lot of byte swapping 
if that's the case.

How do you handle writes without first creating the whole image in memory?
From what I can see, the current JDBC driver uses the lo_ client 
api's and they seem to map to the inv_ server api's.

Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Non-C locale and LIKE

2004-11-28 Thread Bruce Momjian
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > However, I am wondering if we should create a character lookup during
> > initdb that has the characters ordered so we can do:
> 
> That won't work.  Real-life collations are too complicated.

OK.

> > Also, we mention you should use the "C" locale to use normal indexes
> > for LIKE but isn't it more correct to say the encoding has to be
> > SQL_ASCII?
> 
> No, the locale decides the ordering.

Oh, OK.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Fix for "q" with psql display paging dumps out of psql

2004-11-28 Thread Jim Seymour
Hi,

I'm kind of wondering if anybody on the dev team noticed this and
what, if anything, they planned to do with it?

Jim

[EMAIL PROTECTED] (Jim Seymour) wrote:
> 
> 
> Hi,
> 
> Environment:
> 
> SunOS 5.7 Generic_106541-29 sun4u sparc SUNW,UltraSPARC-IIi-Engine
> Postgresql-7.4.6
> Build config: --with-java --enable-thread-safety
> gcc version 3.3.1
> less-381
> readline-4.3
> 
> $ echo $PAGER
> /usr/local/bin/less
> $ echo $LESS
> -e
> 
> I recently upgraded from 7.4.2 to 7.4.6 and ran into an annoying
> problem.  Thereafter, most times, "q"uitting out of paged display
> output would dump me straight out of psql if the query result was
> sufficiently large.
> 
> A bit of debugging revealed psql receiving a SIGPIPE to be the
> problem.
> 
> Building pgsql w/o --enable-thread-safety eliminated the problem.
> 
> It looks like the culprit is asynchronous SIGPIPE signals under Solaris
> 7 when using thread-safe libraries.  Here's a reference:
> .
> 
> I do not know if Solaris 8 and beyond behave this way.  (Apparently
> async SIGPIPE is not POSIX-compliant, so one hopes Sun has fixed this
> broken behaviour.)
> 
> Here's a fix that's simple, effective and doesn't hurt anything else:
> 
> - begin included text --
> *** src/bin/psql/print.c-orig   Wed Nov 17 08:04:47 2004
> --- src/bin/psql/print.cSat Nov 20 10:43:22 2004
> ***
> *** 1119,1124 
> --- 1119,1128 
> {
> pclose(output);
>   #ifndef WIN32
> +   /* The SIG_IGN is to compensate for broken Solaris 7
> +* (async) SIGPIPE handling with --enable-thread-safety
> +*/
> +   pqsignal(SIGPIPE, SIG_IGN);
> pqsignal(SIGPIPE, SIG_DFL);
>   #endif
> }
> -- end included text ---
> 
> Thanks to Andrew, over at SuperNews, for the idea.
> 
> Regards,
> Jim
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
> 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Error: column "nsptablespace" does not exist

2004-11-28 Thread Christopher Kings-Lynne
with the new Beta5 you will receive an error ""column "nsptablespace" 
does not exist"" on phpPgAdmin and EMS PostgreSQL-Manager. Perhaps there 
will be some more applications around which are broken now.

What is the future in this area? Back to schema of Beta4, or must all 
the utilities be ported to the new behavour?
You are using a pre-release version of a database server, and 
phpPgAdmin's behaviour has had to change _several_ times to track it.

Don't expect a pre-release to work in any way.
We'll fix up phpPgAdmin CVS sometime this week.  We might even do a 
point release.

No other applications will be broken because no other application is 
crazy enough to worry about displaying the tablespace on a schema just yet.

Chris
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Bernd Helmle
--On Sonntag, November 28, 2004 14:55:29 +0100 Thomas Hallgren 
<[EMAIL PROTECTED]> wrote:

 From what I can see, the current JDBC driver uses the lo_ client
api's and they seem to map to the inv_ server api's.
Huh, does that mean the libpq's lo_*() API is deprecated, too? That would 
be bad news.



--
 Bernd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Joshua D. Drake

The "inv_api" large objects are deprecated.  CLOBs and BLOBs should be based 
on text and bytea, respectively.
 

Until bytea is actually useful with large scale binaries I would
say that large objects are far from deprecated. You can't reasonably
store large binary date in bytea.
Large objects are currently much more useful and easier to work with.
Sincerely,
Joshua D. Drake


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Stopgap solution for table-size-estimate updating

2004-11-28 Thread Simon Riggs
On Sat, 2004-11-27 at 00:54, Tom Lane wrote:
> "Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes:
> >> rel->pages = RelationGetNumberOfBlocks(relation);
> 
> > Is RelationGetNumberOfBlocks cheap enough that you can easily use it for the
> > optimizer ?
> 
> It's basically going to cost one extra lseek() kernel call ... per
> query, per table referenced in the query.  I no longer think this is
> a killer argument.  lseek isn't going to induce any I/O, so it should
> be cheap as kernel calls go.
> 

OK, didn't believe it at first, but it is just one call, since md.c's
mdnblocks() is already optimized to avoid O(N) behaviour in terms of
kernel calls. Cool.

> > I myself have always preferred more stable estimates that only change
> > when told to. I never liked that vacuum (without analyze) and create index
> > change those values, imho only analyze should.
> 
> Indeed, this is probably the most significant argument in favor of
> leaving things as they are.  But the other side of the coin is: why
> shouldn't a bunch of inserts or updates cause the plan to change?
> The people who are complaining about it have certainly posted plenty
> of examples where it would help to change the plans.
> 

ISTM that you both have good arguments. Andreas' argument is what most
people expect to happen, if they come from other commercial RDBMS. This
was my starting place, though upon reflection I think Tom's proposal
seems to be the way of the future, even if it does seem now to be a more
dynamic approach with less direct control for the DBA.

If we look at this from the perspective of how many people will post
problems about this issue, I'd say this late in the beta cycle there's a
big risk that it will cause much grief. However, the issue already does
cause much grief to those who don't manage it well (as Richard Huxton
points out on a previous thread today). There doesn't seem any benefit
in staying where we are today apart for those few who already precisely
and accurately control statistics collection.

Andreas called for a GUC to control that behaviour. Given that the more
dynamic behaviour suggested needs to be turned on by default, it does
seem reasonable to have a GUC that allows you to turn it off. There may
be other side effects discovered later that require more manual control
and it would make sense at that point to have a switch to turn it off if
not required. 

So, I vote in favour of the new dynamic estimation method to be added to
8.0, on by default, but with a GUC to turn off if problems arise. 

... enable_dynamic_statistics=true

If it holds good, like I'm sure it will then this can be deprecated
later. Many other aspects of statistics collection can occur dynamically
also, such as post-execution cardinality statistics.

Or perhaps some_default_estimate was itself a GUC, that would turn off
this feature off when it was set to 0...? If not, what value is
proposed?


On the topic of accuracy of the estimate: Updates cause additional data
to be written to the table, so tables get bigger until vacuumed. Tables
with many Inserts are also regularly trimmed with Deletes. With a
relatively static workload and a regular vacuum cycle, the table size
for many major tables eventually levels off, remaining roughly constant
but the number of non-zero pages will vary over time in a saw-tooth
curve. Estimating the cardinality by using the number of blocks would
ignore the fact that many of them are empty for much of the time. That
would then lead to a systematic over-estimate of the cardinality of the
regularly updated tables. You have to take the estimate from somewhere,
but I note that current practice of using a VACUUM ANALYZE would mean
that the statistics would be collected when free space in the table was
highest. That estimate would differ from the dynamic method suggested
since this would lead to a calculation equivalent to the taking an
ANALYZE immediately before a VACUUM, rather than after it. 
How easy would it be to take into account the length of the FSM for the
relation also?

[...IIRC DB2 has a VOLATILE option at table level, which enables dynamic
estimation of statistics.]

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Error: column "nsptablespace" does not exist

2004-11-28 Thread Dave Page



-Original Message-
From: [EMAIL PROTECTED] on behalf of Christopher Kings-Lynne
Sent: Sun 11/28/2004 2:57 PM
To: Roland Volkmann
Cc: PostgreSQL Developers
Subject: Re: [HACKERS] Error: column "nsptablespace" does not exist
 
> No other applications will be broken because no other application is 
> crazy enough to worry about displaying the tablespace on a schema just yet.

Sorry Chris - obviously the pgAdmin team are just a bit crazier than your lot 
:-)

/D


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Joe Conway
Thomas Hallgren wrote:
Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
Hmm, ok. But there's no way to stream them in and out from disk. From
what I can see, you have to bring all of it into memory. Not so ideal
perhaps if you want to provide streaming media for thousands of users.
You can use the substring function to read the pieces you need.
Won't the substring function bring the whole thing into memory in the 
backend before it pass you the piece you need? Let's assume I want to 
stream 4k at a time of a 40mb image, that's a whole lot of byte swapping 
if that's the case.
Not if the column is storage type EXTERNAL. See a past discussion here:
http://archives.postgresql.org/pgsql-general/2003-07/msg01447.php

How do you handle writes without first creating the whole image in memory?
You can't currently, but it would be a nice addition ;-)
I agree with Peter -- I think effort is better spent improving bytea.
BTW, someone on GENERAL just started a very similar thread 
("implementing a lo_truncate operation, and possibly a lo_size"). I've 
cc'd him here.

Joe
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Stopgap solution for table-size-estimate updating problem

2004-11-28 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On the topic of accuracy of the estimate: Updates cause additional data
> to be written to the table, so tables get bigger until vacuumed. Tables
> with many Inserts are also regularly trimmed with Deletes. With a
> relatively static workload and a regular vacuum cycle, the table size
> for many major tables eventually levels off, remaining roughly constant
> but the number of non-zero pages will vary over time in a saw-tooth
> curve. Estimating the cardinality by using the number of blocks would
> ignore the fact that many of them are empty for much of the time. That
> would then lead to a systematic over-estimate of the cardinality of the
> regularly updated tables.

You mean underestimate.  After a VACUUM, the tuples-per-page figure
would be set to a relatively low value, and then subsequent inserts
would fill in the free space, causing the actual density to rise
while the physical number of blocks stays more or less constant.
So the proposed method would always give an accurate number of blocks,
but it would tend to underestimate the number of tuples in a dynamic
situation.

Still, it's better than the current method, which is likely to
underestimate both parameters.  I believe that having an accurate block
count and an underestimated tuple count would tend to favor choosing
indexscans over seqscans, which is probably a good thing --- when was
the last time you saw someone complaining that the planner had
improperly chosen an indexscan over a seqscan?

> How easy would it be to take into account the length of the FSM for the
> relation also?

Don't think this would help; the FSM doesn't really track number of
tuples.  Free space isn't a good guide to number of tuples because you
can't distinguish inserts from updates at that level.  (I'm also a bit
concerned about turning the FSM into a source of contention.)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Fix for "q" with psql display paging dumps out of psql

2004-11-28 Thread Tom Lane
[EMAIL PROTECTED] (Jim Seymour) writes:
> I'm kind of wondering if anybody on the dev team noticed this and
> what, if anything, they planned to do with it?

Can we make it "#ifdef SOLARIS7" somehow?  I'm uneager to put a
performance penalty on every platform because of an admittedly
broken signal implementation on one.

Also, you didn't explain what changed between 7.4.2 and 7.4.6 to
make this necessary when it wasn't before.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Error: column "nsptablespace" does not exist

2004-11-28 Thread Roland Volkmann
Hello Christopher, Hello Tom,
thank you for your answers.

You are using a pre-release version of a database server, and 
phpPgAdmin's behaviour has had to change _several_ times to track it.

Don't expect a pre-release to work in any way.
I know that PostgreSQL 8.0 isn't ready for use in production 
environment. But it's the first native win32 version, so I don't have 
any option for my current project, except using another database  ;-)
When my project will be ready, I'm sure PostgreSQL will be stable enough.


We'll fix up phpPgAdmin CVS sometime this week.  We might even do a 
point release.
that's good news - thank you.
With best regards,
Roland
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin
On Fri, 19 Nov 2004 10:35:20 -0500, Tom Lane wrote:

>> 2. Does someone know of interesting documentation (perhaps
>>in the form of interesting code comments) which I should
>>read, as a basis for creating a non-standard index type
>>in PostgreSQL?
> 
> There's not a whole lot :-( and you should definitely expect to have to
> read code, not just comments.

I have read some code, and have gained some understanding. I think/hope.
However:

For the suffix array to work, it needs to store positions from the base
table (among other pieces of information): A suffix array stores the
complete set of suffixes from the indexed string, in sorted order. Storage
is in the form of pointers to the indexed string.

What kind of (logical) block identifier should I point to in my index? Can
I be guaranteed that the block will not move, leaving dangling pointers in
the index?

Am I right that the answer is related to BlockIdData? If I store
BlockIds in an index, do I then have to worry about physical blocks on
the disk which might somehow move?

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane
Troels Arvin <[EMAIL PROTECTED]> writes:
> What kind of (logical) block identifier should I point to in my index?

CTID (block # + line #) is the only valid pointer from an index to a
table.  It doesn't change over the life of an index entry.  I think
though that you'd be making a serious mistake by not duplicating the
suffixes into the index (rather than expecting to retrieve them from the
table every time, as you seem to be contemplating).  You need to be able
to scan the index and identify rows matching a query without making lots
of probes into the table.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] SQL:2003 TODO items

2004-11-28 Thread Simon Riggs

Could I suggest that we re-work the TODO list slightly, with regard to
mentioning the ANSI SQL standard?

As of last count, we have 14 items listed which would take PostgreSQL
into full SQL:2003 Core conformance. Troels Arvin has recently listed
some of these as "low hanging fruit" in the recent thread about SQL
conformance related patch, 

Some SQL:2003 features are mentioned in other parts of the TODO list,
though we don't refer to whether these are Mandatory or Optional (e.g.
MERGE is Optional). ISTM that it would be useful to group the Mandatory
ones together into a new section of the TODO list: "SQL:2003 Core".

I'm aware that a number of those are possibly deliberately not
supported, in which case, we could mention those in brackets or simply
leave them off the list etc..

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin
On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote:

> CTID (block # + line #) is the only valid pointer from an index to a
> table.

Thanks.

> I think
> though that you'd be making a serious mistake by not duplicating the
> suffixes into the index (rather than expecting to retrieve them from the
> table every time, as you seem to be contemplating).

Yes, I've thought about this, and I may end up doing that.

> You need to be able
> to scan the index and identify rows matching a query without making lots
> of probes into the table.

But is it cheaper, IO-wise to "jump" around in an index than to go back
and forth between index and tuple blocks?

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] SQL:2003 TODO items

2004-11-28 Thread Bruce Momjian
Simon Riggs wrote:
> 
> Could I suggest that we re-work the TODO list slightly, with regard to
> mentioning the ANSI SQL standard?
> 
> As of last count, we have 14 items listed which would take PostgreSQL
> into full SQL:2003 Core conformance. Troels Arvin has recently listed
> some of these as "low hanging fruit" in the recent thread about SQL
> conformance related patch, 
> 
> Some SQL:2003 features are mentioned in other parts of the TODO list,
> though we don't refer to whether these are Mandatory or Optional (e.g.
> MERGE is Optional). ISTM that it would be useful to group the Mandatory
> ones together into a new section of the TODO list: "SQL:2003 Core".
> 
> I'm aware that a number of those are possibly deliberately not
> supported, in which case, we could mention those in brackets or simply
> leave them off the list etc..

Yes, we could create a new section or add SQL:2003 identifiers as part
of the description for certain items.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Stopgap solution for table-size-estimate updating

2004-11-28 Thread Simon Riggs
On Sun, 2004-11-28 at 18:52, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On the topic of accuracy of the estimate: Updates cause additional data
> > to be written to the table, so tables get bigger until vacuumed. Tables
> > with many Inserts are also regularly trimmed with Deletes. With a
> > relatively static workload and a regular vacuum cycle, the table size
> > for many major tables eventually levels off, remaining roughly constant
> > but the number of non-zero pages will vary over time in a saw-tooth
> > curve. Estimating the cardinality by using the number of blocks would
> > ignore the fact that many of them are empty for much of the time. That
> > would then lead to a systematic over-estimate of the cardinality of the
> > regularly updated tables.
> 
> You mean underestimate.  After a VACUUM, the tuples-per-page figure
> would be set to a relatively low value, and then subsequent inserts
> would fill in the free space, causing the actual density to rise
> while the physical number of blocks stays more or less constant.
> So the proposed method would always give an accurate number of blocks,
> but it would tend to underestimate the number of tuples in a dynamic
> situation.

OK, just *wrong* then (the estimate, as well as myself) :-)

> Still, it's better than the current method, which is likely to
> underestimate both parameters.  I believe that having an accurate block
> count and an underestimated tuple count would tend to favor choosing
> indexscans over seqscans, which is probably a good thing --- when was
> the last time you saw someone complaining that the planner had
> improperly chosen an indexscan over a seqscan?

Well, yes, but user perception is not always right.

Given we expect an underestimate, can we put in a correction factor
should the estimate get really low...sounds like we could end up
choosing nested joins more often when we should have chosen merge joins.
That is something that people do regularly complain about (indirectly).

> > How easy would it be to take into account the length of the FSM for the
> > relation also?
> 
> Don't think this would help; the FSM doesn't really track number of
> tuples.  Free space isn't a good guide to number of tuples because you
> can't distinguish inserts from updates at that level.  (I'm also a bit
> concerned about turning the FSM into a source of contention.)

Agreed.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [JDBC] Strange server error with current 8.0beta driver

2004-11-28 Thread Tom Lane
Oliver Jowett <[EMAIL PROTECTED]> writes:
>> Perhaps PerformCursorOpen should copy the query tree before planning, or 
>> plan in a different memory context?

> Patch attached. It moves query planning inside the new portal's memory 
> context. With this applied I can run Barry's testcase without errors, 
> and valgrind seems OK with it too.

I think the better solution is the first way (copy the querytree first).
The problem with the way you did it is that all the temporary structures
built by the planner will be left behind in the cursor's memory context,
and can't be reclaimed until the cursor is destroyed.  In the case of a
complex query that could represent a pretty serious memory leak.  It
seems better to eat the cost of copying the querytree an extra time,
especially since this way forms a patch that's easy to reverse whenever
we fix the planner to be less cavalier about scribbling on its input.

I've applied the attached patch instead (and analogously in 7.4 branch).
Would you confirm it fixes the problem you see?

regards, tom lane

*** src/backend/commands/portalcmds.c.orig  Thu Sep 16 12:58:28 2004
--- src/backend/commands/portalcmds.c   Sun Nov 28 17:02:22 2004
***
*** 62,73 
RequireTransactionChain((void *) stmt, "DECLARE CURSOR");
  
/*
 * The query has been through parse analysis, but not rewriting or
 * planning as yet.  Note that the grammar ensured we have a SELECT
 * query, so we are not expecting rule rewriting to do anything
 * strange.
 */
!   rewritten = QueryRewrite((Query *) stmt->query);
if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query))
elog(ERROR, "unexpected rewrite result");
query = (Query *) linitial(rewritten);
--- 62,82 
RequireTransactionChain((void *) stmt, "DECLARE CURSOR");
  
/*
+* Because the planner is not cool about not scribbling on its input,
+* we make a preliminary copy of the source querytree.  This prevents
+* problems in the case that the DECLARE CURSOR is in a portal and is
+* executed repeatedly.  XXX the planner really shouldn't modify its
+* input ... FIXME someday.
+*/
+   query = copyObject(stmt->query);
+ 
+   /*
 * The query has been through parse analysis, but not rewriting or
 * planning as yet.  Note that the grammar ensured we have a SELECT
 * query, so we are not expecting rule rewriting to do anything
 * strange.
 */
!   rewritten = QueryRewrite(query);
if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query))
elog(ERROR, "unexpected rewrite result");
query = (Query *) linitial(rewritten);

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Thomas Hallgren
Joe Conway wrote:
Thomas Hallgren wrote:
Peter Eisentraut wrote:
Am Sonntag, 28. November 2004 12:33 schrieb Thomas Hallgren:
Hmm, ok. But there's no way to stream them in and out from disk. From
what I can see, you have to bring all of it into memory. Not so ideal
perhaps if you want to provide streaming media for thousands of users.

You can use the substring function to read the pieces you need.
Won't the substring function bring the whole thing into memory in the 
backend before it pass you the piece you need? Let's assume I want to 
stream 4k at a time of a 40mb image, that's a whole lot of byte 
swapping if that's the case.

Not if the column is storage type EXTERNAL. See a past discussion here:
http://archives.postgresql.org/pgsql-general/2003-07/msg01447.php

How do you handle writes without first creating the whole image in 
memory?

You can't currently, but it would be a nice addition ;-)
I agree with Peter -- I think effort is better spent improving bytea.
BTW, someone on GENERAL just started a very similar thread 
("implementing a lo_truncate operation, and possibly a lo_size"). I've 
cc'd him here.

Seems to me both solutions have things (and people) speaking for them.
My goal is to provide a handle to a large amount of data. In Java, this 
is a java.sql.Blob/Clob and it maps to an SQL locator. This object 
supports random access so you can do the equivalent of seek, tell, read, 
write, and truncate. A Blob/Clob object must be valid for the duration 
of a transaction.

Mapping this object to a LargeObjectDesc and get full functionality 
seems to be fairly easy. The only thing missing is the "truncate" since 
a "size" function can use seek with SEEK_END on a temporary descriptor copy.

Another fairly significant advantage using large objects is that the 
client JDBC driver is using it. I'd like to keep my driver as similar as 
possible.

Mapping a Blob to a bytea appears to be much more difficult and the 
result is more limited (half the size).

I understand from Peter and Joe that they consider bytea to be the 
future. One thing to really consider for the future is the current 1 GB 
limit. I'd like to see a future where bytea remains more or less as it 
is today, i.e. optimal for sizes 1 GB and less, and LargeObjectDesc is 
rewritten to use 64 bit quantities for seek, size, and tell, thus 
allowing for much larger objects to be stored.

What is the quality of the large object solution today. Does it have 
known flaws that nobody cares about since it's discontinued or is it 
considered a maintained and worthy part of the overall solution?

Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Stopgap solution for table-size-estimate updating problem

2004-11-28 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Given we expect an underestimate, can we put in a correction factor
> should the estimate get really low...sounds like we could end up
> choosing nested joins more often when we should have chosen merge joins.

One possibility: vacuum already knows how many tuples it removed.  We
could set reltuples equal to, say, the mean of the number-of-tuples-
after-vacuuming and the number-of-tuples-before.  In a steady state
situation this would represent a fairly reasonable choice.  In cases
where the table size has actually decreased permanently, it'd take a few
cycles of vacuuming before reltuples converges to the new value, but that
doesn't seem too bad.

A standalone ANALYZE should still do what it does now, though, I think;
namely set reltuples to its best estimate of the current value.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane
Troels Arvin <[EMAIL PROTECTED]> writes:
> On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote:
>> You need to be able
>> to scan the index and identify rows matching a query without making lots
>> of probes into the table.

> But is it cheaper, IO-wise to "jump" around in an index than to go back
> and forth between index and tuple blocks?

Perhaps not --- but why would you be "jumping around"?  Wouldn't the
needed info appear in consecutive locations in the index?

Even if that's not the case, the index should be much denser than the
table because it's only storing the keys and not the rest of the
columns.  So I'd expect less net I/O even if the access pattern is just
as random.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Adding Reply-To: to Lists configuration ...

2004-11-28 Thread Marc G. Fournier
What is the general opinion of this?  I'd like to implement it, but not so 
much so that I'm going to beat my head against a brick wall on it ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] psql and schemas

2004-11-28 Thread Neil Conway
On Sat, 2004-11-27 at 23:11 -0500, Bruce Momjian wrote:
> Is there a TODO here?  Or a few?

Sure: you could add a TODO item like "Improve psql schema behavior", and
assign it to me. I'll send in a patch that implements the behavior I
proposed for 8.1

-Neil



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Adding Reply-To: to Lists configuration ...

2004-11-28 Thread Tom Lane
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> What is the general opinion of this?  I'd like to implement it, but not so 
> much so that I'm going to beat my head against a brick wall on it ...

I think we've discussed this in the past, and the consensus has always
been that more people like it as-is than want to change it.  I'm
certainly in the leave-it-as-is camp.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Stopgap solution for table-size-estimate updating

2004-11-28 Thread Simon Riggs
On Sun, 2004-11-28 at 22:35, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Given we expect an underestimate, can we put in a correction factor
> > should the estimate get really low...sounds like we could end up
> > choosing nested joins more often when we should have chosen merge joins.
> 
> One possibility: vacuum already knows how many tuples it removed.  We
> could set reltuples equal to, say, the mean of the number-of-tuples-
> after-vacuuming and the number-of-tuples-before.  In a steady state
> situation this would represent a fairly reasonable choice.  In cases
> where the table size has actually decreased permanently, it'd take a few
> cycles of vacuuming before reltuples converges to the new value, but that
> doesn't seem too bad.

That sounds good to me.  Covers all cases I can see from here.

> A standalone ANALYZE should still do what it does now, though, I think;
> namely set reltuples to its best estimate of the current value.

A GUC-free solution...but yet manual control is possible. Sounds good to
me - and for you Andreas, also?

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread David Garamond
Joe Conway wrote:
Not if the column is storage type EXTERNAL. See a past discussion here:
http://archives.postgresql.org/pgsql-general/2003-07/msg01447.php
what is the reasoning behind this syntax?
 ALTER TABLE [ ONLY ] table [ * ]
 ALTER [ COLUMN ] column SET STORAGE
 { PLAIN | EXTERNAL | EXTENDED | MAIN }
I find it nonintuitive and hard to remember. Perhaps something like this 
is better (I know, it's probably too late):

 ALTER [ COLUMN ] column SET STORAGE { INLINE | EXTERNAL }
 ALTER [ COLUMN ] column SET COMPRESSION { YES | NO }
--
dave
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Tom Lane
David Garamond <[EMAIL PROTECTED]> writes:
> I find it nonintuitive and hard to remember. Perhaps something like this 
> is better (I know, it's probably too late):

>   ALTER [ COLUMN ] column SET STORAGE { INLINE | EXTERNAL }
>   ALTER [ COLUMN ] column SET COMPRESSION { YES | NO }

The semantics are not simply two independent variables, however.
In particular, IIRC the precedence of different possible actions
is such that you couldn't cleanly express it that way.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [pgsql-www] pg_autovacuum is nice ... but ...

2004-11-28 Thread Bruce Momjian

Should I add a TODO to warn if FSM values are too small?  Is that doable?

---

Marc G. Fournier wrote:
> 
> Moved to -hackers where this belongs :)
> 
> On Fri, 5 Nov 2004, Justin Clift wrote:
> 
> > Tom Lane wrote:
> > 
> >> Yup. 2 < 23072, so you're losing some proportion of FSM entries.
> >> What's worse, the FSM relation table is maxed out (1000 = 1000) which
> >> suggests that there are relations not being tracked at all; you have
> >> no idea how much space is getting leaked in those.
> >> 
> >> You can determine the number of relations potentially needing FSM
> >> entries by
> >>select count(*) from pg_class where relkind in ('r','i','t');
> >> --- sum over all databases in the cluster to get the right result.
> >> 
> >> Once you've fixed max_fsm_relations, do vacuums in all databases, and
> >> then vacuum verbose should give you a usable lower bound for
> >> max_fsm_pages.
> >
> > Would making max_fsm_relations and max_fsm_pages dynamically update 
> > themselves whilst PostgreSQL runs be useful?  Sounds like they're the 
> > kind of things that many people would receive maximum benefit if 
> > PostgreSQL altered these settings as needed itself.
> 
> I'm not sure if I like this one too much ... but it would be nice if 
> something like this triggered a warning in the logs, maybe a feature of 
> pg_autovacuum itself?
> 
> 
> Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
> Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] unnest

2004-11-28 Thread Bruce Momjian

I assume this is not something for our PostgreSQL CVS, even the later
SRF implementation.

---

John Hansen wrote:
> Attached, array -> rows iterator.
> 
> select * from unnest(array[1,2,3,4,5]);
> 
> Unnest
> ---
>  1
>  2
>  3
>  4
>  5
> 5 rows
> 
> The switch statement could probably be done in a different way, but
> there doesn't seem to be any good examples of how to return anyitem. If
> anyone have a better way, please let me know.
> 
> Does anyone know how to check individual array elements for NULL values?
> PG_ARG_ISNULL() seems to return true if ANY array element is null; ex::
> array[1,2,3,null,4,5]
> 
> Comments / improvements welcome.
> 
> Kind regards,
> 
> John
> 

Content-Description: Makefile

[ Attachment, skipping... ]

Content-Description: unnest.c

[ Attachment, skipping... ]

Content-Description: unnest.sql

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html



[HACKERS] Auto Vacuum

2004-11-28 Thread Russell Smith
Hi All,

I am doing serious thinking about the implementation of Auto Vacuum as part of 
the backend, Not using libpq, but classing internal functions directly.
It appears to me that calling internal functions directly is a better 
implementation than using the external library to do the job.

I know I might be stepping on Matthew's toes, but I don't really want to.  I am 
a complete newbie to the postgresql code, however I am trying.
Vacuum appears to be one of the bigger saw points with administrator having to 
configure it via scheduled tasks.

I have outlined things I have thought about below.  I've surely missed a lot, 
and am open to feedback.  Others may like the current tuning used
by the external autovacuum, however to have stats that are collected at vacuum, 
and used for later vacuum would mean we don't need the
stats collector running.

The major autovacuum issues 

1. Transaction Wraparound
2. Vacuum of relations
3. Tracking of when to do vacuums
4. Where to store information needed by auto vacuum

I would suggest the first step is to replicate the wraparound and relation 
vacuuming from the current code first. But I could be wrong.
Currently there are a lot of tuning options in the external autovacuum, and the 
best way to run vacuum internally will need to be thought out.

1. Transaction Wraparound

It appears from the code, that the best time to do a transaction wraparound db 
wide vacuum is when the
frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform->datfrozenxid).  And 
is probably the most
simple vacuum to implement.


2. Vacuuming of relations

Currently, the entire heap must be vacuumed at one time.  I would possible be 
desireable to have only part of the relation vacuumed at
a time.  If you can find out which parts of the relation have the most slack 
space.  There is a todo item regarding tracking recent deletions
so they can be resused.  Some form of this would be helpful to work out what to 
vacuum.  Performance issues for this type of activity 
may be a concern.  But I have no experience to be able to make comment on them. 
 So I welcome yours.

3. Tracking of when to vacuum

Current autovacuum relies the stats collector to be running.  I would like to 
only use the stats if they are available,
and have an option to be able to vacuum accurately without having to have stats 
running.

By adding certain histograms, on tuples, filesize and slack space we can guage 
the time between vacuums

number of tuples will show the inserts as opposed to updates.
file size will show that the file is growing and by how much between vacuums.
slack space will show the delete/updated records.

A new guc and relation option would need to be implemented to give a target 
slack space in a file.
this will help to reduce growth in relations if vacuum happens to not run 
frequently enough.  This information
can also inform autovacuum that it should be vacuuming more frequently.  The 
number would be a percentage,
eg 10% of the total file size is allowed to be unused.  Also alter table would 
allow users to set levels of slackness
for each relation.  If the table gets too much more than the target slack 
space, a "partial" vacuum full could be run
to reduce the size of the table by moving tuples at the end of the table to 
slack space near the beginning and 
shortening the table length.  It would require a full table lock, but you may 
be able to space it out, to only do a page
at a time.

/* target percentage of slack space */
vacuum_default_target_slack_space = 10

ALTER TABLE SET TARGET SLACK SPACE = 10;

4. Where to store information required by auto vacuum.

Auto vacuum needs somewhere to stop the information it needs about current and 
future vacuuming.  I am unsure of where
to put this.  It appears as if it will have a number of fields.  I feel like 
pg_class is the best place to put the information, but 
again I have no idea.

That's the best I can do for now.  I can clarify things further if required.

Regards

Russell.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Documentation on PITR still scarce

2004-11-28 Thread Bruce Momjian

Is this a TODO?

---

Greg Stark wrote:
> 
> Tom Lane <[EMAIL PROTECTED]> writes:
> 
> > I suppose it might be useful to have some kind of "suspended animation"
> > behavior where you could bring up a backend and look at the database in
> > a strict read-only fashion, not really executing transactions at all,
> > just to see what you had.  Then you could end the recovery and go to
> > normal operations, or allow the recovery to proceed further if you
> > decided this wasn't where you wanted to be yet.  However that would
> > require a great deal of mechanism we haven't got (yet).  In particular
> > there is no such thing as strict read-only examination of the database.
> 
> That would be a great thing to have one day for other reasons aside from the
> ability to test out a recovered database. It makes warm standby databases much
> more useful.
> 
> A warm standby is when you keep a second machine constantly up to date by
> applying the archived PITR logs as soon as they come off your server. You're
> ready to switch over at the drop of a hat and don't have to go through the
> whole recovery process, you just switch the database from recovery mode to
> active mode and make it your primary database. But in the until then the
> backup hardware languishes, completely useless.
> 
> Oracle has had a feature for a long time that you can actually open the
> standby database in a strict read-only mode and run queries. This is great for
> a data warehouse situation where you want to run long batch jobs against
> recent data.
> 
> -- 
> greg
> 
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Status of server side Large Object support?

2004-11-28 Thread Tom Lane
Thomas Hallgren <[EMAIL PROTECTED]> writes:
> What is the quality of the large object solution today. Does it have 
> known flaws that nobody cares about since it's discontinued or is it 
> considered a maintained and worthy part of the overall solution?

More the former than the latter, I think, at least in the minds of
the usual suspects for backend work.

The main problem I'd see with the idea of supporting over-2GB LOs is
that we store all LOs in a database in the same table (pg_largeobject)
and so you would run into the table size limit (around 16TB IIRC) with
not an amazingly large number of such LOs.  We used to store each LO in
its own table but that was not better, as a few thousand LOs could
easily bring the filesystem to its knees (on platforms where the
directory lookup mechanism doesn't scale to huge numbers of entries in
a single directory).  I don't think there'd be any point in upgrading
the LO support to 64 bits without some rethinking of the underlying
storage structure.

A generic issue with LOs is the extreme pain involved in dump/reload;
not only the difficulty of transporting the LOs themselves, but that
of updating references to them from the database.  Vacuuming
no-longer-referenced LOs is a serious problem too.  If LOs were
considered a first-class feature then I'd want to see more interest
in dealing with those problems.

Lesser issues with LOs are protection (there isn't any), user-accessible
locking (there isn't any), MVCC (there isn't any).  The latter has been
on the to-do list since
http://archives.postgresql.org/pgsql-hackers/2002-05/msg00875.php
I think it could actually be fixed now without too much pain because
there is a mechanism for finding out the surrounding query's snapshot,
which functions could not do before 8.0.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Auto Vacuum

2004-11-28 Thread Bruce Momjian

I have added an auto-vacuum TODO item:

* Auto-vacuum
o Move into the backend code
o Scan the buffer cache to find free space or use background writer
o Use free-space map information to guide refilling


---

Russell Smith wrote:
> Hi All,
> 
> I am doing serious thinking about the implementation of Auto Vacuum as part 
> of the backend, Not using libpq, but classing internal functions directly.
> It appears to me that calling internal functions directly is a better 
> implementation than using the external library to do the job.
> 
> I know I might be stepping on Matthew's toes, but I don't really want to.  I 
> am a complete newbie to the postgresql code, however I am trying.
> Vacuum appears to be one of the bigger saw points with administrator having 
> to configure it via scheduled tasks.
> 
> I have outlined things I have thought about below.  I've surely missed a lot, 
> and am open to feedback.  Others may like the current tuning used
> by the external autovacuum, however to have stats that are collected at 
> vacuum, and used for later vacuum would mean we don't need the
> stats collector running.
> 
> The major autovacuum issues 
> 
> 1. Transaction Wraparound
> 2. Vacuum of relations
> 3. Tracking of when to do vacuums
> 4. Where to store information needed by auto vacuum
> 
> I would suggest the first step is to replicate the wraparound and relation 
> vacuuming from the current code first. But I could be wrong.
> Currently there are a lot of tuning options in the external autovacuum, and 
> the best way to run vacuum internally will need to be thought out.
> 
> 1. Transaction Wraparound
> 
> It appears from the code, that the best time to do a transaction wraparound 
> db wide vacuum is when the
> frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform->datfrozenxid).  
> And is probably the most
> simple vacuum to implement.
> 
> 
> 2. Vacuuming of relations
> 
> Currently, the entire heap must be vacuumed at one time.  I would possible be 
> desireable to have only part of the relation vacuumed at
> a time.  If you can find out which parts of the relation have the most slack 
> space.  There is a todo item regarding tracking recent deletions
> so they can be resused.  Some form of this would be helpful to work out what 
> to vacuum.  Performance issues for this type of activity 
> may be a concern.  But I have no experience to be able to make comment on 
> them.  So I welcome yours.
> 
> 3. Tracking of when to vacuum
> 
> Current autovacuum relies the stats collector to be running.  I would like to 
> only use the stats if they are available,
> and have an option to be able to vacuum accurately without having to have 
> stats running.
> 
> By adding certain histograms, on tuples, filesize and slack space we can 
> guage the time between vacuums
> 
> number of tuples will show the inserts as opposed to updates.
> file size will show that the file is growing and by how much between vacuums.
> slack space will show the delete/updated records.
> 
> A new guc and relation option would need to be implemented to give a target 
> slack space in a file.
> this will help to reduce growth in relations if vacuum happens to not run 
> frequently enough.  This information
> can also inform autovacuum that it should be vacuuming more frequently.  The 
> number would be a percentage,
> eg 10% of the total file size is allowed to be unused.  Also alter table 
> would allow users to set levels of slackness
> for each relation.  If the table gets too much more than the target slack 
> space, a "partial" vacuum full could be run
> to reduce the size of the table by moving tuples at the end of the table to 
> slack space near the beginning and 
> shortening the table length.  It would require a full table lock, but you may 
> be able to space it out, to only do a page
> at a time.
> 
> /* target percentage of slack space */
> vacuum_default_target_slack_space = 10
> 
> ALTER TABLE SET TARGET SLACK SPACE = 10;
> 
> 4. Where to store information required by auto vacuum.
> 
> Auto vacuum needs somewhere to stop the information it needs about current 
> and future vacuuming.  I am unsure of where
> to put this.  It appears as if it will have a number of fields.  I feel like 
> pg_class is the best place to put the information, but 
> again I have no idea.
> 
> That's the best I can do for now.  I can clarify things further if required.
> 
> Regards
> 
> Russell.
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)--

Re: [HACKERS] Auto Vacuum

2004-11-28 Thread Gavin Sherry
Matthew T. O'Connor looked at this fairly closely leading up to 8.0
feature freeze. There was a long discussion earlier this year with respect
to libpq vs. using backend functions directly to vacuum multiple
databases.

http://archives.postgresql.org/pgsql-hackers/2004-03/msg00931.php

This should be mandatory reading before proceeding. Also, you want to talk
to Matthew and review his patch (submitted early July, I think).

Thanks,

Gavin

On Sun, 28 Nov 2004, Bruce Momjian wrote:

>
> I have added an auto-vacuum TODO item:
>
> * Auto-vacuum
> o Move into the backend code
> o Scan the buffer cache to find free space or use background writer
> o Use free-space map information to guide refilling
>
>
> ---
>
> Russell Smith wrote:
> > Hi All,
> >
> > I am doing serious thinking about the implementation of Auto Vacuum as part 
> > of the backend, Not using libpq, but classing internal functions directly.
> > It appears to me that calling internal functions directly is a better 
> > implementation than using the external library to do the job.
> >
> > I know I might be stepping on Matthew's toes, but I don't really want to.  
> > I am a complete newbie to the postgresql code, however I am trying.
> > Vacuum appears to be one of the bigger saw points with administrator having 
> > to configure it via scheduled tasks.
> >
> > I have outlined things I have thought about below.  I've surely missed a 
> > lot, and am open to feedback.  Others may like the current tuning used
> > by the external autovacuum, however to have stats that are collected at 
> > vacuum, and used for later vacuum would mean we don't need the
> > stats collector running.
> >
> > The major autovacuum issues
> >
> > 1. Transaction Wraparound
> > 2. Vacuum of relations
> > 3. Tracking of when to do vacuums
> > 4. Where to store information needed by auto vacuum
> >
> > I would suggest the first step is to replicate the wraparound and relation 
> > vacuuming from the current code first. But I could be wrong.
> > Currently there are a lot of tuning options in the external autovacuum, and 
> > the best way to run vacuum internally will need to be thought out.
> >
> > 1. Transaction Wraparound
> >
> > It appears from the code, that the best time to do a transaction wraparound 
> > db wide vacuum is when the
> > frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform->datfrozenxid).  
> > And is probably the most
> > simple vacuum to implement.
> >
> >
> > 2. Vacuuming of relations
> >
> > Currently, the entire heap must be vacuumed at one time.  I would possible 
> > be desireable to have only part of the relation vacuumed at
> > a time.  If you can find out which parts of the relation have the most 
> > slack space.  There is a todo item regarding tracking recent deletions
> > so they can be resused.  Some form of this would be helpful to work out 
> > what to vacuum.  Performance issues for this type of activity
> > may be a concern.  But I have no experience to be able to make comment on 
> > them.  So I welcome yours.
> >
> > 3. Tracking of when to vacuum
> >
> > Current autovacuum relies the stats collector to be running.  I would like 
> > to only use the stats if they are available,
> > and have an option to be able to vacuum accurately without having to have 
> > stats running.
> >
> > By adding certain histograms, on tuples, filesize and slack space we can 
> > guage the time between vacuums
> >
> > number of tuples will show the inserts as opposed to updates.
> > file size will show that the file is growing and by how much between 
> > vacuums.
> > slack space will show the delete/updated records.
> >
> > A new guc and relation option would need to be implemented to give a target 
> > slack space in a file.
> > this will help to reduce growth in relations if vacuum happens to not run 
> > frequently enough.  This information
> > can also inform autovacuum that it should be vacuuming more frequently.  
> > The number would be a percentage,
> > eg 10% of the total file size is allowed to be unused.  Also alter table 
> > would allow users to set levels of slackness
> > for each relation.  If the table gets too much more than the target slack 
> > space, a "partial" vacuum full could be run
> > to reduce the size of the table by moving tuples at the end of the table to 
> > slack space near the beginning and
> > shortening the table length.  It would require a full table lock, but you 
> > may be able to space it out, to only do a page
> > at a time.
> >
> > /* target percentage of slack space */
> > vacuum_default_target_slack_space = 10
> >
> > ALTER TABLE SET TARGET SLACK SPACE = 10;
> >
> > 4. Where to store information required by auto vacuum.
> >
> > Auto vacuum needs somewhere to stop the information it needs about current 
> > and future vacuuming.  I am unsure of where
> > to put this.  It appears as if it will have a number of fields. 

Re: [HACKERS] Adding Reply-To: to Lists configuration ...

2004-11-28 Thread Jim Seymour
"Marc G. Fournier" <[EMAIL PROTECTED]> wrote:
> 
> 
> What is the general opinion of this?  I'd like to implement it, but not so 
> much so that I'm going to beat my head against a brick wall on it ...

The procmail rules I set up for each mailing list to which I subscribe
sets Reply-To to the mailing list *unless* there's already a Reply-To
set.  If you're going to do it, that's what I'd recommend doing.

Jim

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Adding Reply-To: to Lists configuration ...

2004-11-28 Thread Bruce Momjian

Are you saying this is going to make it impossible for me to reply just
to the poster, or is this an option that is set by the user via majordomo?

---

Jim Seymour wrote:
> "Marc G. Fournier" <[EMAIL PROTECTED]> wrote:
> > 
> > 
> > What is the general opinion of this?  I'd like to implement it, but not so 
> > much so that I'm going to beat my head against a brick wall on it ...
> 
> The procmail rules I set up for each mailing list to which I subscribe
> sets Reply-To to the mailing list *unless* there's already a Reply-To
> set.  If you're going to do it, that's what I'd recommend doing.
> 
> Jim
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Adding Reply-To: to Lists configuration

2004-11-28 Thread Marc G. Fournier
On Sun, 28 Nov 2004, Bruce Momjian wrote:
Are you saying this is going to make it impossible for me to reply just
to the poster, or is this an option that is set by the user via majordomo?
No, the poster will still be included as part of the headers ... what 
happens, at least under Pine, is that I am prompted whther I want to honor 
the reply-to, if I hit 'y', then the other headers *are* strip'd and the 
mail is set right back to the list ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian

OK, what solutions do we have for this?  Not being able to load dumped
data is a serious bug.  I have added this to the open items list:

* fix COPY CSV with \r,\n in data

My feeling is that if we are in a quoted string we just process whatever
characters we find, even passing through an EOL.  I realize it might not
mark missing quote errors well but that seems minor compared to not
loading valid data.

---

Andrew Dunstan wrote:
> 
> This example should fail  on data line 2 or 3 on any platform, 
> regardless of the platform's line-end convention, although I haven't 
> tested on Windows.
> 
> cheers
> 
> andrew
> 
> [EMAIL PROTECTED] inst]$ bin/psql -e -f csverr.sql ; od -c 
> /tmp/csverrtest.csv
> create table csverrtest (a int, b text, c int);
> CREATE TABLE
> insert into csverrtest values(1,'a',1);
> INSERT 122471 1
> insert into csverrtest values(2,'foo\r\nbar',2);
> INSERT 122472 1
> insert into csverrtest values(3,'baz\nblurfl',3);
> INSERT 122473 1
> insert into csverrtest values(4,'d',4);
> INSERT 122474 1
> insert into csverrtest values(5,'e',5);
> INSERT 122475 1
> copy csverrtest to '/tmp/csverrtest.csv' csv;
> COPY
> truncate csverrtest;
> TRUNCATE TABLE
> copy csverrtest from '/tmp/csverrtest.csv' csv;
> psql:cvserr.sql:9: ERROR:  literal carriage return found in data
> HINT:  Use "\r" to represent carriage return.
> CONTEXT:  COPY csverrtest, line 2: "2,"foo"
> drop table csverrtest;
> DROP TABLE
> 000   1   ,   a   ,   1  \n   2   ,   "   f   o   o  \r  \n   b   a
> 020   r   "   ,   2  \n   3   ,   "   b   a   z  \n   b   l   u   r
> 040   f   l   "   ,   3  \n   4   ,   d   ,   4  \n   5   ,   e   ,
> 060   5  \n
> 062
> [EMAIL PROTECTED] inst]$
> 
> Bruce Momjian wrote:
> 
> >Can I see an example of such a failure line?
> >
> >---
> >
> >Andrew Dunstan wrote:
> >  
> >
> >>Darcy Buskermolen has drawn my attention to unfortunate behaviour of 
> >>COPY CSV with fields containing embedded line end chars if the embedded 
> >>sequence isn't the same as those of the file containing the CSV data. In 
> >>that case we error out when reading the data in. This means there are 
> >>cases where we can produce a CSV data file which we can't read in, which 
> >>is not at all pleasant.
> >>
> >>Possible approaches to the problem:
> >>. make it a documented limitation
> >>. have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that 
> >>relaxes some of the restrictions on inconsistent line endings
> >>. escape embedded line end chars
> >>
> >>The last really isn't an option, because the whole point of CSVs is to 
> >>play with other programs, and my understanding is that those that 
> >>understand multiline fields (e.g. Excel) expect them not to be escaped, 
> >>and do not produce them escaped.
> >>
> >>So right now I'm tossing up in my head between the first two options. Or 
> >>maybe there's another solution I haven't thought of.
> >>
> >>Thoughts?
> >>
> >>cheers
> >>
> >>andrew
> >>
> >>---(end of broadcast)---
> >>TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
> >>
> >>
> >>
> >
> >  
> >
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Adding Reply-To: to Lists configuration ...

2004-11-28 Thread Bruce Momjian
Marc G. Fournier wrote:
> On Sun, 28 Nov 2004, Bruce Momjian wrote:
> 
> >
> > Are you saying this is going to make it impossible for me to reply just
> > to the poster, or is this an option that is set by the user via majordomo?
> 
> No, the poster will still be included as part of the headers ... what 
> happens, at least under Pine, is that I am prompted whther I want to honor 
> the reply-to, if I hit 'y', then the other headers *are* strip'd and the 
> mail is set right back to the list ...

I think my mail reader, 'elm', will always honor the reply-to, which is
bad I think.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] [HACKERS] Adding Reply-To: to Lists

2004-11-28 Thread Doug McNaught
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:

> No, the poster will still be included as part of the headers ... what
> happens, at least under Pine, is that I am prompted whther I want to
> honor the reply-to, if I hit 'y', then the other headers *are* strip'd
> and the mail is set right back to the list ...

I'm in the "Reply-To considered harmful" camp.  I also don't see any
real evidence that the current setup is causing problems. 

-Doug

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> OK, what solutions do we have for this?  Not being able to load dumped
> data is a serious bug.

Which we do not have, because pg_dump doesn't use CSV.  I do not think
this is a must-fix, especially not if the proposed fix introduces
inconsistencies elsewhere.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] -V, --version -- deprecated?

2004-11-28 Thread Neil Conway
On Wed, 2004-11-24 at 20:25 -0500, Bruce Momjian wrote:
> FreeBSD had a problem with double-dash args but I thought that related
> to getopt, and I can't remember how that fits in.  Maybe we defined '-'
> in getopt and said it took an argument and tested for '-help' and
> '-verbose', but now we just check argv right inside main.  I can't
> remember totally.

ISTM that port/getopt_long.c ought to allow long options to work even if
the platform doesn't provide a getopt_long() itself.

BTW, pg_dump's "-X ..." options seem weird. Why is the "-X" prefix
necessary? ISTM pg_dump would be more consistent with standard
command-line tools if we just provided the long options (such as
--disable-triggers and so on) and did away with the "-X" prefixes.

I'd like to propose these changes:

(1) remove documentation for "-V", declare it deprecated. I don't see
any reason to actually remove it, but this should at least make the
current status quo more consistent.

(2) add documentation for "--help" and "--version" flags, where
appropriate

(3) remove documentation for pg_dump's "-X ..." flags, just document the
--long-option variant. Again, I don't see a need to remove support for
the -X options, but we should declare them deprecated.

Comments?

> Agreed.  psql --help certainly looks inconsistent --- only --help and
> --version are long.

Well, perhaps, but I don't think that's a problem (there is no reason
that _every_ command-line flag needs to have both long and short
options).

-Neil



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] bug fix request

2004-11-28 Thread Christopher Kings-Lynne
I would like to beg for some sort of fix to psql's handling of COPY data 
if the COPY fails.

Presently you get a million lines of '\N command not recognised' and 
various other random things because if a line of the copy fails due to 
say a FK constraint, or even if the COPY is run in an aborted 
transaction, it tries to execute all the stdin data as actual 
statements.  It's really, really, really annoying.

Also, sometimes when you copy and paste SQL into a psql window, it 
executes help on commands for each line, although it doesn't affect the 
paste.  That is also really annoying.  I'll add to this email when it 
happens to me again, cos I tried a few pastes and couldn't reproduce it :/

Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > OK, what solutions do we have for this?  Not being able to load dumped
> > data is a serious bug.
> 
> Which we do not have, because pg_dump doesn't use CSV.  I do not think
> this is a must-fix, especially not if the proposed fix introduces
> inconsistencies elsewhere.

Sure, pg_dump doesn't use it but COPY should be able to load anything it
output.

Can this be fixed if we ignore the problem with reporting errors?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] bug fix request

2004-11-28 Thread Tim Allen
Christopher Kings-Lynne wrote:
Also, sometimes when you copy and paste SQL into a psql window, it 
executes help on commands for each line, although it doesn't affect the 
paste.  That is also really annoying.  I'll add to this email when it 
happens to me again, cos I tried a few pastes and couldn't reproduce it :/
That last one is probably due to tabs in the pasted text. Happens to me 
all the time...

Chris
Tim
--
---
Tim Allen  [EMAIL PROTECTED]
Proximity Pty Ltd  http://www.proximity.com.au/
  http://www4.tpg.com.au/users/rita_tim/
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Adding Reply-To: to Lists configuration ...

2004-11-28 Thread Peter Eisentraut
Marc G. Fournier wrote:
> What is the general opinion of this?  I'd like to implement it, but
> not so much so that I'm going to beat my head against a brick wall on
> it ...

Please, please, please, please don't.  The choice of the reply path lies 
with the author or the replier, not with an intermediate party.  Both 
of the former two parties have adequate technical means to achieve 
their preferred choice automatically [sender: Mail-Followup-To; 
receiver: reply-to functions in the mail client].  Writing "please Cc 
me, I'm not subscribed" is not one of them.

We have many people writing to the lists while not being subscribed.  
Consider people writing to pgsql-bugs and not getting replies.  I've 
had that happen to me too many times in other forums.

Also, I *want* to be in the recipient list of replies to my posts, so 
it's easier to find these posts.

We've done quite well with the current setup, so I don't see a need to 
tinker with it.  I've always found the Reply-to-enabled lists I'm on to 
be a more lossy medium.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] [HACKERS] Adding Reply-To: to Lists

2004-11-28 Thread Andrew Dunstan
Doug McNaught said:
> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
>
>> No, the poster will still be included as part of the headers ... what
>> happens, at least under Pine, is that I am prompted whther I want to
>> honor the reply-to, if I hit 'y', then the other headers *are* strip'd
>> and the mail is set right back to the list ...
>
> I'm in the "Reply-To considered harmful" camp.  I also don't see any
> real evidence that the current setup is causing problems.
>


And the historical document referred to can be found here:

http://www.unicom.com/pw/reply-to-harmful.html

and an opposing view here:

http://www.metasystema.net/essays/reply-to.mhtml


cheers

andrew



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] bug fix request

2004-11-28 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> I would like to beg for some sort of fix to psql's handling of COPY data 
> if the COPY fails.

> Presently you get a million lines of '\N command not recognised' and 
> various other random things because if a line of the copy fails due to 
> say a FK constraint, or even if the COPY is run in an aborted 
> transaction, it tries to execute all the stdin data as actual 
> statements.

I'd like to see a test case for this in v3 protocol.  It was definitely
a problem before that, but I thought I fixed it.

> Also, sometimes when you copy and paste SQL into a psql window, it 
> executes help on commands for each line, although it doesn't affect the 
> paste.  That is also really annoying.

Disable tab completion, or don't paste tabs.  I don't think psql can be
expected to recognize that a tab is coming from pasted input.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] multiline CSV fields

2004-11-28 Thread Andrew Dunstan
Bruce Momjian said:
> Tom Lane wrote:
>> Bruce Momjian <[EMAIL PROTECTED]> writes:
>> > OK, what solutions do we have for this?  Not being able to load
>> > dumped data is a serious bug.
>>
>> Which we do not have, because pg_dump doesn't use CSV.  I do not think
>> this is a must-fix, especially not if the proposed fix introduces
>> inconsistencies elsewhere.
>
> Sure, pg_dump doesn't use it but COPY should be able to load anything
> it output.
>
> Can this be fixed if we ignore the problem with reporting errors?
>


When I looked at it I could not see any simple fix that was not worse than
the symptom. If the asymmetry offends you, then we could do as Tom suggested
and rip out the multiline processing completely for now. Personally I would
regard that as a pity, as it would disallow many cases that will work quite
happily as we are, and because this is a feature that was requested when we
did this work. The limitation has been documented - my incliniation would be
to revisit this during the 8.1 dev cycle.

cheers

andrew



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] bug fix request

2004-11-28 Thread Christopher Kings-Lynne
Presently you get a million lines of '\N command not recognised' and 
various other random things because if a line of the copy fails due to 
say a FK constraint, or even if the COPY is run in an aborted 
transaction, it tries to execute all the stdin data as actual 
statements.
I'd like to see a test case for this in v3 protocol.  It was definitely
a problem before that, but I thought I fixed it.
This is with 7.4 and it still does it  Here is an example:
Make file called test.sql:
BEGIN;
CREATE TABLE blah (a int4);
COPY blah FROM STDIN;
1
2
3
\.
COMMIT;
Now, execute it TWICE on a clean database:
-bash-2.05b$ psql -f test.sql test
BEGIN
CREATE TABLE
COMMIT
-bash-2.05b$ psql -f test.sql test
BEGIN
psql:test.sql:2: ERROR:  relation "blah" already exists
psql:test.sql:3: ERROR:  current transaction is aborted, commands 
ignored until end of transaction block
psql:test.sql:7: invalid command \.
psql:test.sql:8: ERROR:  syntax error at or near "1" at character 1
-bash-2.05b$

Also, sometimes when you copy and paste SQL into a psql window, it 
executes help on commands for each line, although it doesn't affect the 
paste.  That is also really annoying.
Disable tab completion, or don't paste tabs.  I don't think psql can be
expected to recognize that a tab is coming from pasted input.
How about if it's in a literal, eg. function definition.
Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] [GENERAL] Adding Reply-To: to Lists configuration ...

2004-11-28 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> We've done quite well with the current setup, so I don't see a need to 
> tinker with it.  I've always found the Reply-to-enabled lists I'm on to 
> be a more lossy medium.

The basic issue is that the current setup encourages
reply-to-author-and-list, while adding Reply-To encourages
reply-to-list-only (at least when the replier is using one of the mail
clients I'm used to).  Peter correctly notes that reply-to-list-only
creates problems for authors who aren't subscribed.  The other point
that looms large for me is that reply-to-list-only forces every
conversation to occur just at the speed and reliability of the list
'bot.  Without wishing to tread on anyone's toes, it's undeniable that
we have a long history of slow and unreliable forwarding through the PG
list server.  I'd rather have contributors to a thread converse among
themselves, and let the list server catch up when it can.

Personally: if Reply-To is added to the list headers, I can and will
reprogram my mail software to ignore it.  But I doubt that most
contributors to the lists have that option.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] bug fix request

2004-11-28 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> Presently you get a million lines of '\N command not recognised' and 
> various other random things because if a line of the copy fails due to 
> say a FK constraint, or even if the COPY is run in an aborted 
> transaction, it tries to execute all the stdin data as actual 
> statements.
>> 
>> I'd like to see a test case for this in v3 protocol.  It was definitely
>> a problem before that, but I thought I fixed it.

> This is with 7.4 and it still does it  Here is an example:

Hmm.  This error is not coming from "a line of the copy", it is occurring
because the COPY command itself fails, and so the server never tells
psql to shift into COPY mode.  I'm not sure that a reasonable fix for
this is possible.  As a counterexample, if you misspelled COPY as COPZ,
would you expect the software to decide that following lines up to
\. should be ignored?  If you manually misentered a COPY command and got
an error, would you be surprised to have psql ignore everything you
typed until you typed \. ?  (I can bet we'd get bug reports about that.)

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] bug fix request

2004-11-28 Thread Christopher Kings-Lynne
Hmm.  This error is not coming from "a line of the copy", it is occurring
because the COPY command itself fails, and so the server never tells
psql to shift into COPY mode.  I'm not sure that a reasonable fix for
this is possible.  As a counterexample, if you misspelled COPY as COPZ,
would you expect the software to decide that following lines up to
\. should be ignored?  If you manually misentered a COPY command and got
an error, would you be surprised to have psql ignore everything you
typed until you typed \. ?  (I can bet we'd get bug reports about that.)
Hmmm...doesn't stop it being annoying, however.
I presumed I was replicating the same problem I get when running SQL 
scripts that insert a few million rows.  Basically I start it running, 
then maybe some command before the COPY fails, then it gets to the COPY 
anyway and start barfing millions of lines.  Then I have to change my 
terminal settings to record heaps of lines and then try to ctrl-C the 
query before it scrolls too far off, just to find out the line that 
caused the error.

I guess the kind of difference in this case to me is that it's in a 
transaction, and the only "error" that the COPY command causes is that 
it's runnning in a rolled-back transaction.

Low tech I know...but it's kind of annoying :)
Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html