Re: [HACKERS] buildenv.pl/buildenv.bat

2007-12-11 Thread Magnus Hagander
On Sun, Dec 09, 2007 at 02:40:37PM -0500, Andrew Dunstan wrote:
 
 
 Magnus Hagander wrote:
 
 You seem to have misunderstood what I am suggesting. Of course we should 
 document use of buildenv.pl in addition to the hacky fix to the .bat 
 files. The hack is the part that would be invisible. The docs would be 
 visible and contain what would be our ongoing practice.
 
 
 Correct, I was misunderstanding it :-)
 
 I still can't say I like that hack though. I'd rather document that you
 have to do it in the .bat file for docs + gui build. 
 
 But it's better than what I thought you were proposing :-)
 
 
   
 
 Well, I honestly think we can live with it for one cycle. As soon as 8.4 
 opens I'll get to work converting these .bat files to pure one line 
 wrappers.

Ok. We obviously don't agree on what to do here, so let's open it up for 
somebody
else to comment on what they think is best.

My take is document the fact that you have to do it twice. Andrews is this
patch.

 Meanwhile, here's the proposed patch. If you want something else you'll 
 have to do it.
 
 I don't know how many people regularly build on Windows other than you, 
 me and Dave.

I know a couple of others. But - it was Dave that got bitten by this thing
first, which is when I noticed it and brought it up :-P

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Problem of a server gettext message.

2007-12-11 Thread Zeugswetter Andreas ADI SD

  GetText is conversion po(EUC_JP) to SJIS.

Yes.

 Are you sure about that?  Why would gettext be converting to SJIS,
when
 SJIS is nowhere in the environment it can see?

gettext is using GetACP () on Windows, wherever that gets it's info from
...
chcp did change the GetACP codepage in Hiroshi's example, but chcp
does not reflect in LC_*

Seems we may want to use bind_textdomain_codeset.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Stefan Kaltenbrunner

Michael Akinde wrote:

Hi,

I am encountering problems when trying to run VACUUM FULL ANALYZE on a 
particular table in my database; namely that the process crashes out 
with the following problem:


INFO:  vacuuming pg_catalog.pg_largeobject
ERROR:  out of memory
DETAIL:  Failed on request of size 536870912.

INFO:  vacuuming pg_catalog.pg_largeobject
ERROR:  out of memory
DETAIL:  Failed on request of size 32.

Granted, our largeobject table is a bit large:

INFO:  analyzing pg_catalog.pg_largeobject
INFO:  pg_largeobject: scanned 3000 of 116049431 pages, containing 
18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 
estimated total rows


...but I trust that VACUUM ANALYZE doesn't try to read the entire table 
into memory at once. :-) The machine was set up with 1.2 GB shared 
memory and 1 GB maintenance memory, so I would have expected this to be 
sufficient for the task (we will eventually set this up oa 64-bit 
machine with 16 GB memory, but at the moment we are restricted to 32 bit).


This is currently running on PostgreSQL 8.3beta2, but since I haven't 
seen this problem reported before, I guess this will also be a problem 
in earlier versions. Have we run into a bug/limitation of the Postgres 
VACUUM or is this something we might be able to solve via reconfiguring 
the server/database, or downgrading the DBMS version.


this seems simply a problem of setting maintenance_work_mem too high (ie 
higher than what your OS can support - maybe an ulimit/processlimit is 
in effect?) . Try reducing maintenance_work_mem to say 128MB and retry.
If you promise postgresql that it can get 1GB it will happily try to use 
it ...



Stefan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 10:59 +0100, Michael Akinde wrote:

 I am encountering problems when trying to run VACUUM FULL ANALYZE on a 
 particular table in my database; namely that the process crashes out 
 with the following problem:

Probably just as well, since a VACUUM FULL on an 800GB table is going to
take a rather long time, so you are saved from discovering just how
excessively long it will run for. But it seems like a bug. This happens
consistently, I take it?

Can you run ANALYZE and then VACUUM VERBOSE, both on just
pg_largeobject, please? It will be useful to know whether they succeed.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Michael Akinde

Hi,

I am encountering problems when trying to run VACUUM FULL ANALYZE on a 
particular table in my database; namely that the process crashes out 
with the following problem:


INFO:  vacuuming pg_catalog.pg_largeobject
ERROR:  out of memory
DETAIL:  Failed on request of size 536870912.

INFO:  vacuuming pg_catalog.pg_largeobject
ERROR:  out of memory
DETAIL:  Failed on request of size 32.

Granted, our largeobject table is a bit large:

INFO:  analyzing pg_catalog.pg_largeobject
INFO:  pg_largeobject: scanned 3000 of 116049431 pages, containing 
18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 
estimated total rows


...but I trust that VACUUM ANALYZE doesn't try to read the entire table 
into memory at once. :-) The machine was set up with 1.2 GB shared 
memory and 1 GB maintenance memory, so I would have expected this to be 
sufficient for the task (we will eventually set this up oa 64-bit 
machine with 16 GB memory, but at the moment we are restricted to 32 bit).


This is currently running on PostgreSQL 8.3beta2, but since I haven't 
seen this problem reported before, I guess this will also be a problem 
in earlier versions. Have we run into a bug/limitation of the Postgres 
VACUUM or is this something we might be able to solve via reconfiguring 
the server/database, or downgrading the DBMS version.


I shall be trying to run a simple VACUUM later this evening, in order to 
see whether that manages to complete. Unfortunately, due to the time it 
takes to load data, it's not really practicable to shift servers at the 
moment


A little background on the application:
We are building a raster-database to be used for storing weather and 
water data. The raster data (2D matrices of floating points) are stored 
using large objects and indexed using a values table (with multiple 
dimensions: time, parameter, altitudes, etc). This is a technique I've 
worked with successfully in the past, though in that case using an 
Informix DBMS. My current employer is a strong proponent for Open 
Software, which has led to our implementation of the current system on a 
PostgreSQL DBMS (we will also be releasing our system as GPL in the near 
future).


The test instance we are working on now is about 1 TB; we expect to 
increase that by a factor of at least 5 within the first year of 
operation, so we'd really like to ensure that we can get VACUUM working 
(although the data is mostly going to be static on this installation, we 
will have others that won't be).


Anyone with some insights on VACUUM FULL ANALYZE who can weigh in on 
what is going wrong?


Regards,

Michael Akinde

Database Architect,
met.no

begin:vcard
fn:Michael Akinde
n:Akinde;Michael
org:Meteorologisk Institutt, Norge;IT
adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge
email;internet:[EMAIL PROTECTED]
tel;work:22963379
tel;cell:45885379
x-mozilla-html:FALSE
url:http://www.met.no
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
I'm starting work on next projects for 8.4.

Many applications have the need to store very large data volumes for
both archival and analysis. The analytic databases are commonly known as
Data Warehouses, though there isn't a common term for large archival
data stores. The use cases for those can often be blurred and many
people see those as only one use case. My initial interest is in the
large archival data stores.

One of the main issues to be faced is simply data maintenance and
management. Loading, deleting, vacuuming data all takes time. Those
issues relate mainly to the size of the data store rather than any
particular workload, so I'm calling that set of required features Very
Large Database (or VLDB) features.

VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features

Details of those will be covered in separate mails over next few weeks
and months. So just to let everybody know that's where I'm headed, so
you see the big picture with me.

I'll be working on other projects as well, many of which I've listed
here: http://developer.postgresql.org/index.php/Simon_Riggs%
27_Development_Projects  I expect the list is too long to complete for
8.4, but I'm allowing for various issues arising during development.

So specific discussion on other mails as they arrive, please.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs

Many applications have the need to archive data after it has been
through the initial flurry of reads and updates that follows its
original insertion. Currently there is no specific feature support to
meet this requirement, so I propose to add this for 8.4.

Use Case: VLDB with tons of (now) read only data, some not. Data needs
to be accessible, but data itself is rarely touched, allowing storage
costs to be minimised via a storage hierarchy of progressively cheaper
storage.

Features
- Read Only Tables
- Compressed Tablespaces
- Attaching table
- Per-Tablespace Access Costing
- Performance Tuning


Read-Only Tables

Postgres supports the concept of freezing tuples, so they can live
forever within the database without needing further writes. Currently
there is no command that will guarantee that a table has been completely
frozen. This makes it difficult to reliably write data files to WORM
media for longer term archiving. (WORM means Write-Once, Read-Many).
It's also a pain having to VACUUM a large table again just because a
small number of rows need to be frozen.

So we need a DDL command that will ensure all tuples are frozen and then
mark the table as read-only. Ideally, we would like to do this in a way
that doesn't hold long full table locks, since we want the data to
remain accessible at all times.

So... VACUUM FREEZE table SET READ ONLY;

would be my first thought, but I'm guessing everybody will press me
towards supporting the more obvious

ALTER TABLE table SET READ ONLY;

This command will place a ShareLock (only) on the table, preventing
anybody from writing to the table while we freeze it. The ShareLock is
incompatible with any transaction that has written to the table, so when
we acquire the lock all writers to the table will have completed. We
then run the equivalent of a VACUUM FREEZE which will then be able to
freeze *all* rows in one pass (rather than all except the most recent).
On completion of the freeze pass we will then update the pg_class entry
to show that it is now read-only, so we will emulate the way VACUUM does
this.

This form of the ALTER TABLE command will need to be mangled so it can
only run outside of a transaction block and also so it takes only a
ShareLock rather than an AccessExclusiveLock.

Reversing the process is simpler, since we only have to turn off the
flag in pg_class:

ALTER TABLE table SET READ WRITE;

possibly able to do this without grabbing an AccessExclusiveLock, though
that isn't an important part of this implementation.

Read-only tables never need VACUUMing, so we would be able to make
autovacuum and explicit vacuum ignore them.

Read-only tables may not be written to, yet would still allow implicit
or explicit INSERT, UPDATE and DELETE privileges to be held on the
table. Attempts to write to the table will result in a specific read
only table cannot be modified ERROR. This allows a table to be placed
into read-only mode for long periods of time and flipped back to
read-write if some updates are required. That is useful for various
performance reasons, see later. We can't use the privilege mechanism to
prevent writes since superusers bypass them. (Thoughts?)

Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only
tables will be ignored, since they are effectively already there. So we
don't need to change the internals of the locking, nor edit the RI code
to remove the call to SHARE lock referenced tables. Do this during
post-parse analysis.

Tables can be copied to WORM media by using

ALTER TABLE table SET TABLESPACE tblspc;

This would also use a ShareLock rather than an AccessExclusiveLock,
piggy-backing off the work mentioned above.

Running SET TABLESPACE and SET READ ONLY at the same time might sound
like a good plan, but ISTM will require two fairly different code paths,
so if we do it at all it will be a later addition.

Compressed Tablespaces
--

Frequently with large data archives there is a requirement to reduce the
footprint of the data to allow longer term storage costs to be reduced.

For Insert-only data we might imagine we can reduce the size of tables
by removing unused tuple header information. Although that is possible,
repeated headers compress fairly well, so it seems easier to tackle the
problem directly by having compressed tables.

Using a streaming library like zlib, it will be easy to read/write data
files into a still-usable form but with much reduced size. Access to a
compressed table only makes sense as a SeqScan. That would be handled by
introducing tablespace-specific access costs, discussed below. Indexes
on compressed tables would still be allowed, but would hardly ever be
used.

Access would probably be via tablespace-specific storage managers. So
implement mdcompress.c alongside md.c in src/backend/storage/smgr. If
that implementation route was chosen, it would then allow the
compression option to be made at tablespace level, so commands would be:

CREATE 

Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Michael Akinde

Thanks for the rapid responses.

Stefan Kaltenbrunner wrote:
this seems simply a problem of setting maintenance_work_mem too high 
(ie higher than what your OS can support - maybe an 
ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem 
to say 128MB and retry.
If you promise postgresql that it can get 1GB it will happily try to 
use it ...
I set up the system together with one of our Linux sysOps, so I think 
the settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get 
him to recheck if there could be any other limits he has forgotten to 
increase.


The way the process was running, it seems to have basically just 
continually allocated memory until (presumably) it broke through the  
slightly less than 1.2 GB shared memory allocation we had provided for 
PostgreSQL (at least the postgres process was still running by the time 
resident size had reached 1.1 GB).


Incidentally, in the first error of the two I posted, the shared memory 
setting was significantly lower (24 MB, I believe). I'll try with 128 MB 
before I leave in the evening, though (assuming the other tests I'm 
running complete by then).


Simon Riggs wrote:

On Tue, 2007-12-11 at 10:59 +0100, Michael Akinde wrote:

  
I am encountering problems when trying to run VACUUM FULL ANALYZE on a 
particular table in my database; namely that the process crashes out 
with the following problem:



Probably just as well, since a VACUUM FULL on an 800GB table is going to
take a rather long time, so you are saved from discovering just how
excessively long it will run for. But it seems like a bug. This happens
consistently, I take it?
  
I suspect so, though it has only happened a couple of times yet (as it 
does take a while) before it hits that 1.1 GB roof. But part of the 
reason for running the VACUUM FULL was of course to find out how long 
time it would take. Reliability is always a priority for us,
so I like to know what (useful) tools we have available and stress the 
system as much as possible...  :-)

Can you run ANALYZE and then VACUUM VERBOSE, both on just
pg_largeobject, please? It will be useful to know whether they succeed.
  
I ran just ANALYZE on the entire database yesterday, and that worked 
without any problems.


I am currently running a VACUUM VERBOSE on the database. It isn't done 
yet, but it is running with a steady (low) resource usage.


Regards,

Michael A.

begin:vcard
fn:Michael Akinde
n:Akinde;Michael
org:Meteorologisk Institutt, Norge;IT
adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge
email;internet:[EMAIL PROTECTED]
tel;work:22963379
tel;cell:45885379
x-mozilla-html:FALSE
url:http://www.met.no
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Stefan Kaltenbrunner

Michael Akinde wrote:

Thanks for the rapid responses.

Stefan Kaltenbrunner wrote:
this seems simply a problem of setting maintenance_work_mem too high 
(ie higher than what your OS can support - maybe an 
ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem 
to say 128MB and retry.
If you promise postgresql that it can get 1GB it will happily try to 
use it ...
I set up the system together with one of our Linux sysOps, so I think 
the settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get 
him to recheck if there could be any other limits he has forgotten to 
increase.


The way the process was running, it seems to have basically just 
continually allocated memory until (presumably) it broke through the  
slightly less than 1.2 GB shared memory allocation we had provided for 
PostgreSQL (at least the postgres process was still running by the time 
resident size had reached 1.1 GB).


Incidentally, in the first error of the two I posted, the shared memory 
setting was significantly lower (24 MB, I believe). I'll try with 128 MB 
before I leave in the evening, though (assuming the other tests I'm 
running complete by then).


this is most likely not at all related to your shared memory settings 
but to your setting of maintenance_work_mem which is the amount of 
memory a single backend(!) can use for maintainance operations (which 
VACUUM is for example).
notice that your first error refers to an allocation of about 500MB 
which your ulimit/kernel process limit simply might not be able to give 
a single process.
And for very large tables VACUUM FULL is generally not a good idea at 
all - either look into regular normal vacuum scheduling or if you need 
to recover from a a bloated database use a command that forced a rewrite 
of the table (like CLUSTER) which will be heaps faster but also require 
about twice the amount of diskspace.



Stefan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 So... VACUUM FREEZE table SET READ ONLY;

 would be my first thought, but I'm guessing everybody will press me
 towards supporting the more obvious

 ALTER TABLE table SET READ ONLY;

 This command will place a ShareLock (only) on the table, preventing
 anybody from writing to the table while we freeze it. The ShareLock is
 incompatible with any transaction that has written to the table, so when
 we acquire the lock all writers to the table will have completed. We
 then run the equivalent of a VACUUM FREEZE which will then be able to
 freeze *all* rows in one pass (rather than all except the most recent).
 On completion of the freeze pass we will then update the pg_class entry
 to show that it is now read-only, so we will emulate the way VACUUM does
 this.

To be clear it if it meets a block for which a tuple is not freezable -- that
is, it has an xmin or xmax more recent than the global xmin then it needs to
block waiting for the backend which that recent xmin. Then presumably it needs
to update its concept of recent global xmin going forward.

You might be best off grabbing a list of txid-xmin when you start and sorting
them by xmin so you can loop through them sleeping until you reach the first
txid with an xmin large enough to continue.

 Reversing the process is simpler, since we only have to turn off the
 flag in pg_class:

I'm not sure how this interacts with:

 Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only
 tables will be ignored, since they are effectively already there. So we
 don't need to change the internals of the locking, nor edit the RI code
 to remove the call to SHARE lock referenced tables. Do this during
 post-parse analysis.

Since queries which think they hold FOR SHARE tuple locks will be magically
losing their share locks if you turn off the read-only flag. Do you need to
obtain an exclusive lock on the table to turn it read-write?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Alvaro Herrera
Michael Akinde wrote:
 Thanks for the rapid responses.

 Stefan Kaltenbrunner wrote:
 this seems simply a problem of setting maintenance_work_mem too high (ie 
 higher than what your OS can support - maybe an ulimit/processlimit is in 
 effect?) . Try reducing maintenance_work_mem to say 128MB and retry.
 If you promise postgresql that it can get 1GB it will happily try to use 
 it ...
 I set up the system together with one of our Linux sysOps, so I think the 
 settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get him to 
 recheck if there could be any other limits he has forgotten to increase.

You are confusing shared memory (shared_buffers and kernel.shmmax) with
local memory (work_mem and maintenance_work_mem).  The error you got is
about the latter kind.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
La soledad es compañía

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Martijn van Oosterhout
On Tue, Dec 11, 2007 at 12:30:43PM +0100, Michael Akinde wrote:
 The way the process was running, it seems to have basically just 
 continually allocated memory until (presumably) it broke through the  
 slightly less than 1.2 GB shared memory allocation we had provided for 
 PostgreSQL (at least the postgres process was still running by the time 
 resident size had reached 1.1 GB).

I think you're slightly confused. The VACUUM isn't going to use much of
the shared memory anyway. Shared memory is just disk buffers mostly and
is all allocated at startup. The memory being allocated by VACUUM is
the maintainence workmem *in addition* to any shared memory.

Also, depending on what's happening it may be allocating maintainence
workmem more than once.

 Incidentally, in the first error of the two I posted, the shared memory 
 setting was significantly lower (24 MB, I believe). I'll try with 128 MB 
 before I leave in the evening, though (assuming the other tests I'm 
 running complete by then).

What you want is a reasonable shared mem, maybe 0.5GB and a smaller
maintainence workmem since the letter is probably what's killing you.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [HACKERS] partitioned table query question

2007-12-11 Thread Gregory Stark

[moved from -general]

Tom Lane [EMAIL PROTECTED] writes:

 The bottom line here is that we have built a partitioning facility
 out of spare parts, ie, a very generalized contradiction-proving
 section of the planner.  

That's true but I think the conclusions you reach are a stretch. The main
problem with our existing approach is that there's no way to make the kind of
deductions we want which depend on an holistic view of all the constraints. So
you can't sort the ranges and search them using a binary scan, or join
matching partitions from separate tables before appending them, or any number
of similar ideas.

But I don't see that handling hash or bin partitioning is really going to be
so hard with our current scheme. I think we need to have some form of
understanding of which functions preserve which btree ordering information in
order to make better use of expression indexes anyways.

 Eventually we'll have to push an understanding of partitioning down to
 some lower level of the system --- that is, if we think it's critical
 enough to justify that much effort.

If by a lower level of the system you mean the storage manager or anything
like that then I definitely don't agree. If on the other hand you just mean a
simpler more regular structure than turing-complete constraints then I would
agree, but for the reasons above -- not for the problems with hash/bin
partitioning and equality semantics.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote:
 Features
 - Read Only Tables
 - Compressed Tablespaces

I wonder if instead of read-only tables wouldn't it be better to have
some kind of automatic partitioning which permits to have different
chunks of the table data in different tablespaces, and a freeze command
which effectively moves the data from the (normally small) active chunk
to the archive chunk when it's transaction id is older than a predefined
threshold ?

Then put the active chunk on a high performance file system and the
archive tablespace on a compressed/slow/cheap file system and you're
done. Allow even the archive chunk to be updateable, and put new tuple
data in the active chunk. It would work just fine for cases where the
old data is rarely updated/deleted...

Another advantage I guess would be that active data would more likely
stay in cache, as updated records would stay together and not spread
over the inactive.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote:
 Another advantage I guess would be that active data would more likely
 stay in cache, as updated records would stay together and not spread
 over the inactive.

And I forgot to mention that vacuum could mostly skip the archive part,
and only vacuum the active part, which would drastically reduce the cost
of vacuuming big  active tables.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Hannu Krosing

Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy:
 On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote:
  Features
  - Read Only Tables
  - Compressed Tablespaces
 
 I wonder if instead of read-only tables wouldn't it be better to have
 some kind of automatic partitioning which permits to have different
 chunks of the table data in different tablespaces, and a freeze command
 which effectively moves the data from the (normally small) active chunk
 to the archive chunk when it's transaction id is older than a predefined
 threshold ?

This would be doable using Simons proposed commands.

 Then put the active chunk on a high performance file system and the
 archive tablespace on a compressed/slow/cheap file system and you're
 done. Allow even the archive chunk to be updateable, and put new tuple
 data in the active chunk. It would work just fine for cases where the
 old data is rarely updated/deleted...

You can't update a table on a read-only (write-once) partition, at least
not with current header structure.

 Another advantage I guess would be that active data would more likely
 stay in cache, as updated records would stay together and not spread
 over the inactive.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy:
  Then put the active chunk on a high performance file system and the
  archive tablespace on a compressed/slow/cheap file system and you're
  done. Allow even the archive chunk to be updateable, and put new tuple
  data in the active chunk. It would work just fine for cases where the
  old data is rarely updated/deleted...
 
 You can't update a table on a read-only (write-once) partition, at least
 not with current header structure.

OK, but that's what I'm challenging, why do you need a write once
partition ? You mean by that tapes ? OK, it means I was thinking in
completely different usage scenarios then...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] quotas once again

2007-12-11 Thread Zdenek Kotala

Gevik Babakhani wrote:

After reading the thread of 2004 regarding user quotas, I understand
why the discussion moved towards having a tablespace quota as a
solution.

My reason to start this discussion was due the need of controlling
database size. Having tablespace quotas could allow one to create a 
database in a given tablespace and then limit the size of the tablespace.


You can control tablespace size by filesystem quotas. When you put each 
tablespace on separate FS. Hovewer, disadvantage is that you need admin 
access to the machine and it is not controlled by postgres.  ZFS works 
fine in this case, because you can delegate volume/filesystem 
administration to the any user and FS creation is very easy.


Zdenek

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Re: Document how to turn off disk write cache on popular operating

2007-12-11 Thread Bruce Momjian
Magnus Hagander wrote:
 On Mon, Dec 10, 2007 at 02:05:05PM +, Bruce Momjian wrote:
  Log Message:
  ---
  Document how to turn off disk write cache on popular operating systems.
  
  Modified Files:
  --
  pgsql/doc/src/sgml:
  wal.sgml (r1.46 - r1.47)
  
  (http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/wal.sgml?r1=1.46r2=1.47)
 
 Should this mention that you don't need to turn it off at the disk level if
 you use fsync_writethrough? 

Uh, I remember we looked at this checkbox before but I don't remember
the details, and I can't find a comment about it.  Was the issue that
writethrough always forces through the disk cache?  Is that the default
on Win32?  Did we comment this somewhere?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Trevor Talbot
On 12/11/07, Simon Riggs [EMAIL PROTECTED] wrote:

 Compressed Tablespaces

 Using a streaming library like zlib, it will be easy to read/write data
 files into a still-usable form but with much reduced size. Access to a
 compressed table only makes sense as a SeqScan. That would be handled by
 introducing tablespace-specific access costs, discussed below. Indexes
 on compressed tables would still be allowed, but would hardly ever be
 used.

I've actually been wanting this lately, for a couple reasons. One is
reduced disk footprint, but the other is reduced I/O, similar to how
TOAST helps with large fields now. (In my particular scenario, TOAST
can't help due to small field sizes.) It would be useful to have
available even on read/write data. To that end, it would probably make
more sense to use a block compression algorithm rather than a
streaming one. Block-based algorithms can generally get better
compression than streaming ones as well, at least when fed large
enough blocks.

I'm not familiar with the implementation issues, other than the
obvious variable block sizes make the I/O subsystem look very
different, so I don't know if there's a major tradeoff between the
two strategies (even just for read-only).

 I'm open to arguments that we don't need this at all because filesystem
 utilities exist that do everything we need. You're experience will be
 good to hear about in regard to this feature.

Some filesystems do support transparent compression, but they're not
always available. It would be nice to have compression on
unsophisticated systems with cheap hardware.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Peter Childs
On 11/12/2007, Csaba Nagy [EMAIL PROTECTED] wrote:

 On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote:
  Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy:
   Then put the active chunk on a high performance file system and the
   archive tablespace on a compressed/slow/cheap file system and you're
   done. Allow even the archive chunk to be updateable, and put new tuple
   data in the active chunk. It would work just fine for cases where the
   old data is rarely updated/deleted...
 
  You can't update a table on a read-only (write-once) partition, at least
  not with current header structure.

 OK, but that's what I'm challenging, why do you need a write once
 partition ? You mean by that tapes ? OK, it means I was thinking in
 completely different usage scenarios then...

 Cheers,
 Csaba.



I think DVD or CD would make sence, Tapes have an added limitation of being
sequential access only.

Peter Childs


Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs

2007-12-11 Thread Alvaro Herrera
Hmm, I'm now wondering if the log line number is correctly positioned.
Right now we have it just after the PID.  So it suggests that following
PID and log line number is enough for tracking what a session does.
While this is not entirely incorrect, ISTM to be more logical to put it
closer to the session ID, and change the name so it is less misleading
in that sense.

Currently we have

session_id | 475e91da.291f
connection_from| [local]
process_id | 10527
process_line_num   | 3

I propose we change it to

process_id | 10527
connection_from| [local]
session_id | 475e91da.291f
session_line_num   | 3

Note changed column name.  I also suggest we change the description of
%l in log_line_prefix to

%   Number of the log line for each session, starting at 1
original is
Number of the log line for each process, starting at 1

Thoughts?

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
No necesitamos banderas
 No reconocemos fronteras  (Jorge González)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Michael Akinde

Stefan Kaltenbrunner wrote:

Michael Akinde wrote:
Incidentally, in the first error of the two I posted, the shared 
memory setting was significantly lower (24 MB, I believe). I'll try 
with 128 MB before I leave in the evening, though (assuming the other 
tests I'm running complete by then).


this is most likely not at all related to your shared memory settings 
but to your setting of maintenance_work_mem which is the amount of 
memory a single backend(!) can use for maintainance operations (which 
VACUUM is for example).
notice that your first error refers to an allocation of about 500MB 
which your ulimit/kernel process limit simply might not be able to 
give a single process.


Yes - in the first case, the maintenance_work_mem was at default (so I 
wasn't surprised to see it fail to allocate half a gigabyte). In the 
second case, though, maintenance_work_mem was set at 1024 MB (where it 
then has the slighly odd error Failed on request of size 32).


The server has 4 GB RAM available, so even if it was trying to use 1.2 
GB shared memory + 1 GB for maintenance_mem all at once, it still seems  
odd that the process would fail. As far as I can tell (running ulimit -a 
),  the limits look pretty OK to me.


core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
max nice(-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) unlimited
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Being unable to run VACUUM FULL isn't a problem for the current 
configuration of our application (as it will mostly be large amounts of 
static data), but we're likely to have an application working with the 
database next year where we'd move around 100 GB through the database on 
a daily basis. At least based on the documentation of the various 
commands, I would expect that one would want to perform VACUUM FULL 
every once in a while.


Again, thanks for the feedback.

Regards,

Michael Akinde
Database Architect, met.no

begin:vcard
fn:Michael Akinde
n:Akinde;Michael
org:Meteorologisk Institutt, Norge;IT
adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge
email;internet:[EMAIL PROTECTED]
tel;work:22963379
tel;cell:45885379
x-mozilla-html:FALSE
url:http://www.met.no
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Peter Childs
On 11/12/2007, Simon Riggs [EMAIL PROTECTED] wrote:

 Attach
 --

 Writing tables on one system and then moving that data to other systems
 is fairly common. If we supported read-only tables then you might
 consider how you would publish new versions to people.

 For now, we just want to consider how we will upgrade from one release
 to another without needing to unload and reload potentially many
 Terabytes of data. We can't delete the old data until the new data is
 successfully loaded, so we will have a huge temporary storage cost. This
 could go very slowly if we use cheaper storage, plus reloading the data
 means we have to re-freeze it again also.

 So we need a way of attaching the old tables to the new database. We
 might call this binary upgrade, or we might be slightly less ambitious
 and talk about just moving the old read-only data. That's all I want to
 do at this stage.

 I'm mentioning this here now to see what comes out in debate, and what
 others are planning to work on in this area.


This sounds like allowing new versions to read old versions file structure,
Probably I guess on a tablespace by table space basis

Another advantage might be to be able to load an old pitr backup in a new
version across major versions. But I'm not sure that would work.

Maybe we need a command to upgrade a tablespace to a new versions file
format?

Need to be careful we don't stunt future progress by fixing file format
much.

Peter Childs


Re: [HACKERS] There's random access and then there's random access

2007-12-11 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 I think this will be easiest to do for bitmap index scans. Since we gather up
 all the pages we'll need before starting the heap scan we can easily skim
 through them, issue posix_fadvises for at least a certain number ahead of the
 actual read point and then proceed with the rest of the scan unchanged. 

I've written up a simple test implementation of prefetching using
posix_fadvise(). Here are some nice results on a query accessing 1,000 records
from a 10G table with 300 million records:


postgres=# set preread_pages=0;   explain analyze select (select count(*) from 
h where h = any (x)) from (select random_array(1000,1,3) as x)x;
SET
 QUERY PLAN 


 Subquery Scan x  (cost=0.00..115.69 rows=1 width=32) (actual 
time=6069.505..6069.509 rows=1 loops=1)
   -  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.061 
rows=1 loops=1)
   SubPlan
 -  Aggregate  (cost=115.66..115.67 rows=1 width=0) (actual 
time=6069.425..6069.426 rows=1 loops=1)
   -  Bitmap Heap Scan on h  (cost=75.49..115.63 rows=10 width=0) 
(actual time=3543.107..6068.335 rows=1000 loops=1)
 Recheck Cond: (h = ANY ($0))
 -  Bitmap Index Scan on hi  (cost=0.00..75.49 rows=10 
width=0) (actual time=3542.220..3542.220 rows=1000 loops=1)
   Index Cond: (h = ANY ($0))
 Total runtime: 6069.632 ms
(9 rows)


postgres=# set preread_pages=300;   explain analyze select (select count(*) 
from h where h = any (x)) from (select random_array(1000,1,3) as x)x;
SET
 QUERY PLAN 


 Subquery Scan x  (cost=0.00..115.69 rows=1 width=32) (actual 
time=3945.602..3945.607 rows=1 loops=1)
   -  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.060..0.064 
rows=1 loops=1)
   SubPlan
 -  Aggregate  (cost=115.66..115.67 rows=1 width=0) (actual 
time=3945.520..3945.521 rows=1 loops=1)
   -  Bitmap Heap Scan on h  (cost=75.49..115.63 rows=10 width=0) 
(actual time=3505.546..3944.817 rows=1000 loops=1)
 Recheck Cond: (h = ANY ($0))
 -  Bitmap Index Scan on hi  (cost=0.00..75.49 rows=10 
width=0) (actual time=3452.759..3452.759 rows=1000 loops=1)
   Index Cond: (h = ANY ($0))
 Total runtime: 3945.730 ms
(9 rows)


Note that while the query itself is only 50% faster the bitmap heap scan
specifically is actually 575% faster than without readahead.

It would be nice to optimize the bitmap index scan as well but that will be a
bit trickier and it probably won't be able to cover as many pages. As a result
it probably won't be a 5x speedup like the heap scan.

Also, this is with a fairly aggressive readahead which only makes sense for
queries that look a lot like this and will read all the tuples. For a more
general solution I think it would make sense to water down the performance a
bit in exchange for some protection against doing unnecessary I/O in cases
where the query isn't actually going to read all the tuples.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Martijn van Oosterhout
On Tue, Dec 11, 2007 at 03:18:54PM +0100, Michael Akinde wrote:
 The server has 4 GB RAM available, so even if it was trying to use 1.2 
 GB shared memory + 1 GB for maintenance_mem all at once, it still seems  
 odd that the process would fail. As far as I can tell (running ulimit -a 
 ),  the limits look pretty OK to me.

IIRC you said you're on a 32-bit architecture? Which means any single
process only has 4GB address space. Take off 1GB for the kernel, 1GB
shared memory, 1 GB maintainence workmem and a collection of libraries,
stack space and general memory fragmentation and I can absolutly
beleive you've run into the limit of *address* space.

On a 64-bit machine it doesn't matter so much but on a 32-bit machine
using 1GB for shared memory severely cuts the amount of auxilliary
memory the server can use. Unless you've shown a measuable difference
between 256MB and 1G shared memory, I'd say you're better off using the
smaller amount so you can have higher maintainence work mem.

VACUUM doesn't benefit much from lots of shared buffers, but it does
benefit from maint workmem.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [HACKERS] Problem with ControlFileData structure being ABI depe ndent

2007-12-11 Thread Magnus Hagander
On Mon, Dec 10, 2007 at 09:56:39AM +, Dave Page wrote:
 Dave Page wrote:
  Tom Lane wrote:
  Dave Page [EMAIL PROTECTED] writes:
  Gregory Stark wrote:
  An alternative is leaving it in the project file but putting
  something like
  this in c.h:
 
  Put it in win32.h, please.  c.h shouldn't get cluttered with
  platform-specific kluges when there's no need for it.
 
  Is there a good reason not to just #define _USE_32BIT_TIME_T in win32.h?
  
  Yeah, the fact that addons may then end up partially compiled with and
  partially without it being defined. It we just have it error as Greg
  suggested, then it will force the authors to define it themselves, and
  if they get that wrong it's their fault not ours.
 
 Patch attached.

Applide with two tiny modifications - the missing quote taht you mentioned,
and changed the #ifdef to only affect MSVC and not mingw.

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Document how to turn off disk write cache on popular operating

2007-12-11 Thread Magnus Hagander
On Tue, Dec 11, 2007 at 08:18:42AM -0500, Bruce Momjian wrote:
 Magnus Hagander wrote:
  On Mon, Dec 10, 2007 at 02:05:05PM +, Bruce Momjian wrote:
   Log Message:
   ---
   Document how to turn off disk write cache on popular operating systems.
   
   Modified Files:
   --
   pgsql/doc/src/sgml:
   wal.sgml (r1.46 - r1.47)
   
   (http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/wal.sgml?r1=1.46r2=1.47)
  
  Should this mention that you don't need to turn it off at the disk level if
  you use fsync_writethrough? 
 
 Uh, I remember we looked at this checkbox before but I don't remember
 the details, and I can't find a comment about it.  Was the issue that
 writethrough always forces through the disk cache?  Is that the default
 on Win32?  Did we comment this somewhere?

If you set it to fsync or fsync_writethrough it will write through the
cache. (fsync is just an alias)
If you set it to OPEN_DATASYNC, it will respond to the checkbox you are
referring to.

OPEN_DATASYNC is the default, IIRC.

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs

2007-12-11 Thread Alvaro Herrera

Another problem I just noticed is that it seems the bgwriter is
inheriting the session id from Postmaster; it doesn't have one of its
own.

-- 
Alvaro Herrera  Developer, http://www.PostgreSQL.org/
People get annoyed when you try to debug them.  (Larry Wall)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs

2007-12-11 Thread Alvaro Herrera
Alvaro Herrera wrote:
 
 Another problem I just noticed is that it seems the bgwriter is
 inheriting the session id from Postmaster; it doesn't have one of its
 own.

Huh, sorry, I'm just blind, I neglected to look at the last digit ;-)

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 18.1, W 73º 13' 56.4
I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much work goes away in
my code.  Too bad I can't do this at work (Oracle 8/9).   (Tom Allison)
   http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] VACUUM ANALYZE out of memory

2007-12-11 Thread Michael Akinde

Martijn van Oosterhout wrote:

IIRC you said you're on a 32-bit architecture? Which means any single
process only has 4GB address space. Take off 1GB for the kernel, 1GB
shared memory, 1 GB maintainence workmem and a collection of libraries,
stack space and general memory fragmentation and I can absolutly
beleive you've run into the limit of *address* space.
  
Should have been 64-bit, but a foul-up means it is running in 32-bit at 
the moment.

On a 64-bit machine it doesn't matter so much but on a 32-bit machine
using 1GB for shared memory severely cuts the amount of auxilliary
memory the server can use. Unless you've shown a measuable difference
between 256MB and 1G shared memory, I'd say you're better off using the
smaller amount so you can have higher maintainence work mem.
  
We're still in the process of testing and tuning (which takes its sweet 
time), so at the moment I can not tell what benefits we have on the 
different settings in practice. But I'll try to set shared buffers down 
to 128-256 MB and the maintenance_work_memory to 512-1024MB when I next 
have a time slot where I can run the server into the ground.


However, the problem also occurred with the shared_buffers limit set at 
24 MB and maintenance_work_mem was at its default setting (16 MB?), so I 
would be rather surprised if the problem did not repeat itself.


Regards,

Michael Akinde
Database Architect, met.no

begin:vcard
fn:Michael Akinde
n:Akinde;Michael
org:Meteorologisk Institutt, Norge;IT
adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge
email;internet:[EMAIL PROTECTED]
tel;work:22963379
tel;cell:45885379
x-mozilla-html:FALSE
url:http://www.met.no
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Document how to turn off disk write cache on popular operating

2007-12-11 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 If you set it to fsync or fsync_writethrough it will write through the
 cache.

Really?  How much should we trust that?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Document how to turn off disk write cache on popular operating

2007-12-11 Thread Magnus Hagander
On Tue, Dec 11, 2007 at 10:09:51AM -0500, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  If you set it to fsync or fsync_writethrough it will write through the
  cache.
 
 Really?  How much should we trust that?

I'd say as much as we shuold trust that checkbox in the Windows settings
page...

I haven't come across a case yet where it doesn't work, but that doesn't
mean there isn't one. It certainly writes through the writeback cache of
shiny expensive raid controllers :-) And it does write through the IDE
drives that I've tested.

Bottom line is, I think that the cases where fsync_writethrough doesn't do
it, that checkbox isn't going to work either

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 11:49 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
 
  So... VACUUM FREEZE table SET READ ONLY;
 
  would be my first thought, but I'm guessing everybody will press me
  towards supporting the more obvious
 
  ALTER TABLE table SET READ ONLY;
 
  This command will place a ShareLock (only) on the table, preventing
  anybody from writing to the table while we freeze it. The ShareLock is
  incompatible with any transaction that has written to the table, so when
  we acquire the lock all writers to the table will have completed. We
  then run the equivalent of a VACUUM FREEZE which will then be able to
  freeze *all* rows in one pass (rather than all except the most recent).
  On completion of the freeze pass we will then update the pg_class entry
  to show that it is now read-only, so we will emulate the way VACUUM does
  this.
 
 To be clear it if it meets a block for which a tuple is not freezable -- that
 is, it has an xmin or xmax more recent than the global xmin then it needs to
 block waiting for the backend which that recent xmin. Then presumably it needs
 to update its concept of recent global xmin going forward.
 
 You might be best off grabbing a list of txid-xmin when you start and sorting
 them by xmin so you can loop through them sleeping until you reach the first
 txid with an xmin large enough to continue.

D'oh. Completely agreed. Mia culpa.

I had that bit in my original design, but I was looking elsewhere on
this clearly. I'd been trying to think about how to do this since about
2 years ago and it was only the CREATE INDEX CONCURRENTLY stuff that
showed me how. Thanks for nudging me.

  Reversing the process is simpler, since we only have to turn off the
  flag in pg_class:
 
 I'm not sure how this interacts with:
 
  Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only
  tables will be ignored, since they are effectively already there. So we
  don't need to change the internals of the locking, nor edit the RI code
  to remove the call to SHARE lock referenced tables. Do this during
  post-parse analysis.
 
 Since queries which think they hold FOR SHARE tuple locks will be magically
 losing their share locks if you turn off the read-only flag. Do you need to
 obtain an exclusive lock on the table to turn it read-write?

Agreed. I wasn't suggesting implementing without, just noting that it
might have been possible, but it seems not as you say. I don't think its
important to be able to do that with less than AccessExclusiveLock.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 This command will place a ShareLock (only) on the table, preventing
 anybody from writing to the table while we freeze it. The ShareLock is
 incompatible with any transaction that has written to the table, so when
 we acquire the lock all writers to the table will have completed. We
 then run the equivalent of a VACUUM FREEZE which will then be able to
 freeze *all* rows in one pass (rather than all except the most
 recent).

This breaks MVCC.  The fact that a transaction has completed is not
license to discard tuple xmin immediately.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs

2007-12-11 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Another change I did was to change a %.*s to %*s.  The precision
  marker seems useless AFAICT.
 
 This is wrong, broken, will cause crashes on platforms where the PS
 string is not null-terminated.  (Hint: .* is a maximum width, * is a
 minimum width.)

Oh, OK.

 Another thing I thought strange was the order of the added columns;
 why isn't it more like the order in which they appear in the text
 version?  In particular hint should probably come just after detail
 and before context, and internal-query should also come before context
 because when relevant it's usually more closely nested than the context
 stack.

Ok, I changed it like you suggest.  I didn't do any other order changes.
I still propose that the log line number should be moved w.r.t. session
identifier.

I changed two more things: the VXID is not reported if not in a backend
(because AuxiliaryProcesses are said to never have one), and added
quotes surrounding the hostname, because in a test here it seems
supported to create an alias for my loopback interface with a name like
a,b.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
All rings of power are equal,
But some rings of power are more equal than others.
 (George Orwell's The Lord of the Rings)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PGparam proposal

2007-12-11 Thread Andrew Chernow
For starters, if binary results is a feature you wish you could uninvent 
then we are probably dead in the water for that reason.  This goes to 
the core of our concept.  If there is no desire to synch client  server 
in regards to type handling, than this a waste of time.  I think all of 
this would make libpq more powerful.  With that said, my follow up:


 I think a printf-style API is fundamentally a bad idea in this
 context. printf only works well when the set of concepts (datatypes,
 format specifiers, etc) is small and fixed;

As of now, including numbers and alpha chars, there are 62 classes of 
which we used 11 (a class is the first char).  That leaves 51 classes 
with 62 types per class.  Where I would agree with you, is that over 
time things can become cryptic.


There are only several ways to do this.

1. Use Oids - doesn't work because they are not publically exposed by libpq.

2. Create a function for each type.  This idea was rejected because it 
bloated the API (I'll buy this).


3. Type aliasing schema - naturally, printf-style comes to mind but may 
become cryptic.


I should mention that we were only trying to support built-in pgtypes in 
libpq.  If you are looking to support all external types, then we 
propose the below:


For #3, maybe it would be better to abandon %c or %cc type encoding and 
move into something more verbose.  We could just spell out the type: 
%int4, %point, etc...  Maybe for built-in types you could prepend 'pg': 
%pgint4, %pgpoint. This opens up the namespace and removes scalability 
and cryptic issues.


Expanding on %pgint4 idea, 3rd party types can supply their own %typname 
handlers (a more moduler approach).  You can install them at runtime, 
PQinstallTypeHandler(typname, etc..), or something like that.  When a 
3rd party %typname is encountered, the appropriate handler would be 
used.  Standard pgtypes would be installed by default.


PQinstallTypeHandler(... gisbox2d ...);
PQputf(... %gisbox2d %pgpolygon %pgint4 ...);
//PQgetf would use the same %typname

The only thing libpq should support by default, is the built-in pgtypes. 
 A handler can expand on this.


 I find the idea of embedding state like that into the PGconn to be
 pretty horrid, as well.  It makes the design non-reentrant

 You can't just randomly change the behavior of existing API functions.

Okay.  We initially had the PGparam as a public opaque, but changed it. 
 We will stop piggy backing off the existing parameterized functions. 
Instead, we will supply a PGparam exec/send functions.


typedef struct pg_param PGparam;//opaque

param = PQparamCreate(conn);
PQputf(param, %pgint4 %pgtimestamptz, 62, tstz);
res = PQparamExec(conn, param, command, resfmt);
//PQparamExec will always PQparamClear(param), whether it failed or not
//That means after an exec/send, the param object is ready for puts
PQparamFinish(param); // free it

// This causes the below sequence of function calls:
// PQparamCreate, PQputf, PQexecParams(... VALUES ($1)), PQparamFinish
res = PQparamExecf(conn, resfmt, INSERT INTO t VALUES (%pgint), 62);

 * the 8.2-to-8.3 change in the width of type money

We have code comments throughout the patches, as well as documented in 
the release notes.  At this point, we solved getf by checking 
PGgetlength.  If its 4, read4 otherwise read8.  For putf, we would have 
to check the server version.


 * the likely future change to type timestamptz to store original
   timezone explicitly

We would have to change how timestamptz handles the binary format from 
that version forward, looks like a switch on sversion for back+forwards 
compatibility.


 * the likely future change to type text to store encoding/collation
   info explicitly

Would the server do text conversion and then pass the converted text 
back to the client?  Or, would it choose a common encoding like UTF-8 
and return the text with encoding and let the client convert.  How does 
this affect text format?


In the end, some of these changes would change the text format right? 
That would push these changes into the API users lap, to parse and fuss 
with.  I just think it is cleaner to synch the binary and/or text 
formats with the server.  If you are looking for ways to change the 
binary/text format of types w/o having to make the most recent clients 
aware of this, then I think we have lost this battle.


Another solution is revamping utils/adt so that it is a shared API for 
client  server.  If you upgrade a client, you would automatically get 
the latest formatting functions.  Just like libpq checks protocol 
version in several places, conn-sversion would have to be checked, or 
maybe have a typefmt_api_version.


andrew  merlin


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 10:19 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  This command will place a ShareLock (only) on the table, preventing
  anybody from writing to the table while we freeze it. The ShareLock is
  incompatible with any transaction that has written to the table, so when
  we acquire the lock all writers to the table will have completed. We
  then run the equivalent of a VACUUM FREEZE which will then be able to
  freeze *all* rows in one pass (rather than all except the most
  recent).
 
 This breaks MVCC.  The fact that a transaction has completed is not
 license to discard tuple xmin immediately.

Yeh, agreed. I knew I'd solved that bit, so I was focused elsewhere.
Sloppy, so apologies.

I was originally planning to put a wait in at the beginning, as is used
by CREATE INDEX CONCURRENTLY, though I prefer Greg's variant because
it's more forgiving.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote:
 On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote:
  Features
  - Read Only Tables
  - Compressed Tablespaces
 
 I wonder if instead of read-only tables wouldn't it be better to have
 some kind of automatic partitioning 

That's definitely on my list of requirements for partitioning.

 which permits to have different
 chunks of the table data in different tablespaces, and a freeze command
 which effectively moves the data from the (normally small) active chunk
 to the archive chunk when it's transaction id is older than a predefined
 threshold ?

As Hannu says, this is exactly what the other features will allow, so
Yes!

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs

2007-12-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I still propose that the log line number should be moved w.r.t. session
 identifier.

No objection here.

 I changed two more things: the VXID is not reported if not in a backend
 (because AuxiliaryProcesses are said to never have one), and added
 quotes surrounding the hostname, because in a test here it seems
 supported to create an alias for my loopback interface with a name like
 a,b.

Sounds reasonable, as long as autovac processes still report VXID.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Peter Eisentraut
Am Montag, 10. Dezember 2007 schrieb Alvaro Herrera:
 Peter Eisentraut wrote:
  I figured it would make sense if pgarch.c used the same mechanism that
  postmaster.c uses to report the various variants of regular and signal
  exits.

 Hmm.  Getting rid of the (PID 0) is going to be a mess enough for
 translations that I think it is worth pgarch.c having its own routine
 for this.  Furthermore I think the detailed archive command should be
 reported in an errdetail() field, which makes it even farther off.

Better patch.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/
diff -cr ../cvs-pgsql/src/backend/postmaster/pgarch.c ./src/backend/postmaster/pgarch.c
*** ../cvs-pgsql/src/backend/postmaster/pgarch.c	2007-11-26 13:29:36.0 +0100
--- ./src/backend/postmaster/pgarch.c	2007-12-11 17:10:50.0 +0100
***
*** 474,494 
  	rc = system(xlogarchcmd);
  	if (rc != 0)
  	{
! 		/*
! 		 * If either the shell itself, or a called command, died on a signal,
! 		 * abort the archiver.	We do this because system() ignores SIGINT and
! 		 * SIGQUIT while waiting; so a signal is very likely something that
! 		 * should have interrupted us too.	If we overreact it's no big deal,
! 		 * the postmaster will just start the archiver again.
! 		 *
! 		 * Per the Single Unix Spec, shells report exit status  128 when a
! 		 * called command died on a signal.
! 		 */
! 		bool		signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc)  128;
! 
! 		ereport(signaled ? FATAL : LOG,
! (errmsg(archive command \%s\ failed: return code %d,
! 		xlogarchcmd, rc)));
  
  		return false;
  	}
--- 474,514 
  	rc = system(xlogarchcmd);
  	if (rc != 0)
  	{
! 		if (WIFEXITED(rc))
! 		{
! 			ereport(LOG,
! 	(errmsg(archive command failed with exit code %d, WEXITSTATUS(rc)),
! 	 errdetail(The archive command was \%s\., xlogarchcmd)));
! 		}
! 		else if (WIFSIGNALED(rc))
! 		{
! 			/*
! 			 * If either the shell itself, or a called command, died
! 			 * on a signal, abort the archiver. We do this because
! 			 * system() ignores SIGINT and SIGQUIT while waiting; so a
! 			 * signal is very likely something that should have
! 			 * interrupted us too. If we overreact it's no big deal,
! 			 * the postmaster will just start the archiver again.
! 			 */
! 			ereport(FATAL, (
! #if defined(WIN32)
! 		errmsg(archive command was terminated by exception 0x%X, WTERMSIG(rc)),
! 		errhint(See C include file \ntstatus.h\ for a description of the hexadecimal value.),
! #elif defined(HAVE_DECL_SYS_SIGLIST)  HAVE_DECL_SYS_SIGLIST
! 		errmsg(archive command was terminated by signal %d: %s,
! 			   WTERMSIG(rc),
! 			   WTERMSIG(rc)  NSIG ? sys_siglist[WTERMSIG(rc)] : (unknown)),
! #else
! 		errmsg(archive command was terminated by signal %d, WTERMSIG(exitstatus)),
! #endif
! 		errdetail(The archive command was \%s\., xlogarchcmd)));
! 		}
! 		else
! 		{
! 			ereport(LOG,
! 	(errmsg(archive command exited with unrecognized status %d, rc),
! 	 errdetail(The archive command was \%s\., xlogarchcmd)));
! 		}
  
  		return false;
  	}

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PGparam proposal

2007-12-11 Thread Tom Lane
Andrew Chernow [EMAIL PROTECTED] writes:
 For starters, if binary results is a feature you wish you could uninvent 
 then we are probably dead in the water for that reason.  This goes to 
 the core of our concept.

Not really: AFAICS you could implement exactly the API you have sketched
without any reliance on binary data transmission whatsoever.  As long
as PGparam is an opaque struct, library users would have no idea whether
the values they provide are being sent as text or binary.

You should probably take two steps back and think about what aspects of
what you want to do are really involved with providing an easier-to-use
API for PQexecParams and friends, and what parts are actually interested
in binary data transmission (and why).  Separating those issues in your
mind might produce more clarity.

 In the end, some of these changes would change the text format right? 

I'd consider that fairly unlikely.  For instance, the money width change
didn't impact the text format (except to the extent that longer values
are now legal), and remembering a timestamptz's zone wouldn't impact
the text representation either.  Another example is that any significant
re-implementation of type NUMERIC (say, as a bignum integer plus
exponent instead of the current BCD-ish format) would probably change
its binary representation, but there'd be no need for a text change.

The bottom line to me is that binary representations are inherently a
lot more fragile and version-dependent than text representations.
Our attitude so far has been that client-side code that wants to use
binary data transmission is taking all the risk of changes on itself.
(If it breaks, you get to keep both pieces.)  It's not clear to me
what we gain by making libpq subject to those risks.  If we could
have libpq insulate client apps from these kinds of changes, that would
be one thing; but AFAICS, with these more complex types, a binary format
change would usually also dictate a change in what the library exposes
to clients.  As far as I saw, your proposal completely glossed over the
issue of exactly what data structure would be exposed to clients for
anything more complex than an integer.  I'm afraid that that structure
would be subject to change, and then we'd just have two layers of
brokenness on our hands instead of only one.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Why my C function is called twice?

2007-12-11 Thread Billow Gao
A very simple C function which I copied from the manual.
And I found that it's called twice.

Even in the function:

if (SRF_IS_FIRSTCALL()) {
ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg(1)));
}

An example output. You will find two INFO: 1 there..

Why a function is called twice and how to prevent this problem?

Thanks

Billow

test=# select * from retcomposite(1,48);
INFO:  1
INFO:  2
INFO:  3
INFO:  1
INFO:  4
INFO:  2
INFO:  2
INFO:  5
INFO:  3
INFO:  4
INFO:  2
INFO:  5
 f1 | f2 | f3
++-
 48 | 96 | 144
(1 row)



===
Code..
===
/*
CREATE OR REPLACE FUNCTION retcomposite(integer, integer)
RETURNS SETOF __retcomposite
AS 'test.so', 'retcomposite'
LANGUAGE C IMMUTABLE STRICT;
*/

// PostgreSQL includes
#include postgres.h
#include fmgr.h

// Tuple building functions and macros
#include access/heapam.h
#include funcapi.h
#include utils/builtins.h

#include sys/socket.h
#include fcntl.h
#include errno.h
#include sys/select.h

#define _textout(str) DatumGetPointer(DirectFunctionCall1(textout,
PointerGetDatum(str)))

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif


PG_FUNCTION_INFO_V1(retcomposite);

Datum
retcomposite(PG_FUNCTION_ARGS)
{
  FuncCallContext *funcctx;
  int call_cntr;
  int max_calls;
  TupleDesc tupdesc;
  AttInMetadata *attinmeta;

  /* stuff done only on the first call of the function */
  if (SRF_IS_FIRSTCALL()) {
MemoryContext oldcontext;

ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg(1)));

/* create a function context for cross-call persistence
 */
funcctx = SRF_FIRSTCALL_INIT();

/* switch to memory context appropriate for multiple
   function calls */
oldcontext =
MemoryContextSwitchTo(funcctx-
  multi_call_memory_ctx);

/* total number of tuples to be returned */
funcctx-max_calls = PG_GETARG_UINT32(0);

/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, tupdesc) !=
TYPEFUNC_COMPOSITE)
  ereport(ERROR,
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
   errmsg
   (function returning record called in context 
that cannot accept type record)));

/* generate attribute metadata needed later to produce
   tuples from raw C strings */
attinmeta = TupleDescGetAttInMetadata(tupdesc);
funcctx-attinmeta = attinmeta;

MemoryContextSwitchTo(oldcontext);
  }

ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg(2)));


  /* stuff done on every call of the function */
  funcctx = SRF_PERCALL_SETUP();

  call_cntr = funcctx-call_cntr;
  max_calls = funcctx-max_calls;
  attinmeta = funcctx-attinmeta;

  if (call_cntr  max_calls) {  /* do when there is more
   left to send */
char **values;
HeapTuple tuple;
Datum result;
ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg(3)));

/* Prepare a values array for building the returned
   tuple. This should be an array of C strings which
   will be processed later by the type input functions. */
values = (char **) palloc(3 * sizeof(char *));
values[0] = (char *) palloc(16 * sizeof(char));
values[1] = (char *) palloc(16 * sizeof(char));
values[2] = (char *) palloc(16 * sizeof(char));

snprintf(values[0], 16, %d, 1 * PG_GETARG_INT32(1));
snprintf(values[1], 16, %d, 2 * PG_GETARG_INT32(1));
snprintf(values[2], 16, %d, 3 * PG_GETARG_INT32(1));

/* build a tuple */
tuple = BuildTupleFromCStrings(attinmeta, values);

/* make the tuple into a datum */
result = HeapTupleGetDatum(tuple);

/* clean up (this is not really necessary) */
pfree(values[0]);
pfree(values[1]);
pfree(values[2]);
pfree(values);

ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg(4)));


SRF_RETURN_NEXT(funcctx, result);
  } else {  /* do when there is no more left */
ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg(5)));

SRF_RETURN_DONE(funcctx);
  }
}


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 17:27 +0100, Peter Eisentraut wrote:
 Am Montag, 10. Dezember 2007 schrieb Alvaro Herrera:
  Peter Eisentraut wrote:
   I figured it would make sense if pgarch.c used the same mechanism that
   postmaster.c uses to report the various variants of regular and signal
   exits.
 
  Hmm.  Getting rid of the (PID 0) is going to be a mess enough for
  translations that I think it is worth pgarch.c having its own routine
  for this.  Furthermore I think the detailed archive command should be
  reported in an errdetail() field, which makes it even farther off.
 
 Better patch.

Looks much easier to understand.

I prefer archive_command = ... for the errdetail though. The commands
can be quite long so the extra words don't really add anything, plus
they require translation.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Better patch.

Doesn't this patch break the behavior that is documented in the comment?
Specifically, the case where the restore_command dies on a signal and
this is reported to us by the controlling shell as exitcode  128.
We want the archiver to die, but this patch makes it not do so.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Peter Eisentraut
Am Dienstag, 11. Dezember 2007 schrieb Tom Lane:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Better patch.

 Doesn't this patch break the behavior that is documented in the comment?
 Specifically, the case where the restore_command dies on a signal and
 this is reported to us by the controlling shell as exitcode  128.
 We want the archiver to die, but this patch makes it not do so.

AFAICT, the coding

WIFSIGNALED(rc) || WEXITSTATUS(rc)  128

is simply redundant, because a signal happened exactly when WIFSIGNALED(rc) is 
true.

I have tested this:

LOG:  database system was shut down at 2007-12-11 17:15:43 CET
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
FATAL:  archive command was terminated by signal 1: Hangup
DETAIL:  The archive command was kill -1 $$.
LOG:  archiver process (PID 22572) exited with exit code 1

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PGparam proposal

2007-12-11 Thread Andrew Chernow

library users would have no idea whether
 the values they provide are being sent as text or binary.

The putf interface currently abstracts how it actually sends it. 
Although, you do put a C type rather than a string.  There is a putstr 
%pqtypstr, which puts the string representation of a type.


 providing an easier-to-use
 API for PQexecParams and friends, and what parts are actually
 interested in binary data transmission (and why)

There are two things happening here and two things we are trying to solve:

1. putf and PGparam - simpler interface for executing queries using 
parameterized functions.


2. getf - doesn't use PGparam at all.  You pass it a PGresult.  The goal 
of this is to translate either text or binary results into a uniform C 
type or structure.  For instance, we expose the below structure for inet.


/* This struct works with CIDR as well. */
typedef struct
{
  /* convenience, value the same as first 2 bytes of sa_buf */
  unsigned short sa_family;

  /* mask, ie. /24 */
  int mask;
  int is_cidr;

  /* Cast to: sockaddr, sockaddr_in, sockaddr_in6, sockaddr_stroage */
  int sa_len;
  char sa_buf[128]; /* avoid referencing sockaddr structs */
} PGinet;

// res could be text or binary results, but PGinet remains the same.
PGinet inet;
PGgetf(res, tup_num, %pginet, field_num inet);
connect(sock,
  (const struct sockaddr *)inet.sa_buf,
  (socklen_t)inet.sa_len);

The above is simpler than examining 10.0.0.1/32 and converting it to a 
struct sockaddr.  The same struct is used when putting a type as a C 
type rather than as a string.  You can use getf without ever using 
putf+PGparam, and vise-versa.


 your proposal completely glossed over the
 issue of exactly what data structure would be exposed to clients for
 anything more complex than an integer

Complex types require a receiving structure on the client side, thus the 
types we added to libpq-fe.h: PGinet, PGpolygon, PGarray, PGtimestamp, 
etc...  But keep in mind that these structs are the result of libpq 
extracting the data from text/binary formats and assigning data to 
struct members.  It does not expose raw format, the API user can already 
get that via PQgetvalue().


 If we could have libpq insulate client apps from these kinds
 of changes, that would be one thing;

This is the goal of getf.  The API user interfaces through PGinet, not 
through the output of PQgetvalue().  We propose that its libpq's job 
internally to handle changes to text or binary formats and expose 
consistent types/structures.


 type NUMERIC (say, as a bignum integer plus
 exponent instead of the current BCD-ish format)

This is what we want to hide inside libpq's getf.  Just expose a 
PGnumeric that has been translated into c types.  We never expose the 
wire format, only the C translated version of it.


 without any reliance on binary data transmission whatsoever.

Yes this is possible, but at a performance  ease-of-use cost.  Our 
performance tests didn't show much gain with strings or ints, but 
complex types were 10 times faster (putting/getting arrays, polygons, 
paths, etc...).  So, the trade-off is a little more maintainence 
overhead on libpq.


BTW, previously we mentioned a 3rd party handler api concept.  This is 
not needed to get libpq up and going with built-in types (which is all 
we feel it should be responsible for).  the handler API can always be 
added later w/o changing existing functions ... have to add a couple though.


andrew  merlin



---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] psql \dFp's behavior

2007-12-11 Thread Guillaume Lelarge
Hi all,

I'm not sure psql handles \dFp the right way. The query allows
translators to translate some columns' values but forgets to escape the
strings. So, here is a patch that escapes these translated strings.

Regards.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com
Index: src/bin/psql/describe.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/psql/describe.c,v
retrieving revision 1.162
diff -c -r1.162 describe.c
*** src/bin/psql/describe.c	15 Nov 2007 21:14:42 -	1.162
--- src/bin/psql/describe.c	11 Dec 2007 18:04:09 -
***
*** 2069,2112 
  	printQueryOpt myopt = pset.popt;
  
  	initPQExpBuffer(buf);
! 
! 	printfPQExpBuffer(buf,
! 	  SELECT '%s' AS \%s\, \n
  	 p.prsstart::pg_catalog.regproc AS \%s\, \n
! 		 pg_catalog.obj_description(p.prsstart, 'pg_proc') as \%s\ \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT '%s', \n
  	 p.prstoken::pg_catalog.regproc, \n
  	   pg_catalog.obj_description(p.prstoken, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT '%s', \n
  	 p.prsend::pg_catalog.regproc, \n
  	 pg_catalog.obj_description(p.prsend, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT '%s', \n
  	 p.prsheadline::pg_catalog.regproc, \n
  pg_catalog.obj_description(p.prsheadline, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT '%s', \n
  	 p.prslextype::pg_catalog.regproc, \n
   pg_catalog.obj_description(p.prslextype, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
! 	   WHERE p.oid = '%s' \n,
! 	  _(Start parse),
! 	  _(Method), _(Function), _(Description),
! 	  oid,
! 	  _(Get next token), oid,
! 	  _(End parse), oid,
! 	  _(Get headline), oid,
! 	  _(Get token types), oid
  		);
  
  	res = PSQLexec(buf.data, false);
--- 2069,2114 
  	printQueryOpt myopt = pset.popt;
  
  	initPQExpBuffer(buf);
! 	appendPQExpBuffer(buf, SELECT );
! 	appendStringLiteralConn(buf, _(Start parse), pset.db);
! 	appendPQExpBuffer(buf,  AS \%s\, \n
  	 p.prsstart::pg_catalog.regproc AS \%s\, \n
! 		 pg_catalog.obj_description(p.prsstart, 'pg_proc') AS \%s\ \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT ,
! 	  _(Method), _(Function), _(Description), oid);
! 	appendStringLiteralConn(buf, _(Get next token), pset.db);
! 	appendPQExpBuffer(buf, , \n
  	 p.prstoken::pg_catalog.regproc, \n
  	   pg_catalog.obj_description(p.prstoken, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT , oid);
! 	appendStringLiteralConn(buf, _(End parse), pset.db);
! 	appendPQExpBuffer(buf, , \n
  	 p.prsend::pg_catalog.regproc, \n
  	 pg_catalog.obj_description(p.prsend, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT , oid);
! 	appendStringLiteralConn(buf, _(Get headline), pset.db);
! 	appendPQExpBuffer(buf, , \n
  	 p.prsheadline::pg_catalog.regproc, \n
  pg_catalog.obj_description(p.prsheadline, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
  	   WHERE p.oid = '%s' \n
  	  UNION ALL \n
! 	  SELECT , oid);
! 	appendStringLiteralConn(buf, _(Get token types), pset.db);
! 	appendPQExpBuffer(buf, , \n
  	 p.prslextype::pg_catalog.regproc, \n
   pg_catalog.obj_description(p.prslextype, 'pg_proc') \n
  	   FROM pg_catalog.pg_ts_parser p \n
! 	   WHERE p.oid = '%s' \n, oid
  		);
  
  	res = PSQLexec(buf.data, false);

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Simon.

 VLDB Features I'm expecting to work on are
 - Read Only Tables/WORM tables
 - Advanced Partitioning
 - Compression
 plus related performance features

Just so you don't lose sight of it, one of the biggest VLDB features we're 
missing is fault-tolerant bulk load.  Unfortunately, I don't know anyone 
who's working on it.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PGparam proposal

2007-12-11 Thread Andrew Chernow

 your proposal completely glossed over the
 issue of exactly what data structure would be exposed to clients for
 anything more complex than an integer

Yeah.  Forgot to include the below in the proposal and the last email. 
Here is the lastest list for complex types.  Most have been around since 
the last 0.5 patch.  Still need numeric, date, time and a couple others.


typedef struct
{
  double x;
  double y;
} PGpoint;

typedef struct
{
  PGpoint pts[2];
} PGlseg;

typedef struct
{
  PGpoint high;
  PGpoint low;
} PGbox;

typedef struct
{
  PGpoint center;
  double radius;
} PGcircle;

/* When used with PQgetf, 'pts' must be freed with PQfreemem(). */
typedef struct
{
  int npts;
  int closed;
  PGpoint *pts;
} PGpath;

/* When used with PQgetf, 'pts' must be freed with PQfreemem(). */
typedef struct
{
  int npts;
  PGpoint *pts;
} PGpolygon;

/* This struct works with CIDR as well. */
typedef struct
{
  /* here for convenience, value the same as first 2 bytes of sa_buf */
  unsigned short sa_family;

  /* mask, ie. /24 */
  int mask;
  int is_cidr;

  /* Cast to: sockaddr, sockaddr_in, sockaddr_in6, sockaddr_stroage */
  int sa_len;
  char sa_buf[128];
} PGinet;

typedef struct
{
  int a;
  int b;
  int c;
  int d;
  int e;
  int f;
} PGmacaddr;

/* main problem with this type is that it can be a double
 * or int64 and that is a compile-time decision.  This means
 * the client has a 50% chance of getting it wrong.  It would
 * be nice if the server included an indicater or converted
 * the external format to one or the other.  OR, make client
 * aware of server's timestamp encoding when it connects.
 */
typedef struct
{
  /* When non-zero, this is a TIMESTAMP WITH TIME ZONE.  When zero,
   * this is a TIMESTAMP WITHOUT TIME ZONE.  When WITHOUT, gmtoff
   * will always be 0 and tzn will be GMT.
   */
  int withtz;

  /* binary timestamp from server (in host order).  If haveint64 is
   * non-zero, use the 'ts.asint64' value otherwise use 'ts.asdouble'.
   */
  int haveint64;
  union
  {
struct {
  unsigned int a;
  signed int b;
} asint64;

double asdouble;
  } ts;

  /* microseconds */
  int usec;
  /* GMT offset, some systems don't have this in struct tm */
  int gmtoff;
  /* time zone name, some systems don't have this in struct tm */
  char *tzn;
  /* broken-down time */
  struct tm tm;
} PGtimestamp;


/* still working on this, may need access macros */

typedef struct
{
  int dim;
  int lbound;
} PGarraydim;

typedef struct
{
  int len;

  /* already in PGtype format.  For instance:
   *   (PGpolygon *)arr-items[i].data
   * or
   *   printf(text=%s\n, arr-items[i].data);
   *
   * Could have a union of all types here but that
   * doesn't help much for 3rd party types.
   */
  char *data;
} PGarrayitem;

typedef struct
{
  Oid oid; /* type of items[].data */
  int ndim;
  PGarraydim dims[MAXDIM];
  int nitems;
  PGarrayitem *items;
} PGarray;

andrew  merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Andrew Sullivan
On Tue, Dec 11, 2007 at 11:12:46AM +, Simon Riggs wrote:
 
 Read-Only Tables
 

In the past when this topic came up, there was some discussion of doing this
at a level somewhere below the table horizon.  There are a number of nasty
limitations for partitions currently (not the least of which is that real
uniqueness guarantees are impractical), so allowing users to specify some
segment of the table to be read only without imposing it on the whole
table would be awful nice.  I seem to recall Jan had an idea on how to do
it, but I could be wrong.

Also, doing this at the tuple, rather than table-wide, level might lead to
additional capabilities in this area:

 Attach
 --
 
 Writing tables on one system and then moving that data to other systems
 is fairly common. If we supported read-only tables then you might
 consider how you would publish new versions to people.

Some time ago I was speculating on pie-in-the-sky features I might like in
Postgres, and it was something like this attach.  But the idea was somehow
related to the read-only tuples.

In my specific case, I have piles and piles of mostly useless data.
Sometimes, however, some of that data is possibly useful in retrospect.  So
the suggestion was to have tables that could be mostly offline -- archived
somewhere -- but for which we had enough metadata online to say, You have
some data that might match in catalog C.  Go mount it, and I'll check.  I
think this is subtly different from the attach case you're outlining?

A

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Dienstag, 11. Dezember 2007 schrieb Tom Lane:
 Doesn't this patch break the behavior that is documented in the comment?
 Specifically, the case where the restore_command dies on a signal and
 this is reported to us by the controlling shell as exitcode  128.
 We want the archiver to die, but this patch makes it not do so.

 AFAICT, the coding

 WIFSIGNALED(rc) || WEXITSTATUS(rc)  128

 is simply redundant, because a signal happened exactly when WIFSIGNALED(rc) 
 is 
 true.

No, you are confusing the cases called shell was killed by a signal
and called command was killed by a signal, which the shell then turned
around and reported to us as exit  128.

 I have tested this:

A single test case proves little.  You need to consider race conditions
and cases where the shell is ignoring the particular signal.  I'm fairly
certain that both of these exit statuses can happen in practice, at
least with some shells/platforms.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] psql \dFp's behavior

2007-12-11 Thread Tom Lane
Guillaume Lelarge [EMAIL PROTECTED] writes:
 I'm not sure psql handles \dFp the right way. The query allows
 translators to translate some columns' values but forgets to escape the
 strings. So, here is a patch that escapes these translated strings.

This seems mighty ugly, and it's not the way we handle any other \d
command.  Why is it needed for \dFp (and only that)?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] psql \dFp's behavior

2007-12-11 Thread Guillaume Lelarge
Tom Lane a écrit :
 Guillaume Lelarge [EMAIL PROTECTED] writes:
 I'm not sure psql handles \dFp the right way. The query allows
 translators to translate some columns' values but forgets to escape the
 strings. So, here is a patch that escapes these translated strings.
 
 This seems mighty ugly, and it's not the way we handle any other \d
 command.  Why is it needed for \dFp (and only that)?
 

Oh I didn't say only \dFp needs this kind of fix. I've found an issue
with \dFp+ today, so I'm trying to fix it. Here is the issue I found :

[EMAIL PROTECTED]:/opt/postgresql-head$ LANG=en psql postgres
Welcome to psql 8.3beta4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit

Text search parser pg_catalog.default
 Method  |Function| Description
-++-
 Start parse | prsd_start |
 Get next token  | prsd_nexttoken |
 End parse   | prsd_end   |
 Get headline| prsd_headline  |
 Get token types | prsd_lextype   |

Token types for parser pg_catalog.default
[...]

OK, it works great. Now, in french :

[EMAIL PROTECTED]:/opt/postgresql-head$ LANG=fr_FR.UTF-8 psql postgres
Bienvenue dans psql 8.3beta4, l'interface interactive de PostgreSQL.

Saisissez:
\copyright pour les termes de distribution
\h pour l'aide-mémoire des commandes SQL
\? pour l'aide-mémoire des commandes psql
\g ou point-virgule en fin d'instruction pour exécuter la requête
\q pour quitter

postgres=# set lc_messages to 'C';
SET
postgres=# \dFp+
ERROR:  syntax error at or near analyse
LIGNE 1 : SELECT 'Début de l'analyse' AS Méthode,
 ^

The problem here is that Start parse is translated with Début de
l'analyse (which is a bad translation but that's not the point). The
point is that the query is not protected against quotes and backslashes
and this is bad. My code is probably ugly, I trust you on this, but I
think we need to fix this. I think we have two ways to do it :
 - escaping the translated words ;
 - removing the call to gettext (so no translations for these strings).

I found \dFp but we could have the same problems with \dp because it
puts directly in the query the translations of some words (table, view,
sequence, aggregate, function, operator, datatype, rule, trigger) which
is not a problem in French but can be one in another language. \du, \dg,
\d seem to have problems too.

Regards.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] psql \dFp's behavior

2007-12-11 Thread Tom Lane
Guillaume Lelarge [EMAIL PROTECTED] writes:
 Tom Lane a écrit :
 This seems mighty ugly, and it's not the way we handle any other \d
 command.  Why is it needed for \dFp (and only that)?

 The problem here is that Start parse is translated with Début de
 l'analyse (which is a bad translation but that's not the point).

Well, that particular issue could be fixed if the translated string
doubled the quote mark.  Which I agree is a pretty fragile solution.

describe.c's whole approach to this has always been pretty thoroughly
broken in my mind, because it makes untenable assumptions about the
client-side gettext() producing strings that are in the current
client_encoding.  If they are not, the server will probably reject
the SQL query as failing encoding verification.

We should be fixing it so that the translated strings never go to the
server and back at all.  This doesn't seem amazingly hard for column
headings --- it'd take some API additions in print.c, I think.
If we are actually embedding translated words in the data
then it'd be a bigger problem.

 I found \dFp but we could have the same problems with \dp

IIRC, *all* the \d commands do this.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Dumb question about binary cursors and #ifdef HAVE_INT64_TIMESTAMP

2007-12-11 Thread Dann Corbit
If I create a binary cursor on a recent version of PostgreSQL, how can I
tell if the timestamp data internally is an 8 byte double or an 8 byte
integer?

I see an #ifdef that changes the code path to compute timestamps as one
type or the other, but I do not know how to recognize the internal
format of the type that will be returned in a binary cursor.

How can I do that?

 



Re: [HACKERS] VLDB Features

2007-12-11 Thread Hannu Krosing

Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus:
 Simon.
 
  VLDB Features I'm expecting to work on are
  - Read Only Tables/WORM tables
  - Advanced Partitioning
  - Compression
  plus related performance features
 
 Just so you don't lose sight of it, one of the biggest VLDB features we're 
 missing is fault-tolerant bulk load. 

What do you mean by fault-tolerant here ?

Just 

COPY ... WITH ERRORS TO ...

or something more advanced, like bulkload which can be continued after
crash ?

--
Hannu



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdefHAVE_INT64_TIMESTAMP

2007-12-11 Thread Dann Corbit
 -Original Message-
 From: Alvaro Herrera [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 11, 2007 1:11 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; [EMAIL PROTECTED]
 Subject: Re: [GENERAL] Dumb question about binary cursors and
 #ifdefHAVE_INT64_TIMESTAMP
 
 Dann Corbit wrote:
  If I create a binary cursor on a recent version of PostgreSQL, how
can I
  tell if the timestamp data internally is an 8 byte double or an 8
byte
  integer?
 
  I see an #ifdef that changes the code path to compute timestamps as
one
  type or the other, but I do not know how to recognize the internal
  format of the type that will be returned in a binary cursor.
 
  How can I do that?
 
 SHOW integer_timestamp;
 
 (actually, IIRC, this is one of the params that the server will send
you
 at session start).

I guess that I am supposed to check for error on the statement?  What
does it look like when the query works?

This is what I get against PostgreSQL 8.2.5 using PG Admin III query
tool:

ERROR:  unrecognized configuration parameter integer_timestamp

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Re: [GENERAL] Dumb question about binary cursors and #ifdef HAVE_INT64_TIMESTAMP

2007-12-11 Thread Alvaro Herrera
Dann Corbit wrote:
 If I create a binary cursor on a recent version of PostgreSQL, how can I
 tell if the timestamp data internally is an 8 byte double or an 8 byte
 integer?
 
 I see an #ifdef that changes the code path to compute timestamps as one
 type or the other, but I do not know how to recognize the internal
 format of the type that will be returned in a binary cursor.
 
 How can I do that?

SHOW integer_timestamp;

(actually, IIRC, this is one of the params that the server will send you
at session start).

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
Y eso te lo doy firmado con mis lágrimas (Fiebre del Loco)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdef HAVE_INT64_TIMESTAMP

2007-12-11 Thread Tom Lane
Dann Corbit [EMAIL PROTECTED] writes:
 If I create a binary cursor on a recent version of PostgreSQL, how can I
 tell if the timestamp data internally is an 8 byte double or an 8 byte
 integer?

PQparameterStatus(conn, integer_datetimes)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
 Simon.
 
  VLDB Features I'm expecting to work on are
  - Read Only Tables/WORM tables
  - Advanced Partitioning
  - Compression
  plus related performance features
 
 Just so you don't lose sight of it, one of the biggest VLDB features we're 
 missing is fault-tolerant bulk load.  Unfortunately, I don't know anyone 
 who's working on it.

Not lost sight of it; I have a design, but I have to prioritise also.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] psql \dFp's behavior

2007-12-11 Thread Guillaume Lelarge
Tom Lane a écrit :
 Guillaume Lelarge [EMAIL PROTECTED] writes:
 Tom Lane a écrit :
 This seems mighty ugly, and it's not the way we handle any other \d
 command.  Why is it needed for \dFp (and only that)?
 
 The problem here is that Start parse is translated with Début de
 l'analyse (which is a bad translation but that's not the point).
 
 Well, that particular issue could be fixed if the translated string
 doubled the quote mark.  Which I agree is a pretty fragile solution.
 

Which is why I choose to look at the code and write a patch.

 describe.c's whole approach to this has always been pretty thoroughly
 broken in my mind, because it makes untenable assumptions about the
 client-side gettext() producing strings that are in the current
 client_encoding.  If they are not, the server will probably reject
 the SQL query as failing encoding verification.
 

Oh, that's true. I didn't think about that but I understand your concerns.

 We should be fixing it so that the translated strings never go to the
 server and back at all.  This doesn't seem amazingly hard for column
 headings --- it'd take some API additions in print.c, I think.
 If we are actually embedding translated words in the data
 then it'd be a bigger problem.
 

I'll take a look at this.

Thanks for your answer.

Regards.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdefHAVE_INT64_TIMESTAMP

2007-12-11 Thread Dann Corbit
 -Original Message-
 From: Alvaro Herrera [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 11, 2007 1:11 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; [EMAIL PROTECTED]
 Subject: Re: [GENERAL] Dumb question about binary cursors and
 #ifdefHAVE_INT64_TIMESTAMP
 
 Dann Corbit wrote:
  If I create a binary cursor on a recent version of PostgreSQL, how
can I
  tell if the timestamp data internally is an 8 byte double or an 8
byte
  integer?
 
  I see an #ifdef that changes the code path to compute timestamps as
one
  type or the other, but I do not know how to recognize the internal
  format of the type that will be returned in a binary cursor.
 
  How can I do that?
 
 SHOW integer_timestamp;
 
 (actually, IIRC, this is one of the params that the server will send
you
 at session start).

Tom's post clued me in.
It's:
show integer_datetimes;

Or (in my case):
PQparameterStatus(conn, integer_datetimes)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] psql \dFp's behavior

2007-12-11 Thread Tom Lane
Guillaume Lelarge [EMAIL PROTECTED] writes:
 Tom Lane a écrit :
 We should be fixing it so that the translated strings never go to the
 server and back at all.  This doesn't seem amazingly hard for column
 headings --- it'd take some API additions in print.c, I think.

 I'll take a look at this.

I'm already looking ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Peter Eisentraut
Simon Riggs wrote:
 I prefer archive_command = ... for the errdetail though. The commands
 can be quite long so the extra words don't really add anything, plus
 they require translation.

I did like this suggestion, but then I noticed, we don't actually report the 
setting of the archive_command setting but the actual command that was 
executed, with the placeholders filled out.  So I think the way I posted it 
is more correct.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


printQuery API change proposal (was Re: [HACKERS] psql \dFp's behavior)

2007-12-11 Thread Tom Lane
I wrote:
 describe.c's whole approach to this has always been pretty thoroughly
 broken in my mind, because it makes untenable assumptions about the
 client-side gettext() producing strings that are in the current
 client_encoding.  If they are not, the server will probably reject
 the SQL query as failing encoding verification.

 We should be fixing it so that the translated strings never go to the
 server and back at all.  This doesn't seem amazingly hard for column
 headings --- it'd take some API additions in print.c, I think.
 If we are actually embedding translated words in the data
 then it'd be a bigger problem.

I looked at the code a bit closer, and my vague memory was correct:
describe.c mostly uses translated strings for column headers, eg

printfPQExpBuffer(buf,
  SELECT spcname AS \%s\,\n
pg_catalog.pg_get_userbyid(spcowner) AS \%s\,\n
spclocation AS \%s\,
  _(Name), _(Owner), _(Location));

but there are also a few places where it wants a column to contain
translated values, for example

CAST(\n
  CASE c.relkind WHEN 'r' THEN '%s' WHEN 'v' THEN '%s' 
WHEN 'i' THEN '%s' WHEN 'S' THEN '%s' END
AS pg_catalog.text) as object\n
...
  _(table), _(view), _(index), _(sequence)

It would be reasonably straightforward to get rid of sending the column
headers to the server, since the underlying printTable function already
accepts column headers as a separate array argument; we could ignore
the column headers coming back from the server and just inject correctly
translated strings instead.  However the data values are a bit harder.

What I'm tempted to do is add a couple of optional fields to struct
printQueryOpt that specify translatable strings in column headers and
column contents, respectively:

booltranslate_headers;
bool   *translate_columns;   /* translate_columns[i-1] applies to column i 
*/

If these are set then printQuery would run through the headers and/or
contents of specific columns and apply gettext() on the indicated
strings, after it had finished disassembling the PGresult into arrays.
(Since we don't want to be doing gettext() on random strings, we need
to indicate exactly which columns should be processed.)  To ensure that
the strings are available for translation, all the _(x) instances in
describe.c would change to gettext_noop(x), but otherwise that code
would only need to change to the extent of setting the new option fields
in printQueryOpt.  This means the server sees only untranslated
plain-ASCII strings and shouldn't get upset about encoding issues.

We'd still have a problem if we wanted to put a single-quote mark in an
untranslated string (or a double-quote, in the case of a column header),
but so far there's been no need for that.  If it did come up, we could
handle it in the style Guillaume suggested, that is
appendStringLiteral(gettext_noop(foo's a problem)).  So I think it's
not necessary to contort the general solution to make that case easier.

printQueryOpt isn't exported anywhere but bin/psql and bin/scripts,
so changing it doesn't create an ABI break.

Objections, better ideas?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Peter Eisentraut
Tom Lane wrote:
 No, you are confusing the cases called shell was killed by a signal
 and called command was killed by a signal, which the shell then turned
 around and reported to us as exit  128.

Yes, I had missed that difference.  Next try ...

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/
diff -cr ../cvs-pgsql/src/backend/postmaster/pgarch.c ./src/backend/postmaster/pgarch.c
*** ../cvs-pgsql/src/backend/postmaster/pgarch.c	2007-11-25 12:39:56.0 +0100
--- ./src/backend/postmaster/pgarch.c	2007-12-11 23:28:43.0 +0100
***
*** 484,494 
  		 * Per the Single Unix Spec, shells report exit status  128 when a
  		 * called command died on a signal.
  		 */
! 		bool		signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc)  128;
  
! 		ereport(signaled ? FATAL : LOG,
! (errmsg(archive command \%s\ failed: return code %d,
! 		xlogarchcmd, rc)));
  
  		return false;
  	}
--- 484,518 
  		 * Per the Single Unix Spec, shells report exit status  128 when a
  		 * called command died on a signal.
  		 */
! 		int		lev = (WIFSIGNALED(rc) || WEXITSTATUS(rc)  128) ? FATAL : LOG;
  
! 		if (WIFEXITED(rc))
! 		{
! 			ereport(lev,
! 	(errmsg(archive command failed with exit code %d, WEXITSTATUS(rc)),
! 	 errdetail(The archive command was \%s\., xlogarchcmd)));
! 		}
! 		else if (WIFSIGNALED(rc))
! 		{
! 			ereport(lev, (
! #if defined(WIN32)
! 		errmsg(archive command was terminated by exception 0x%X, WTERMSIG(rc)),
! 		errhint(See C include file \ntstatus.h\ for a description of the hexadecimal value.),
! #elif defined(HAVE_DECL_SYS_SIGLIST)  HAVE_DECL_SYS_SIGLIST
! 		errmsg(archive command was terminated by signal %d: %s,
! 			   WTERMSIG(rc),
! 			   WTERMSIG(rc)  NSIG ? sys_siglist[WTERMSIG(rc)] : (unknown)),
! #else
! 		errmsg(archive command was terminated by signal %d, WTERMSIG(exitstatus)),
! #endif
! 		errdetail(The archive command was \%s\., xlogarchcmd)));
! 		}
! 		else
! 		{
! 			ereport(lev,
! 	(errmsg(archive command exited with unrecognized status %d, rc),
! 	 errdetail(The archive command was \%s\., xlogarchcmd)));
! 		}
  
  		return false;
  	}

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 14:25 -0500, Andrew Sullivan wrote:
 On Tue, Dec 11, 2007 at 11:12:46AM +, Simon Riggs wrote:
  
  Read-Only Tables
  
 
 In the past when this topic came up, there was some discussion of doing this
 at a level somewhere below the table horizon.  There are a number of nasty
 limitations for partitions currently (not the least of which is that real
 uniqueness guarantees are impractical), so allowing users to specify some
 segment of the table to be read only without imposing it on the whole
 table would be awful nice.  I seem to recall Jan had an idea on how to do
 it, but I could be wrong.
 
 Also, doing this at the tuple, rather than table-wide, level might lead to
 additional capabilities in this area:

Seems fair comment.

I'll move forward my thoughts on partitioning, so we can decide whether
to do things this way, or below the table horizon as you say. I've got
some detailed notes on this already which showed it was roughly balanced
between the two ways. I'll write it up so we can see; the answer might
change during the writing.

  Attach
  --
  
  Writing tables on one system and then moving that data to other systems
  is fairly common. If we supported read-only tables then you might
  consider how you would publish new versions to people.
 
 Some time ago I was speculating on pie-in-the-sky features I might like in
 Postgres, and it was something like this attach.  But the idea was somehow
 related to the read-only tuples.
 
 In my specific case, I have piles and piles of mostly useless data.
 Sometimes, however, some of that data is possibly useful in retrospect.  So
 the suggestion was to have tables that could be mostly offline -- archived
 somewhere -- but for which we had enough metadata online to say, You have
 some data that might match in catalog C.  Go mount it, and I'll check.  I
 think this is subtly different from the attach case you're outlining?

Yes it is, but I had hoped that what you're asking for is catered for
here.

If you have a hierarchical storage manager, you can just call access the
file, whereupon the actual file will be de-archived to allow access. Or
maybe you have it on MAID storage, so we spin up the disks to allow
access to the files. So I guess I was expecting the de-archive to be
automated at the file system level.
http://en.wikipedia.org/wiki/Hierarchical_Storage_Management
Regrettably, I'm not aware of an open source HSM, though XFS has some
hooks in it that mention this. That's an old IBM term, so some people
might refer to this concept as tiered storage. 

Since I was planning to modify smgr to allow different kinds of
tablespaces it should be possible to make the file issue some kind of a
call out to mount the appropriate files. What would that call out look
like? Would that be a DMAPI/XDSM impelementation, or something simpler
as was used for PITR, or a roll-your-own plug-in hook?

I can see I'll have to re-wire smgr_nblocks() for non-md smgrs to access
pg_class.relpages rather than issue a seek, which will need to issue an
open. That way we can do planning without actually accessing the table.
(Maybe that's presuming we dealing with tables, oh well).

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 23:31 +0100, Peter Eisentraut wrote:
 Simon Riggs wrote:
  I prefer archive_command = ... for the errdetail though. The commands
  can be quite long so the extra words don't really add anything, plus
  they require translation.
 
 I did like this suggestion, but then I noticed, we don't actually report the 
 setting of the archive_command setting but the actual command that was 
 executed, with the placeholders filled out.  So I think the way I posted it 
 is more correct.

I think you should lose the The, if nothing else. Most Postgres
messages I recall say return code = xx not The return code ...

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Alvaro Herrera
Simon Riggs wrote:
 On Tue, 2007-12-11 at 23:31 +0100, Peter Eisentraut wrote:
  Simon Riggs wrote:
   I prefer archive_command = ... for the errdetail though. The commands
   can be quite long so the extra words don't really add anything, plus
   they require translation.
  
  I did like this suggestion, but then I noticed, we don't actually report 
  the 
  setting of the archive_command setting but the actual command that was 
  executed, with the placeholders filled out.  So I think the way I posted it 
  is more correct.
 
 I think you should lose the The, if nothing else. Most Postgres
 messages I recall say return code = xx not The return code ...

Right, that's because they are in errmessage() and not errdetail().  The
guidelines are different.  (The reason they are different really escapes
me, but you have to note that (1) Peter invented them, and (2) this
grammatical difference always make sense when you read the logs.)

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
There is evil in the world. There are dark, awful things. Occasionally, we get
a glimpse of them. But there are dark corners; horrors almost impossible to
imagine... even in our worst nightmares. (Van Helsing, Dracula A.D. 1972)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Hannu,

 COPY ... WITH ERRORS TO ...

Yeah, that's a start.

 or something more advanced, like bulkload which can be continued after
 crash ?

Well, we could also use a loader which automatically parallelized, but that 
functionality can be done at the middleware level.  WITH ERRORS is the 
most critical part.

Here's the other VLDB features we're missing:

Parallel Query
Windowing Functions
Parallel Index Build (not sure how this works exactly, but it speeds Oracle 
up considerably)
On-disk Bitmap Index (anyone game to finish GP patch?)

Simon, we should start a VLDB-Postgres developer wiki page.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] VLDB Features

2007-12-11 Thread Neil Conway
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
 Just so you don't lose sight of it, one of the biggest VLDB features we're 
 missing is fault-tolerant bulk load.

I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation for 8.4.

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] VLDB Features

2007-12-11 Thread Simon Riggs
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:

 Here's the other VLDB features we're missing:
 
 Parallel Query
 Windowing Functions
 Parallel Index Build (not sure how this works exactly, but it speeds Oracle 
 up considerably)
 On-disk Bitmap Index (anyone game to finish GP patch?)

I would call those VLDB Data Warehousing features to differentiate
between that and the use of VLDBs for other purposes.

I'd add Materialized View support in the planner, as well as saying its
more important than parallel query, IMHO. MVs are to DW what indexes are
to OLTP. It's the same as indexes vs. seqscan; you can speed up the seq
scan or you can avoid it. Brute force is cool, but being smarter is even
better. 

The reason they don't normally show up high on anybody's feature list is
that the TPC benchmarks specifically disallow them, which as I once
observed is very good support for them being a useful feature in
practice. (Oracle originally brought out MV support as a way of
improving their TPC scores at a time when Teradata was wiping the floor
with parallel query implementation). 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] whats the deal with -u ?

2007-12-11 Thread Kevin Grittner
 On Sun, Dec 9, 2007 at  6:37 PM, in message
[EMAIL PROTECTED], Alvaro Herrera
[EMAIL PROTECTED] wrote: 
 
 I have never understood what's the point of having an option to force a
 password prompt.  I wonder why don't we deprecate -W?
 
I occasionally find it useful for situations where I have a .pgpass
entry which would normally cover a database, but I have temporarily
changed the password to prevent conflicting usage during maintenance
or testing.  (For example, while borrowing a machine which is
normally part of the production load for a series of benchmarks
under the beta release.)
 
There would be other ways to deal with it if this were gone, but
it is convenient.
 
-Kevin
 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] VLDB Features

2007-12-11 Thread Greg Smith

On Tue, 11 Dec 2007, Josh Berkus wrote:


Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.  Unfortunately, I don't know anyone
who's working on it.


I'm curious what you feel is missing that pgloader doesn't fill that 
requirement:  http://pgfoundry.org/projects/pgloader/


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] 8.3beta4 space reduction

2007-12-11 Thread Kevin Grittner
I'm still working on getting through our test cases and benchmarks,
but one thing I can say for sure -- a freshly loaded database under
8.3beta4 is over 13% smaller than the exact same database freshly
loaded into 8.2.5.  It went from 216GB to 187GB.
 
Of course, the down side of this is that it'll be that much longer
before we can brag about having a TB database
 
-Kevin
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Yes, I had missed that difference.  Next try ...

Looks sane to me.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] archive_command failures report confusing exit status

2007-12-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Simon Riggs wrote:
 I think you should lose the The, if nothing else. Most Postgres
 messages I recall say return code = xx not The return code ...

 Right, that's because they are in errmessage() and not errdetail().  The
 guidelines are different.

Right --- errdetail is supposed to be a complete sentence.  (The
rationale for that is explained in TFM.)  I do find the current phrasing
a bit awkward, though; what's bugging me is putting quotes around the
command string.  It seems highly likely that the command string will
itself contain quotes and thus that the added quotes will be confusing.
I wonder if it'd be OK to do

The failed archive command was: %s

which is stretching the style guidelines by omitting a trailing period,
but I think that might be justifiable in this context.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]

2007-12-11 Thread Alvaro Herrera
Hi,

Here's another problem report on Windows.  This time it is usage of SSL
connections and NOTIFY.  I talked to Magnus on IRC and he directed me to
bug #2829:
http://archives.postgresql.org/pgsql-bugs/2006-12/msg00122.php

This report seems to be a little different, if only because the reported
error string from SSL mentions an Unknown winsock error 10004.

This guy is using 8.2.5.  SSL seems to be able to fill his log files at
full speed.

Is this an issue we can do something about?


- Forwarded message from Henry [EMAIL PROTECTED] -

From: Henry [EMAIL PROTECTED]
To: Alvaro Herrera [EMAIL PROTECTED]
Cc: Postgres [EMAIL PROTECTED]
Date: Wed, 12 Dec 2007 03:34:04 +0100 (CET)
Subject: Re: [pgsql-es-ayuda] SLL error 100% cpu
Message-ID: [EMAIL PROTECTED]


--- Alvaro Herrera [EMAIL PROTECTED] escribió:

 Henry escribió:
  buenas a todos los listeros.
  
  ya puse a produccion SSL con postgresql, y la
  performance se va degradando mientras se va
 usando, 
  procesos de CPU ocupa el 100% y cuando bajo el
  Servicio quedan alguno postgres.exe colgados,
  desactive la escritura de Log, porque se creaban
  demasiados archivos log con el texto de SYSCALL
  ERROR... , que raro pero hasta se creo
 un
  archivo de 14MB (ke raro, si esta configurado
 hasta
  10MB solamente).

- 
 Puedes mandar un extracto de ese archivo gigante? 
 Unas cuantas lineas
 de ese SYSCALL ERROR.
--

aqui esta:
LOG:  SSL SYSCALL error: Unknown winsock error 10004


saludos
 



   
__ 
¿Chef por primera vez?
Sé un mejor Cocinillas. 
http://es.answers.yahoo.com/info/welcome

- End forwarded message -


-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Schwern It does it in a really, really complicated way
crab why does it need to be complicated?
Schwern Because it's MakeMaker.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Problem of a server gettext message.

2007-12-11 Thread Hiroshi Saito

Hi.

Yeah, As a part from which a problem happens, it is your suggestion.

This is only the check. 
http://winpg.jp/~saito/pg83/message_check/gtext2.c

Therefore, a message needed is acquirable in the next operation.
gtext2 C UTF-8
http://winpg.jp/~saito/pg83/message_check/codeset_utf8_msg.txt
gtext2 C EUC_JP
http://winpg.jp/~saito/pg83/message_check/codeset_eucjp_msg.txt

However, The check of accuracy is not settled yet. If all server encodings 
are possible, I will want to work. But but, It is not desirable that more 
encodings are intermingled as a log message Then, here is no still good 
method. Furthermore, a good solution plan is desired. probably..


Thanks!

Regards,
Hiroshi Saito

- Original Message - 
From: Zeugswetter Andreas ADI SD [EMAIL PROTECTED]



 GetText is conversion po(EUC_JP) to SJIS.


Yes.


Are you sure about that?  Why would gettext be converting to SJIS,

when

SJIS is nowhere in the environment it can see?


gettext is using GetACP () on Windows, wherever that gets it's info from
...
chcp did change the GetACP codepage in Hiroshi's example, but chcp
does not reflect in LC_*

Seems we may want to use bind_textdomain_codeset.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]

2007-12-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 This guy is using 8.2.5.  SSL seems to be able to fill his log files at
 full speed.

Are you *sure* the server is 8.2.5?  8.2.5 shouldn't emit duplicate
messages, but 8.2.4 and before would:

2007-05-17 21:20  tgl

* src/backend/libpq/: be-secure.c (REL7_4_STABLE), be-secure.c
(REL8_1_STABLE), be-secure.c (REL8_0_STABLE), be-secure.c
(REL8_2_STABLE), be-secure.c: Remove redundant logging of send
failures when SSL is in use.  While pqcomm.c had been taught not to
do that ages ago, the SSL code was helpfully bleating anyway. 
Resolves some recent reports such as bug #3266; however the
underlying cause of the related bug #2829 is still unclear.

Furthermore, it looks to me like SSL SYSCALL error: %m doesn't
exist anymore since that patch, so my bogometer is buzzing loudly.

I dunno anything about how to fix the real problem (what's winsock error
10004?), but I don't think he'd be seeing full speed log filling in
8.2.5.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Why my C function is called twice?

2007-12-11 Thread Tom Lane
Billow Gao [EMAIL PROTECTED] writes:
 A very simple C function which I copied from the manual.
 And I found that it's called twice.

You do realize that it's *supposed* to be called twice?  Once to
return the first row, and again to say it's done returning rows.

But the info messages you show are strange anyway.  I tried your example
here and got the results I'd expect:

regression=# select * from retcomposite(1,48);
INFO:  1
INFO:  2
INFO:  3
INFO:  4
INFO:  2
INFO:  5
 f1 | f2 | f3  
++-
 48 | 96 | 144
(1 row)

I think the code you showed must not quite match what you're actually
executing.  Maybe you recompiled the code and forgot to do a LOAD or
start a fresh session to bring in the new .so file?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] VLDB Features

2007-12-11 Thread Josh Berkus
Greg,

 I'm curious what you feel is missing that pgloader doesn't fill that
 requirement:  http://pgfoundry.org/projects/pgloader/

Because pgloader is implemented in middleware, it carries a very high overhead 
if you have bad rows.  As little as 1% bad rows will slow down loading by 20% 
due to retries.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]

2007-12-11 Thread Trevor Talbot
On 12/11/07, Tom Lane [EMAIL PROTECTED] wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:

 I dunno anything about how to fix the real problem (what's winsock error
 10004?), but I don't think he'd be seeing full speed log filling in
 8.2.5.

WSAEINTR, A blocking operation was interrupted by a call to
WSACancelBlockingCall.

Offhand I'd take it as either not entirely sane usage of a network
API, or one of the so very many broken software firewalls / network
security products.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]

2007-12-11 Thread Tom Lane
Trevor Talbot [EMAIL PROTECTED] writes:
 On 12/11/07, Tom Lane [EMAIL PROTECTED] wrote:
 I dunno anything about how to fix the real problem (what's winsock error
 10004?),

 WSAEINTR, A blocking operation was interrupted by a call to
 WSACancelBlockingCall.

Oh, then it's exactly the same thing as our bug #2829.

I opined in that thread that OpenSSL was broken because it failed to
treat this as a retryable case like EINTR.  But not being much of a
Windows person, that might be mere hot air.  Someone with a Windows
build environment should try patching OpenSSL to treat WSAEINTR
the same as Unix EINTR and see what happens ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] There's random access and then there's random access

2007-12-11 Thread Bruce Momjian
Gregory Stark wrote:
 I could swear this has been discussed in the past too. I seem to recall Luke
 disparaging Postgres on the same basis but proposing an immensely complicated
 solution. posix_fadvise or using libaio in a simplistic fashion as a kind of
 fadvise would be fairly lightweight way to get most of the benefit of the more
 complex solutions.

It has been on the TODO list for a long time:

* Do async I/O for faster random read-ahead of data

  Async I/O allows multiple I/O requests to be sent to the disk with
  results coming back asynchronously.

  http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php

I have added your thread URL to this.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster