date:20070228

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread tomas

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Feb 26, 2007 at 01:39:40PM -0500, Tom Lane wrote:
[...]
 Or were you speaking of the pg_class.reltuples count?

Yes (modulo my warning, that is)

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF5T2SBcgs9XrR2kYRAndUAJoDG+5zqk0PxOI5GUM68GKW7+NdRgCfVB5p
6eod6gx21tgOciSKXAuuCvA=
=3Oz7
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Simon Riggs

On Wed, 2007-02-28 at 13:53 +0900, Galy Lee wrote:
 
 Tom Lane wrote:
  Huh?  There is no extra cost in what I suggested; it'll perform
  exactly the same number of index scans that it would do anyway.
 
 The things I wanted to say is that:
 If we can stop at any point, we can make maintenance memory large
 sufficient to contain all of the dead tuples, then we only need to
 clean index for once. No matter how many times vacuum stops,
 indexes are cleaned for once.

I agree that the cycle-at-a-time approach could perform more poorly with
repeated stop-start. The reason for the suggestion was robustness, not
performance. If you provide the wrong dead-tuple-list to VACUUM, you
will destroy the integrity of a table, which can result in silent data
loss. You haven't explained how saving the dead-tuple-list could be done
in a safe mannner and it seems risky to me.

 But in your proposal, indexes will be scan as many as vacuum stops.
 Those extra indexes cleaning are thought as the extra cost compared
 with stop-on-dime approach. To vacuum a large table by stopping 8
 times, tests show the extra cost can be one third of the stop-on-dime
  approach.

But the VACUUM is being run during your maintenance window, so why do
you care about performance of VACUUM during that time? There is some
inefficiency in the VACUUM process, but seems like a high price to pay
for more robustness. 

Does the loss of efficiency during VACUUM translate directly into
reduced performance during operational periods? I think not.

Deferring completion of VACUUM means deferring refreshing the FSM.
Allowing cycle-at-a-time VACUUM would allow the FSM to be updated after
each run, thus releasing space for reuse again. ISTM that the
saving-dead-list approach would defer the updating of the FSM for many
days in your situation.

If you would like to reduce VACUUM times have you considered
partitioning? It can be very effective at isolating changes and is
designed specifically to cope with large data maintenance issues. If
there are issues that prevent the use of partitioning in your case,
perhaps we should be discussing those instead?

Migration from a non-partitioned environment to a partitioned one is
quite simple from 8.2 onwards.

 So I'm not really convinced that being able to stop a table
  vacuum halfway is critical.
 To run vacuum on the same table for a long period, it is critical
 to be sure:
 1. not to eat resources that foreground processes needs
 2. not to block vacuuming of hot-updated tables
 3. not to block any transaction, not to block any backup activities
 
 In the current implementation of concurrent vacuum, the third is not
 satisfied obviously, the first issue comes to my mind is the
 lazy_truncate_heap, it takes AccessExclusiveLock for a long time,
 that is problematic. 

Are you saying you know for certain this lock is held for a long time,
or are you just saying you think it is? If you have some evidence for
long truncation times then that would be a separate issue of concern,
since that might starve out normal users. Please say more?

ISTM that if you can refresh the FSM more frequently you will have less
need to truncate the relation at the end of each run. After some time, I
would expect that no truncation would be required because of the cyclic
reuse of space within the table, rather than extension/truncation.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 On Wed, 2007-02-28 at 13:53 +0900, Galy Lee wrote:
 In the current implementation of concurrent vacuum, the third is not
 satisfied obviously, the first issue comes to my mind is the
 lazy_truncate_heap, it takes AccessExclusiveLock for a long time,
 that is problematic. 

 Are you saying you know for certain this lock is held for a long time,
 or are you just saying you think it is? If you have some evidence for
 long truncation times then that would be a separate issue of concern,
 since that might starve out normal users. Please say more?

lazy_truncate_heap does a ConditionalLockAcquire, that is, it won't
succeed in acquiring the exclusive lock if there is any competition.
And I find it hard to believe that it will hold the lock very long
if it does get it --- in most scenarios it won't be possible to remove
very many pages, so the scan won't take long.  (Of course that is
arguably a bug, but until you can fix things so that an average VACUUM
*can* remove a lot of pages, it's hardly profitable to worry about
whether this step creates a concurrency problem.)

So I agree with Simon: if you want us to believe there's a performance
issue here, please present some evidence.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Heikki Linnakangas


Tom Lane wrote:

Galy Lee [EMAIL PROTECTED] writes:

If we can stop at any point, we can make maintenance memory large
sufficient to contain all of the dead tuples, then we only need to
clean index for once. No matter how many times vacuum stops,
indexes are cleaned for once.


I beg your pardon?  You're the one who's been harping on the
table-so-large-it-takes-days-to-vacuum scenario.  How you figure that
you can store all the dead TIDs in working memory?


This reminds me of an idea I had while looking at the bitmap index 
patch: We could store the dead TIDs more efficiently in a bitmap, 
allowing tables to be vacuumed in lesser cycles.


Of course, that's orthogonal to the above discussion.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Dead Space Map version 2

2007-02-28 Thread Heikki Linnakangas


Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Tom Lane wrote:

I'd be happier if the DSM content could be
treated as just a hint.


If we don't have a frozen state, we can't use the DSM to implement 
index-only scans.


To implement index-only scans, the DSM would have to be expected to
provide 100% reliable coverage, which will increase its cost and
complexity by orders of magnitude.  If you insist on that, I will bet
you lunch at a fine restaurant that it doesn't make it into 8.3.


:)

While I understand that 100% reliable coverage is a significantly 
stronger guarantee, I don't see any particular problems in implementing 
that. WAL logging isn't that hard.


I won't insist, I'm not the one doing the programming after all. 
Anything is better than what we have now. However, I do hope that 
whatever is implemented doesn't need a complete rewrite to make it 100% 
reliable in the future.


The basic wish I have is to not use a fixed size shared memory area like 
FSM for the DSM. I'd like it to use the shared buffers instead, which 
makes the memory management and tuning easier. And it also makes it 
easier to get the WAL logging right, even if it's not done for 8.3 but 
added later.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Galy Lee



Simon Riggs wrote:
 You haven't explained how saving the dead-tuple-list could be done
 in a safe mannner and it seems risky to me.

The files are placed in a new directory $PGDATA/pg_vacuum
with the name: spcNode.dbNode.relNode for each relations
which have been interrupted during vacuum.

It has the format likes:

1. VacStateFileHeader
2. VacStateData
3. Dead Tuple list
4. CRC32

The files are removed
 - when original physical heap files are removed,
 - when vacuum full have been issued,
 - or after the content has been read in memory.
 - etc.

Is there any potential big risk there? Correct me if I am
wrong.

 Deferring completion of VACUUM means deferring refreshing the FSM.

I borrow the code from DSM patch to merge free space info
 into FSM when vacuum stops.

 Are you saying you know for certain this lock is held for a long time,
 or are you just saying you think it is? If you have some evidence for
 long truncation times then that would be a separate issue of concern,
 since that might starve out normal users. Please say more?

Sorry. I *thought* it is. The benchmark has not shown such
kind of problem anyway. Thanks for the clarification for me. :)

Regards,
-- 
Galy Lee
lee.galy _at_ oss.ntt.co.jp
NTT Open Source Software Center

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Zeugswetter Andreas ADI SD


  The things I wanted to say is that:
  If we can stop at any point, we can make maintenance memory large 
  sufficient to contain all of the dead tuples, then we only need to 
  clean index for once. No matter how many times vacuum 
 stops, indexes 
  are cleaned for once.
 
 I agree that the cycle-at-a-time approach could perform more 
 poorly with repeated stop-start. The reason for the 
 suggestion was robustness, not performance. If you provide 

It performs more poorly, but it also gives immediate gain, since part of
the table is readily vacuumed. If you do it all in one pass with stop
resume, the first visible effect may be several days after you start
vacuuming. And, basically you need to pretend the vacuum transaction is
still running after the first stop. Else dead tuple reuse ala HOT is not
possible (or the ctid list needs to be reevaluated during resume, which
per se is not efficient). 

 the wrong dead-tuple-list to VACUUM, you will destroy the 
 integrity of a table, which can result in silent data loss. 
 You haven't explained how saving the dead-tuple-list could be 
 done in a safe mannner and it seems risky to me.

Agreed. It seems not efficiently possible.

Andreas


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Zeugswetter Andreas ADI SD


  You haven't explained how saving the dead-tuple-list could be done
in 
  a safe mannner and it seems risky to me.
 
 The files are placed in a new directory $PGDATA/pg_vacuum 
 with the name: spcNode.dbNode.relNode for each relations 
 which have been interrupted during vacuum.
 
 It has the format likes:
 
 1. VacStateFileHeader
 2. VacStateData
 3. Dead Tuple list
 4. CRC32
 
 The files are removed
  - when original physical heap files are removed,
  - when vacuum full have been issued,
  - or after the content has been read in memory.
  - etc.
 
 Is there any potential big risk there? Correct me if I am wrong.

The main risc is not a corrupt file or broken list. The risc is, that a
ctid in the list points at a tuple that is not dead anymore. To avoid
that risc you would need to:
1. keep the vacuum lock open
2. leave the vacuum tx open

(or reevaluate visibility of list members upon resume)

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Zeugswetter Andreas ADI SD


But we do don't we? fsync = off, full_page_writes = off?
  
  BTW, our testing seems to indicate that full_page_writes = off is
safe 
  on Solaris 10 on good hardware.  At least, we haven't been able to
break it yet.
  
 
 Is that an OS-dependent parameter? I always assumed it depended
entirely
 on hardware. I have no way to test it for myself though, so I just
leave
 full_page_writes=on to be safe.

It also depends on the FS implementation. The OS/FS must guarantee, that
it does not chunk single data page write calls. Usually that is the
case, when OS/FS pagesize and pg pagesize are identical. And the HW
needs to guarantee atomicity for single calls. e.g. on AIX you need to
reduce the pg page size to 4k to be able to give those guarantees.

Andreas


---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] VACUUM and spoiling the buffer manager cache

2007-02-28 Thread Simon Riggs

Round about v.161 of src/storage/buffer/bufmgr.c, during the development
of 8.0 a change was introduced to prevent VACUUM from changing the state
of the Adaptive Replacement Cache buffer management strategy. At the
time that change made lots of sense. Since then we have changed the
buffer management strategy and this behaviour of VACUUM may no longer
make as much sense as it did then.

VACUUM's current behaviour is to take blocks it has touched and place
them on the head of the freelist, allowing them to be reused. This is a
good strategy with clean blocks, but it is a poor strategy for dirty
blocks. Once a dirty block has been placed on the freelist, the very
next request for a free buffer will need to both write the block to disk
*and* this will typically require a WAL flush to occur also.

The WAL flushing behaviour has been described in detail on this thread:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00674.php
though this proposal has nothing to do with FREEZEing rows.

The effects of this behaviour are that when VACUUM is running alone it
has to make more WAL flushes than it really needs to, so is slightly
slower. That could be improved, but isn't my priority on this post.

When VACUUM operates alongside a concurrent workload the other
non-VACUUM backends become involved in cleaning the VACUUM's dirty
blocks. This slows the non-VACUUM backends down and effectively favours
the VACUUM rather than masking its effects, as we were trying to
achieve. This behaviour noticeably increases normal transaction response
time for extended periods, with noticeable WAL spikes as the WAL drive
repeatedly fsyncs, much more than without the VACUUM workload.

The proposal would be to stop VACUUM from putting its blocks onto the
freelist if they are dirty. This then allows the bgwriter to write the
VACUUM's dirty blocks, which avoids the increased response times due to
WAL flushing. It also incidentally improves a lone VACUUM, since the
bgwriter is able to help write out the dirty blocks. VACUUM pays the
cost to test if they are dirty, but its minor anyway.

The clock cycle buffer management strategy is less prone to cache
spoiling behaviour than was the earlier LRU methods, fixed or adaptive.
A simple solution does effectively smooth out the poor response times
seen while a VACUUM is in progress.

The in-line patch is a one-line change to the buffer manager code, and
is one of a few versions experimented with. The additional line is a
simple test to see whether the VACUUM'd block is dirty before deciding
what to do with it. [A separate patch is available, if requested,
identified as vacstrategy.v2.patch]

Independent verification of test results is requested. 



Index: src/backend/storage/buffer/bufmgr.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.215
diff -c -r1.215 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c 1 Feb 2007 19:10:27 -
1.215
--- src/backend/storage/buffer/bufmgr.c 26 Feb 2007 13:09:35 -
***
*** 907,913 
else
{
/* VACUUM accesses don't bump usage
count, instead... */
!   if (buf-refcount == 0 
buf-usage_count == 0)
immed_free_buffer = true;
}
}
--- 907,914 
else
{
/* VACUUM accesses don't bump usage
count, instead... */
!   if (buf-refcount == 0 
buf-usage_count == 0 
!   !(buf-flags  BM_DIRTY))
immed_free_buffer = true;
}
}


-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Gregory Stark

Jonah H. Harris [EMAIL PROTECTED] writes:

 First, rather than using 16-bytes per page and having to deal with
 handling the non-contiguous space, why not just use a page-level
 checksum like everyone else?  Most of the systems I've seen seem to
 employ a simple CRC16 or CRC32.

I think a CRC would be a useful feature for people who want an extra degree of
protection from faulty hardware.

But we've already seen that CRC checks can be expensive. Not everyone will
want to take the cpu hit. Storing a byte counter in every block is cheap. 

And the idea came from what someone said MSSQL does, so like everyone else
-- which isn't a very compelling argument to begin with -- doesn't argue
against it.

 Second, unless I'm missing something, I don't see how your algorithm
 is going to work as each 512 byte chunk of the block will *always*
 have the same sequential byte value.  That is, unless you have some
 way of preventing wraparound at 255 without adding additional block
 overhead.

I think the way you would work is to have the smgr note the sequential value
it found when it read in a page and then when it writes it out increment that
value by one. Conveniently the pages would be 16 bytes shorter than an 8kb
page so you have 16 bytes available with every buffer to note information like
the last sequential tag the buffer used.

 Lastly, from a performance perspective, it's going to be faster to
 compute the entire page's checksum than it would be to check the
 sequence every 512 bytes and perform the space adjustment.

That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already
suffering a copy due to our use of read/write the difference between
read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be
non-zero but very small. Thousands of times quicker than the CRC.

If we went to direct-io then it would entail an additional memory-copy which
would be annoying. But that would still be much much cheaper than a CRC check.
The best we could do in that case would be to do a CRC check at the same time
as the memory move.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Packed short varlenas, what next?

2007-02-28 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 I've committed this, but in testing with a hack that does ntohl() in the
 VARSIZE macro and vice-versa in SET_VARSIZE, I find that core passes
 regression but several contrib modules do not.  It looks like the
 contrib modules were depending on various random structs being
 compatible with varlena, while not exposing that dependence in ways that
 either of us caught :-(.

I just noticed that last night myself. In particular the GIST modules seems to
be a major problem. they define dozens of new objects, many of which are just
passing around C data structures internally but some of which are objects
which get stored in the database. I have no idea which are which and which
ones are varlenas.

Worse, it uses PG_GETARG_POINTER() and explicitly calls PG_DETOAST_DATUM() in
the few places it assumes finding toasted data is possible. That's even harder
to track down.

I can send up a patch for the data types I fixed last night.

 I'll work on cleaning up the remaining mess tomorrow, but I think that
 we may need to think twice about whether it's OK to expect that every
 datatype with typlen = -1 will be compatible with the New Rules.  I'm
 back to wondering if maybe only types with typalign 'c' should get
 caught up in the changes.


I don't think we can key off typalign='c'. That would entail changing varlenas
to typalign 'c' which would throw off other consumers of the typalign which
expect it to be the alignment of the detoasted datum. Moreover I still align
them when they have the full 4-byte header by using the typalign.

I think we would want to introduce a new column, or maybe a new attlen value,
or a new typalign value.

I was thinking about that though and it's not so simple. It's easy enough not
to convert to short varlena for data types that don't assert that they support
the packed format. That's not a problem. That takes care of data types which
don't call pg_detoast_datum().

But not storing the varlena header in network byte order sometimes would be
quite tricky. There are a great many places that call VARSIZE that don't look
at the attalign or even have it handy.

If we made it a new attlen value we could have two different macros, but that
will be another quite large patch. It would mean hitting all those datatypes
all over again to change every instance of VARSIZE into NEWVARSIZE or
something like that. Plus all the sites in the core that call VARSIZE would
need to check attlen and call the right one.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Simon Riggs

On Wed, 2007-02-28 at 11:19 +0100, Zeugswetter Andreas ADI SD wrote:
   The things I wanted to say is that:
   If we can stop at any point, we can make maintenance memory large 
   sufficient to contain all of the dead tuples, then we only need to 
   clean index for once. No matter how many times vacuum 
  stops, indexes 
   are cleaned for once.
  
  I agree that the cycle-at-a-time approach could perform more 
  poorly with repeated stop-start. The reason for the 
  suggestion was robustness, not performance. If you provide 
 
 It performs more poorly, but it also gives immediate gain, since part of
 the table is readily vacuumed. If you do it all in one pass with stop
 resume, the first visible effect may be several days after you start
 vacuuming. 

I think that in itself is enough to tip the scales.

 And, basically you need to pretend the vacuum transaction is
 still running after the first stop. Else dead tuple reuse ala HOT is not
 possible (or the ctid list needs to be reevaluated during resume, which
 per se is not efficient). 

Ah, I see you got there ahead of me. Yes, it would prevent HOT from
performing retail VACUUMs on heap blocks. (I'm not saying HOT will be
accepted/acceptable, but I'd rather not have its acceptability hinge on
a use case that seems so rare).

One proposal that we do still have in discussion is Heikki's patch to
re-evaluate the OldestXmin while the VACUUM runs. That's something we'd
definitely want to do in a restartable VACUUM anyway. But my thought is
that it actually increases quite dramatically the number of dead rows
harvested during VACUUM (a good thing), which is likely to increase the
number of cycles required to complete a large table (no problem, because
of the increased efficiency of the VACUUM). I think there's a strong
argument to make VACUUM refresh rather than rebuild the FSM after each
cycle rather than wait until the end, whether or not we stop/start the
VACUUM. In any long running VACUUM that seems very worthwhile.

Big VACUUM needs big memory. Using huge settings of maintenance_work_mem
dedicated solely to VACUUM seems like it could be a waste of resources
in many cases. It may be much better to allow 1 GB of memory to be used
to cache indexes better, which would improve performance of other
applications, as well as improving the index scan time during VACUUM. So
scanning indexes more often during VACUUM isn't necessarily bad either,
unless your table is truly huge, in which case you should use
partitioning to reduce it.

Galy, please hear that people like your idea and understand your use
case, but just don't like all of the proposal, just the main thrust of
it. The usual way is that 
(people that agree + amount of your exact idea remaining) = 100%

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Simon Riggs

On Wed, 2007-02-28 at 09:38 +, Heikki Linnakangas wrote:
 Tom Lane wrote:
  Galy Lee [EMAIL PROTECTED] writes:
  If we can stop at any point, we can make maintenance memory large
  sufficient to contain all of the dead tuples, then we only need to
  clean index for once. No matter how many times vacuum stops,
  indexes are cleaned for once.
  
  I beg your pardon?  You're the one who's been harping on the
  table-so-large-it-takes-days-to-vacuum scenario.  How you figure that
  you can store all the dead TIDs in working memory?
 
 This reminds me of an idea I had while looking at the bitmap index 
 patch: We could store the dead TIDs more efficiently in a bitmap, 
 allowing tables to be vacuumed in lesser cycles.
 
 Of course, that's orthogonal to the above discussion.

I like the idea. 

How much memory would it save during VACUUM on a 1 billion row table
with 200 million dead rows? Would that reduce the number of cycles a
normal non-interrupted VACUUM would perform?

Would it work efficiently for all of the current index AMs? Each index
might use the index slightly differently during cleanup, I'm not sure.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] psql problem querying relations

2007-02-28 Thread Zoltan Boszormenyi


Hi,

this is with current CVS code:

# \dt
ERROR:  did not find '}' at end of input node

Server log:

ERROR:  did not find '}' at end of input node
STATEMENT:  SELECT n.nspname as Schema,
 c.relname as Name,
 CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 
'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END as 
Type,

 r.rolname as Owner
   FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_roles r ON r.oid = c.relowner
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
   WHERE c.relkind IN ('r','')
 AND n.nspname NOT IN ('pg_catalog', 'pg_toast')
 AND pg_catalog.pg_table_is_visible(c.oid)
   ORDER BY 1,2;



Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Implicit casts with generic arrays

2007-02-28 Thread Peter Eisentraut

Am Dienstag, 27. Februar 2007 19:50 schrieb Tom Lane:
 Seems basically we'd want to not cast unknown to anyarray unless there
 is some additional bit of context suggesting that that's the right thing.
 But what should that extra requirement be?  Can we go as far as not
 doing this cast implicitly at all?

We could say that unknown is not taken as anyarray input if the entire 
function/operator argument list consists of anyelement or anyarray.  But that 
might be even harder to comprehend.  With the ARRAY[...] syntax available, 
converting unknown to anyarray might be altogether unnecessary.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] psql problem querying relations

2007-02-28 Thread Andrew Dunstan


Zoltan Boszormenyi wrote:

Hi,

this is with current CVS code:

# \dt
ERROR:  did not find '}' at end of input node

Server log:

ERROR:  did not find '}' at end of input node


It's working for me. Have you tried with a fresh checkout or after 
running make clean before you build?


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Major Feature Interactions

2007-02-28 Thread Simon Riggs

There's a number of major features all coming together over the next
weeks, so I wanted to raise some awareness of possible areas of
interaction. My concern is more than just will multiple patches apply
together?, but extends into whether the features interact with each
other with synergy, ignore each other or have negative impact.

The projects that concern me are the following internals projects

Plan Invalidation
Autovacuum changes
Bitmap indexes
GIT indexes
Dead Space Map (DSM)
HOT Updating
Updateable cursors
Restartable VACUUM

The potential interactions as I understand them are these: I'm not
saying this list is either complete or accurate; checking whether it is
or not is exactly the reason for this post.

Anyway, just trying to raise awareness to look for potential issues.

UpdCursors
--
Syntax changes only, no impact on other projects mentioned.

HOT
---
GIT
Used to be some conflicts at patch level, now gone?
Positive impact on GIT performance
PlanInval
heap_check_hotupdate() test can performed in planner, so
will no longer be a heap function
DSM
HOT updates need not be logged, since they will be cleared
by retail VACUUMs
BitmapIdx
Potential issue with chilling tuples during CREATE INDEX
AutoVac
HOT will change how UPDATEs are reported to pg_stat, by
adding additional fields.
UpdCursors
Should tidscan follow chain to correct tuple? That seems to
change current behaviour even if it might be useful.

Plan Invalidation
-
No known changes, other than the integrations already mentioned

Autovacuum changes
--
DSM
Is implementing new forms of VACUUM: VACUUM and VACUUM ALL.
Does reporting of row stats change? 

Bitmap indexes
--
GIT
Do we need both? I think yes, but need to see some info on
when crossover occurs between two index methods.

Restartable VACUUM
--
DSM
Interactions with DSM?
HOT
Must use full cycles only

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 How much memory would it save during VACUUM on a 1 billion row table
 with 200 million dead rows? Would that reduce the number of cycles a
 normal non-interrupted VACUUM would perform?

It would depend on how many dead tuples you have per-page. If you have a very
large table with only one dead tuple per page then it might even be larger.
But in the usual case it would be smaller.

Also note that it would have to be non-lossy.

My only objection to this idea, and it's not really an objection at all, is
that I think we want to head in the direction of making indexes cheaper to
scan and doing the index scan phase more often. That reduces the need for
multiple concurrent vacuums and makes the problem of busy tables getting
starved less of a concern.

That doesn't mean there's any downside to making the dead tuple list take less
memory but I think the upside is limited. As we optimize our index
representations with GII and bitmapped indexes scanning them gets easier and
easier anyways. And you don't really want to wait too long before you get the
benefit of the recovered space in the table.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] psql problem querying relations

2007-02-28 Thread Zoltan Boszormenyi


Andrew Dunstan írta:

Zoltan Boszormenyi wrote:

Hi,

this is with current CVS code:

# \dt
ERROR:  did not find '}' at end of input node

Server log:

ERROR:  did not find '}' at end of input node


It's working for me. Have you tried with a fresh checkout or after 
running make clean before you build?


cheers

andrew


I have to reinitdb my test database then tried again.
It's working regardless of my IDENTITY patch is applied or not.
Thanks.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Compilation errors

2007-02-28 Thread Gregory Stark


Does anyone happen to know what it is about my build environment that causes
these errors? I get a similar slew of errors from each flex lexer we built
including the plpgsql lexer, the ecpg lexer, etc. Do I have the wrong version
of flex installed? Something else? It's quite annoying, especially as the beta
version of Emacs I'm using seems to not be able to parse these lines.

In file included from gram.y:9493:
scan.c:7050: warning: no previous prototype for 'base_yyget_lineno'
scan.c:7059: warning: no previous prototype for 'base_yyget_in'
scan.c:7067: warning: no previous prototype for 'base_yyget_out'
scan.c:7075: warning: no previous prototype for 'base_yyget_leng'
scan.c:7084: warning: no previous prototype for 'base_yyget_text'
scan.c:7093: warning: no previous prototype for 'base_yyset_lineno'
scan.c:7105: warning: no previous prototype for 'base_yyset_in'
scan.c:7110: warning: no previous prototype for 'base_yyset_out'
scan.c:7115: warning: no previous prototype for 'base_yyget_debug'
scan.c:7120: warning: no previous prototype for 'base_yyset_debug'
scan.c:7154: warning: no previous prototype for 'base_yylex_destroy'


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote:

But we've already seen that CRC checks can be expensive. Not everyone will
want to take the cpu hit. Storing a byte counter in every block is cheap.


CRC checking a page is most certainly the simplest.  And, I disagree
that it would be worse than either a sequence counter or the full page
write.  Block checksumming is done at read/write time... which is
something that needs to be improved anyway.  With a properly tuned
bgwriter, the write itself should barely be noticeable.  How fast is a
CRC of 8K?  Last time I checked it was something on the scale of ~95
usec for CRC32 and ~33 usec for sb8.


And the idea came from what someone said MSSQL does, so like everyone else
-- which isn't a very compelling argument to begin with -- doesn't argue
against it.


Rather than basing designs on poor second-hand information, maybe you
and the person who mentioned this idea should get up-to-date and read
the SQL Server storage engine architecture.

As of SQL Server 2005, blocks *are* checksummed with CRC32.  And, just
for the record, previous versions of SQL server performed a bit
flipping technique for every 512 bytes in the page header; it did
*not* waste a byte for every 512 bytes written.


I think the way you would work is to have the smgr note the sequential value
it found when it read in a page and then when it writes it out increment that
value by one. Conveniently the pages would be 16 bytes shorter than an 8kb
page so you have 16 bytes available with every buffer to note information like
the last sequential tag the buffer used.


This proposed design is overcomplicated and a waste of space.  I mean,
we reduce storage overhead using phantom command id and variable
varlena, but let's just fill it up again with unnecessary junk bytes.


That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already
suffering a copy due to our use of read/write the difference between
read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be
non-zero but very small. Thousands of times quicker than the CRC.


Prove it.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Compilation errors

2007-02-28 Thread Andrew Dunstan


Gregory Stark wrote:

Does anyone happen to know what it is about my build environment that causes
these errors? I get a similar slew of errors from each flex lexer we built
including the plpgsql lexer, the ecpg lexer, etc. Do I have the wrong version
of flex installed? Something else? It's quite annoying, especially as the beta
version of Emacs I'm using seems to not be able to parse these lines.

In file included from gram.y:9493:
scan.c:7050: warning: no previous prototype for 'base_yyget_lineno'

  

[snip]

What versions of flex and bison are you using? It's a bit hard to 
diagnose without knowing that ;-)


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Gregory Stark


Jonah H. Harris [EMAIL PROTECTED] writes:

 This proposed design is overcomplicated and a waste of space.  I mean,
 we reduce storage overhead using phantom command id and variable
 varlena, but let's just fill it up again with unnecessary junk bytes.

We reduced storage overhead using phantom command id by 8 bytes *per tuple*. I
hardly think 8 bytes per page is much of a concern. You're already losing an
average of 1/2 a tuple per page to rounding and that's a minimum of 16 bytes
for the narrowest of tuples.

 That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already
 suffering a copy due to our use of read/write the difference between
 read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be
 non-zero but very small. Thousands of times quicker than the CRC.

 Prove it.

We've already seen wal CRC checking show up at the top of profiles.

Do you really doubt that memcpy is faster than CRC32 checking? Especially when
you're already doing memcpy anyways and the only overhead is the few unaligned
bytes at the end and the 8 one-byte copies?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Compilation errors

2007-02-28 Thread Peter Eisentraut

Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark:
 Does anyone happen to know what it is about my build environment that
 causes these errors?

Nothing.  Everybody gets them.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Stefan Kaltenbrunner


Gregory Stark wrote:

Jonah H. Harris [EMAIL PROTECTED] writes:


This proposed design is overcomplicated and a waste of space.  I mean,
we reduce storage overhead using phantom command id and variable
varlena, but let's just fill it up again with unnecessary junk bytes.


We reduced storage overhead using phantom command id by 8 bytes *per tuple*. I
hardly think 8 bytes per page is much of a concern. You're already losing an
average of 1/2 a tuple per page to rounding and that's a minimum of 16 bytes
for the narrowest of tuples.


That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already
suffering a copy due to our use of read/write the difference between
read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be
non-zero but very small. Thousands of times quicker than the CRC.

Prove it.


We've already seen wal CRC checking show up at the top of profiles.


yeah - on fast boxes (diskio wise) wal-crc checking is nearly always on 
the very top of wal-intensive workloads.



Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Compilation errors

2007-02-28 Thread Andrew Dunstan


Peter Eisentraut wrote:

Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark:
  

Does anyone happen to know what it is about my build environment that
causes these errors?



Nothing.  Everybody gets them.

  


That's not what the buildfarm shows.

example: 
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=spongedt=2007-02-28%20113002stg=make


cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] VACUUM and spoiling the buffer manager cache

2007-02-28 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 VACUUM's current behaviour is to take blocks it has touched and place
 them on the head of the freelist, allowing them to be reused.

No, it puts them at the tail of the freelist.  So I am unconvinced by
the rest of your argument.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [PATCHES]

2007-02-28 Thread korryd

  Not that I think that anyone owning both a law degree and a computer
  in 2007 should legitimately be able to plead innocence here.  FAST
  Australia's lawyers are making themselves look like idiots, and the
  same for every other company tacking on such notices.  I think the
  real bottom line here is we don't accept patches from idiots.
 
 I think we don't accept patches from idiots is a bit harsh.


I agree, after all, you've accepted some of my patches and... oh,
wait...


-- Korry

Re: [HACKERS] Packed short varlenas, what next?

2007-02-28 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 I just noticed that last night myself. In particular the GIST modules seems to
 be a major problem. they define dozens of new objects, many of which are just
 passing around C data structures internally but some of which are objects
 which get stored in the database. I have no idea which are which and which
 ones are varlenas.

FWIW, when I went to bed last night I had hstore and intarray working,
but was still fooling with ltree.  Didn't get to the others yet.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote:

We reduced storage overhead using phantom command id by 8 bytes *per tuple*. I
hardly think 8 bytes per page is much of a concern. You're already losing an
average of 1/2 a tuple per page to rounding and that's a minimum of 16 bytes
for the narrowest of tuples.


Again, it goes back to competent design.  Save space here, waste it
there.  SQL Server's bit-flipping technique is still *much* better
than wasting 1 byte for every 512.


We've already seen wal CRC checking show up at the top of profiles.


Well, when you consider we're performing a CRC for every log record
rather than at the block level, like most other systems, I wouldn't be
that surprised.  Don't try and use that example as a reason to kill
the checksum; it's a completely different use case.


Do you really doubt that memcpy is faster than CRC32 checking? Especially when
you're already doing memcpy anyways and the only overhead is the few unaligned
bytes at the end and the 8 one-byte copies?


I'm saying the complexity and implementation of it is going to get you
a bit more than you think.  If it didn't, you'd already have the thing
coded and would be proving me wrong with the patch rather than theory.
I can code up a checksum version in an hour or less if you want to
give it a go.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


Hello hackers,

I was starting to think about next SOC and the project for it. And for a 
long time I wanted to implement the user quotas in PG.
So, I'll try to explain my understanding of the implementation, and I'll 
be happy to hear any comments, objections, or pointings to my 
misunderstanding. This is very first very rough idea, but I still would 
like to hear whether it contains some obvious flaws...


1) The main idea is to implement the per-user quota (not per tablespace 
for example). So, during the creation of the new user some quota can be 
specified, and after that the size of all the relations *owned* by that 
user should be limited by that number.


2) I looked into the code, and from my understanding, the main part of the 
code which should be affected by the quotas is storage/smgr/md.c. If I 
understand correctly, only functions like mdcreate  mdextend really 
change the size of the user relations (I don't consider things like WAL, 
and I don't think it should be subject for quota). And it seems to me, 
that the machinery of smgr/md is moreless enough to control the space 
occupied by the relations (within some 1 block size precision).


3) How the quota should be controlled: I think, that generally, for all 
the users which have quotas, the shared memory should contain the number 
of blocks left from the quota. And each backend extending or truncating 
the relations owned by the user should appropriately change that number of 
blocks left in the shared memory. As soon as this number is equal to 
zero, all the mdcreate, mdextend functions shouldn't do anything but 
return the error. I don't know, but I hope these functions won't be 
invoked  if the user will do DELETE and/or VACUUM to recover the space ?
Also, I'm not completely sure that refusing the call of the mdextend 
function in the case of quota excess won't lead to any corruption ? (in 
the case of Btree splits for example ).


Any comments ?
Thank you.

Regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] VACUUM and spoiling the buffer manager cache

2007-02-28 Thread Simon Riggs

On Wed, 2007-02-28 at 11:09 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  VACUUM's current behaviour is to take blocks it has touched and place
  them on the head of the freelist, allowing them to be reused.
 
 No, it puts them at the tail of the freelist.

That's a minor point because the freelist is mostly empty, so head ==
tail in 99.9% of cases.

The freelist does start full, but we only *put* things on the freelist
when we call InvalidateBuffer() after a DROP TABLE etc, or when we
finish with a VACUUM buffer. So after the first few minutes of server
operation we hardly ever use the freelist and so will be empty when we
start dropping VACUUM'd buffers on it, head or tail.

The negative effect on response times and overall performance is clearly
noticeable in tests. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Compilation errors

2007-02-28 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark:
 Does anyone happen to know what it is about my build environment that
 causes these errors?

 Nothing.  Everybody gets them.

I don't.  What version of flex are you guys using?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Joshua D. Drake


 Do you really doubt that memcpy is faster than CRC32 checking?
 Especially when
 you're already doing memcpy anyways and the only overhead is the few
 unaligned
 bytes at the end and the 8 one-byte copies?
 
 I'm saying the complexity and implementation of it is going to get you
 a bit more than you think.  If it didn't, you'd already have the thing
 coded and would be proving me wrong with the patch rather than theory.
 I can code up a checksum version in an hour or less if you want to
 give it a go.

LIVE FROM THE WWE, CAGE MATCH!

Jonah (the Theorist) Harris versus Greg (the Brain) Stark.

What is going to happen between these two brothers in arms when they
must both prove their theory!

Sincerely,

Joshua D. Drake




-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Tom Lane

Sergey E. Koposov [EMAIL PROTECTED] writes:
 1) The main idea is to implement the per-user quota (not per tablespace 
 for example). So, during the creation of the new user some quota can be 
 specified, and after that the size of all the relations *owned* by that 
 user should be limited by that number.

This seems impractical as stated; there is no way to determine what a
user owns in some other database.  Possibly you could do it if the quota
were both per-user and per-database.

 2) I looked into the code, and from my understanding, the main part of the 
 code which should be affected by the quotas is storage/smgr/md.c.

md.c is too low level to do catalog accesses and thus too low level to
know who owns what.

 3) How the quota should be controlled: I think, that generally, for all 
 the users which have quotas, the shared memory should contain the number 
 of blocks left from the quota. And each backend extending or truncating 
 the relations owned by the user should appropriately change that number of 
 blocks left in the shared memory.

What will you do with ALTER TABLE OWNER?  What if such a command is
rolled back?  (Likewise for some other commands such as TRUNCATE, or
even just DROP TABLE.)  What if there are too many users to fit in your
(necessarily fixed size) shared memory area?  What sort of contention
will there be for access to this area?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Joshua D. Drake

Sergey E. Koposov wrote:
 Hello hackers,
 
 I was starting to think about next SOC and the project for it. And for a
 long time I wanted to implement the user quotas in PG.
 So, I'll try to explain my understanding of the implementation, and I'll
 be happy to hear any comments, objections, or pointings to my
 misunderstanding. This is very first very rough idea, but I still would
 like to hear whether it contains some obvious flaws...
 
 1) The main idea is to implement the per-user quota (not per tablespace
 for example). So, during the creation of the new user some quota can be
 specified, and after that the size of all the relations *owned* by that
 user should be limited by that number.

I could see this being useful per database, maybe. It seems like kind of
an odd feature.

Sincerely,

Joshua D. Drake


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


On Wed, 28 Feb 2007, Tom Lane wrote:


Sergey E. Koposov [EMAIL PROTECTED] writes:

1) The main idea is to implement the per-user quota (not per tablespace
for example). So, during the creation of the new user some quota can be
specified, and after that the size of all the relations *owned* by that
user should be limited by that number.


This seems impractical as stated; there is no way to determine what a
user owns in some other database.  Possibly you could do it if the quota
were both per-user and per-database.


yes, agreed. I didn't think of that.


3) How the quota should be controlled: I think, that generally, for all
the users which have quotas, the shared memory should contain the number
of blocks left from the quota. And each backend extending or truncating
the relations owned by the user should appropriately change that number of
blocks left in the shared memory.


What will you do with ALTER TABLE OWNER?  What if such a command is
rolled back?


I don't know, but I guess the ALTER OWNER  should be considered 
differently. It probably should proceed only if it sees that there are 
enough place to perform the whole operation. If there are, then it should 
block any writing to the tables of the user, perform the alter owner and 
unblock everything again.



(Likewise for some other commands such as TRUNCATE, or
even just DROP TABLE.)


I didn't think of yet, but I will.


What if there are too many users to fit in your
(necessarily fixed size) shared memory area?


We really don't need to create the array for all users. We only need to 
create that array for users 1) having quotas 2) the users,

whose tables are accessed at the moment
So I don't think that in that case the amount of required space is a 
problem here.



What sort of contention
will there be for access to this area?


I think, that the only requirement is that the incrementation or
decrementation of number of blocks left for each user 
should be atomic operation.


regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Bruce Momjian


I think we need to think about when these CRCs would be read and
written.  It would be written when it hits the disk, hopefully by the
background writer, and I think after a server crash, all pages would
have to be read and checked.  The good news is that both of these are
non-critical paths.

---

Jonah H. Harris wrote:
 On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote:
  But we've already seen that CRC checks can be expensive. Not everyone will
  want to take the cpu hit. Storing a byte counter in every block is cheap.
 
 CRC checking a page is most certainly the simplest.  And, I disagree
 that it would be worse than either a sequence counter or the full page
 write.  Block checksumming is done at read/write time... which is
 something that needs to be improved anyway.  With a properly tuned
 bgwriter, the write itself should barely be noticeable.  How fast is a
 CRC of 8K?  Last time I checked it was something on the scale of ~95
 usec for CRC32 and ~33 usec for sb8.
 
  And the idea came from what someone said MSSQL does, so like everyone else
  -- which isn't a very compelling argument to begin with -- doesn't argue
  against it.
 
 Rather than basing designs on poor second-hand information, maybe you
 and the person who mentioned this idea should get up-to-date and read
 the SQL Server storage engine architecture.
 
 As of SQL Server 2005, blocks *are* checksummed with CRC32.  And, just
 for the record, previous versions of SQL server performed a bit
 flipping technique for every 512 bytes in the page header; it did
 *not* waste a byte for every 512 bytes written.
 
  I think the way you would work is to have the smgr note the sequential value
  it found when it read in a page and then when it writes it out increment 
  that
  value by one. Conveniently the pages would be 16 bytes shorter than an 8kb
  page so you have 16 bytes available with every buffer to note information 
  like
  the last sequential tag the buffer used.
 
 This proposed design is overcomplicated and a waste of space.  I mean,
 we reduce storage overhead using phantom command id and variable
 varlena, but let's just fill it up again with unnecessary junk bytes.
 
  That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already
  suffering a copy due to our use of read/write the difference between
  read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be
  non-zero but very small. Thousands of times quicker than the CRC.
 
 Prove it.
 
 -- 
 Jonah H. Harris, Software Architect | phone: 732.331.1324
 EnterpriseDB Corporation| fax: 732.331.1301
 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
 Iselin, New Jersey 08830| http://www.enterprisedb.com/
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Gregory Stark

Gregory Stark [EMAIL PROTECTED] writes:

 We've already seen wal CRC checking show up at the top of profiles.

 Do you really doubt that memcpy is faster than CRC32 checking? Especially when
 you're already doing memcpy anyways and the only overhead is the few unaligned
 bytes at the end and the 8 one-byte copies?

Well color me surprised, writev is not nearly so much faster than CRC as I had
expected:

lseek+write syscall overhead:   7.95  us
CRC32   32.54 us
writev  26.56 us

The reason there's lseek overhead in there is because I had it seek back to
the same block repeatedly to (hopefully) avoid any i/o. It seems to have
worked as I find it hard to believe these numbers could be so low if there's
any i/o being included.

I think part of the reason writev is slow is because I'm including the time it
took to set up the iovec array. That's 64 word copies write there. And then
writev has to read those 64 words back and do 64 extra branches and so on...

This is on an Intel T2500 (2Ghz).

There is a side issue that tagging each sector is 100% guaranteed to detect
torn pages whereas checksums still have a chance of missing them. But usually
the scenario where that comes into play is where you have many checksum
failures and are ignoring them assuming they never fail. In the case of torn
pages there'll only be one torn page and we're going to scream bloody murder
if we see it so I don't think that's a big issue.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


On Wed, 28 Feb 2007, Joshua D. Drake wrote:


I could see this being useful per database, maybe. It seems like kind of
an odd feature.


Per user AND per database (as Tom noted).  But I dont see what's odd in 
it... It exists in Oracle, and I need quotas in the project on which I'm 
working. And I remember user requests for quotas in the mailing lists ...


regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Gregory Stark

Joshua D. Drake [EMAIL PROTECTED] writes:

 LIVE FROM THE WWE, CAGE MATCH!

 Jonah (the Theorist) Harris versus Greg (the Brain) Stark.

 What is going to happen between these two brothers in arms when they
 must both prove their theory!

Darn, I wish I had seen this post before I posted the results of my testing.

So what happens to the loser in WWE cage matches?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote:

LIVE FROM THE WWE, CAGE MATCH!

Jonah (the Theorist) Harris versus Greg (the Brain) Stark.

What is going to happen between these two brothers in arms when they
must both prove their theory!


Heh, I like it :)

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES]

2007-02-28 Thread Bruce Momjian


I have added this to the developer's FAQ to clarify the situtation of
posting a patch:

liPostgreSQL is licensed under a BSD license.  By posting a patch
to the public PostgreSQL mailling lists, you are giving the PostgreSQL
Global Development Group the non-revokable right to distribute your
patch under the BSD license.  If you use code that is available under
some other license that is BSD compatible (eg. public domain), please
note that in your email submission./li


---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Neil Conway wrote:
  For the case in question, sure, requiring some clarification from FJ
  would be reasonable. But more broadly, my point is that I think you're
  fooling yourself if you think that requiring a disclaimer or explicit
  transfer of copyright for this *one* particular patch is likely to make
  any material difference to the overall copyright status of the code
  base.
 
  Yes, I do.  If there is an explicit claim, like an email footer or a
  copyright in the code, we do try to nail that down.
 
 AFAICT, the footer in question tries to make it illegal for us even to
 have the message in our mail archives.  If I were running the PG lists,
 I would install filters that automatically reject mails containing such
 notices, with a message like Your corporate lawyers do not deserve to
 have access to the internet.  Go away until you've acquired a clue.
 
 I fully support Bruce's demand that patches be submitted with no such
 idiocy attached.
 
   regards, tom lane

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 I think we need to think about when these CRCs would be read and
 written.  It would be written when it hits the disk, hopefully by the
 background writer, and I think after a server crash, all pages would
 have to be read and checked.  The good news is that both of these are
 non-critical paths.

If you're protecting against torn pages then yes, if the system is shut down
uncleanly by a system crash or power failure you would in theory have to scan
every page of every table and index before starting up.

But if the system was shut down uncleanly as the result of a Postgres crash or
fast shutdown of Postgres then that isn't an issue. And many users may prefer
to bring the system up as soon as possible as long as they know any corrupt
pages will be spotted and throw errors as soon as it's seen.

So I think you need a mode that only checks checksums when a page is read from
disk. That would protect against torn pages (but not necessarily before
bringing up the system) and against bad i/o hardware.

Unfortunately memory errors are far more common than disk errors and I it
would be much harder to protect against them. You can't check it when someone
may be writing to the buffer, which limits you to checking it only when you
acquire some form of lock on the buffer. It also means you would have to write
it before you release a lock if you've made any changes.

Worse, I'm not sure how to handle hint bits though. We currently don't require
any lock at all to set hint bits which means someone may think they can check
a checksum while or after you've fiddled some bits.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Joshua D. Drake

Gregory Stark wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:
 
 LIVE FROM THE WWE, CAGE MATCH!

 Jonah (the Theorist) Harris versus Greg (the Brain) Stark.

 What is going to happen between these two brothers in arms when they
 must both prove their theory!
 
 Darn, I wish I had seen this post before I posted the results of my testing.
 
 So what happens to the loser in WWE cage matches?

Usually the are left bloody and staring at the winner through the cage.
Occasionally they are carried out on a stretcher.

The good news is, the loser gets a rematch at the next pay per view.

Sincerely,

Joshua D. Drake


 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Compilation errors

2007-02-28 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark:
 Does anyone happen to know what it is about my build environment that
 causes these errors?

 Nothing.  Everybody gets them.

 I don't.  What version of flex are you guys using?

flex 2.5.33

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Joshua D. Drake

Sergey E. Koposov wrote:
 On Wed, 28 Feb 2007, Joshua D. Drake wrote:
 
 I could see this being useful per database, maybe. It seems like kind of
 an odd feature.
 
 Per user AND per database (as Tom noted).  But I dont see what's odd in
 it... It exists in Oracle, and I need quotas in the project on which I'm
 working. And I remember user requests for quotas in the mailing lists ...

Well Oracle isn't really our goal is it? I am not questioning that you
are well intended but I just don't see a use case.

For example, what happens if I hit my quota?

Joshua D. Drake

 
 regards,
 Sergey
 
 ***
 Sergey E. Koposov
 Max Planck Institute for Astronomy/Cambridge Institute for
 Astronomy/Sternberg Astronomical Institute
 Tel: +49-6221-528-349
 Web: http://lnfm1.sai.msu.ru/~math
 E-mail: [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES]

2007-02-28 Thread Joshua D. Drake

Bruce Momjian wrote:
 I have added this to the developer's FAQ to clarify the situtation of
 posting a patch:
 
 liPostgreSQL is licensed under a BSD license.  By posting a patch
 to the public PostgreSQL mailling lists, you are giving the PostgreSQL
 Global Development Group the non-revokable right to distribute your
 patch under the BSD license.  If you use code that is available under
 some other license that is BSD compatible (eg. public domain), please
 note that in your email submission./li


We should add this to the mailing list signup pages and the welcome
pages to the lists.

Joshua D. Drake


 
 
 ---
 
 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 Neil Conway wrote:
 For the case in question, sure, requiring some clarification from FJ
 would be reasonable. But more broadly, my point is that I think you're
 fooling yourself if you think that requiring a disclaimer or explicit
 transfer of copyright for this *one* particular patch is likely to make
 any material difference to the overall copyright status of the code
 base.
 Yes, I do.  If there is an explicit claim, like an email footer or a
 copyright in the code, we do try to nail that down.
 AFAICT, the footer in question tries to make it illegal for us even to
 have the message in our mail archives.  If I were running the PG lists,
 I would install filters that automatically reject mails containing such
 notices, with a message like Your corporate lawyers do not deserve to
 have access to the internet.  Go away until you've acquired a clue.

 I fully support Bruce's demand that patches be submitted with no such
 idiocy attached.

  regards, tom lane
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Bruce Momjian

Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  I think we need to think about when these CRCs would be read and
  written.  It would be written when it hits the disk, hopefully by the
  background writer, and I think after a server crash, all pages would
  have to be read and checked.  The good news is that both of these are
  non-critical paths.
 
 If you're protecting against torn pages then yes, if the system is shut down
 uncleanly by a system crash or power failure you would in theory have to scan
 every page of every table and index before starting up.
 
 But if the system was shut down uncleanly as the result of a Postgres crash or
 fast shutdown of Postgres then that isn't an issue. And many users may prefer
 to bring the system up as soon as possible as long as they know any corrupt
 pages will be spotted and throw errors as soon as it's seen.

I don't think we should start up a system and only detect the errors
later.

 So I think you need a mode that only checks checksums when a page is read from
 disk. That would protect against torn pages (but not necessarily before
 bringing up the system) and against bad i/o hardware.
 
 Unfortunately memory errors are far more common than disk errors and I it
 would be much harder to protect against them. You can't check it when someone
 may be writing to the buffer, which limits you to checking it only when you
 acquire some form of lock on the buffer. It also means you would have to write
 it before you release a lock if you've made any changes.
 
 Worse, I'm not sure how to handle hint bits though. We currently don't require
 any lock at all to set hint bits which means someone may think they can check
 a checksum while or after you've fiddled some bits.

Yep, a problem.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES]

2007-02-28 Thread Bruce Momjian

Joshua D. Drake wrote:
 Bruce Momjian wrote:
  I have added this to the developer's FAQ to clarify the situtation of
  posting a patch:
  
  liPostgreSQL is licensed under a BSD license.  By posting a patch
  to the public PostgreSQL mailling lists, you are giving the PostgreSQL
  Global Development Group the non-revokable right to distribute your
  patch under the BSD license.  If you use code that is available under
  some other license that is BSD compatible (eg. public domain), please
  note that in your email submission./li
 
 
 We should add this to the mailing list signup pages and the welcome
 pages to the lists.

Yep, good idea.  Marc?

---


 
 Joshua D. Drake
 
 
  
  
  ---
  
  Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  Neil Conway wrote:
  For the case in question, sure, requiring some clarification from FJ
  would be reasonable. But more broadly, my point is that I think you're
  fooling yourself if you think that requiring a disclaimer or explicit
  transfer of copyright for this *one* particular patch is likely to make
  any material difference to the overall copyright status of the code
  base.
  Yes, I do.  If there is an explicit claim, like an email footer or a
  copyright in the code, we do try to nail that down.
  AFAICT, the footer in question tries to make it illegal for us even to
  have the message in our mail archives.  If I were running the PG lists,
  I would install filters that automatically reject mails containing such
  notices, with a message like Your corporate lawyers do not deserve to
  have access to the internet.  Go away until you've acquired a clue.
 
  I fully support Bruce's demand that patches be submitted with no such
  idiocy attached.
 
 regards, tom lane
  
 
 
 -- 
 
   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
  http://www.commandprompt.com/
 
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
 PostgreSQL Replication: http://www.commandprompt.com/products/

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


On Wed, 28 Feb 2007, Joshua D. Drake wrote:


Sergey E. Koposov wrote:

On Wed, 28 Feb 2007, Joshua D. Drake wrote:
Per user AND per database (as Tom noted).  But I dont see what's odd in
it... It exists in Oracle, and I need quotas in the project on which I'm
working. And I remember user requests for quotas in the mailing lists ...


Well Oracle isn't really our goal is it? I am not questioning that you
are well intended but I just don't see a use case.

For example, what happens if I hit my quota?


Then you cannot run any queries that extend the size of your relations 
(for example INSERT, UPDATE etc.). Unless you drop your tables or DELETE 
something


The use case for that is the situation when you provide the access to 
different people to do something on the DB. The real world example (in 
which I'm interested) is when the large science project produce a huge 
amount of data, store it in large database, and let different scientists 
work on that data, having their little accounts there. (example 
http://casjobs.sdss.org/CasJobs/Guide.aspx ). That's the way how most of 
large astronomical projects start to work now.


Regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Joshua D. Drake


 Then you cannot run any queries that extend the size of your relations
 (for example INSERT, UPDATE etc.). Unless you drop your tables or DELETE
 something

Interesting. Well my two cents is don't go any deeper than database.
I.e; don't try and track to the individual relation.

Joshua D. Drake


 
 The use case for that is the situation when you provide the access to
 different people to do something on the DB. The real world example (in
 which I'm interested) is when the large science project produce a huge
 amount of data, store it in large database, and let different scientists
 work on that data, having their little accounts there. (example
 http://casjobs.sdss.org/CasJobs/Guide.aspx ). That's the way how most of
 large astronomical projects start to work now.
 
 Regards,
 Sergey
 
 ***
 Sergey E. Koposov
 Max Planck Institute for Astronomy/Cambridge Institute for
 Astronomy/Sternberg Astronomical Institute
 Tel: +49-6221-528-349
 Web: http://lnfm1.sai.msu.ru/~math
 E-mail: [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Tom Lane

Sergey E. Koposov [EMAIL PROTECTED] writes:
 Per user AND per database (as Tom noted).  But I dont see what's odd in 
 it... It exists in Oracle, and I need quotas in the project on which I'm 
 working. And I remember user requests for quotas in the mailing lists ...

It hasn't ever made it onto the TODO list, which means there's not a
consensus that we need it.  If it were a simple, small, low-impact patch
then you probably wouldn't need to do much convincing that it's an
important feature to have, but I'm afraid the patch will be none of
those things.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Joshua D. Drake

Tom Lane wrote:
 Sergey E. Koposov [EMAIL PROTECTED] writes:
 Per user AND per database (as Tom noted).  But I dont see what's odd in 
 it... It exists in Oracle, and I need quotas in the project on which I'm 
 working. And I remember user requests for quotas in the mailing lists ...
 
 It hasn't ever made it onto the TODO list, which means there's not a
 consensus that we need it.  If it were a simple, small, low-impact patch
 then you probably wouldn't need to do much convincing that it's an
 important feature to have, but I'm afraid the patch will be none of
 those things.

Tom what about at just the DB level?

E.g; if user foo then pg_database_size may not be  than X?

I guess the big question would be when do we check though? At each
transaction seems like it would add significant overhead, especially if
we had to rollback the transaction because it was going to go over their
quota.

Egad.

Joshua D. Drake



 
   regards, tom lane
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES]

2007-02-28 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Joshua D. Drake wrote:
 We should add this to the mailing list signup pages and the welcome
 pages to the lists.

 Yep, good idea.  Marc?

For -patches and -hackers, I agree.  It seems a bit legalistic and
off-putting for the general lists, though.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


On Wed, 28 Feb 2007, Tom Lane wrote:

2) I looked into the code, and from my understanding, the main part of the
code which should be affected by the quotas is storage/smgr/md.c.


md.c is too low level to do catalog accesses and thus too low level to
know who owns what.


That's probably a dumb question(I dont know the PG infrastructrure that 
well), but Is it possible to put the information about the owner into 
SMgrRelation/Relation structures? As I see the smgrextend() in smgr.c get 
the SMgrRelation agrument...


regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Compilation errors

2007-02-28 Thread Andrew Dunstan


Gregory Stark wrote:

Tom Lane [EMAIL PROTECTED] writes:

  

Peter Eisentraut [EMAIL PROTECTED] writes:


Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark:
  

Does anyone happen to know what it is about my build environment that
causes these errors?


Nothing.  Everybody gets them.
  

I don't.  What version of flex are you guys using?



flex 2.5.33

  


Aha! Known to be broken, iirc. Use flex 2.5.4a

cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-02-28 Thread Joshua D. Drake

Darcy Buskermolen wrote:
 I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector 
 on an 8.2.3 box  investigation has lead me to belive that the stats file is 
 written a lot more often that once every 500ms  the following shows this 
 behavior.
 
 PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
 20060404 (Red Hat 3.4.6-3)
 
 I ran a 
 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done
 
 which took
 real1m23.288s
 user0m24.142s
 sys 0m21.536s
 
 
 to execute, during which time I ran a strace on the stats collector which 
 produces the following output.  From this it looks like the stats file is 
 getting rewritten for each connection teardown, not just every 500ms.
 
 Process 10061 attached - interrupt to quit
 Process 10061 detached
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  68.14   28.811963  17   1663827   write
  18.227.701885 123 62808 12793 poll
  11.314.783082 365 13101   rename
   0.580.246169   5 50006   recvfrom
   0.570.241073  18 13101   open
   0.430.182816  14 13101   munmap
   0.180.076176   6 13101   mmap
   0.170.072746   6 13101   close
   0.140.060483   5 13101   setitimer
   0.100.041344   3 13101 12793 rt_sigreturn
   0.090.039240   3 13101   fstat
   0.060.024041   2 13110   getppid
 -- --- --- - - 
 100.00   42.281018   1894559 25586 total
 
 As you can see rename was called more than the theroitcal 167  times for 
 500ms 
 slices that elapsed during the test
 
 Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
 gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
 [All be it this is slower hardware..]
 
 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done
 
 which took
 real9m25.380s
 user6m51.254s
 sys 1m47.687s
 (and therefor should be about 1130 stat write cycles)
 
 and yielded the following strace
 
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  93.64   20.422006 334 61212   select
   3.490.760963   7110192   read
   1.820.396654  19 21128   write
   0.640.139679 126  1112   rename
   0.270.057970  52  1112   open
   0.060.012177  11  1112   munmap
   0.040.008901   8  1112   mmap
   0.030.006402   6  1112   close
   0.020.004282   4  1112   fstat
 -- --- --- - - 
 100.00   21.809034199204   total
 
 
 
 During this run the stats collector does not even show and CPU usage 
 according 
 to top.
 
 
 both 8.1 and 8.2 have the following postgresql.conf parameters
 
 stats_command_string =  off
 stats_start_collector = on
 stats_block_level = on
 stats_row_level = on
 
 
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Martijn van Oosterhout

On Wed, Feb 28, 2007 at 09:58:52AM -0800, Joshua D. Drake wrote:
 E.g; if user foo then pg_database_size may not be  than X?
 
 I guess the big question would be when do we check though? At each
 transaction seems like it would add significant overhead, especially if
 we had to rollback the transaction because it was going to go over their
 quota.

Generally, rolling back a transaction doesn't reduce the amount of disk
used. Only VACUUM FULL actually shrinks relations.

Seem to me if the RelationOpen stores a pointer to a counter that gets
incremented on mdextend, it should work reasonably well. Extending
doesn't happen that often relative to other database activity.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Joshua D. Drake

Martijn van Oosterhout wrote:
 On Wed, Feb 28, 2007 at 09:58:52AM -0800, Joshua D. Drake wrote:
 E.g; if user foo then pg_database_size may not be  than X?

 I guess the big question would be when do we check though? At each
 transaction seems like it would add significant overhead, especially if
 we had to rollback the transaction because it was going to go over their
 quota.
 
 Generally, rolling back a transaction doesn't reduce the amount of disk
 used. Only VACUUM FULL actually shrinks relations.

Right, but what I mean was -- if we rollback because we hit quota we
could potentially cause even more maintenance to have to happen (vacuum).

J


 
 Seem to me if the RelationOpen stores a pointer to a counter that gets
 incremented on mdextend, it should work reasonably well. Extending
 doesn't happen that often relative to other database activity.
 
 Have a nice day,


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:

 But if the system was shut down uncleanly as the result of a Postgres crash or
 fast shutdown of Postgres then that isn't an issue. And many users may prefer
 to bring the system up as soon as possible as long as they know any corrupt
 pages will be spotted and throw errors as soon as it's seen.

I don't think we should start up a system and only detect the errors
later.


Which is, of course, how everyone else does it.  On block access, the
checksum is verified (if you've turned checksum checking on).  I
*really* doubt you want to pull in every page in the database at
startup time to verify the checksum or sequence.  Even pages from the
last checkpoint would be a killer.

All of the databases (Oracle, SQL Server, DB2) have a way to perform a
database corruption check which does go out and verify all checksums.

If consistency is stored at the block-level, which is pretty much the
only way to avoid full page writes, you have to accept some level of
possible corruption.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Bruce Momjian

Jonah H. Harris wrote:
 On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:
   But if the system was shut down uncleanly as the result of a Postgres 
   crash or
   fast shutdown of Postgres then that isn't an issue. And many users may 
   prefer
   to bring the system up as soon as possible as long as they know any 
   corrupt
   pages will be spotted and throw errors as soon as it's seen.
 
  I don't think we should start up a system and only detect the errors
  later.
 
 Which is, of course, how everyone else does it.  On block access, the
 checksum is verified (if you've turned checksum checking on).  I
 *really* doubt you want to pull in every page in the database at
 startup time to verify the checksum or sequence.  Even pages from the
 last checkpoint would be a killer.
 
 All of the databases (Oracle, SQL Server, DB2) have a way to perform a
 database corruption check which does go out and verify all checksums.
 
 If consistency is stored at the block-level, which is pretty much the
 only way to avoid full page writes, you have to accept some level of
 possible corruption.

Am am not comfortable starting and having something fail later.  How
other databases do it is not an issue for me.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Oleg Bartunov


On Wed, 28 Feb 2007, Joshua D. Drake wrote:


Tom Lane wrote:

Sergey E. Koposov [EMAIL PROTECTED] writes:

Per user AND per database (as Tom noted).  But I dont see what's odd in
it... It exists in Oracle, and I need quotas in the project on which I'm
working. And I remember user requests for quotas in the mailing lists ...


It hasn't ever made it onto the TODO list, which means there's not a
consensus that we need it.  If it were a simple, small, low-impact patch
then you probably wouldn't need to do much convincing that it's an
important feature to have, but I'm afraid the patch will be none of
those things.


We need this kind of feature in our scientific project I and Sergey are
working on. We provide access to big pool of astronomical catalogs and
ability to match users data with these huge catalogs and we want to
be able to provide sort of QoS.

Pg became very popular in Russia, especially after the biggest
accounting and enteprize management software developer 1C 
(about 800,000 installations) has been supporting Pg and I expect

a large interest to Pg this year, especially from the application providers,
shared environment.

btw, this  should be announced in -advocacy, I and Teodor 
worked on Pg port, some patches we have submitted was grown from that work.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Bruce Momjian

Jonah H. Harris wrote:
 On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:
  Am am not comfortable starting and having something fail later.
 
 Then do you have some other idea for protecting pages from being torn
 without storing an entire backup copy or performing a block-level
 consistency check?

We have several methods suggested to check the blocks, like CRC.  My
point was that, whatever check method we use, we should be prepared to
check on startup, or at least make it the default for a crash restart.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:

We have several methods suggested to check the blocks, like CRC.  My
point was that, whatever check method we use, we should be prepared to
check on startup, or at least make it the default for a crash restart.


Sounds like it should be a guc.  I most certainly wouldn't check the
entire database, especially if it was over a gigabyte.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Bruce Momjian

Jonah H. Harris wrote:
 On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:
  We have several methods suggested to check the blocks, like CRC.  My
  point was that, whatever check method we use, we should be prepared to
  check on startup, or at least make it the default for a crash restart.
 
 Sounds like it should be a guc.  I most certainly wouldn't check the
 entire database, especially if it was over a gigabyte.

Keep in mind if you don't check it on startup, you will be checking it
for every read, which for rare crashes, might not be wise.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jeff Davis

On Wed, 2007-02-28 at 14:10 -0500, Jonah H. Harris wrote:
 On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:
   But if the system was shut down uncleanly as the result of a Postgres 
   crash or
   fast shutdown of Postgres then that isn't an issue. And many users may 
   prefer
   to bring the system up as soon as possible as long as they know any 
   corrupt
   pages will be spotted and throw errors as soon as it's seen.
 
  I don't think we should start up a system and only detect the errors
  later.
 
 Which is, of course, how everyone else does it.  On block access, the
 checksum is verified (if you've turned checksum checking on).  I
 *really* doubt you want to pull in every page in the database at
 startup time to verify the checksum or sequence.  Even pages from the
 last checkpoint would be a killer.
 

Under normal operations, shutting down the database does a checkpoint,
right? So unless you're in recovery mode, there's no additional cost.
And I can't think of any reason you'd need to see any pages before the
last checkpoint (unless you don't trust your disk and just want to check
all the pages, which is more than we can do now anyway).

So the additional cost of doing CRCs every time would be the CPU cost,
and also the cost during recovery of reading in all the data pages since
the last checkpoint. That's 5 minutes of data, in the default
configuration.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:

Keep in mind if you don't check it on startup, you will be checking it
for every read, which for rare crashes, might not be wise.


Well understood.  That's how most everyone configures their database
systems; they certainly don't optimize for torn page detection on a
crash.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote:

That's 5 minutes of data, in the default configuration.


Right, but I don't know anyone that keeps checkpoints at 5 minutes.
At least not on OLTP configurations.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Heikki Linnakangas


Jonah H. Harris wrote:

On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 But if the system was shut down uncleanly as the result of a 
Postgres crash or
 fast shutdown of Postgres then that isn't an issue. And many users 
may prefer
 to bring the system up as soon as possible as long as they know any 
corrupt

 pages will be spotted and throw errors as soon as it's seen.

I don't think we should start up a system and only detect the errors
later.


Which is, of course, how everyone else does it.  On block access, the
checksum is verified (if you've turned checksum checking on).  I
*really* doubt you want to pull in every page in the database at
startup time to verify the checksum or sequence.  Even pages from the
last checkpoint would be a killer.


AFAICS pages from the last checkpoint would be sufficient, and not that 
expensive given that we have to pull all the pages touched since last 
checkpoint from disk to do the WAL replay anyway.



All of the databases (Oracle, SQL Server, DB2) have a way to perform a
database corruption check which does go out and verify all checksums.


I think that's to protect from random disk errors rather than just torn 
pages. Which might be a useful thing but I think that level of 
protection belongs to the filesystem or storage hardware.


What's the use case for the torn-page detection, anyway? If you don't 
trust your OS/hardware to protect you from torn-pages, but you do care 
about your data, surely you would use full_pages_writes = on. If you do 
trust your OS/hardware, just turn it off. And if you don't care about 
your data, what's the point?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Joshua D. Drake

Jonah H. Harris wrote:
 On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote:
 That's 5 minutes of data, in the default configuration.
 
 Right, but I don't know anyone that keeps checkpoints at 5 minutes.
 At least not on OLTP configurations.

Uhmm... most do because most don't ever touch the postgresql.conf and
those that do, don't touch checkpoints because they don't understand it.

Joshua D. Drake


 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] 8.1.8 Installer Fails on Win2k Server

2007-02-28 Thread Saqib Awan

I am installing on a Cisco Media Server 7800 running Win2k Server and am
noticing that the installer fails every time with the error dialog saying
Failed to create process for initdb: Access is denied. It looks like that
I need to change some permission in the registry allowing other users spawn
processes. Any pointer in this respect is highly appreciated.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jeff Davis

On Wed, 2007-02-28 at 14:54 -0500, Jonah H. Harris wrote:
 On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote:
  That's 5 minutes of data, in the default configuration.
 
 Right, but I don't know anyone that keeps checkpoints at 5 minutes.
 At least not on OLTP configurations.
 

It's got a hard maximum of 1 hour. Also, the need to make checkpoints
far apart has been reduced with bgwriter. It will be further reduced
with the patch that allows people to tune the bgwriter for their needs.

Recovery has to run through all those WAL segments anyway. It's not like
we're making a 5 second recovery take 20 minutes, we'd be taking an
already long recovery and making it longer (I'm not sure how much
longer, but it can't be more than twice as long).

I'm not saying there's no cost, but the extra recovery cost seems lower
to me than the CRC cost on every data page read during operation.

Also, if we find an error, do we even have the ability to correct it? A
CRC doesn't tell us which pages were written and which weren't, so we
could detect the error but not correct it, right?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote:

I'm not saying there's no cost, but the extra recovery cost seems lower
to me than the CRC cost on every data page read during operation.


I agree, I just think it should be configurable.


Also, if we find an error, do we even have the ability to correct it? A
CRC doesn't tell us which pages were written and which weren't, so we
could detect the error but not correct it, right?


Correct.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote:

 Right, but I don't know anyone that keeps checkpoints at 5 minutes.
 At least not on OLTP configurations.

Uhmm... most do because most don't ever touch the postgresql.conf and
those that do, don't touch checkpoints because they don't understand it.


Yes, I guess I should've qualified that to specify people who do
understand the system better.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Joshua D. Drake

Jonah H. Harris wrote:
 On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote:
  Right, but I don't know anyone that keeps checkpoints at 5 minutes.
  At least not on OLTP configurations.

 Uhmm... most do because most don't ever touch the postgresql.conf and
 those that do, don't touch checkpoints because they don't understand it.
 
 Yes, I guess I should've qualified that to specify people who do
 understand the system better.

I am not trying to pick on the issue but I do think it is important to
recognize that literally only those in the know, are going to ever touch
the postgresql.conf.

I deal with lots of customers, who employ lots of programmers that I
have often heard, There is a postgresql.conf?. Why? Because they are
user space programmers using an ORM or other such tech that allows them
to never actually login to postgresql or a shell or anything once they
edit the pg_hba.conf.

Joshua D. Drake


 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote:

I am not trying to pick on the issue but I do think it is important to
recognize that literally only those in the know, are going to ever touch
the postgresql.conf.


I agree.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 Martijn van Oosterhout wrote:
 Generally, rolling back a transaction doesn't reduce the amount of disk
 used. Only VACUUM FULL actually shrinks relations.

 Right, but what I mean was -- if we rollback because we hit quota we
 could potentially cause even more maintenance to have to happen (vacuum).

It's worse than that, because VACUUM FULL will actually bloat the
indexes on the way to being able to reduce the table size (since it has
to make new index entries for rows it moves).  If the limit is strictly
enforced then a user who has reached his quota is really totally
screwed: the only easy way to get back under quota will be to completely
drop tables, ie, discard data.  VACUUM probably won't reduce the
physical table size much, and VACUUM FULL will fail, and other
approaches such as CLUSTER won't work either.

[ thinks for a bit... ]  Possibly you could drop all your indexes,
VACUUM FULL, reconstruct indexes.  But it would be painful and would
certainly prevent you from working normally until you finish that
maintenance.  If the quota limit includes temp files you might find that
rebuilding the indexes fails, too, because of the transient space needed
to rebuild.

Plus, all that forced maintenance activity will be degrading response
for other users while it happens.

On the whole I'm not convinced that a quota is a good idea.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Proposal for Implenting read-only queries during wal replay (SoC 2007)

2007-02-28 Thread plabrh1

Thanks Josh,

I'll look for the earlier one and try to add it there...

-Paul



-Original Message-
From: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 28, 2007 12:09 AM
To: Paul Silveira
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal for Implenting read-only queries during wal
replay (SoC 2007)

Paul Silveira wrote:
 Hello,
 
 I just wanted to voice my opinion for this feature...  I've implemented a
 few Production applicaitons with PostgreSQL now and would die for that
 feature.  Right now, I am constantly trying to find way's to make my data
 more available. 

Paul unfortunately you have responded to a hijacked thread. Jonah was
speaking about a project that he wishes would have been accepted which
was called Full Disjunctions.

I have not read the read-only queries during wal replay thread but I can
assure you that Jonah's response had nothing to do with it.

Joshua D. Drake



 I've even resulted to using pg_dump to create read only
 copies of the database and placed them behind load balancers to make the
 data more available.  Something like this would allow me to quickly
leverage
 a read only node to scale out the applicaiton...  If it can at all be
built,
 it would get my first, second and third vote. :)
 
 Regards,
 
 Paul Silveira
 
 
 
 
 Jonah H. Harris-2 wrote:
 On 2/26/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 Jonah, I have no idea what fault you are trying to blame on the
 community in the above statement.  The author didn't discuss the idea
 with the community before spending months on it so we have no obligation
 to accept it in the core.
 You're missing the point entirely.  The majority of the (vocal)
 community didn't even want the feature and as such, failed to provide
 viable suggestions for him to move forward.  As the majority of the
 community didn't want the feature, it wouldn't have made a difference
 when he proposed it; which would have remained negative nonetheless.

 -- 
 Jonah H. Harris, Software Architect | phone: 732.331.1324
 EnterpriseDB Corporation| fax: 732.331.1301
 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
 Iselin, New Jersey 08830| http://www.enterprisedb.com/

 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at

 http://www.postgresql.org/about/donate


 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Compilation errors

2007-02-28 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 flex 2.5.33

 Aha! Known to be broken, iirc. Use flex 2.5.4a

No, the known breakages with flex were years ago; 2.5.33 has only been
out a year.  I think 2.5.31 might have been the one we saw big problems
with (there's a note warning against using it on the flex sourceforge
page).

I think most of us do still use 2.5.4a, but it'd probably be a good idea
to check out 2.5.33 and see if it can be made to not generate warnings.
I'm certainly tired of seeing the warnings 2.5.4a creates ...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Gregory Stark

Jonah H. Harris [EMAIL PROTECTED] writes:

 Which is, of course, how everyone else does it.  

I happen to agree with your conclusion but this line of argument is
exceptionally unconvincing. In fact in this crowd you'll tend to turn people
off and lose people if you say things like that rather than convince anyone of
anything.

 Even pages from the last checkpoint would be a killer.

Hm that's an interesting thought. We only really have to check pages that
would have received a full page write since the last checkpoint. So if we made
turning full page writes off still record the page ids of the pages it *would*
have written then we just need the code that normally replays full page writes
to check the checksum if the page data isn't available.

I can't see how that would be a killer. No matter how large a system you're
talking about you're going to tune checkpoints to be occurring at about the
same interval anyways. So the amount of time the wal replay checksum checking
takes will be more or less constant.

In fact we're already reading in most, if not all, of those pages anyways
since we're replaying wal records that touch them after all. Would we even
have to do anything extra? If we check checksums whenever we read in a page
surely the wal replay code would automatically detect any torn pages without
any special attention.

That also makes it clear just how awful full page writes are for scalability.
As you scale up the system but try to keep checkpoint intervals constant
you're less and less likely to ever see the same page twice between two
checkpoints. So as you scale the system up more and more of the wal will
consist of full page writes.

 All of the databases (Oracle, SQL Server, DB2) have a way to perform a
 database corruption check which does go out and verify all checksums.

Which is pretty poor design. If we implemented a fsck-like tool I would be far
more interested in checking things like tuples don't overlap or hint bits
are set correctly and so on. Checksums do nothing to protect against software
failures which is the only kind of failure with a good rationale for being in
an external tool.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Oleg Bartunov


On Wed, 28 Feb 2007, Tom Lane wrote:


Joshua D. Drake [EMAIL PROTECTED] writes:

Martijn van Oosterhout wrote:

Generally, rolling back a transaction doesn't reduce the amount of disk
used. Only VACUUM FULL actually shrinks relations.



Right, but what I mean was -- if we rollback because we hit quota we
could potentially cause even more maintenance to have to happen (vacuum).


It's worse than that, because VACUUM FULL will actually bloat the
indexes on the way to being able to reduce the table size (since it has
to make new index entries for rows it moves).  If the limit is strictly
enforced then a user who has reached his quota is really totally
screwed: the only easy way to get back under quota will be to completely
drop tables, ie, discard data.  VACUUM probably won't reduce the
physical table size much, and VACUUM FULL will fail, and other
approaches such as CLUSTER won't work either.

[ thinks for a bit... ]  Possibly you could drop all your indexes,
VACUUM FULL, reconstruct indexes.  But it would be painful and would
certainly prevent you from working normally until you finish that
maintenance.  If the quota limit includes temp files you might find that
rebuilding the indexes fails, too, because of the transient space needed
to rebuild.

Plus, all that forced maintenance activity will be degrading response
for other users while it happens.

On the whole I'm not convinced that a quota is a good idea.


On database level it's possible to have soft user quote, just measure
disk usage and warn user if database size is over. This could be realized
using external tools. But Sergey wanted finer granulation. As a workaround,
we could have function which return size of db objects owned by user and
let administrator run cron job to realize soft quota. This will not provide
foundation for enterprize level of QoS, but we certainly don't want to
introduce too much overhead. It's interesting and challenging task though.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


On Wed, 28 Feb 2007, Tom Lane wrote:


Joshua D. Drake [EMAIL PROTECTED] writes:

Martijn van Oosterhout wrote:

Generally, rolling back a transaction doesn't reduce the amount of disk
used. Only VACUUM FULL actually shrinks relations.



Right, but what I mean was -- if we rollback because we hit quota we
could potentially cause even more maintenance to have to happen (vacuum).


It's worse than that, because VACUUM FULL will actually bloat the
indexes on the way to being able to reduce the table size (since it has
to make new index entries for rows it moves).  If the limit is strictly
enforced then a user who has reached his quota is really totally
screwed: the only easy way to get back under quota will be to completely
drop tables, ie, discard data.  VACUUM probably won't reduce the
physical table size much, and VACUUM FULL will fail, and other
approaches such as CLUSTER won't work either.


I don't know, but in my opinion, I don't see anything bad in requiring 
dropping the data if the quota is full. That's what usually occurs in the 
case of normal filesystem quota... If you don't have a space there, you 
cannot edit files, copy them etc...
And that solution should be definitely better than the filesystem quota 
for the PostgreSQL user for example.


regards,
Sergey
***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote:

Jonah H. Harris [EMAIL PROTECTED] writes:

 Which is, of course, how everyone else does it.

I happen to agree with your conclusion but this line of argument is
exceptionally unconvincing. In fact in this crowd you'll tend to turn people
off and lose people if you say things like that rather than convince anyone of
anything.


Rather than reinventing the wheel, it often pays to piggyback on the
solutions others in similar situations have encountered.  I'm just
stating how others provide similar functionality or capabilities.  If
someone dislikes an idea just because the major vendors have done it
that way, that's their own problem.  It's up to the community to
decide how to proceed given the information at hand.


 Even pages from the last checkpoint would be a killer.

Hm that's an interesting thought. We only really have to check pages that
would have received a full page write since the last checkpoint.


That's the only way I see that it could possibly be acceptable from a
time-to-recover performance standpoint.  I would still prefer a guc.


Which is pretty poor design. If we implemented a fsck-like tool I would be far
more interested in checking things like tuples don't overlap or hint bits
are set correctly and so on. Checksums do nothing to protect against software
failures which is the only kind of failure with a good rationale for being in
an external tool.


Regardless of whether it's better as a separate tool or in the
database itself, they provide a
corruption-finding/consistency-checking capability.  As far as other
checks that could be performed, SQL Server and Oracle do have their
own internal structure checks; many of which execute at runtime, not
as a separate tool or process.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jonah H. Harris


On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:

Am am not comfortable starting and having something fail later.


Then do you have some other idea for protecting pages from being torn
without storing an entire backup copy or performing a block-level
consistency check?


How other databases do it is not an issue for me.


Not saying it is, just stating the facts.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Compilation errors

2007-02-28 Thread Andrew Dunstan


Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  

Gregory Stark wrote:


flex 2.5.33
  


  

Aha! Known to be broken, iirc. Use flex 2.5.4a



No, the known breakages with flex were years ago; 2.5.33 has only been
out a year.  I think 2.5.31 might have been the one we saw big problems
with (there's a note warning against using it on the flex sourceforge
page).

I think most of us do still use 2.5.4a, but it'd probably be a good idea
to check out 2.5.33 and see if it can be made to not generate warnings.
I'm certainly tired of seeing the warnings 2.5.4a creates ...
  




It gives me the same warnings that Greg reported.

I guess we could conditionally add prototypes for those functions to all 
the .l files if you really want to move to 2.5.33. Kinda yucky, though.


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Compilation errors

2007-02-28 Thread Andrew Dunstan


I wrote:



I guess we could conditionally add prototypes for those functions to 
all the .l files if you really want to move to 2.5.33. Kinda yucky, 
though.





Actually, we couldn't.The definition section from the .l file gets 
included after these functions. So we'd need to include something in 
gram.y before including scan.c.


cheers

andrew


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Heikki Linnakangas


Gregory Stark wrote:

Simon Riggs [EMAIL PROTECTED] writes:


How much memory would it save during VACUUM on a 1 billion row table
with 200 million dead rows? Would that reduce the number of cycles a
normal non-interrupted VACUUM would perform?


It would depend on how many dead tuples you have per-page. If you have a very
large table with only one dead tuple per page then it might even be larger.
But in the usual case it would be smaller.


FWIW, there's some unused bits in current representation, so it might 
actually be possible to design it so that it's never larger.


One optimization to the current structure, instead of switching to a 
bitmap, would be to store the block number just once for each block, 
followed by a variable length list of offsets. It'd complicate the 
binary search though.



Also note that it would have to be non-lossy.


Yep. Or actually, it might be useful to forget some dead tids if it 
allowed you to memorize a larger number of other dead tids. Hmm, what a 
weird thought :).


Another insight I had while thinking about this is that the dead tid 
list behaves quite nicely from a OS memory management point of view. In 
the 1st vacuum phase, the array is filled in sequence, which means that 
the OS can swap out the early parts of it and use the memory for buffer 
cache instead. In the index scan phase, it's randomly accessed, but if 
the table is clustered, it's in fact not completely random access. In 
the 2nd vacuum pass, the array is scanned sequentially again. I'm not 
sure how that works out in practice, but you might want to use a larger 
maintenance_work_mem than you'd think.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Martijn van Oosterhout

On Wed, Feb 28, 2007 at 03:57:56PM -0500, Tom Lane wrote:
 It's worse than that, because VACUUM FULL will actually bloat the
 indexes on the way to being able to reduce the table size (since it has
 to make new index entries for rows it moves).  If the limit is strictly

I was thinking that indexes and temp tables wouldn't be counted. I
thought it was more of a stop people using up lots of disk space
rather than specifically stopping at a hard limit.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Joshua D. Drake


 I don't know, but in my opinion, I don't see anything bad in requiring
 dropping the data if the quota is full. That's what usually occurs in
 the case of normal filesystem quota... If you don't have a space there,
 you cannot edit files, copy them etc...
 And that solution should be definitely better than the filesystem quota
 for the PostgreSQL user for example.

The bad point is not that we would rollback the transaction. The bad
point is what happens when you need to rollback a transaction and in
your scenario it is quite plausible that a large rollback could occur,
more than once, causing the requirement of something like a vacuum full
to clean things up.

Sincerely,

Joshua D. Drake



 
 regards,
 Sergey
 ***
 Sergey E. Koposov
 Max Planck Institute for Astronomy/Cambridge Institute for
 Astronomy/Sternberg Astronomical Institute
 Tel: +49-6221-528-349
 Web: http://lnfm1.sai.msu.ru/~math
 E-mail: [EMAIL PROTECTED]
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-02-28 Thread Joshua D. Drake

Joshua D. Drake wrote:
 Darcy Buskermolen wrote:
 I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats 
 collector 
 on an 8.2.3 box  investigation has lead me to belive that the stats file is 
 written a lot more often that once every 500ms  the following shows this 
 behavior.

I have just done a test separate from darcy on my workstation:


[EMAIL PROTECTED]:~/82$ strace -c -p 16130
Process 16130 attached - interrupt to quit
Process 16130 detached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 50.000.167103   3 60264 10225 poll
 28.310.094600   9 10398   rename
 16.890.056457   5 10398   open
  2.480.008278   0 50023   recvfrom
  1.140.003804   0 10398   setitimer
  0.530.001781   0 20796   write
  0.430.001432   0 10398   close
  0.210.000690   0 10398   munmap
  0.020.57   0 10398   mmap
  0.000.00   0 10398   fstat
  0.000.00   0 10398 10225 rt_sigreturn
  0.000.00   0 10414   getppid
-- --- --- - - 
100.000.334202224681 20450 total


Query:

time for i in `bin/psql -p8000 -d postgres -c select
generate_series(1,1)`; do bin/psql -p8000 -d postgres -qc select 1
from pg_database where datname = 'postgres'; done;

Time:

real2m5.077s
user0m28.414s
sys 0m39.762s


PostgreSQL 8.2.0, Ubuntu Edgy 64bit.

Seems like something is extremely wonky here.

Joshua D. Drake







 PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
 20060404 (Red Hat 3.4.6-3)

 I ran a 
 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done

 which took
 real1m23.288s
 user0m24.142s
 sys 0m21.536s


 to execute, during which time I ran a strace on the stats collector which 
 produces the following output.  From this it looks like the stats file is 
 getting rewritten for each connection teardown, not just every 500ms.

 Process 10061 attached - interrupt to quit
 Process 10061 detached
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  68.14   28.811963  17   1663827   write
  18.227.701885 123 62808 12793 poll
  11.314.783082 365 13101   rename
   0.580.246169   5 50006   recvfrom
   0.570.241073  18 13101   open
   0.430.182816  14 13101   munmap
   0.180.076176   6 13101   mmap
   0.170.072746   6 13101   close
   0.140.060483   5 13101   setitimer
   0.100.041344   3 13101 12793 rt_sigreturn
   0.090.039240   3 13101   fstat
   0.060.024041   2 13110   getppid
 -- --- --- - - 
 100.00   42.281018   1894559 25586 total

 As you can see rename was called more than the theroitcal 167  times for 
 500ms 
 slices that elapsed during the test

 Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
 gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
 [All be it this is slower hardware..]

 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done

 which took
 real9m25.380s
 user6m51.254s
 sys 1m47.687s
 (and therefor should be about 1130 stat write cycles)

 and yielded the following strace

 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  93.64   20.422006 334 61212   select
   3.490.760963   7110192   read
   1.820.396654  19 21128   write
   0.640.139679 126  1112   rename
   0.270.057970  52  1112   open
   0.060.012177  11  1112   munmap
   0.040.008901   8  1112   mmap
   0.030.006402   6  1112   close
   0.020.004282   4  1112   fstat
 -- --- --- - - 
 100.00   21.809034199204   total



 During this run the stats collector does not even show and CPU usage 
 according 
 to top.


 both 8.1 and 8.2 have the following postgresql.conf parameters

 stats_command_string =  off
 stats_start_collector = on
 stats_block_level = on
 stats_row_level = on



 
 


-- 

  === The PostgreSQL

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Jeff Davis

On Wed, 2007-02-28 at 21:13 +, Gregory Stark wrote:
 Hm that's an interesting thought. We only really have to check pages that
 would have received a full page write since the last checkpoint. So if we made

Do we ever do a partial page write, or is what you're saying equivalent
to we only have to check pages that have been written to since the last
checkpoint?

And if it is the same, can't we get the pages that were written to from
the ctids in the wal records?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Joachim Wieland

On Wed, Feb 28, 2007 at 12:56:13PM -0500, Tom Lane wrote:
 It hasn't ever made it onto the TODO list, which means there's not a
 consensus that we need it.

Such a patch could improve the acceptance of PostgreSQL in shared hosting
environments. Note that a database without quotas can be filled up easily
and the database will stop serving requests to other users' databases.

There is a quota implementation already in the archives but I don't know
more about it than that it exists:

http://archives.postgresql.org/pgsql-hackers/2004-07/msg00392.php


Joachim



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


On Wed, 28 Feb 2007, Joshua D. Drake wrote:


The bad point is not that we would rollback the transaction. The bad
point is what happens when you need to rollback a transaction and in
your scenario it is quite plausible that a large rollback could occur,
more than once, causing the requirement of something like a vacuum full
to clean things up.


Yes, I understand, but I think, the scenario of working and always 
hitting a quota is not normal. I think you shouldn't expect perfect, 
stable performance etc. in that case. It's similar to if you'll try to 
work with PG with very little memory and/or shared memory. You can work, 
but you can easily hit these limits, and than something will not work...


regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] SOC user quotas

2007-02-28 Thread Sergey E. Koposov


On Wed, 28 Feb 2007, Joachim Wieland wrote:


On Wed, Feb 28, 2007 at 12:56:13PM -0500, Tom Lane wrote:

It hasn't ever made it onto the TODO list, which means there's not a
consensus that we need it.


Such a patch could improve the acceptance of PostgreSQL in shared hosting
environments. Note that a database without quotas can be filled up easily
and the database will stop serving requests to other users' databases.


Yes, I agree.


There is a quota implementation already in the archives but I don't know
more about it than that it exists:

http://archives.postgresql.org/pgsql-hackers/2004-07/msg00392.php


I remember that thread, but I think there was no patch at all, at least I 
didn't see it.


Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Compilation errors

2007-02-28 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Actually, we couldn't.The definition section from the .l file gets 
 included after these functions. So we'd need to include something in 
 gram.y before including scan.c.

Actually, since we don't use any of those functions, the following
advice from the 2.5.33 flex manual seems useful:

   A number of options are available for lint purists who want to
suppress the appearance of unneeded routines in the generated scanner.
Each of the following, if unset (e.g., `%option nounput'), results in
the corresponding routine not appearing in the generated scanner:

 input, unput
 yy_push_state, yy_pop_state, yy_top_state
 yy_scan_buffer, yy_scan_bytes, yy_scan_string
 
 yyget_extra, yyset_extra, yyget_leng, yyget_text,
 yyget_lineno, yyset_lineno, yyget_in, yyset_in,
 yyget_out, yyset_out, yyget_lval, yyset_lval,
 yyget_lloc, yyset_lloc, yyget_debug, yyset_debug

   (though `yy_push_state()' and friends won't appear anyway unless you
use `%option stack)'.

However, I'm not sure whether 2.5.4 will choke on unrecognized %option
commands --- if it has to be conditional on version it'll be a PITA.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Simon Riggs

On Tue, 2007-02-27 at 09:32 -0800, Josh Berkus wrote:
 Simon,

I think if you address me in a mail, it would be best not to explicitly
*remove* my name from the address list.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Bruce Momjian

Jonah H. Harris wrote:
 On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote:
  Am am not comfortable starting and having something fail later.
 
 Then do you have some other idea for protecting pages from being torn
 without storing an entire backup copy or performing a block-level
 consistency check?

I want the startup to fail.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread Gregory Stark

Jonah H. Harris [EMAIL PROTECTED] writes:

 On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote:
 Jonah H. Harris [EMAIL PROTECTED] writes:

  Which is, of course, how everyone else does it.

 I happen to agree with your conclusion but this line of argument is
 exceptionally unconvincing. In fact in this crowd you'll tend to turn people
 off and lose people if you say things like that rather than convince anyone 
 of
 anything.

 Rather than reinventing the wheel, it often pays to piggyback on the
 solutions others in similar situations have encountered.  I'm just
 stating how others provide similar functionality or capabilities.  If
 someone dislikes an idea just because the major vendors have done it
 that way, that's their own problem.  It's up to the community to
 decide how to proceed given the information at hand.

Except that's not what you're doing. There's nothing wrong with saying foo
does this clever thing I think we should copy because insert argument. Nor
even foo does this thing, would that help us? But what you seem to be saying
is *Because* foo does this thing we can conclude it's a clever thing and we
should do it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

1 2 >

1 - 100 of 132 matches

Mail list logo