Re: [HACKERS] Resumable vacuum proposal and design overview
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, Feb 26, 2007 at 01:39:40PM -0500, Tom Lane wrote: [...] Or were you speaking of the pg_class.reltuples count? Yes (modulo my warning, that is) Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFF5T2SBcgs9XrR2kYRAndUAJoDG+5zqk0PxOI5GUM68GKW7+NdRgCfVB5p 6eod6gx21tgOciSKXAuuCvA= =3Oz7 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Resumable vacuum proposal and design overview
On Wed, 2007-02-28 at 13:53 +0900, Galy Lee wrote: Tom Lane wrote: Huh? There is no extra cost in what I suggested; it'll perform exactly the same number of index scans that it would do anyway. The things I wanted to say is that: If we can stop at any point, we can make maintenance memory large sufficient to contain all of the dead tuples, then we only need to clean index for once. No matter how many times vacuum stops, indexes are cleaned for once. I agree that the cycle-at-a-time approach could perform more poorly with repeated stop-start. The reason for the suggestion was robustness, not performance. If you provide the wrong dead-tuple-list to VACUUM, you will destroy the integrity of a table, which can result in silent data loss. You haven't explained how saving the dead-tuple-list could be done in a safe mannner and it seems risky to me. But in your proposal, indexes will be scan as many as vacuum stops. Those extra indexes cleaning are thought as the extra cost compared with stop-on-dime approach. To vacuum a large table by stopping 8 times, tests show the extra cost can be one third of the stop-on-dime approach. But the VACUUM is being run during your maintenance window, so why do you care about performance of VACUUM during that time? There is some inefficiency in the VACUUM process, but seems like a high price to pay for more robustness. Does the loss of efficiency during VACUUM translate directly into reduced performance during operational periods? I think not. Deferring completion of VACUUM means deferring refreshing the FSM. Allowing cycle-at-a-time VACUUM would allow the FSM to be updated after each run, thus releasing space for reuse again. ISTM that the saving-dead-list approach would defer the updating of the FSM for many days in your situation. If you would like to reduce VACUUM times have you considered partitioning? It can be very effective at isolating changes and is designed specifically to cope with large data maintenance issues. If there are issues that prevent the use of partitioning in your case, perhaps we should be discussing those instead? Migration from a non-partitioned environment to a partitioned one is quite simple from 8.2 onwards. So I'm not really convinced that being able to stop a table vacuum halfway is critical. To run vacuum on the same table for a long period, it is critical to be sure: 1. not to eat resources that foreground processes needs 2. not to block vacuuming of hot-updated tables 3. not to block any transaction, not to block any backup activities In the current implementation of concurrent vacuum, the third is not satisfied obviously, the first issue comes to my mind is the lazy_truncate_heap, it takes AccessExclusiveLock for a long time, that is problematic. Are you saying you know for certain this lock is held for a long time, or are you just saying you think it is? If you have some evidence for long truncation times then that would be a separate issue of concern, since that might starve out normal users. Please say more? ISTM that if you can refresh the FSM more frequently you will have less need to truncate the relation at the end of each run. After some time, I would expect that no truncation would be required because of the cyclic reuse of space within the table, rather than extension/truncation. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Resumable vacuum proposal and design overview
Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2007-02-28 at 13:53 +0900, Galy Lee wrote: In the current implementation of concurrent vacuum, the third is not satisfied obviously, the first issue comes to my mind is the lazy_truncate_heap, it takes AccessExclusiveLock for a long time, that is problematic. Are you saying you know for certain this lock is held for a long time, or are you just saying you think it is? If you have some evidence for long truncation times then that would be a separate issue of concern, since that might starve out normal users. Please say more? lazy_truncate_heap does a ConditionalLockAcquire, that is, it won't succeed in acquiring the exclusive lock if there is any competition. And I find it hard to believe that it will hold the lock very long if it does get it --- in most scenarios it won't be possible to remove very many pages, so the scan won't take long. (Of course that is arguably a bug, but until you can fix things so that an average VACUUM *can* remove a lot of pages, it's hardly profitable to worry about whether this step creates a concurrency problem.) So I agree with Simon: if you want us to believe there's a performance issue here, please present some evidence. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Resumable vacuum proposal and design overview
Tom Lane wrote: Galy Lee [EMAIL PROTECTED] writes: If we can stop at any point, we can make maintenance memory large sufficient to contain all of the dead tuples, then we only need to clean index for once. No matter how many times vacuum stops, indexes are cleaned for once. I beg your pardon? You're the one who's been harping on the table-so-large-it-takes-days-to-vacuum scenario. How you figure that you can store all the dead TIDs in working memory? This reminds me of an idea I had while looking at the bitmap index patch: We could store the dead TIDs more efficiently in a bitmap, allowing tables to be vacuumed in lesser cycles. Of course, that's orthogonal to the above discussion. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Dead Space Map version 2
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: I'd be happier if the DSM content could be treated as just a hint. If we don't have a frozen state, we can't use the DSM to implement index-only scans. To implement index-only scans, the DSM would have to be expected to provide 100% reliable coverage, which will increase its cost and complexity by orders of magnitude. If you insist on that, I will bet you lunch at a fine restaurant that it doesn't make it into 8.3. :) While I understand that 100% reliable coverage is a significantly stronger guarantee, I don't see any particular problems in implementing that. WAL logging isn't that hard. I won't insist, I'm not the one doing the programming after all. Anything is better than what we have now. However, I do hope that whatever is implemented doesn't need a complete rewrite to make it 100% reliable in the future. The basic wish I have is to not use a fixed size shared memory area like FSM for the DSM. I'd like it to use the shared buffers instead, which makes the memory management and tuning easier. And it also makes it easier to get the WAL logging right, even if it's not done for 8.3 but added later. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Resumable vacuum proposal and design overview
Simon Riggs wrote: You haven't explained how saving the dead-tuple-list could be done in a safe mannner and it seems risky to me. The files are placed in a new directory $PGDATA/pg_vacuum with the name: spcNode.dbNode.relNode for each relations which have been interrupted during vacuum. It has the format likes: 1. VacStateFileHeader 2. VacStateData 3. Dead Tuple list 4. CRC32 The files are removed - when original physical heap files are removed, - when vacuum full have been issued, - or after the content has been read in memory. - etc. Is there any potential big risk there? Correct me if I am wrong. Deferring completion of VACUUM means deferring refreshing the FSM. I borrow the code from DSM patch to merge free space info into FSM when vacuum stops. Are you saying you know for certain this lock is held for a long time, or are you just saying you think it is? If you have some evidence for long truncation times then that would be a separate issue of concern, since that might starve out normal users. Please say more? Sorry. I *thought* it is. The benchmark has not shown such kind of problem anyway. Thanks for the clarification for me. :) Regards, -- Galy Lee lee.galy _at_ oss.ntt.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Resumable vacuum proposal and design overview
The things I wanted to say is that: If we can stop at any point, we can make maintenance memory large sufficient to contain all of the dead tuples, then we only need to clean index for once. No matter how many times vacuum stops, indexes are cleaned for once. I agree that the cycle-at-a-time approach could perform more poorly with repeated stop-start. The reason for the suggestion was robustness, not performance. If you provide It performs more poorly, but it also gives immediate gain, since part of the table is readily vacuumed. If you do it all in one pass with stop resume, the first visible effect may be several days after you start vacuuming. And, basically you need to pretend the vacuum transaction is still running after the first stop. Else dead tuple reuse ala HOT is not possible (or the ctid list needs to be reevaluated during resume, which per se is not efficient). the wrong dead-tuple-list to VACUUM, you will destroy the integrity of a table, which can result in silent data loss. You haven't explained how saving the dead-tuple-list could be done in a safe mannner and it seems risky to me. Agreed. It seems not efficiently possible. Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Resumable vacuum proposal and design overview
You haven't explained how saving the dead-tuple-list could be done in a safe mannner and it seems risky to me. The files are placed in a new directory $PGDATA/pg_vacuum with the name: spcNode.dbNode.relNode for each relations which have been interrupted during vacuum. It has the format likes: 1. VacStateFileHeader 2. VacStateData 3. Dead Tuple list 4. CRC32 The files are removed - when original physical heap files are removed, - when vacuum full have been issued, - or after the content has been read in memory. - etc. Is there any potential big risk there? Correct me if I am wrong. The main risc is not a corrupt file or broken list. The risc is, that a ctid in the list points at a tuple that is not dead anymore. To avoid that risc you would need to: 1. keep the vacuum lock open 2. leave the vacuum tx open (or reevaluate visibility of list members upon resume) Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COMMIT NOWAIT Performance Option
But we do don't we? fsync = off, full_page_writes = off? BTW, our testing seems to indicate that full_page_writes = off is safe on Solaris 10 on good hardware. At least, we haven't been able to break it yet. Is that an OS-dependent parameter? I always assumed it depended entirely on hardware. I have no way to test it for myself though, so I just leave full_page_writes=on to be safe. It also depends on the FS implementation. The OS/FS must guarantee, that it does not chunk single data page write calls. Usually that is the case, when OS/FS pagesize and pg pagesize are identical. And the HW needs to guarantee atomicity for single calls. e.g. on AIX you need to reduce the pg page size to 4k to be able to give those guarantees. Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] VACUUM and spoiling the buffer manager cache
Round about v.161 of src/storage/buffer/bufmgr.c, during the development of 8.0 a change was introduced to prevent VACUUM from changing the state of the Adaptive Replacement Cache buffer management strategy. At the time that change made lots of sense. Since then we have changed the buffer management strategy and this behaviour of VACUUM may no longer make as much sense as it did then. VACUUM's current behaviour is to take blocks it has touched and place them on the head of the freelist, allowing them to be reused. This is a good strategy with clean blocks, but it is a poor strategy for dirty blocks. Once a dirty block has been placed on the freelist, the very next request for a free buffer will need to both write the block to disk *and* this will typically require a WAL flush to occur also. The WAL flushing behaviour has been described in detail on this thread: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00674.php though this proposal has nothing to do with FREEZEing rows. The effects of this behaviour are that when VACUUM is running alone it has to make more WAL flushes than it really needs to, so is slightly slower. That could be improved, but isn't my priority on this post. When VACUUM operates alongside a concurrent workload the other non-VACUUM backends become involved in cleaning the VACUUM's dirty blocks. This slows the non-VACUUM backends down and effectively favours the VACUUM rather than masking its effects, as we were trying to achieve. This behaviour noticeably increases normal transaction response time for extended periods, with noticeable WAL spikes as the WAL drive repeatedly fsyncs, much more than without the VACUUM workload. The proposal would be to stop VACUUM from putting its blocks onto the freelist if they are dirty. This then allows the bgwriter to write the VACUUM's dirty blocks, which avoids the increased response times due to WAL flushing. It also incidentally improves a lone VACUUM, since the bgwriter is able to help write out the dirty blocks. VACUUM pays the cost to test if they are dirty, but its minor anyway. The clock cycle buffer management strategy is less prone to cache spoiling behaviour than was the earlier LRU methods, fixed or adaptive. A simple solution does effectively smooth out the poor response times seen while a VACUUM is in progress. The in-line patch is a one-line change to the buffer manager code, and is one of a few versions experimented with. The additional line is a simple test to see whether the VACUUM'd block is dirty before deciding what to do with it. [A separate patch is available, if requested, identified as vacstrategy.v2.patch] Independent verification of test results is requested. Index: src/backend/storage/buffer/bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.215 diff -c -r1.215 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 1 Feb 2007 19:10:27 - 1.215 --- src/backend/storage/buffer/bufmgr.c 26 Feb 2007 13:09:35 - *** *** 907,913 else { /* VACUUM accesses don't bump usage count, instead... */ ! if (buf-refcount == 0 buf-usage_count == 0) immed_free_buffer = true; } } --- 907,914 else { /* VACUUM accesses don't bump usage count, instead... */ ! if (buf-refcount == 0 buf-usage_count == 0 ! !(buf-flags BM_DIRTY)) immed_free_buffer = true; } } -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris [EMAIL PROTECTED] writes: First, rather than using 16-bytes per page and having to deal with handling the non-contiguous space, why not just use a page-level checksum like everyone else? Most of the systems I've seen seem to employ a simple CRC16 or CRC32. I think a CRC would be a useful feature for people who want an extra degree of protection from faulty hardware. But we've already seen that CRC checks can be expensive. Not everyone will want to take the cpu hit. Storing a byte counter in every block is cheap. And the idea came from what someone said MSSQL does, so like everyone else -- which isn't a very compelling argument to begin with -- doesn't argue against it. Second, unless I'm missing something, I don't see how your algorithm is going to work as each 512 byte chunk of the block will *always* have the same sequential byte value. That is, unless you have some way of preventing wraparound at 255 without adding additional block overhead. I think the way you would work is to have the smgr note the sequential value it found when it read in a page and then when it writes it out increment that value by one. Conveniently the pages would be 16 bytes shorter than an 8kb page so you have 16 bytes available with every buffer to note information like the last sequential tag the buffer used. Lastly, from a performance perspective, it's going to be faster to compute the entire page's checksum than it would be to check the sequence every 512 bytes and perform the space adjustment. That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already suffering a copy due to our use of read/write the difference between read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be non-zero but very small. Thousands of times quicker than the CRC. If we went to direct-io then it would entail an additional memory-copy which would be annoying. But that would still be much much cheaper than a CRC check. The best we could do in that case would be to do a CRC check at the same time as the memory move. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Packed short varlenas, what next?
Tom Lane [EMAIL PROTECTED] writes: I've committed this, but in testing with a hack that does ntohl() in the VARSIZE macro and vice-versa in SET_VARSIZE, I find that core passes regression but several contrib modules do not. It looks like the contrib modules were depending on various random structs being compatible with varlena, while not exposing that dependence in ways that either of us caught :-(. I just noticed that last night myself. In particular the GIST modules seems to be a major problem. they define dozens of new objects, many of which are just passing around C data structures internally but some of which are objects which get stored in the database. I have no idea which are which and which ones are varlenas. Worse, it uses PG_GETARG_POINTER() and explicitly calls PG_DETOAST_DATUM() in the few places it assumes finding toasted data is possible. That's even harder to track down. I can send up a patch for the data types I fixed last night. I'll work on cleaning up the remaining mess tomorrow, but I think that we may need to think twice about whether it's OK to expect that every datatype with typlen = -1 will be compatible with the New Rules. I'm back to wondering if maybe only types with typalign 'c' should get caught up in the changes. I don't think we can key off typalign='c'. That would entail changing varlenas to typalign 'c' which would throw off other consumers of the typalign which expect it to be the alignment of the detoasted datum. Moreover I still align them when they have the full 4-byte header by using the typalign. I think we would want to introduce a new column, or maybe a new attlen value, or a new typalign value. I was thinking about that though and it's not so simple. It's easy enough not to convert to short varlena for data types that don't assert that they support the packed format. That's not a problem. That takes care of data types which don't call pg_detoast_datum(). But not storing the varlena header in network byte order sometimes would be quite tricky. There are a great many places that call VARSIZE that don't look at the attalign or even have it handy. If we made it a new attlen value we could have two different macros, but that will be another quite large patch. It would mean hitting all those datatypes all over again to change every instance of VARSIZE into NEWVARSIZE or something like that. Plus all the sites in the core that call VARSIZE would need to check attlen and call the right one. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Resumable vacuum proposal and design overview
On Wed, 2007-02-28 at 11:19 +0100, Zeugswetter Andreas ADI SD wrote: The things I wanted to say is that: If we can stop at any point, we can make maintenance memory large sufficient to contain all of the dead tuples, then we only need to clean index for once. No matter how many times vacuum stops, indexes are cleaned for once. I agree that the cycle-at-a-time approach could perform more poorly with repeated stop-start. The reason for the suggestion was robustness, not performance. If you provide It performs more poorly, but it also gives immediate gain, since part of the table is readily vacuumed. If you do it all in one pass with stop resume, the first visible effect may be several days after you start vacuuming. I think that in itself is enough to tip the scales. And, basically you need to pretend the vacuum transaction is still running after the first stop. Else dead tuple reuse ala HOT is not possible (or the ctid list needs to be reevaluated during resume, which per se is not efficient). Ah, I see you got there ahead of me. Yes, it would prevent HOT from performing retail VACUUMs on heap blocks. (I'm not saying HOT will be accepted/acceptable, but I'd rather not have its acceptability hinge on a use case that seems so rare). One proposal that we do still have in discussion is Heikki's patch to re-evaluate the OldestXmin while the VACUUM runs. That's something we'd definitely want to do in a restartable VACUUM anyway. But my thought is that it actually increases quite dramatically the number of dead rows harvested during VACUUM (a good thing), which is likely to increase the number of cycles required to complete a large table (no problem, because of the increased efficiency of the VACUUM). I think there's a strong argument to make VACUUM refresh rather than rebuild the FSM after each cycle rather than wait until the end, whether or not we stop/start the VACUUM. In any long running VACUUM that seems very worthwhile. Big VACUUM needs big memory. Using huge settings of maintenance_work_mem dedicated solely to VACUUM seems like it could be a waste of resources in many cases. It may be much better to allow 1 GB of memory to be used to cache indexes better, which would improve performance of other applications, as well as improving the index scan time during VACUUM. So scanning indexes more often during VACUUM isn't necessarily bad either, unless your table is truly huge, in which case you should use partitioning to reduce it. Galy, please hear that people like your idea and understand your use case, but just don't like all of the proposal, just the main thrust of it. The usual way is that (people that agree + amount of your exact idea remaining) = 100% -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Resumable vacuum proposal and design overview
On Wed, 2007-02-28 at 09:38 +, Heikki Linnakangas wrote: Tom Lane wrote: Galy Lee [EMAIL PROTECTED] writes: If we can stop at any point, we can make maintenance memory large sufficient to contain all of the dead tuples, then we only need to clean index for once. No matter how many times vacuum stops, indexes are cleaned for once. I beg your pardon? You're the one who's been harping on the table-so-large-it-takes-days-to-vacuum scenario. How you figure that you can store all the dead TIDs in working memory? This reminds me of an idea I had while looking at the bitmap index patch: We could store the dead TIDs more efficiently in a bitmap, allowing tables to be vacuumed in lesser cycles. Of course, that's orthogonal to the above discussion. I like the idea. How much memory would it save during VACUUM on a 1 billion row table with 200 million dead rows? Would that reduce the number of cycles a normal non-interrupted VACUUM would perform? Would it work efficiently for all of the current index AMs? Each index might use the index slightly differently during cleanup, I'm not sure. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] psql problem querying relations
Hi, this is with current CVS code: # \dt ERROR: did not find '}' at end of input node Server log: ERROR: did not find '}' at end of input node STATEMENT: SELECT n.nspname as Schema, c.relname as Name, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END as Type, r.rolname as Owner FROM pg_catalog.pg_class c JOIN pg_catalog.pg_roles r ON r.oid = c.relowner LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','') AND n.nspname NOT IN ('pg_catalog', 'pg_toast') AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2; Best regards, Zoltán Böszörményi ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Implicit casts with generic arrays
Am Dienstag, 27. Februar 2007 19:50 schrieb Tom Lane: Seems basically we'd want to not cast unknown to anyarray unless there is some additional bit of context suggesting that that's the right thing. But what should that extra requirement be? Can we go as far as not doing this cast implicitly at all? We could say that unknown is not taken as anyarray input if the entire function/operator argument list consists of anyelement or anyarray. But that might be even harder to comprehend. With the ARRAY[...] syntax available, converting unknown to anyarray might be altogether unnecessary. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] psql problem querying relations
Zoltan Boszormenyi wrote: Hi, this is with current CVS code: # \dt ERROR: did not find '}' at end of input node Server log: ERROR: did not find '}' at end of input node It's working for me. Have you tried with a fresh checkout or after running make clean before you build? cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Major Feature Interactions
There's a number of major features all coming together over the next weeks, so I wanted to raise some awareness of possible areas of interaction. My concern is more than just will multiple patches apply together?, but extends into whether the features interact with each other with synergy, ignore each other or have negative impact. The projects that concern me are the following internals projects Plan Invalidation Autovacuum changes Bitmap indexes GIT indexes Dead Space Map (DSM) HOT Updating Updateable cursors Restartable VACUUM The potential interactions as I understand them are these: I'm not saying this list is either complete or accurate; checking whether it is or not is exactly the reason for this post. Anyway, just trying to raise awareness to look for potential issues. UpdCursors -- Syntax changes only, no impact on other projects mentioned. HOT --- GIT Used to be some conflicts at patch level, now gone? Positive impact on GIT performance PlanInval heap_check_hotupdate() test can performed in planner, so will no longer be a heap function DSM HOT updates need not be logged, since they will be cleared by retail VACUUMs BitmapIdx Potential issue with chilling tuples during CREATE INDEX AutoVac HOT will change how UPDATEs are reported to pg_stat, by adding additional fields. UpdCursors Should tidscan follow chain to correct tuple? That seems to change current behaviour even if it might be useful. Plan Invalidation - No known changes, other than the integrations already mentioned Autovacuum changes -- DSM Is implementing new forms of VACUUM: VACUUM and VACUUM ALL. Does reporting of row stats change? Bitmap indexes -- GIT Do we need both? I think yes, but need to see some info on when crossover occurs between two index methods. Restartable VACUUM -- DSM Interactions with DSM? HOT Must use full cycles only -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Resumable vacuum proposal and design overview
Simon Riggs [EMAIL PROTECTED] writes: How much memory would it save during VACUUM on a 1 billion row table with 200 million dead rows? Would that reduce the number of cycles a normal non-interrupted VACUUM would perform? It would depend on how many dead tuples you have per-page. If you have a very large table with only one dead tuple per page then it might even be larger. But in the usual case it would be smaller. Also note that it would have to be non-lossy. My only objection to this idea, and it's not really an objection at all, is that I think we want to head in the direction of making indexes cheaper to scan and doing the index scan phase more often. That reduces the need for multiple concurrent vacuums and makes the problem of busy tables getting starved less of a concern. That doesn't mean there's any downside to making the dead tuple list take less memory but I think the upside is limited. As we optimize our index representations with GII and bitmapped indexes scanning them gets easier and easier anyways. And you don't really want to wait too long before you get the benefit of the recovered space in the table. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] psql problem querying relations
Andrew Dunstan írta: Zoltan Boszormenyi wrote: Hi, this is with current CVS code: # \dt ERROR: did not find '}' at end of input node Server log: ERROR: did not find '}' at end of input node It's working for me. Have you tried with a fresh checkout or after running make clean before you build? cheers andrew I have to reinitdb my test database then tried again. It's working regardless of my IDENTITY patch is applied or not. Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Compilation errors
Does anyone happen to know what it is about my build environment that causes these errors? I get a similar slew of errors from each flex lexer we built including the plpgsql lexer, the ecpg lexer, etc. Do I have the wrong version of flex installed? Something else? It's quite annoying, especially as the beta version of Emacs I'm using seems to not be able to parse these lines. In file included from gram.y:9493: scan.c:7050: warning: no previous prototype for 'base_yyget_lineno' scan.c:7059: warning: no previous prototype for 'base_yyget_in' scan.c:7067: warning: no previous prototype for 'base_yyget_out' scan.c:7075: warning: no previous prototype for 'base_yyget_leng' scan.c:7084: warning: no previous prototype for 'base_yyget_text' scan.c:7093: warning: no previous prototype for 'base_yyset_lineno' scan.c:7105: warning: no previous prototype for 'base_yyset_in' scan.c:7110: warning: no previous prototype for 'base_yyset_out' scan.c:7115: warning: no previous prototype for 'base_yyget_debug' scan.c:7120: warning: no previous prototype for 'base_yyset_debug' scan.c:7154: warning: no previous prototype for 'base_yylex_destroy' -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote: But we've already seen that CRC checks can be expensive. Not everyone will want to take the cpu hit. Storing a byte counter in every block is cheap. CRC checking a page is most certainly the simplest. And, I disagree that it would be worse than either a sequence counter or the full page write. Block checksumming is done at read/write time... which is something that needs to be improved anyway. With a properly tuned bgwriter, the write itself should barely be noticeable. How fast is a CRC of 8K? Last time I checked it was something on the scale of ~95 usec for CRC32 and ~33 usec for sb8. And the idea came from what someone said MSSQL does, so like everyone else -- which isn't a very compelling argument to begin with -- doesn't argue against it. Rather than basing designs on poor second-hand information, maybe you and the person who mentioned this idea should get up-to-date and read the SQL Server storage engine architecture. As of SQL Server 2005, blocks *are* checksummed with CRC32. And, just for the record, previous versions of SQL server performed a bit flipping technique for every 512 bytes in the page header; it did *not* waste a byte for every 512 bytes written. I think the way you would work is to have the smgr note the sequential value it found when it read in a page and then when it writes it out increment that value by one. Conveniently the pages would be 16 bytes shorter than an 8kb page so you have 16 bytes available with every buffer to note information like the last sequential tag the buffer used. This proposed design is overcomplicated and a waste of space. I mean, we reduce storage overhead using phantom command id and variable varlena, but let's just fill it up again with unnecessary junk bytes. That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already suffering a copy due to our use of read/write the difference between read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be non-zero but very small. Thousands of times quicker than the CRC. Prove it. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compilation errors
Gregory Stark wrote: Does anyone happen to know what it is about my build environment that causes these errors? I get a similar slew of errors from each flex lexer we built including the plpgsql lexer, the ecpg lexer, etc. Do I have the wrong version of flex installed? Something else? It's quite annoying, especially as the beta version of Emacs I'm using seems to not be able to parse these lines. In file included from gram.y:9493: scan.c:7050: warning: no previous prototype for 'base_yyget_lineno' [snip] What versions of flex and bison are you using? It's a bit hard to diagnose without knowing that ;-) cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris [EMAIL PROTECTED] writes: This proposed design is overcomplicated and a waste of space. I mean, we reduce storage overhead using phantom command id and variable varlena, but let's just fill it up again with unnecessary junk bytes. We reduced storage overhead using phantom command id by 8 bytes *per tuple*. I hardly think 8 bytes per page is much of a concern. You're already losing an average of 1/2 a tuple per page to rounding and that's a minimum of 16 bytes for the narrowest of tuples. That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already suffering a copy due to our use of read/write the difference between read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be non-zero but very small. Thousands of times quicker than the CRC. Prove it. We've already seen wal CRC checking show up at the top of profiles. Do you really doubt that memcpy is faster than CRC32 checking? Especially when you're already doing memcpy anyways and the only overhead is the few unaligned bytes at the end and the 8 one-byte copies? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Compilation errors
Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark: Does anyone happen to know what it is about my build environment that causes these errors? Nothing. Everybody gets them. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] COMMIT NOWAIT Performance Option
Gregory Stark wrote: Jonah H. Harris [EMAIL PROTECTED] writes: This proposed design is overcomplicated and a waste of space. I mean, we reduce storage overhead using phantom command id and variable varlena, but let's just fill it up again with unnecessary junk bytes. We reduced storage overhead using phantom command id by 8 bytes *per tuple*. I hardly think 8 bytes per page is much of a concern. You're already losing an average of 1/2 a tuple per page to rounding and that's a minimum of 16 bytes for the narrowest of tuples. That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already suffering a copy due to our use of read/write the difference between read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be non-zero but very small. Thousands of times quicker than the CRC. Prove it. We've already seen wal CRC checking show up at the top of profiles. yeah - on fast boxes (diskio wise) wal-crc checking is nearly always on the very top of wal-intensive workloads. Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compilation errors
Peter Eisentraut wrote: Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark: Does anyone happen to know what it is about my build environment that causes these errors? Nothing. Everybody gets them. That's not what the buildfarm shows. example: http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=spongedt=2007-02-28%20113002stg=make cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] VACUUM and spoiling the buffer manager cache
Simon Riggs [EMAIL PROTECTED] writes: VACUUM's current behaviour is to take blocks it has touched and place them on the head of the freelist, allowing them to be reused. No, it puts them at the tail of the freelist. So I am unconvinced by the rest of your argument. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES]
Not that I think that anyone owning both a law degree and a computer in 2007 should legitimately be able to plead innocence here. FAST Australia's lawyers are making themselves look like idiots, and the same for every other company tacking on such notices. I think the real bottom line here is we don't accept patches from idiots. I think we don't accept patches from idiots is a bit harsh. I agree, after all, you've accepted some of my patches and... oh, wait... -- Korry
Re: [HACKERS] Packed short varlenas, what next?
Gregory Stark [EMAIL PROTECTED] writes: I just noticed that last night myself. In particular the GIST modules seems to be a major problem. they define dozens of new objects, many of which are just passing around C data structures internally but some of which are objects which get stored in the database. I have no idea which are which and which ones are varlenas. FWIW, when I went to bed last night I had hstore and intarray working, but was still fooling with ltree. Didn't get to the others yet. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote: We reduced storage overhead using phantom command id by 8 bytes *per tuple*. I hardly think 8 bytes per page is much of a concern. You're already losing an average of 1/2 a tuple per page to rounding and that's a minimum of 16 bytes for the narrowest of tuples. Again, it goes back to competent design. Save space here, waste it there. SQL Server's bit-flipping technique is still *much* better than wasting 1 byte for every 512. We've already seen wal CRC checking show up at the top of profiles. Well, when you consider we're performing a CRC for every log record rather than at the block level, like most other systems, I wouldn't be that surprised. Don't try and use that example as a reason to kill the checksum; it's a completely different use case. Do you really doubt that memcpy is faster than CRC32 checking? Especially when you're already doing memcpy anyways and the only overhead is the few unaligned bytes at the end and the 8 one-byte copies? I'm saying the complexity and implementation of it is going to get you a bit more than you think. If it didn't, you'd already have the thing coded and would be proving me wrong with the patch rather than theory. I can code up a checksum version in an hour or less if you want to give it a go. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] SOC user quotas
Hello hackers, I was starting to think about next SOC and the project for it. And for a long time I wanted to implement the user quotas in PG. So, I'll try to explain my understanding of the implementation, and I'll be happy to hear any comments, objections, or pointings to my misunderstanding. This is very first very rough idea, but I still would like to hear whether it contains some obvious flaws... 1) The main idea is to implement the per-user quota (not per tablespace for example). So, during the creation of the new user some quota can be specified, and after that the size of all the relations *owned* by that user should be limited by that number. 2) I looked into the code, and from my understanding, the main part of the code which should be affected by the quotas is storage/smgr/md.c. If I understand correctly, only functions like mdcreate mdextend really change the size of the user relations (I don't consider things like WAL, and I don't think it should be subject for quota). And it seems to me, that the machinery of smgr/md is moreless enough to control the space occupied by the relations (within some 1 block size precision). 3) How the quota should be controlled: I think, that generally, for all the users which have quotas, the shared memory should contain the number of blocks left from the quota. And each backend extending or truncating the relations owned by the user should appropriately change that number of blocks left in the shared memory. As soon as this number is equal to zero, all the mdcreate, mdextend functions shouldn't do anything but return the error. I don't know, but I hope these functions won't be invoked if the user will do DELETE and/or VACUUM to recover the space ? Also, I'm not completely sure that refusing the call of the mdextend function in the case of quota excess won't lead to any corruption ? (in the case of Btree splits for example ). Any comments ? Thank you. Regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] VACUUM and spoiling the buffer manager cache
On Wed, 2007-02-28 at 11:09 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: VACUUM's current behaviour is to take blocks it has touched and place them on the head of the freelist, allowing them to be reused. No, it puts them at the tail of the freelist. That's a minor point because the freelist is mostly empty, so head == tail in 99.9% of cases. The freelist does start full, but we only *put* things on the freelist when we call InvalidateBuffer() after a DROP TABLE etc, or when we finish with a VACUUM buffer. So after the first few minutes of server operation we hardly ever use the freelist and so will be empty when we start dropping VACUUM'd buffers on it, head or tail. The negative effect on response times and overall performance is clearly noticeable in tests. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Compilation errors
Peter Eisentraut [EMAIL PROTECTED] writes: Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark: Does anyone happen to know what it is about my build environment that causes these errors? Nothing. Everybody gets them. I don't. What version of flex are you guys using? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COMMIT NOWAIT Performance Option
Do you really doubt that memcpy is faster than CRC32 checking? Especially when you're already doing memcpy anyways and the only overhead is the few unaligned bytes at the end and the 8 one-byte copies? I'm saying the complexity and implementation of it is going to get you a bit more than you think. If it didn't, you'd already have the thing coded and would be proving me wrong with the patch rather than theory. I can code up a checksum version in an hour or less if you want to give it a go. LIVE FROM THE WWE, CAGE MATCH! Jonah (the Theorist) Harris versus Greg (the Brain) Stark. What is going to happen between these two brothers in arms when they must both prove their theory! Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] SOC user quotas
Sergey E. Koposov [EMAIL PROTECTED] writes: 1) The main idea is to implement the per-user quota (not per tablespace for example). So, during the creation of the new user some quota can be specified, and after that the size of all the relations *owned* by that user should be limited by that number. This seems impractical as stated; there is no way to determine what a user owns in some other database. Possibly you could do it if the quota were both per-user and per-database. 2) I looked into the code, and from my understanding, the main part of the code which should be affected by the quotas is storage/smgr/md.c. md.c is too low level to do catalog accesses and thus too low level to know who owns what. 3) How the quota should be controlled: I think, that generally, for all the users which have quotas, the shared memory should contain the number of blocks left from the quota. And each backend extending or truncating the relations owned by the user should appropriately change that number of blocks left in the shared memory. What will you do with ALTER TABLE OWNER? What if such a command is rolled back? (Likewise for some other commands such as TRUNCATE, or even just DROP TABLE.) What if there are too many users to fit in your (necessarily fixed size) shared memory area? What sort of contention will there be for access to this area? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] SOC user quotas
Sergey E. Koposov wrote: Hello hackers, I was starting to think about next SOC and the project for it. And for a long time I wanted to implement the user quotas in PG. So, I'll try to explain my understanding of the implementation, and I'll be happy to hear any comments, objections, or pointings to my misunderstanding. This is very first very rough idea, but I still would like to hear whether it contains some obvious flaws... 1) The main idea is to implement the per-user quota (not per tablespace for example). So, during the creation of the new user some quota can be specified, and after that the size of all the relations *owned* by that user should be limited by that number. I could see this being useful per database, maybe. It seems like kind of an odd feature. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Tom Lane wrote: Sergey E. Koposov [EMAIL PROTECTED] writes: 1) The main idea is to implement the per-user quota (not per tablespace for example). So, during the creation of the new user some quota can be specified, and after that the size of all the relations *owned* by that user should be limited by that number. This seems impractical as stated; there is no way to determine what a user owns in some other database. Possibly you could do it if the quota were both per-user and per-database. yes, agreed. I didn't think of that. 3) How the quota should be controlled: I think, that generally, for all the users which have quotas, the shared memory should contain the number of blocks left from the quota. And each backend extending or truncating the relations owned by the user should appropriately change that number of blocks left in the shared memory. What will you do with ALTER TABLE OWNER? What if such a command is rolled back? I don't know, but I guess the ALTER OWNER should be considered differently. It probably should proceed only if it sees that there are enough place to perform the whole operation. If there are, then it should block any writing to the tables of the user, perform the alter owner and unblock everything again. (Likewise for some other commands such as TRUNCATE, or even just DROP TABLE.) I didn't think of yet, but I will. What if there are too many users to fit in your (necessarily fixed size) shared memory area? We really don't need to create the array for all users. We only need to create that array for users 1) having quotas 2) the users, whose tables are accessed at the moment So I don't think that in that case the amount of required space is a problem here. What sort of contention will there be for access to this area? I think, that the only requirement is that the incrementation or decrementation of number of blocks left for each user should be atomic operation. regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] COMMIT NOWAIT Performance Option
I think we need to think about when these CRCs would be read and written. It would be written when it hits the disk, hopefully by the background writer, and I think after a server crash, all pages would have to be read and checked. The good news is that both of these are non-critical paths. --- Jonah H. Harris wrote: On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote: But we've already seen that CRC checks can be expensive. Not everyone will want to take the cpu hit. Storing a byte counter in every block is cheap. CRC checking a page is most certainly the simplest. And, I disagree that it would be worse than either a sequence counter or the full page write. Block checksumming is done at read/write time... which is something that needs to be improved anyway. With a properly tuned bgwriter, the write itself should barely be noticeable. How fast is a CRC of 8K? Last time I checked it was something on the scale of ~95 usec for CRC32 and ~33 usec for sb8. And the idea came from what someone said MSSQL does, so like everyone else -- which isn't a very compelling argument to begin with -- doesn't argue against it. Rather than basing designs on poor second-hand information, maybe you and the person who mentioned this idea should get up-to-date and read the SQL Server storage engine architecture. As of SQL Server 2005, blocks *are* checksummed with CRC32. And, just for the record, previous versions of SQL server performed a bit flipping technique for every 512 bytes in the page header; it did *not* waste a byte for every 512 bytes written. I think the way you would work is to have the smgr note the sequential value it found when it read in a page and then when it writes it out increment that value by one. Conveniently the pages would be 16 bytes shorter than an 8kb page so you have 16 bytes available with every buffer to note information like the last sequential tag the buffer used. This proposed design is overcomplicated and a waste of space. I mean, we reduce storage overhead using phantom command id and variable varlena, but let's just fill it up again with unnecessary junk bytes. That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already suffering a copy due to our use of read/write the difference between read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be non-zero but very small. Thousands of times quicker than the CRC. Prove it. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COMMIT NOWAIT Performance Option
Gregory Stark [EMAIL PROTECTED] writes: We've already seen wal CRC checking show up at the top of profiles. Do you really doubt that memcpy is faster than CRC32 checking? Especially when you're already doing memcpy anyways and the only overhead is the few unaligned bytes at the end and the 8 one-byte copies? Well color me surprised, writev is not nearly so much faster than CRC as I had expected: lseek+write syscall overhead: 7.95 us CRC32 32.54 us writev 26.56 us The reason there's lseek overhead in there is because I had it seek back to the same block repeatedly to (hopefully) avoid any i/o. It seems to have worked as I find it hard to believe these numbers could be so low if there's any i/o being included. I think part of the reason writev is slow is because I'm including the time it took to set up the iovec array. That's 64 word copies write there. And then writev has to read those 64 words back and do 64 extra branches and so on... This is on an Intel T2500 (2Ghz). There is a side issue that tagging each sector is 100% guaranteed to detect torn pages whereas checksums still have a chance of missing them. But usually the scenario where that comes into play is where you have many checksum failures and are ignoring them assuming they never fail. In the case of torn pages there'll only be one torn page and we're going to scream bloody murder if we see it so I don't think that's a big issue. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Joshua D. Drake wrote: I could see this being useful per database, maybe. It seems like kind of an odd feature. Per user AND per database (as Tom noted). But I dont see what's odd in it... It exists in Oracle, and I need quotas in the project on which I'm working. And I remember user requests for quotas in the mailing lists ... regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COMMIT NOWAIT Performance Option
Joshua D. Drake [EMAIL PROTECTED] writes: LIVE FROM THE WWE, CAGE MATCH! Jonah (the Theorist) Harris versus Greg (the Brain) Stark. What is going to happen between these two brothers in arms when they must both prove their theory! Darn, I wish I had seen this post before I posted the results of my testing. So what happens to the loser in WWE cage matches? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote: LIVE FROM THE WWE, CAGE MATCH! Jonah (the Theorist) Harris versus Greg (the Brain) Stark. What is going to happen between these two brothers in arms when they must both prove their theory! Heh, I like it :) -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES]
I have added this to the developer's FAQ to clarify the situtation of posting a patch: liPostgreSQL is licensed under a BSD license. By posting a patch to the public PostgreSQL mailling lists, you are giving the PostgreSQL Global Development Group the non-revokable right to distribute your patch under the BSD license. If you use code that is available under some other license that is BSD compatible (eg. public domain), please note that in your email submission./li --- Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Neil Conway wrote: For the case in question, sure, requiring some clarification from FJ would be reasonable. But more broadly, my point is that I think you're fooling yourself if you think that requiring a disclaimer or explicit transfer of copyright for this *one* particular patch is likely to make any material difference to the overall copyright status of the code base. Yes, I do. If there is an explicit claim, like an email footer or a copyright in the code, we do try to nail that down. AFAICT, the footer in question tries to make it illegal for us even to have the message in our mail archives. If I were running the PG lists, I would install filters that automatically reject mails containing such notices, with a message like Your corporate lawyers do not deserve to have access to the internet. Go away until you've acquired a clue. I fully support Bruce's demand that patches be submitted with no such idiocy attached. regards, tom lane -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COMMIT NOWAIT Performance Option
Bruce Momjian [EMAIL PROTECTED] writes: I think we need to think about when these CRCs would be read and written. It would be written when it hits the disk, hopefully by the background writer, and I think after a server crash, all pages would have to be read and checked. The good news is that both of these are non-critical paths. If you're protecting against torn pages then yes, if the system is shut down uncleanly by a system crash or power failure you would in theory have to scan every page of every table and index before starting up. But if the system was shut down uncleanly as the result of a Postgres crash or fast shutdown of Postgres then that isn't an issue. And many users may prefer to bring the system up as soon as possible as long as they know any corrupt pages will be spotted and throw errors as soon as it's seen. So I think you need a mode that only checks checksums when a page is read from disk. That would protect against torn pages (but not necessarily before bringing up the system) and against bad i/o hardware. Unfortunately memory errors are far more common than disk errors and I it would be much harder to protect against them. You can't check it when someone may be writing to the buffer, which limits you to checking it only when you acquire some form of lock on the buffer. It also means you would have to write it before you release a lock if you've made any changes. Worse, I'm not sure how to handle hint bits though. We currently don't require any lock at all to set hint bits which means someone may think they can check a checksum while or after you've fiddled some bits. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COMMIT NOWAIT Performance Option
Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: LIVE FROM THE WWE, CAGE MATCH! Jonah (the Theorist) Harris versus Greg (the Brain) Stark. What is going to happen between these two brothers in arms when they must both prove their theory! Darn, I wish I had seen this post before I posted the results of my testing. So what happens to the loser in WWE cage matches? Usually the are left bloody and staring at the winner through the cage. Occasionally they are carried out on a stretcher. The good news is, the loser gets a rematch at the next pay per view. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Compilation errors
Tom Lane [EMAIL PROTECTED] writes: Peter Eisentraut [EMAIL PROTECTED] writes: Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark: Does anyone happen to know what it is about my build environment that causes these errors? Nothing. Everybody gets them. I don't. What version of flex are you guys using? flex 2.5.33 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SOC user quotas
Sergey E. Koposov wrote: On Wed, 28 Feb 2007, Joshua D. Drake wrote: I could see this being useful per database, maybe. It seems like kind of an odd feature. Per user AND per database (as Tom noted). But I dont see what's odd in it... It exists in Oracle, and I need quotas in the project on which I'm working. And I remember user requests for quotas in the mailing lists ... Well Oracle isn't really our goal is it? I am not questioning that you are well intended but I just don't see a use case. For example, what happens if I hit my quota? Joshua D. Drake regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES]
Bruce Momjian wrote: I have added this to the developer's FAQ to clarify the situtation of posting a patch: liPostgreSQL is licensed under a BSD license. By posting a patch to the public PostgreSQL mailling lists, you are giving the PostgreSQL Global Development Group the non-revokable right to distribute your patch under the BSD license. If you use code that is available under some other license that is BSD compatible (eg. public domain), please note that in your email submission./li We should add this to the mailing list signup pages and the welcome pages to the lists. Joshua D. Drake --- Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Neil Conway wrote: For the case in question, sure, requiring some clarification from FJ would be reasonable. But more broadly, my point is that I think you're fooling yourself if you think that requiring a disclaimer or explicit transfer of copyright for this *one* particular patch is likely to make any material difference to the overall copyright status of the code base. Yes, I do. If there is an explicit claim, like an email footer or a copyright in the code, we do try to nail that down. AFAICT, the footer in question tries to make it illegal for us even to have the message in our mail archives. If I were running the PG lists, I would install filters that automatically reject mails containing such notices, with a message like Your corporate lawyers do not deserve to have access to the internet. Go away until you've acquired a clue. I fully support Bruce's demand that patches be submitted with no such idiocy attached. regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] COMMIT NOWAIT Performance Option
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: I think we need to think about when these CRCs would be read and written. It would be written when it hits the disk, hopefully by the background writer, and I think after a server crash, all pages would have to be read and checked. The good news is that both of these are non-critical paths. If you're protecting against torn pages then yes, if the system is shut down uncleanly by a system crash or power failure you would in theory have to scan every page of every table and index before starting up. But if the system was shut down uncleanly as the result of a Postgres crash or fast shutdown of Postgres then that isn't an issue. And many users may prefer to bring the system up as soon as possible as long as they know any corrupt pages will be spotted and throw errors as soon as it's seen. I don't think we should start up a system and only detect the errors later. So I think you need a mode that only checks checksums when a page is read from disk. That would protect against torn pages (but not necessarily before bringing up the system) and against bad i/o hardware. Unfortunately memory errors are far more common than disk errors and I it would be much harder to protect against them. You can't check it when someone may be writing to the buffer, which limits you to checking it only when you acquire some form of lock on the buffer. It also means you would have to write it before you release a lock if you've made any changes. Worse, I'm not sure how to handle hint bits though. We currently don't require any lock at all to set hint bits which means someone may think they can check a checksum while or after you've fiddled some bits. Yep, a problem. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES]
Joshua D. Drake wrote: Bruce Momjian wrote: I have added this to the developer's FAQ to clarify the situtation of posting a patch: liPostgreSQL is licensed under a BSD license. By posting a patch to the public PostgreSQL mailling lists, you are giving the PostgreSQL Global Development Group the non-revokable right to distribute your patch under the BSD license. If you use code that is available under some other license that is BSD compatible (eg. public domain), please note that in your email submission./li We should add this to the mailing list signup pages and the welcome pages to the lists. Yep, good idea. Marc? --- Joshua D. Drake --- Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Neil Conway wrote: For the case in question, sure, requiring some clarification from FJ would be reasonable. But more broadly, my point is that I think you're fooling yourself if you think that requiring a disclaimer or explicit transfer of copyright for this *one* particular patch is likely to make any material difference to the overall copyright status of the code base. Yes, I do. If there is an explicit claim, like an email footer or a copyright in the code, we do try to nail that down. AFAICT, the footer in question tries to make it illegal for us even to have the message in our mail archives. If I were running the PG lists, I would install filters that automatically reject mails containing such notices, with a message like Your corporate lawyers do not deserve to have access to the internet. Go away until you've acquired a clue. I fully support Bruce's demand that patches be submitted with no such idiocy attached. regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Joshua D. Drake wrote: Sergey E. Koposov wrote: On Wed, 28 Feb 2007, Joshua D. Drake wrote: Per user AND per database (as Tom noted). But I dont see what's odd in it... It exists in Oracle, and I need quotas in the project on which I'm working. And I remember user requests for quotas in the mailing lists ... Well Oracle isn't really our goal is it? I am not questioning that you are well intended but I just don't see a use case. For example, what happens if I hit my quota? Then you cannot run any queries that extend the size of your relations (for example INSERT, UPDATE etc.). Unless you drop your tables or DELETE something The use case for that is the situation when you provide the access to different people to do something on the DB. The real world example (in which I'm interested) is when the large science project produce a huge amount of data, store it in large database, and let different scientists work on that data, having their little accounts there. (example http://casjobs.sdss.org/CasJobs/Guide.aspx ). That's the way how most of large astronomical projects start to work now. Regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] SOC user quotas
Then you cannot run any queries that extend the size of your relations (for example INSERT, UPDATE etc.). Unless you drop your tables or DELETE something Interesting. Well my two cents is don't go any deeper than database. I.e; don't try and track to the individual relation. Joshua D. Drake The use case for that is the situation when you provide the access to different people to do something on the DB. The real world example (in which I'm interested) is when the large science project produce a huge amount of data, store it in large database, and let different scientists work on that data, having their little accounts there. (example http://casjobs.sdss.org/CasJobs/Guide.aspx ). That's the way how most of large astronomical projects start to work now. Regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SOC user quotas
Sergey E. Koposov [EMAIL PROTECTED] writes: Per user AND per database (as Tom noted). But I dont see what's odd in it... It exists in Oracle, and I need quotas in the project on which I'm working. And I remember user requests for quotas in the mailing lists ... It hasn't ever made it onto the TODO list, which means there's not a consensus that we need it. If it were a simple, small, low-impact patch then you probably wouldn't need to do much convincing that it's an important feature to have, but I'm afraid the patch will be none of those things. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] SOC user quotas
Tom Lane wrote: Sergey E. Koposov [EMAIL PROTECTED] writes: Per user AND per database (as Tom noted). But I dont see what's odd in it... It exists in Oracle, and I need quotas in the project on which I'm working. And I remember user requests for quotas in the mailing lists ... It hasn't ever made it onto the TODO list, which means there's not a consensus that we need it. If it were a simple, small, low-impact patch then you probably wouldn't need to do much convincing that it's an important feature to have, but I'm afraid the patch will be none of those things. Tom what about at just the DB level? E.g; if user foo then pg_database_size may not be than X? I guess the big question would be when do we check though? At each transaction seems like it would add significant overhead, especially if we had to rollback the transaction because it was going to go over their quota. Egad. Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES]
Bruce Momjian [EMAIL PROTECTED] writes: Joshua D. Drake wrote: We should add this to the mailing list signup pages and the welcome pages to the lists. Yep, good idea. Marc? For -patches and -hackers, I agree. It seems a bit legalistic and off-putting for the general lists, though. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Tom Lane wrote: 2) I looked into the code, and from my understanding, the main part of the code which should be affected by the quotas is storage/smgr/md.c. md.c is too low level to do catalog accesses and thus too low level to know who owns what. That's probably a dumb question(I dont know the PG infrastructrure that well), but Is it possible to put the information about the owner into SMgrRelation/Relation structures? As I see the smgrextend() in smgr.c get the SMgrRelation agrument... regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Compilation errors
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Peter Eisentraut [EMAIL PROTECTED] writes: Am Mittwoch, 28. Februar 2007 16:23 schrieb Gregory Stark: Does anyone happen to know what it is about my build environment that causes these errors? Nothing. Everybody gets them. I don't. What version of flex are you guys using? flex 2.5.33 Aha! Known to be broken, iirc. Use flex 2.5.4a cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Possible Bug: high CPU usage for stats collector in 8.2
Darcy Buskermolen wrote: I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector on an 8.2.3 box investigation has lead me to belive that the stats file is written a lot more often that once every 500ms the following shows this behavior. PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3) I ran a time for i in `psql -qtc select generate_series(1,1)`; do psql -qtc select 1 from test where msg_id=$i ; done which took real1m23.288s user0m24.142s sys 0m21.536s to execute, during which time I ran a strace on the stats collector which produces the following output. From this it looks like the stats file is getting rewritten for each connection teardown, not just every 500ms. Process 10061 attached - interrupt to quit Process 10061 detached % time seconds usecs/call callserrors syscall -- --- --- - - 68.14 28.811963 17 1663827 write 18.227.701885 123 62808 12793 poll 11.314.783082 365 13101 rename 0.580.246169 5 50006 recvfrom 0.570.241073 18 13101 open 0.430.182816 14 13101 munmap 0.180.076176 6 13101 mmap 0.170.072746 6 13101 close 0.140.060483 5 13101 setitimer 0.100.041344 3 13101 12793 rt_sigreturn 0.090.039240 3 13101 fstat 0.060.024041 2 13110 getppid -- --- --- - - 100.00 42.281018 1894559 25586 total As you can see rename was called more than the theroitcal 167 times for 500ms slices that elapsed during the test Compared to PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) [All be it this is slower hardware..] time for i in `psql -qtc select generate_series(1,1)`; do psql -qtc select 1 from test where msg_id=$i ; done which took real9m25.380s user6m51.254s sys 1m47.687s (and therefor should be about 1130 stat write cycles) and yielded the following strace % time seconds usecs/call callserrors syscall -- --- --- - - 93.64 20.422006 334 61212 select 3.490.760963 7110192 read 1.820.396654 19 21128 write 0.640.139679 126 1112 rename 0.270.057970 52 1112 open 0.060.012177 11 1112 munmap 0.040.008901 8 1112 mmap 0.030.006402 6 1112 close 0.020.004282 4 1112 fstat -- --- --- - - 100.00 21.809034199204 total During this run the stats collector does not even show and CPU usage according to top. both 8.1 and 8.2 have the following postgresql.conf parameters stats_command_string = off stats_start_collector = on stats_block_level = on stats_row_level = on -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SOC user quotas
On Wed, Feb 28, 2007 at 09:58:52AM -0800, Joshua D. Drake wrote: E.g; if user foo then pg_database_size may not be than X? I guess the big question would be when do we check though? At each transaction seems like it would add significant overhead, especially if we had to rollback the transaction because it was going to go over their quota. Generally, rolling back a transaction doesn't reduce the amount of disk used. Only VACUUM FULL actually shrinks relations. Seem to me if the RelationOpen stores a pointer to a counter that gets incremented on mdextend, it should work reasonably well. Extending doesn't happen that often relative to other database activity. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] SOC user quotas
Martijn van Oosterhout wrote: On Wed, Feb 28, 2007 at 09:58:52AM -0800, Joshua D. Drake wrote: E.g; if user foo then pg_database_size may not be than X? I guess the big question would be when do we check though? At each transaction seems like it would add significant overhead, especially if we had to rollback the transaction because it was going to go over their quota. Generally, rolling back a transaction doesn't reduce the amount of disk used. Only VACUUM FULL actually shrinks relations. Right, but what I mean was -- if we rollback because we hit quota we could potentially cause even more maintenance to have to happen (vacuum). J Seem to me if the RelationOpen stores a pointer to a counter that gets incremented on mdextend, it should work reasonably well. Extending doesn't happen that often relative to other database activity. Have a nice day, -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: But if the system was shut down uncleanly as the result of a Postgres crash or fast shutdown of Postgres then that isn't an issue. And many users may prefer to bring the system up as soon as possible as long as they know any corrupt pages will be spotted and throw errors as soon as it's seen. I don't think we should start up a system and only detect the errors later. Which is, of course, how everyone else does it. On block access, the checksum is verified (if you've turned checksum checking on). I *really* doubt you want to pull in every page in the database at startup time to verify the checksum or sequence. Even pages from the last checkpoint would be a killer. All of the databases (Oracle, SQL Server, DB2) have a way to perform a database corruption check which does go out and verify all checksums. If consistency is stored at the block-level, which is pretty much the only way to avoid full page writes, you have to accept some level of possible corruption. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris wrote: On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: But if the system was shut down uncleanly as the result of a Postgres crash or fast shutdown of Postgres then that isn't an issue. And many users may prefer to bring the system up as soon as possible as long as they know any corrupt pages will be spotted and throw errors as soon as it's seen. I don't think we should start up a system and only detect the errors later. Which is, of course, how everyone else does it. On block access, the checksum is verified (if you've turned checksum checking on). I *really* doubt you want to pull in every page in the database at startup time to verify the checksum or sequence. Even pages from the last checkpoint would be a killer. All of the databases (Oracle, SQL Server, DB2) have a way to perform a database corruption check which does go out and verify all checksums. If consistency is stored at the block-level, which is pretty much the only way to avoid full page writes, you have to accept some level of possible corruption. Am am not comfortable starting and having something fail later. How other databases do it is not an issue for me. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Joshua D. Drake wrote: Tom Lane wrote: Sergey E. Koposov [EMAIL PROTECTED] writes: Per user AND per database (as Tom noted). But I dont see what's odd in it... It exists in Oracle, and I need quotas in the project on which I'm working. And I remember user requests for quotas in the mailing lists ... It hasn't ever made it onto the TODO list, which means there's not a consensus that we need it. If it were a simple, small, low-impact patch then you probably wouldn't need to do much convincing that it's an important feature to have, but I'm afraid the patch will be none of those things. We need this kind of feature in our scientific project I and Sergey are working on. We provide access to big pool of astronomical catalogs and ability to match users data with these huge catalogs and we want to be able to provide sort of QoS. Pg became very popular in Russia, especially after the biggest accounting and enteprize management software developer 1C (about 800,000 installations) has been supporting Pg and I expect a large interest to Pg this year, especially from the application providers, shared environment. btw, this should be announced in -advocacy, I and Teodor worked on Pg port, some patches we have submitted was grown from that work. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris wrote: On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: Am am not comfortable starting and having something fail later. Then do you have some other idea for protecting pages from being torn without storing an entire backup copy or performing a block-level consistency check? We have several methods suggested to check the blocks, like CRC. My point was that, whatever check method we use, we should be prepared to check on startup, or at least make it the default for a crash restart. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: We have several methods suggested to check the blocks, like CRC. My point was that, whatever check method we use, we should be prepared to check on startup, or at least make it the default for a crash restart. Sounds like it should be a guc. I most certainly wouldn't check the entire database, especially if it was over a gigabyte. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris wrote: On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: We have several methods suggested to check the blocks, like CRC. My point was that, whatever check method we use, we should be prepared to check on startup, or at least make it the default for a crash restart. Sounds like it should be a guc. I most certainly wouldn't check the entire database, especially if it was over a gigabyte. Keep in mind if you don't check it on startup, you will be checking it for every read, which for rare crashes, might not be wise. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COMMIT NOWAIT Performance Option
On Wed, 2007-02-28 at 14:10 -0500, Jonah H. Harris wrote: On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: But if the system was shut down uncleanly as the result of a Postgres crash or fast shutdown of Postgres then that isn't an issue. And many users may prefer to bring the system up as soon as possible as long as they know any corrupt pages will be spotted and throw errors as soon as it's seen. I don't think we should start up a system and only detect the errors later. Which is, of course, how everyone else does it. On block access, the checksum is verified (if you've turned checksum checking on). I *really* doubt you want to pull in every page in the database at startup time to verify the checksum or sequence. Even pages from the last checkpoint would be a killer. Under normal operations, shutting down the database does a checkpoint, right? So unless you're in recovery mode, there's no additional cost. And I can't think of any reason you'd need to see any pages before the last checkpoint (unless you don't trust your disk and just want to check all the pages, which is more than we can do now anyway). So the additional cost of doing CRCs every time would be the CPU cost, and also the cost during recovery of reading in all the data pages since the last checkpoint. That's 5 minutes of data, in the default configuration. Regards, Jeff Davis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: Keep in mind if you don't check it on startup, you will be checking it for every read, which for rare crashes, might not be wise. Well understood. That's how most everyone configures their database systems; they certainly don't optimize for torn page detection on a crash. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote: That's 5 minutes of data, in the default configuration. Right, but I don't know anyone that keeps checkpoints at 5 minutes. At least not on OLTP configurations. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris wrote: On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: But if the system was shut down uncleanly as the result of a Postgres crash or fast shutdown of Postgres then that isn't an issue. And many users may prefer to bring the system up as soon as possible as long as they know any corrupt pages will be spotted and throw errors as soon as it's seen. I don't think we should start up a system and only detect the errors later. Which is, of course, how everyone else does it. On block access, the checksum is verified (if you've turned checksum checking on). I *really* doubt you want to pull in every page in the database at startup time to verify the checksum or sequence. Even pages from the last checkpoint would be a killer. AFAICS pages from the last checkpoint would be sufficient, and not that expensive given that we have to pull all the pages touched since last checkpoint from disk to do the WAL replay anyway. All of the databases (Oracle, SQL Server, DB2) have a way to perform a database corruption check which does go out and verify all checksums. I think that's to protect from random disk errors rather than just torn pages. Which might be a useful thing but I think that level of protection belongs to the filesystem or storage hardware. What's the use case for the torn-page detection, anyway? If you don't trust your OS/hardware to protect you from torn-pages, but you do care about your data, surely you would use full_pages_writes = on. If you do trust your OS/hardware, just turn it off. And if you don't care about your data, what's the point? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris wrote: On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote: That's 5 minutes of data, in the default configuration. Right, but I don't know anyone that keeps checkpoints at 5 minutes. At least not on OLTP configurations. Uhmm... most do because most don't ever touch the postgresql.conf and those that do, don't touch checkpoints because they don't understand it. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] 8.1.8 Installer Fails on Win2k Server
I am installing on a Cisco Media Server 7800 running Win2k Server and am noticing that the installer fails every time with the error dialog saying Failed to create process for initdb: Access is denied. It looks like that I need to change some permission in the registry allowing other users spawn processes. Any pointer in this respect is highly appreciated. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COMMIT NOWAIT Performance Option
On Wed, 2007-02-28 at 14:54 -0500, Jonah H. Harris wrote: On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote: That's 5 minutes of data, in the default configuration. Right, but I don't know anyone that keeps checkpoints at 5 minutes. At least not on OLTP configurations. It's got a hard maximum of 1 hour. Also, the need to make checkpoints far apart has been reduced with bgwriter. It will be further reduced with the patch that allows people to tune the bgwriter for their needs. Recovery has to run through all those WAL segments anyway. It's not like we're making a 5 second recovery take 20 minutes, we'd be taking an already long recovery and making it longer (I'm not sure how much longer, but it can't be more than twice as long). I'm not saying there's no cost, but the extra recovery cost seems lower to me than the CRC cost on every data page read during operation. Also, if we find an error, do we even have the ability to correct it? A CRC doesn't tell us which pages were written and which weren't, so we could detect the error but not correct it, right? Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Jeff Davis [EMAIL PROTECTED] wrote: I'm not saying there's no cost, but the extra recovery cost seems lower to me than the CRC cost on every data page read during operation. I agree, I just think it should be configurable. Also, if we find an error, do we even have the ability to correct it? A CRC doesn't tell us which pages were written and which weren't, so we could detect the error but not correct it, right? Correct. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote: Right, but I don't know anyone that keeps checkpoints at 5 minutes. At least not on OLTP configurations. Uhmm... most do because most don't ever touch the postgresql.conf and those that do, don't touch checkpoints because they don't understand it. Yes, I guess I should've qualified that to specify people who do understand the system better. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris wrote: On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote: Right, but I don't know anyone that keeps checkpoints at 5 minutes. At least not on OLTP configurations. Uhmm... most do because most don't ever touch the postgresql.conf and those that do, don't touch checkpoints because they don't understand it. Yes, I guess I should've qualified that to specify people who do understand the system better. I am not trying to pick on the issue but I do think it is important to recognize that literally only those in the know, are going to ever touch the postgresql.conf. I deal with lots of customers, who employ lots of programmers that I have often heard, There is a postgresql.conf?. Why? Because they are user space programmers using an ORM or other such tech that allows them to never actually login to postgresql or a shell or anything once they edit the pg_hba.conf. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Joshua D. Drake [EMAIL PROTECTED] wrote: I am not trying to pick on the issue but I do think it is important to recognize that literally only those in the know, are going to ever touch the postgresql.conf. I agree. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SOC user quotas
Joshua D. Drake [EMAIL PROTECTED] writes: Martijn van Oosterhout wrote: Generally, rolling back a transaction doesn't reduce the amount of disk used. Only VACUUM FULL actually shrinks relations. Right, but what I mean was -- if we rollback because we hit quota we could potentially cause even more maintenance to have to happen (vacuum). It's worse than that, because VACUUM FULL will actually bloat the indexes on the way to being able to reduce the table size (since it has to make new index entries for rows it moves). If the limit is strictly enforced then a user who has reached his quota is really totally screwed: the only easy way to get back under quota will be to completely drop tables, ie, discard data. VACUUM probably won't reduce the physical table size much, and VACUUM FULL will fail, and other approaches such as CLUSTER won't work either. [ thinks for a bit... ] Possibly you could drop all your indexes, VACUUM FULL, reconstruct indexes. But it would be painful and would certainly prevent you from working normally until you finish that maintenance. If the quota limit includes temp files you might find that rebuilding the indexes fails, too, because of the transient space needed to rebuild. Plus, all that forced maintenance activity will be degrading response for other users while it happens. On the whole I'm not convinced that a quota is a good idea. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal for Implenting read-only queries during wal replay (SoC 2007)
Thanks Josh, I'll look for the earlier one and try to add it there... -Paul -Original Message- From: Joshua D. Drake [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 28, 2007 12:09 AM To: Paul Silveira Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Proposal for Implenting read-only queries during wal replay (SoC 2007) Paul Silveira wrote: Hello, I just wanted to voice my opinion for this feature... I've implemented a few Production applicaitons with PostgreSQL now and would die for that feature. Right now, I am constantly trying to find way's to make my data more available. Paul unfortunately you have responded to a hijacked thread. Jonah was speaking about a project that he wishes would have been accepted which was called Full Disjunctions. I have not read the read-only queries during wal replay thread but I can assure you that Jonah's response had nothing to do with it. Joshua D. Drake I've even resulted to using pg_dump to create read only copies of the database and placed them behind load balancers to make the data more available. Something like this would allow me to quickly leverage a read only node to scale out the applicaiton... If it can at all be built, it would get my first, second and third vote. :) Regards, Paul Silveira Jonah H. Harris-2 wrote: On 2/26/07, Bruce Momjian [EMAIL PROTECTED] wrote: Jonah, I have no idea what fault you are trying to blame on the community in the above statement. The author didn't discuss the idea with the community before spending months on it so we have no obligation to accept it in the core. You're missing the point entirely. The majority of the (vocal) community didn't even want the feature and as such, failed to provide viable suggestions for him to move forward. As the majority of the community didn't want the feature, it wouldn't have made a difference when he proposed it; which would have remained negative nonetheless. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Compilation errors
Andrew Dunstan [EMAIL PROTECTED] writes: Gregory Stark wrote: flex 2.5.33 Aha! Known to be broken, iirc. Use flex 2.5.4a No, the known breakages with flex were years ago; 2.5.33 has only been out a year. I think 2.5.31 might have been the one we saw big problems with (there's a note warning against using it on the flex sourceforge page). I think most of us do still use 2.5.4a, but it'd probably be a good idea to check out 2.5.33 and see if it can be made to not generate warnings. I'm certainly tired of seeing the warnings 2.5.4a creates ... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris [EMAIL PROTECTED] writes: Which is, of course, how everyone else does it. I happen to agree with your conclusion but this line of argument is exceptionally unconvincing. In fact in this crowd you'll tend to turn people off and lose people if you say things like that rather than convince anyone of anything. Even pages from the last checkpoint would be a killer. Hm that's an interesting thought. We only really have to check pages that would have received a full page write since the last checkpoint. So if we made turning full page writes off still record the page ids of the pages it *would* have written then we just need the code that normally replays full page writes to check the checksum if the page data isn't available. I can't see how that would be a killer. No matter how large a system you're talking about you're going to tune checkpoints to be occurring at about the same interval anyways. So the amount of time the wal replay checksum checking takes will be more or less constant. In fact we're already reading in most, if not all, of those pages anyways since we're replaying wal records that touch them after all. Would we even have to do anything extra? If we check checksums whenever we read in a page surely the wal replay code would automatically detect any torn pages without any special attention. That also makes it clear just how awful full page writes are for scalability. As you scale up the system but try to keep checkpoint intervals constant you're less and less likely to ever see the same page twice between two checkpoints. So as you scale the system up more and more of the wal will consist of full page writes. All of the databases (Oracle, SQL Server, DB2) have a way to perform a database corruption check which does go out and verify all checksums. Which is pretty poor design. If we implemented a fsck-like tool I would be far more interested in checking things like tuples don't overlap or hint bits are set correctly and so on. Checksums do nothing to protect against software failures which is the only kind of failure with a good rationale for being in an external tool. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Martijn van Oosterhout wrote: Generally, rolling back a transaction doesn't reduce the amount of disk used. Only VACUUM FULL actually shrinks relations. Right, but what I mean was -- if we rollback because we hit quota we could potentially cause even more maintenance to have to happen (vacuum). It's worse than that, because VACUUM FULL will actually bloat the indexes on the way to being able to reduce the table size (since it has to make new index entries for rows it moves). If the limit is strictly enforced then a user who has reached his quota is really totally screwed: the only easy way to get back under quota will be to completely drop tables, ie, discard data. VACUUM probably won't reduce the physical table size much, and VACUUM FULL will fail, and other approaches such as CLUSTER won't work either. [ thinks for a bit... ] Possibly you could drop all your indexes, VACUUM FULL, reconstruct indexes. But it would be painful and would certainly prevent you from working normally until you finish that maintenance. If the quota limit includes temp files you might find that rebuilding the indexes fails, too, because of the transient space needed to rebuild. Plus, all that forced maintenance activity will be degrading response for other users while it happens. On the whole I'm not convinced that a quota is a good idea. On database level it's possible to have soft user quote, just measure disk usage and warn user if database size is over. This could be realized using external tools. But Sergey wanted finer granulation. As a workaround, we could have function which return size of db objects owned by user and let administrator run cron job to realize soft quota. This will not provide foundation for enterprize level of QoS, but we certainly don't want to introduce too much overhead. It's interesting and challenging task though. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Martijn van Oosterhout wrote: Generally, rolling back a transaction doesn't reduce the amount of disk used. Only VACUUM FULL actually shrinks relations. Right, but what I mean was -- if we rollback because we hit quota we could potentially cause even more maintenance to have to happen (vacuum). It's worse than that, because VACUUM FULL will actually bloat the indexes on the way to being able to reduce the table size (since it has to make new index entries for rows it moves). If the limit is strictly enforced then a user who has reached his quota is really totally screwed: the only easy way to get back under quota will be to completely drop tables, ie, discard data. VACUUM probably won't reduce the physical table size much, and VACUUM FULL will fail, and other approaches such as CLUSTER won't work either. I don't know, but in my opinion, I don't see anything bad in requiring dropping the data if the quota is full. That's what usually occurs in the case of normal filesystem quota... If you don't have a space there, you cannot edit files, copy them etc... And that solution should be definitely better than the filesystem quota for the PostgreSQL user for example. regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote: Jonah H. Harris [EMAIL PROTECTED] writes: Which is, of course, how everyone else does it. I happen to agree with your conclusion but this line of argument is exceptionally unconvincing. In fact in this crowd you'll tend to turn people off and lose people if you say things like that rather than convince anyone of anything. Rather than reinventing the wheel, it often pays to piggyback on the solutions others in similar situations have encountered. I'm just stating how others provide similar functionality or capabilities. If someone dislikes an idea just because the major vendors have done it that way, that's their own problem. It's up to the community to decide how to proceed given the information at hand. Even pages from the last checkpoint would be a killer. Hm that's an interesting thought. We only really have to check pages that would have received a full page write since the last checkpoint. That's the only way I see that it could possibly be acceptable from a time-to-recover performance standpoint. I would still prefer a guc. Which is pretty poor design. If we implemented a fsck-like tool I would be far more interested in checking things like tuples don't overlap or hint bits are set correctly and so on. Checksums do nothing to protect against software failures which is the only kind of failure with a good rationale for being in an external tool. Regardless of whether it's better as a separate tool or in the database itself, they provide a corruption-finding/consistency-checking capability. As far as other checks that could be performed, SQL Server and Oracle do have their own internal structure checks; many of which execute at runtime, not as a separate tool or process. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] COMMIT NOWAIT Performance Option
On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: Am am not comfortable starting and having something fail later. Then do you have some other idea for protecting pages from being torn without storing an entire backup copy or performing a block-level consistency check? How other databases do it is not an issue for me. Not saying it is, just stating the facts. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Compilation errors
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Gregory Stark wrote: flex 2.5.33 Aha! Known to be broken, iirc. Use flex 2.5.4a No, the known breakages with flex were years ago; 2.5.33 has only been out a year. I think 2.5.31 might have been the one we saw big problems with (there's a note warning against using it on the flex sourceforge page). I think most of us do still use 2.5.4a, but it'd probably be a good idea to check out 2.5.33 and see if it can be made to not generate warnings. I'm certainly tired of seeing the warnings 2.5.4a creates ... It gives me the same warnings that Greg reported. I guess we could conditionally add prototypes for those functions to all the .l files if you really want to move to 2.5.33. Kinda yucky, though. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Compilation errors
I wrote: I guess we could conditionally add prototypes for those functions to all the .l files if you really want to move to 2.5.33. Kinda yucky, though. Actually, we couldn't.The definition section from the .l file gets included after these functions. So we'd need to include something in gram.y before including scan.c. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Resumable vacuum proposal and design overview
Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: How much memory would it save during VACUUM on a 1 billion row table with 200 million dead rows? Would that reduce the number of cycles a normal non-interrupted VACUUM would perform? It would depend on how many dead tuples you have per-page. If you have a very large table with only one dead tuple per page then it might even be larger. But in the usual case it would be smaller. FWIW, there's some unused bits in current representation, so it might actually be possible to design it so that it's never larger. One optimization to the current structure, instead of switching to a bitmap, would be to store the block number just once for each block, followed by a variable length list of offsets. It'd complicate the binary search though. Also note that it would have to be non-lossy. Yep. Or actually, it might be useful to forget some dead tids if it allowed you to memorize a larger number of other dead tids. Hmm, what a weird thought :). Another insight I had while thinking about this is that the dead tid list behaves quite nicely from a OS memory management point of view. In the 1st vacuum phase, the array is filled in sequence, which means that the OS can swap out the early parts of it and use the memory for buffer cache instead. In the index scan phase, it's randomly accessed, but if the table is clustered, it's in fact not completely random access. In the 2nd vacuum pass, the array is scanned sequentially again. I'm not sure how that works out in practice, but you might want to use a larger maintenance_work_mem than you'd think. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] SOC user quotas
On Wed, Feb 28, 2007 at 03:57:56PM -0500, Tom Lane wrote: It's worse than that, because VACUUM FULL will actually bloat the indexes on the way to being able to reduce the table size (since it has to make new index entries for rows it moves). If the limit is strictly I was thinking that indexes and temp tables wouldn't be counted. I thought it was more of a stop people using up lots of disk space rather than specifically stopping at a hard limit. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] SOC user quotas
I don't know, but in my opinion, I don't see anything bad in requiring dropping the data if the quota is full. That's what usually occurs in the case of normal filesystem quota... If you don't have a space there, you cannot edit files, copy them etc... And that solution should be definitely better than the filesystem quota for the PostgreSQL user for example. The bad point is not that we would rollback the transaction. The bad point is what happens when you need to rollback a transaction and in your scenario it is quite plausible that a large rollback could occur, more than once, causing the requirement of something like a vacuum full to clean things up. Sincerely, Joshua D. Drake regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2
Joshua D. Drake wrote: Darcy Buskermolen wrote: I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector on an 8.2.3 box investigation has lead me to belive that the stats file is written a lot more often that once every 500ms the following shows this behavior. I have just done a test separate from darcy on my workstation: [EMAIL PROTECTED]:~/82$ strace -c -p 16130 Process 16130 attached - interrupt to quit Process 16130 detached % time seconds usecs/call callserrors syscall -- --- --- - - 50.000.167103 3 60264 10225 poll 28.310.094600 9 10398 rename 16.890.056457 5 10398 open 2.480.008278 0 50023 recvfrom 1.140.003804 0 10398 setitimer 0.530.001781 0 20796 write 0.430.001432 0 10398 close 0.210.000690 0 10398 munmap 0.020.57 0 10398 mmap 0.000.00 0 10398 fstat 0.000.00 0 10398 10225 rt_sigreturn 0.000.00 0 10414 getppid -- --- --- - - 100.000.334202224681 20450 total Query: time for i in `bin/psql -p8000 -d postgres -c select generate_series(1,1)`; do bin/psql -p8000 -d postgres -qc select 1 from pg_database where datname = 'postgres'; done; Time: real2m5.077s user0m28.414s sys 0m39.762s PostgreSQL 8.2.0, Ubuntu Edgy 64bit. Seems like something is extremely wonky here. Joshua D. Drake PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3) I ran a time for i in `psql -qtc select generate_series(1,1)`; do psql -qtc select 1 from test where msg_id=$i ; done which took real1m23.288s user0m24.142s sys 0m21.536s to execute, during which time I ran a strace on the stats collector which produces the following output. From this it looks like the stats file is getting rewritten for each connection teardown, not just every 500ms. Process 10061 attached - interrupt to quit Process 10061 detached % time seconds usecs/call callserrors syscall -- --- --- - - 68.14 28.811963 17 1663827 write 18.227.701885 123 62808 12793 poll 11.314.783082 365 13101 rename 0.580.246169 5 50006 recvfrom 0.570.241073 18 13101 open 0.430.182816 14 13101 munmap 0.180.076176 6 13101 mmap 0.170.072746 6 13101 close 0.140.060483 5 13101 setitimer 0.100.041344 3 13101 12793 rt_sigreturn 0.090.039240 3 13101 fstat 0.060.024041 2 13110 getppid -- --- --- - - 100.00 42.281018 1894559 25586 total As you can see rename was called more than the theroitcal 167 times for 500ms slices that elapsed during the test Compared to PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) [All be it this is slower hardware..] time for i in `psql -qtc select generate_series(1,1)`; do psql -qtc select 1 from test where msg_id=$i ; done which took real9m25.380s user6m51.254s sys 1m47.687s (and therefor should be about 1130 stat write cycles) and yielded the following strace % time seconds usecs/call callserrors syscall -- --- --- - - 93.64 20.422006 334 61212 select 3.490.760963 7110192 read 1.820.396654 19 21128 write 0.640.139679 126 1112 rename 0.270.057970 52 1112 open 0.060.012177 11 1112 munmap 0.040.008901 8 1112 mmap 0.030.006402 6 1112 close 0.020.004282 4 1112 fstat -- --- --- - - 100.00 21.809034199204 total During this run the stats collector does not even show and CPU usage according to top. both 8.1 and 8.2 have the following postgresql.conf parameters stats_command_string = off stats_start_collector = on stats_block_level = on stats_row_level = on -- === The PostgreSQL
Re: [HACKERS] COMMIT NOWAIT Performance Option
On Wed, 2007-02-28 at 21:13 +, Gregory Stark wrote: Hm that's an interesting thought. We only really have to check pages that would have received a full page write since the last checkpoint. So if we made Do we ever do a partial page write, or is what you're saying equivalent to we only have to check pages that have been written to since the last checkpoint? And if it is the same, can't we get the pages that were written to from the ctids in the wal records? Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] SOC user quotas
On Wed, Feb 28, 2007 at 12:56:13PM -0500, Tom Lane wrote: It hasn't ever made it onto the TODO list, which means there's not a consensus that we need it. Such a patch could improve the acceptance of PostgreSQL in shared hosting environments. Note that a database without quotas can be filled up easily and the database will stop serving requests to other users' databases. There is a quota implementation already in the archives but I don't know more about it than that it exists: http://archives.postgresql.org/pgsql-hackers/2004-07/msg00392.php Joachim ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Joshua D. Drake wrote: The bad point is not that we would rollback the transaction. The bad point is what happens when you need to rollback a transaction and in your scenario it is quite plausible that a large rollback could occur, more than once, causing the requirement of something like a vacuum full to clean things up. Yes, I understand, but I think, the scenario of working and always hitting a quota is not normal. I think you shouldn't expect perfect, stable performance etc. in that case. It's similar to if you'll try to work with PG with very little memory and/or shared memory. You can work, but you can easily hit these limits, and than something will not work... regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] SOC user quotas
On Wed, 28 Feb 2007, Joachim Wieland wrote: On Wed, Feb 28, 2007 at 12:56:13PM -0500, Tom Lane wrote: It hasn't ever made it onto the TODO list, which means there's not a consensus that we need it. Such a patch could improve the acceptance of PostgreSQL in shared hosting environments. Note that a database without quotas can be filled up easily and the database will stop serving requests to other users' databases. Yes, I agree. There is a quota implementation already in the archives but I don't know more about it than that it exists: http://archives.postgresql.org/pgsql-hackers/2004-07/msg00392.php I remember that thread, but I think there was no patch at all, at least I didn't see it. Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compilation errors
Andrew Dunstan [EMAIL PROTECTED] writes: Actually, we couldn't.The definition section from the .l file gets included after these functions. So we'd need to include something in gram.y before including scan.c. Actually, since we don't use any of those functions, the following advice from the 2.5.33 flex manual seems useful: A number of options are available for lint purists who want to suppress the appearance of unneeded routines in the generated scanner. Each of the following, if unset (e.g., `%option nounput'), results in the corresponding routine not appearing in the generated scanner: input, unput yy_push_state, yy_pop_state, yy_top_state yy_scan_buffer, yy_scan_bytes, yy_scan_string yyget_extra, yyset_extra, yyget_leng, yyget_text, yyget_lineno, yyset_lineno, yyget_in, yyset_in, yyget_out, yyset_out, yyget_lval, yyset_lval, yyget_lloc, yyset_lloc, yyget_debug, yyset_debug (though `yy_push_state()' and friends won't appear anyway unless you use `%option stack)'. However, I'm not sure whether 2.5.4 will choke on unrecognized %option commands --- if it has to be conditional on version it'll be a PITA. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] COMMIT NOWAIT Performance Option
On Tue, 2007-02-27 at 09:32 -0800, Josh Berkus wrote: Simon, I think if you address me in a mail, it would be best not to explicitly *remove* my name from the address list. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris wrote: On 2/28/07, Bruce Momjian [EMAIL PROTECTED] wrote: Am am not comfortable starting and having something fail later. Then do you have some other idea for protecting pages from being torn without storing an entire backup copy or performing a block-level consistency check? I want the startup to fail. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] COMMIT NOWAIT Performance Option
Jonah H. Harris [EMAIL PROTECTED] writes: On 2/28/07, Gregory Stark [EMAIL PROTECTED] wrote: Jonah H. Harris [EMAIL PROTECTED] writes: Which is, of course, how everyone else does it. I happen to agree with your conclusion but this line of argument is exceptionally unconvincing. In fact in this crowd you'll tend to turn people off and lose people if you say things like that rather than convince anyone of anything. Rather than reinventing the wheel, it often pays to piggyback on the solutions others in similar situations have encountered. I'm just stating how others provide similar functionality or capabilities. If someone dislikes an idea just because the major vendors have done it that way, that's their own problem. It's up to the community to decide how to proceed given the information at hand. Except that's not what you're doing. There's nothing wrong with saying foo does this clever thing I think we should copy because insert argument. Nor even foo does this thing, would that help us? But what you seem to be saying is *Because* foo does this thing we can conclude it's a clever thing and we should do it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate