Re: [PATCHES] Maintaining cluster order on insert
On 5/27/07, Jim C. Nasby <[EMAIL PROTECTED]> wrote: On Mon, May 21, 2007 at 10:48:59AM +0100, Heikki Linnakangas wrote: > > IOW it's working as designed. But maybe it's not the desired behavior. > Should we have a special case and always respect the fillfactor when > inserting to the last page of the heap? I think that would be following with least surprise. What's the status of this patch? are we waiting an update? AFAIU, it's not fair to say that the patch maintain cluster order... it just try to keep similar rows on the same page if possible (it's not the same thing)... if it can't then it simply insert at the last page as usual but we have wasted time in the try... so the real question is if there is any performance win on this... have you some numbers? another question: if the fillfactor is 100% then is a complete waste of time to look for a suggested block. maybe we could check for that? -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, revised patch
Heikki Linnakangas wrote: > Alvaro Herrera wrote: > > if (BgWriterShmem->ckpt_time_warn && elapsed_secs < > > CheckPointWarning) > > ereport(LOG, > > (errmsg("checkpoints are occurring too > > frequently (%d seconds apart)", > > elapsed_secs), > > errhint("Consider increasing the > > configuration parameter > > \"checkpoint_segments\"."))); > > BgWriterShmem->ckpt_time_warn = false; > > In the extremely unlikely event that RequestCheckpoint sets > ckpt_time_warn right before it's cleared, after the test in the > if-statement, the warning is missed. I think this code should look like this: if (BgWriterShmem->ckpt_time_warn) { BgWriterShmem->chpt_time_warn = false; if (elapsed_secs < CheckPointWarning) ereport(LOG, (errmsg("checkpoints are occurring too frequently (%d seconds apart)", elapsed_secs), errhint("Consider increasing the configuration parameter \"checkpoint_segments\"."))); } That way seems safer. (I am assuming that a process other than the bgwriter is able to set the ckpt_time_warn bit; otherwise there is no point). This is also used in pmsignal.c. Of course, as you say, this is probably very harmless, but in the other case it is important to get it right. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "Hackers share the surgeon's secret pleasure in poking about in gross innards, the teenager's secret pleasure in popping zits." (Paul Graham) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, revised patch
Alvaro Herrera wrote: Heikki Linnakangas wrote: - The signaling between RequestCheckpoint and bgwriter is a bit tricky. Bgwriter now needs to deal immediate checkpoint requests, like those coming from explicit CHECKPOINT or CREATE DATABASE commands, differently from those triggered by checkpoint_segments. I'm afraid there might be race conditions when a CHECKPOINT is issued at the same instant as checkpoint_segments triggers one. What might happen then is that the checkpoint is performed lazily, spreading the writes, and the CHECKPOINT command has to wait for that to finish which might take a long time. I have not been able to convince myself neither that the race condition exists or that it doesn't. Isn't it just a matter of having a flag to tell whether the checkpoint should be quick or spread out, and have a command set the flag if a checkpoint is already running? Hmm. Thinking about this some more, the core problem is that when starting the checkpoint, bgwriter needs to read and clear the flag. Which is not atomic, as the patch stands. I think we already have a race condition with ckpt_time_warn. The code to test and clear the flag does this: if (BgWriterShmem->ckpt_time_warn && elapsed_secs < CheckPointWarning) ereport(LOG, (errmsg("checkpoints are occurring too frequently (%d seconds apart)", elapsed_secs), errhint("Consider increasing the configuration parameter \"checkpoint_segments\"."))); BgWriterShmem->ckpt_time_warn = false; In the extremely unlikely event that RequestCheckpoint sets ckpt_time_warn right before it's cleared, after the test in the if-statement, the warning is missed. That's a very harmless and theoretical event, you'd have to run CHECKPOINT (or another command that triggers a checkpoint) at the same instant that an xlog switch triggers one, and all that happens is that you don't get the message in the log while you should. So this is not something to worry about in this case, but it would be more severe if we had the same problem in deciding if a checkpoint should be spread out or not. I think we just have to protect those signaling flags with a lock. It's not like it's on a critical path, and though we don't know what locks the callers to RequestCheckpoint hold, as long as we don't acquire any other locks while holding the new proposed lock, there's no danger of deadlocks. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Load Distributed Checkpoints, revised patch
Michael Paesold wrote: Heikki Linnakangas wrote: Here's an updated WIP version of the LDC patch. I just spreads the writes, that achieves the goal of smoothing the checkpoint I/O spikes. I think sorting the writes etc. is interesting but falls in the category of further development and should be pushed to 8.4. Why do you think so? Is it too much risk to adapt the sorted writes? The numbers shown by ITAGAKI Takahiro looked quite impressive, at least for large shared_buffers configurations. The reactions where rather positive, too. Well, it is a very recent idea, and it's not clear that it's a win under all circumstances. Adopting that would need more testing, and at this late stage I'd like to just wrap up what we have and come back to this idea for 8.4. If someone performs the tests with different hardware and workloads, I would be willing to consider it; the patch is still at an early stage but it's very isolated change and should therefore be easy to review. But if someone has the hardware and time perform those tests, I'd like them to perform more testing with just LDC first. As Josh pointed out, it would be good to test it with more oddball workloads, all the tests I've done this far have been with DBT-2. In general, I am hoping that this patch, together with "Automatic adjustment of bgwriter_lru_maxpages" will finally make default postgresql configurations experience much less impact from check points. For my tast, postgresql has recently got way to many nobs which one must tweak by hand... I welcome any approach on auto-tuning (and auto vacuum!). Sure, but that's another topic. Patch status says "waiting on update from author": http://archives.postgresql.org/pgsql-patches/2007-04/msg00331.php Any updates on this? No. I'm not actually clear what we're waiting for and from whom; I know I haven't had the time to review that in detail yet. IIRC we've seen two very similar patches in the discussions, one from Itagaki-san, and one from Greg Smith. I'm not sure which one of them we should use, they both implement roughly the same thing. But the biggest thing needed for that patch is testing with different workloads. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Load Distributed Checkpoints, revised patch
Heikki Linnakangas wrote: > - The signaling between RequestCheckpoint and bgwriter is a bit tricky. > Bgwriter now needs to deal immediate checkpoint requests, like those > coming from explicit CHECKPOINT or CREATE DATABASE commands, differently > from those triggered by checkpoint_segments. I'm afraid there might be > race conditions when a CHECKPOINT is issued at the same instant as > checkpoint_segments triggers one. What might happen then is that the > checkpoint is performed lazily, spreading the writes, and the CHECKPOINT > command has to wait for that to finish which might take a long time. I > have not been able to convince myself neither that the race condition > exists or that it doesn't. Isn't it just a matter of having a flag to tell whether the checkpoint should be quick or spread out, and have a command set the flag if a checkpoint is already running? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] WIP: script binaries renaming
Zdenek Kotala <[EMAIL PROTECTED]> writes: > I think this patch has no (or small) impact on functionality and it > should be committed to 8.3 This missed the feature freeze deadline by well over two months. It is not a candidate for 8.3. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] WIP: script binaries renaming
Zdenek Kotala wrote: I think this patch has no (or small) impact on functionality and it should be committed to 8.3 That's really not the test we apply. We are in feature freeze, which means the only things not on the queue that should get in are bug fixes. If we don't stick to that we'll never get a release out. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] WIP: script binaries renaming
I attach patch which renames following binaries createdb createlang createuser dropdb droplang dropuser clusterdb vacuumdb reindexdb to pg_createdb pg_createlang pg_createuser pg_dropdb pg_droplang pg_dropuser pg_clusterdb pg_vacuumdb pg_reindexdb Symlinks (or copy on win32) are created for backward compatibility. This renaming was discussed there: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00145.php I'm not yet implemented WARNING message when non pg_* command is used. and I'm going to modify documentation. Is there any other place then http://www.postgresql.org/docs/8.2/interactive/reference-client.html? I think this patch has no (or small) impact on functionality and it should be committed to 8.3 Please let me know a comments Index: src/bin/scripts/Makefile === RCS file: /projects/cvsroot/pgsql/src/bin/scripts/Makefile,v retrieving revision 1.37 diff -c -r1.37 Makefile *** src/bin/scripts/Makefile 5 Jan 2007 22:19:50 - 1.37 --- src/bin/scripts/Makefile 15 Jun 2007 13:51:20 - *** *** 14,37 top_builddir = ../../.. include $(top_builddir)/src/Makefile.global ! PROGRAMS = createdb createlang createuser dropdb droplang dropuser clusterdb vacuumdb reindexdb override CPPFLAGS := -DFRONTEND -I$(top_srcdir)/src/bin/pg_dump -I$(top_srcdir)/src/bin/psql -I$(libpq_srcdir) $(CPPFLAGS) all: submake-libpq submake-backend $(PROGRAMS) ! %: %.o $(WIN32RES) $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LIBS) -o [EMAIL PROTECTED](X) ! createdb: createdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! createlang: createlang.o common.o print.o mbprint.o ! createuser: createuser.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! dropdb: dropdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! droplang: droplang.o common.o print.o mbprint.o ! dropuser: dropuser.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! clusterdb: clusterdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! vacuumdb: vacuumdb.o common.o ! reindexdb: reindexdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o dumputils.c: % : $(top_srcdir)/src/bin/pg_dump/% rm -f $@ && $(LN_S) $< . --- 14,37 top_builddir = ../../.. include $(top_builddir)/src/Makefile.global ! PROGRAMS = pg_createdb pg_createlang pg_createuser pg_dropdb pg_droplang pg_dropuser pg_clusterdb pg_vacuumdb pg_reindexdb override CPPFLAGS := -DFRONTEND -I$(top_srcdir)/src/bin/pg_dump -I$(top_srcdir)/src/bin/psql -I$(libpq_srcdir) $(CPPFLAGS) all: submake-libpq submake-backend $(PROGRAMS) ! pg_%: %.o $(WIN32RES) $(CC) $(CFLAGS) $^ $(libpq_pgport) $(LDFLAGS) $(LIBS) -o [EMAIL PROTECTED](X) ! pg_createdb: createdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! pg_createlang: createlang.o common.o print.o mbprint.o ! pg_createuser: createuser.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! pg_dropdb: dropdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! pg_droplang: droplang.o common.o print.o mbprint.o ! pg_dropuser: dropuser.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! pg_clusterdb: clusterdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o ! pg_vacuumdb: vacuumdb.o common.o ! pg_reindexdb: reindexdb.o common.o dumputils.o $(top_builddir)/src/backend/parser/keywords.o dumputils.c: % : $(top_srcdir)/src/bin/pg_dump/% rm -f $@ && $(LN_S) $< . *** *** 45,59 install: all installdirs ! $(INSTALL_PROGRAM) createdb$(X) '$(DESTDIR)$(bindir)'/createdb$(X) ! $(INSTALL_PROGRAM) dropdb$(X) '$(DESTDIR)$(bindir)'/dropdb$(X) ! $(INSTALL_PROGRAM) createlang$(X) '$(DESTDIR)$(bindir)'/createlang$(X) ! $(INSTALL_PROGRAM) droplang$(X) '$(DESTDIR)$(bindir)'/droplang$(X) ! $(INSTALL_PROGRAM) createuser$(X) '$(DESTDIR)$(bindir)'/createuser$(X) ! $(INSTALL_PROGRAM) dropuser$(X) '$(DESTDIR)$(bindir)'/dropuser$(X) ! $(INSTALL_PROGRAM) clusterdb$(X) '$(DESTDIR)$(bindir)'/clusterdb$(X) ! $(INSTALL_PROGRAM) vacuumdb$(X) '$(DESTDIR)$(bindir)'/vacuumdb$(X) ! $(INSTALL_PROGRAM) reindexdb$(X) '$(DESTDIR)$(bindir)'/reindexdb$(X) installdirs: $(mkinstalldirs) '$(DESTDIR)$(bindir)' --- 45,90 install: all installdirs ! $(INSTALL_PROGRAM) pg_createdb$(X) '$(DESTDIR)$(bindir)'/pg_createdb$(X) ! $(INSTALL_PROGRAM) pg_dropdb$(X) '$(DESTDIR)$(bindir)'/pg_dropdb$(X) ! $(INSTALL_PROGRAM) pg_createlang$(X) '$(DESTDIR)$(bindir)'/pg_createlang$(X) ! $(INSTALL_PROGRAM) pg_droplang$(X) '$(DESTDIR)$(bindir)'/pg_droplang$(X) ! $(INSTALL_PROGRAM) pg_createuser$(X) '$(DESTDIR)$(bindir)'/pg_createuser$(X) ! $(INSTALL_PROGRAM) pg_dropuser$(X) '$(DESTDIR)$(bindir)'/pg_dropuser$(X) ! $(INSTALL_PROGRAM) pg_clusterdb$(X) '$(DESTDIR)
Re: [PATCHES] Load Distributed Checkpoints, revised patch
Heikki Linnakangas wrote: Here's an updated WIP version of the LDC patch. I just spreads the writes, that achieves the goal of smoothing the checkpoint I/O spikes. I think sorting the writes etc. is interesting but falls in the category of further development and should be pushed to 8.4. Why do you think so? Is it too much risk to adapt the sorted writes? The numbers shown by ITAGAKI Takahiro looked quite impressive, at least for large shared_buffers configurations. The reactions where rather positive, too. In general, I am hoping that this patch, together with "Automatic adjustment of bgwriter_lru_maxpages" will finally make default postgresql configurations experience much less impact from check points. For my tast, postgresql has recently got way to many nobs which one must tweak by hand... I welcome any approach on auto-tuning (and auto vacuum!). Patch status says "waiting on update from author": http://archives.postgresql.org/pgsql-patches/2007-04/msg00331.php Any updates on this? Best Regards Michael Paesold ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PATCHES] Load Distributed Checkpoints, revised patch
Here's an updated WIP version of the LDC patch. I just spreads the writes, that achieves the goal of smoothing the checkpoint I/O spikes. I think sorting the writes etc. is interesting but falls in the category of further development and should be pushed to 8.4. The documentation changes are not complete, GUC variables need descriptions, and some of the DEBUG elogs will go away in favor of the separate checkpoint logging patch that's in the queue. I'm fairly happy with the code now, but there's a few minor open issues: - What units should we use for the new GUC variables? From implementation point of view, it would be simplest if checkpoint_write_rate is given as pages/bgwriter_delay, similarly to bgwriter_*_maxpages. I never liked those *_maxpages settings, though, a more natural unit from users perspective would be KB/s. - The signaling between RequestCheckpoint and bgwriter is a bit tricky. Bgwriter now needs to deal immediate checkpoint requests, like those coming from explicit CHECKPOINT or CREATE DATABASE commands, differently from those triggered by checkpoint_segments. I'm afraid there might be race conditions when a CHECKPOINT is issued at the same instant as checkpoint_segments triggers one. What might happen then is that the checkpoint is performed lazily, spreading the writes, and the CHECKPOINT command has to wait for that to finish which might take a long time. I have not been able to convince myself neither that the race condition exists or that it doesn't. A few notes about the implementation: - in bgwriter loop, CheckArchiveTimeout always calls time(NULL), while previously it used the value returned by another call earlier in the same codepath. That means we now call time(NULL) twice instead of once per bgwriter iteration, when archive_timout is set. That doesn't seem significant to me, so I didn't try to optimize it. - because of a small change in the meaning of force_checkpoint flag in bgwriter loop, checkpoints triggered by reaching checkpoint_segments call CreateCheckPoint(false, false) instead of CreateCheckPoint(false, true). That second argument is the "force"-flag. If it's false, CreateCheckPoint skips the checkpoint if there's been no WAL activity since last checkpoint. It doesn't matter in this case, there surely has been WAL activity if we reach checkpoint_segments, and doing the check isn't that expensive. - to coordinate the writes with with checkpoint_segments, we need to read the WAL insertion location. To do that, we need to acquire the WALInsertLock. That means that in the worst case, WALInsertLock is acquired every bgwriter_delay when a checkpoint is in progress. I don't think that's a problem, it's only held for a very short duration, but I thought I'd mention it. - How should we deal with changing GUC variables that affect LDC, on the fly when a checkpoint is in progress? The attached patch finishes the in-progress checkpoint ASAP, and reloads the config after that. We could reload the config immediately, but making the new settings effective immediately is not trivial. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: doc/src/sgml/config.sgml === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.126 diff -c -r1.126 config.sgml *** doc/src/sgml/config.sgml 7 Jun 2007 19:19:56 - 1.126 --- doc/src/sgml/config.sgml 15 Jun 2007 09:35:32 - *** *** 1565,1570 --- 1565,1586 + + checkpoint_write_percent (floating point) + +checkpoint_write_percent configuration parameter + + + + To spread works in checkpoints, each checkpoint spends the specified + time and delays to write out all dirty buffers in the shared buffer + pool. The default value is 50.0 (50% of checkpoint_timeout). + This parameter can only be set in the postgresql.conf + file or on the server command line. + + + + checkpoint_warning (integer) Index: src/backend/access/transam/xlog.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.272 diff -c -r1.272 xlog.c *** src/backend/access/transam/xlog.c 31 May 2007 15:13:01 - 1.272 --- src/backend/access/transam/xlog.c 15 Jun 2007 08:14:18 - *** *** 398,404 static void exitArchiveRecovery(TimeLineID endTLI, uint32 endLogId, uint32 endLogSeg); static bool recoveryStopsHere(XLogRecord *record, bool *includeThis); ! static void CheckPointGuts(XLogRecPtr checkPointRedo); static bool XLogCheckBuffer(XLogRecData *rdata, bool doPageWrites, XLogRecPtr *lsn, BkpBlock *bkpb); --- 398,404 static void exitArchiveRecovery(TimeLineID endTLI,