Re: [HACKERS] Scalable postgresql using sys_epoll
On Wed, 10 Mar 2004, Shachar Shemesh wrote: IBM has rewritten their Domino database system to use the new sys_epoll call available in the Linux 2.6 kernel. Would Postgresql benefit from using this API? Is anyone looking at this? I'm not familiar enough with the postgres internals, but is using libevent (http://monkey.org/~provos/libevent/) an option? It uses state triggered, rather than edge triggered, interface, and it automatically selects the best API for the job (epoll, poll, select). I'm not sure whether it's available for all the platforms postgres is available for. libevent is cool, but postgres uses a process-per-client model, so the number of file descriptors of active interest to a backend at any given time is low. Matthew. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Named arguments in function calls
On Mon, 26 Jan 2004, Tom Lane wrote: If that was IS, then foo(x is 13) makes sense. I like that syntax. For example select interest(amount is 500.00, rate is 1.3) is very readable, yet brief. On second thought though, it doesn't work. select func(x is null); is ambiguous, especially if func() accepts boolean. You're unlikely to care, but Oracle's syntax is Perlish: select interest(amount = 500.0, rate = 1.3); That'd be ambiguous again, though. Perhaps: select interest(amount := 500.0, rate := 1.3); ? Matthew. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Preventing stack-overflow crashes (improving on
On Wed, 31 Dec 2003, Tom Lane wrote: Is ABS enough on a 64-bit architecture ? That was pseudocode, I wasn't actually planning to rely on a function. Something more like longdiff; FWIW, ISO has a ptrdiff_t, which may be useful here. Matthew. diff = stack_base_ptr - stack_top_loc; if (diff 0) diff = -diff; if (diff max) elog ... regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Pre-allocation of shared memory ...
On Sat, 14 Jun 2003, Andrew Dunstan wrote: The trouble with this advice is that if I am an SA wanting to run a DBMS server, I will want to run a kernel supplied by a vendor, not an arbitrary kernel released by a developer, even one as respected as Alan Cox. Like, say, Red Hat: $ ls -l /proc/sys/vm/overcommit_memory -rw-r--r--1 root root0 Jun 14 18:58 /proc/sys/vm/overcommit_memory $ uname -a Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux (This is a Rawhide kernel, but I think that control has been in stock RH kernels for some time now.) Matthew. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Pre-allocation of shared memory ...
On Sat, 14 Jun 2003, Kurt Roeckx wrote: $ ls -l /proc/sys/vm/overcommit_memory -rw-r--r--1 root root0 Jun 14 18:58 /proc/sys/vm/overcommit_memory $ uname -a Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21. This might also be interesting: http://www.cs.helsinki.fi/linux/linux-kernel/2002-33/0826.html I couldn't say how much of it is in the stock RH kernels, or how successful the heuristic is. Matthew. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Suggestion; WITH VACUUM option
On Tue, 17 Dec 2002, mlw wrote: update largetable set foo=bar; Lets also assume that largetable has tens of millions of rows. [..] On some of my databases a statement which updates all the rows is unworkable in PostgreSQL, on Oracle, however, there is no poblem. .. provided you have a lot of rollback space, which is essentially what the datafile growth here is providing. Matthew. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] HEADS UP: Win32/OS2/BeOS native ports
On Mon, 6 May 2002, Tom Lane wrote: As a backend is started up, connect to that socket ... if socket is open when trying to start a new frontend, fail as there are currently other connections attached to it? But the backends would only have the socket open, they'd not be actively listening to it. So how could you tell whether anyone had the socket open or not? It's easy. As startup, the postmaster (or standalone backend) creates a Unix socket, binds it to the filename and calls listen on it. If another backend is running, it'll get EADDRINUSE from the bind or listen. Nobody actually needs to connect to the socket. Simple, race-free, 10 lines of code. Matthew. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] HEADS UP: Win32/OS2/BeOS native ports
On Tue, 7 May 2002, Tom Lane wrote: Nobody actually needs to connect to the socket. Simple, race-free, 10 lines of code. ... and we already do it. But it protects the port number, not the data directory. If I understood him correctly, Marc was suggesting a further domain socket inside the data directory. Matthew. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] HEADS UP: Win32/OS2/BeOS native ports
On Fri, 3 May 2002, Tom Lane wrote: But what we must *not* do is allow a new postmaster to start while the old backends are still running; that would mean two sets of backends running without contact with each other, which would be fatal for data integrity. The SysV API lets us detect that case, but I don't see any equally good way to do it if we are using anonymous shared memory. It's a hack (and has slight security implications), but you could just allow the postgres backends to keep the listening socket(s) open. Matthew. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Bitmap indexes?
On Tue, 19 Mar 2002, Oleg Bartunov wrote: Sorry to reply over you, Oleg. On 13 Mar 2002, Greg Copeland wrote: One of the reasons why I originally stated following the hackers list is because I wanted to implement bitmap indexes. I found in the archives, the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which was extracted from this, http://groups.google.com/groups?hl=enthreadm=01C0EF67.5105D2E0.mascarm%40mascari.comrnum=1prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1, archive thread. For every case I have used a bitmap index on Oracle, a partial index[0] made more sense (especialy since it could usefully be compound). Our troublesome case (on Oracle) is a table of events where maybe fifty to a couple of hundred are published (ie. web-visible) at any time. The events are categorised by sport (about a dozen) and by event type (about five). We never really query events except by PK or by sport/type/ published. We make a bitmap index on published, and trust Oracle to use it correctly, and hope that our other indexes are also useful. On Postgres[1] we would make a partial compound index: create index ... on events(sport_id,event_type_id) where published='Y'; Matthew. [0] Is this a postgres-only feature; my tame Oracle and Sybase DBAs had never heard of such a thing, but were rather impressed at the idea. [1] Disclaimer. Our system doesn't run on PG, though I do have a nearly equivalent prototype system which does. I'd love to hear any success (or otherwise) stories about PG partial indexes. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Survey results on Oracle/M$NT4 to PG72/RH72 migration
On Thu, 14 Mar 2002, Jean-Paul ARGUDO wrote: This daemon wakes up every 5 seconds. It scans (SELECT...) for new insert in a table (lika trigger). When new tuples are found, it launches the work. The work consist in computing total sales of a big store... You might find it worthwhile to investigate listen and notify -- combined with a rule or trigger, you can get this effect in near-real-time You'll probably still want a sleep(5) at the end of the loop so you can batch a reasonable number of updates if there's a lot going on. Matthew. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] anoncvs and CVS link off developers.postgresql.org
On Sat, 6 Oct 2001, Larry Rosenman wrote: If I try: cvs -d :pserver:[EMAIL PROTECTED]:/cvsroot login I get a time out Moi aussi. I can't reach www.postgresql.org either. It doesn't seem obviously to be a routing problem. Matthew. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Notes about int8 sequences
On Mon, 6 Aug 2001, Tom Lane wrote: * How should one invoke nextval() and friends on such a sequence? Perhaps we could allow people to write nextval(sequencename) and/or sequencename.nextval, which would expose the sequence object to the parser so that datatype overloading could occur. I'm not worried about the size of the return type of a sequence, but I like the idea of Oracle-compatible seq.nextval syntax. Matthew. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Notes about int8 sequences
On Tue, 7 Aug 2001, Tom Lane wrote: I'm not worried about the size of the return type of a sequence, but I like the idea of Oracle-compatible seq.nextval syntax. I didn't realize we had any Oracle-compatibility issues here. What exactly does Oracle's sequence facility look like? It's exactly seqname.nextval. It seems that it can be used in exactly the places where PG allows nextval(seqname) (subject to the usual sprinkling of from duals, of course). Matthew. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Performance TODO items
On Mon, 30 Jul 2001, Bruce Momjian wrote: * Improve spinlock code, perhaps with OS semaphores, sleeper queue, or spining to obtain lock on multi-cpu systems You may be interested in a discussion which happened over on linux-kernel a few months ago. Quite a lot of people want a lightweight userspace semaphore, and for pretty much the same reasons. Linus proposed a pretty interesting solution which has the same minimal overhead as the current spinlocks in the non- contention case, but avoids the spin where there's contention: http://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg39615.html Matthew. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Re: New Linux xfs/reiser file systems
On Thu, 3 May 2001, mlw wrote: I would bet it is a huge amount of work to use a table space system and no one wants that. From some stracing of 7.1, the most common syscall issued by postgres is an lseek() to the end of the file, presumably to find its length, which seems to happen up to about a dozen times per (pgbench) transaction. Tablespaces would solve this (not that lseek is a particularly expensive operation, of course). Perhaps we can convince the Linux community to create a dbfs which is a stripped down simple no nonsense file system designed for applications like databases? Sync-metadata ext2 should be fine. Filesystems fsck pretty quick when they contain only a few large files. Otherwise, something like smugfs (now obsolete) might do. Matthew. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[HACKERS] Archived redo logs / Managed recovery mode?
Hi, Firstly, the attached patch implements archiving of off- line redo logs, via the wal_archive_dir GUC option. It builds and appears to work (though it looks like guc-file.l has some problems with unquoted strings containing slashes). TODO: handle EXDEV from link/rename, and copy rather than renaming. Clearly this isn't a lot of use at the moment, but what I'd really like would be a way to implement what our (Oracle) DBA calls managed recovery. Essentially, the standby database is opened in read-only mode (since PG seems to lack this, having it not open at all should suffice :). and archived redo logs are copied over from the live database (we do it via rsync, every 5 minutes) and rolled forward. (Note: for what it's worth, we're using this because Oracle's Advanced Replication is too unstable.) Is there an easy way to do this? I suppose that while there isn't a readonly option, it might be best done with an external tool, not unlike resetxlog. What are the plans for replication in 7.2 (assuming that is what's next)? The rserv stuff looks neat, but rather intricate. A cheap, out-of-band replication system would make me very happy. Matthew. Index: src/backend/access/transam/xlog.c === RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.65 diff -u -r1.65 xlog.c --- src/backend/access/transam/xlog.c 2001/04/05 16:55:21 1.65 +++ src/backend/access/transam/xlog.c 2001/04/27 14:49:44 @@ -97,7 +97,7 @@ intXLOG_DEBUG = 0; char *XLOG_sync_method = NULL; const char XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR; -char XLOG_archive_dir[MAXPGPATH];/* null string means +char *XLOG_archive_dir = NULL; /* null string means * delete 'em */ /* these are derived from XLOG_sync_method by assign_xlog_sync_method */ @@ -1476,9 +1476,7 @@ DIR*xldir; struct dirent *xlde; charlastoff[32]; - charpath[MAXPGPATH]; - - Assert(XLOG_archive_dir[0] == 0); /* ! implemented yet */ + charpath[MAXPGPATH], arcpath[MAXPGPATH]; xldir = opendir(XLogDir); if (xldir == NULL) @@ -1493,11 +1491,25 @@ strspn(xlde-d_name, 0123456789ABCDEF) == 16 strcmp(xlde-d_name, lastoff) = 0) { - elog(LOG, MoveOfflineLogs: %s %s, (XLOG_archive_dir[0]) ? + elog(LOG, MoveOfflineLogs: %s %s, XLOG_archive_dir ? archive : remove, xlde-d_name); sprintf(path, %s%c%s, XLogDir, SEP_CHAR, xlde-d_name); - if (XLOG_archive_dir[0] == 0) + if (XLOG_archive_dir == NULL) unlink(path); + else { + sprintf(arcpath, %s%c%s, XLOG_archive_dir, SEP_CHAR, +xlde-d_name); +#ifndef__BEOS__ + if (link(path, arcpath) 0) + elog(STOP, MoveOfflineLogs: %s = %s failed: +%m, + path, arcpath); + else + unlink(path); +#else + if (rename(path, arcpath) 0) + elog(STOP, MoveOfflineLogs: %s = %s failed: +%m, + path, arcpath); +#endif + } } errno = 0; } Index: src/backend/utils/misc/guc.c === RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.35 diff -u -r1.35 guc.c --- src/backend/utils/misc/guc.c2001/03/22 17:41:47 1.35 +++ src/backend/utils/misc/guc.c2001/04/27 14:49:48 @@ -13,6 +13,9 @@ #include postgres.h +#include sys/types.h +#include sys/stat.h + #include errno.h #include float.h #include limits.h @@ -41,6 +44,8 @@ extern int CommitSiblings; extern bool FixBTree; +static bool check_dirname(const char *dirname); + #ifdef ENABLE_SYSLOG extern char *Syslog_facility; extern char *Syslog_ident; @@ -351,6 +356,9 @@ XLOG_sync_method_default, check_xlog_sync_method, assign_xlog_sync_method}, + {wal_archive_dir, PGC_SUSET, XLOG_archive_dir, + , check_dirname, NULL}, + {NULL, 0, NULL, NULL, NULL, NULL} }; @@ -869,6 +877,17 @@ *cp = '_'; } + +static bool
Re: [HACKERS] RE: xlog checkpoint depends on sync() ... seems unsafe
On Tue, 13 Mar 2001, Tom Lane wrote: I was told the same a long ago about FreeBSD. How much can we count on this undocumented sync() feature? Sounds quite unreliable to me. Unless there's some interlock ... like, say, the second sync not being able to advance past a buffer page that's as yet unwritten by the first sync. But would all Unixen share such a strange detail of implementation? The Linux manpage says: NAME sync - commit buffer cache to disk. [..] DESCRIPTION sync first commits inodes to buffers, and then buffers to disk. [..] CONFORMING TO SVr4, SVID, X/OPEN, BSD 4.3 BUGS According to the standard specification (e.g., SVID), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still does not guarantee data integrity: modern disks have large caches.) And it's still true. On a fast system, if you do: $ cp /dev/zero /tmp sleep 1; sync the sync will often never finish. (Of course, that's just an implementation detail really.) Matthew. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] WAL SHM principles
On Tue, 13 Mar 2001, Ken Hirsch wrote: mlock() guarantees that the locked address space is in memory. This doesn't imply that updates are not written to the backing file. I've wondered about this myself. It _is_ true on Linux that mlock prevents writes to the backing store, I don't believe that this is true. The manpage offers no such promises, and the semantics are not useful. and this is used as a security feature for cryptography software. mlock() is used to prevent pages being swapped out. Its use for crypto software is essentially restricted to anon memory (allocated via brk() or mmap() of /dev/zero). If my understanding is accurate, before 2.4 Linux would never swap out pages which had a backing store. It would simply write them back or drop them (if clean). (This is why you need around twice as much swap with 2.4.) The code for gnupg assumes that if you have mlock() on any operating system, it does mean this--which doesn't mean it's true, but perhaps whoever wrote it does have good reason to think so. strace on gpg startup says: mmap(0, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000 getuid()= 500 mlock(0x40015000) = -1 EPERM (Operation not permitted) so whatever the authors think, it does not require this semantic. Matthew. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] WAL SHM principles
On Tue, 13 Mar 2001, Alfred Perlstein wrote: [..] Linux does not filesystem-sync file-backed writable mmap pages on a regular basis. Very intersting. I'm not sure that is necessarily the case in 2.4, though -- my understanding is that the new all-singing, all-dancing page cache makes very little distinction between mapped and unmapped dirty pages. Basically any mmap'd data doesn't seem to get sync()'d out on a regular basis. Hmm.. I'd call that a bug, anyway. and this is used as a security feature for cryptography software. mlock() is used to prevent pages being swapped out. Its use for crypto software is essentially restricted to anon memory (allocated via brk() or mmap() of /dev/zero). What about userland device drivers that want to send parts of a disk backed file to a driver's dma routine? And realtime software. I'm not disputing that mlock is useful, but what it can do be security software is not that huge. The Linux manpage says: Memory locking has two main applications: real-time algo rithms and high-security data processing. Matthew. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Multi-process pgbench?
Hi, Did I read allegations here a while ago that someone had a multi-process version of pgbench? I've poked around the website and mail archives, but couldn't find it. I have access to a couple of 4-CPU boxes, and reckon that a single-process benching tool could well prove a bottleneck. Matthew. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[HACKERS] Re: mmap for zeroing WAL log
On Tue, 27 Feb 2001, Tom Lane wrote: Matthew Kirkwood [EMAIL PROTECTED] writes: I had assumed that the overhead would come from synchronous metadata incurring writes of at least the inode, block bitmap and probably an indirect block for each syscall. No Unix that I've ever heard of forces metadata to disk after each "write" call; anyone who tried it would have abysmal performance. That's what fsync and the syncer daemon are for. My understanding was that that's exactly what ffs' synchronous metadata writes do. Am I missing something here? Do they jsut schedule I/O, but return without waiting for its completion? Matthew.
[HACKERS] Re: mmap for zeroing WAL log
On Sat, 24 Feb 2001, Tom Lane wrote: I am confused why mmap() is better than writing to a real file. It isn't, except that it allows to initialise the logfile in one syscall, without first allocating and zeroing (and hence dirtying) 16Mb of memory. Uh, the existing code does not zero 16Mb of memory... it zeroes 8K and then writes that block repeatedly. See the "one syscall" bit above. It's possible that the overhead of a syscall for each 8K block is significant, I had assumed that the overhead would come from synchronous metadata incurring writes of at least the inode, block bitmap and probably an indirect block for each syscall. but on the other hand writing a block at a time is a heavily used and heavily optimized path in all Unixen. It's at least as plausible that the mmap-as-source-of-zeroes path will be slower! Results: On Linux/ext2, it appears good for a gain of 3-5% for log creations (via a fairly minimal test program). On FreeBSD 4.1-RELEASE/ffs (with all of sync/async/softupdates) it is a couple of percent worse in elapsed time, but consumes around a third more system CPU time (12sec vs 9sec on one test system). I am awaiting numbers from reiserfs but, for now, it looks like I am far from vindicated. Matthew.
Re: [HACKERS] WAL and commit_delay
On Sun, 18 Feb 2001, Tom Lane wrote: I think that there may be a performance advantage to pre-filling the logfile even so, assuming that file allocation info is stored in a Berkeley/McKusik-like fashion (note: I have no idea what ext2 or reiserfs actually do). ext2 is a lot like [UF]FS. reiserfs is very different, but does have similar hole semantics. BTW, I have attached two patches which streamline log initialisation a little. The first (xlog-sendfile.diff) adds support for Linux's sendfile system call. FreeBSD and HP/UX have sendfile() too, but the prototype is different. If it's interesting, someone will have to come up with a configure test, as autoconf scares me. The second removes a further three syscalls from the log init path. There are a couple of things to note here: * I don't know why link/unlink is currently preferred over rename. POSIX offers strong guarantees on the semantics of the latter. * I have assumed that the close/rename/reopen stuff is only there for the benefit of Windows users, and ifdeffed it for everyone else. Matthew. --- xlog.c.old Mon Feb 19 12:35:53 2001 +++ xlog.c Mon Feb 19 13:05:23 2001 @@ -24,6 +24,10 @@ #include locale.h #endif +#ifdef _HAVE_LINUX_SENDFILE +#include sys/sendfile.h +#endif + #include "access/transam.h" #include "access/xact.h" #include "catalog/catversion.h" @@ -962,6 +966,24 @@ elog(STOP, "InitCreate(logfile %u seg %u) failed: %m", logId, logSeg); +#ifdef _HAVE_LINUX_SENDFILE + { + static int zfd = -1; + ssize_t len; + + if (zfd 0) { + zfd = BasicOpenFile("/dev/zero", O_RDONLY, 0); + if (zfd 0) + elog(STOP, "Can't open /dev/zero: %m"); + } + len = sendfile(fd, zfd, NULL, XLogSegSize); + if (len 0) + /* XXX - header support sendfile, but kernel doesn't? Fall +back */ + elog(STOP, "sendfile failed: %m"); + if (len XLogSegSize) + elog(STOP, "short read on sendfile: %m"); + } +#else if (lseek(fd, XLogSegSize - 1, SEEK_SET) != (off_t) (XLogSegSize - 1)) elog(STOP, "lseek(logfile %u seg %u) failed: %m", logId, logSeg); @@ -969,6 +991,7 @@ if (write(fd, "", 1) != 1) elog(STOP, "write(logfile %u seg %u) failed: %m", logId, logSeg); +#endif if (pg_fsync(fd) != 0) elog(STOP, "fsync(logfile %u seg %u) failed: %m", --- xlog.c.sf Mon Feb 19 13:10:38 2001 +++ xlog.c Mon Feb 19 13:13:55 2001 @@ -1001,22 +1001,20 @@ elog(STOP, "lseek(logfile %u seg %u off %u) failed: %m", log, seg, 0); +#ifndefWIN32 close(fd); +#endif -#ifndef __BEOS__ - if (link(tpath, path) 0) -#else if (rename(tpath, path) 0) -#endif elog(STOP, "InitRelink(logfile %u seg %u) failed: %m", logId, logSeg); - unlink(tpath); - +#ifndefWIN32 fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR); if (fd 0) elog(STOP, "InitReopen(logfile %u seg %u) failed: %m", logId, logSeg); +#endif return (fd); }
Re: [HACKERS] WAL and commit_delay
On Mon, 19 Feb 2001, Matthew Kirkwood wrote: BTW, I have attached two patches which streamline log initialisation a little. The first (xlog-sendfile.diff) adds support for Linux's sendfile system call. Whoops, don't use this. It looks like Linux won't sendfile() from /dev/zero. I'll endeavour to get this fixed, but it looks like it'll be rather harder to use sendfile for this. Bah. Matthew.
[HACKERS] beta4 RPM bug
Hi, There seems to be a teeny-tiny bug in the beta4 RPMS. /etc/rc.d/init.d/postgresql contains: # PGVERSION is: PGVERSION=7.1beta3 Matthew.
Re: [HACKERS] Linux 2.2 vs 2.4
On Sat, 17 Feb 2001, Tom Lane wrote: the default -B is way too small for WAL. OK, here are some 2.4 numbers with 1K transactions/client and -B10240. Huh? With the exception of the 16-user case (possibly measurement noise), 2.4 looks better across the board, AFAICS. But see below. OK. Rough methodology: # service postgresql stop # rpm -e postgresql-server # rm -fr /var/lib/pgsql # service postgresql start # reboot # sysctl -w kernel.shmmax=186048768 pg$ creatuser matthew pg$ createdb matthew me$ ./pgbench -i -s5 -t$T -c$N Does this look fairly immune to troubles? Secondly, in both occasions after a run, performance has been more than 20% lower. I find that pgbench's reported performance can vary quite a bit from run to run, at least with smaller values of total transactions. I think this is because it's a bit of a crapshoot how many WAL logfile initializations occur during the run and get charged against the total time. Not to mention whatever else the machine might be doing. With longer runs (say at least 1 total transactions) the numbers should stabilize. I wouldn't put any faith at all in tests involving less than about 1000 total transactions... Ah, good point. Here are some with 2.4.2pre2 and 1000 transactions. I'll try to find time tomorrow to do some batch benching with 10K transactions on various kernels. I hear allegations that the 2.4.1 disk elevator and VM are subject to investigation to I'll try to keep some up-to-date numbers if any- one is interested. Matthew. -- Numbers: 2.4.2-pre2 (-B10240): pgbench -s5 -i: 1:13:02 elapsed pgbench -s5 -t1000 1: 40.06 / 40.10 TPS 2: 53.01 / 53.08 4: 57.14 / 57.23 8: 62.82 / 62.92 16: 62.46 / 62.56 32: 43.15 / 43.20 1: 23.48 / 26.05 1: 30.85 / 30.88 pgbench -v -s5 -t1000 1: 26.37 / 26.39
[HACKERS] Linux 2.2 vs 2.4
Hi, Not sure if anyone will find this of interest, but I ran pgbench on my main Linux box to see what sort of performance difference might be visible between 2.2 and 2.4 kernels. Hardware: A dual P3-450 with 384Mb of RAM and 3 SCSI disks. The pg datafiles live in a half-gig partition on the first one. Software: Red Hat 6.1 plus all sort of bits and pieces. PostgreSQL 7.1beta4 RPMs. pgbench hand-compiled from source for same. No options changed from defaults. (I'll look at that tomorrow -- is there anything worth changing other than commit_delay and fsync?) Kernels: 2.2.15 + software RAID patches, 2.4.2-pre2 With 2.2.15: pgbench -s5 -i: 1.27.78 elapsed pgbench -s5 -t100: clients: TPS / TPS (excluding connection establishment) 1: 39.66 / 40.08 TPS 2: 60.77 / 61.64 TPS 4: 76.15 / 77.42 8: 90.99 / 92.73 16: 71.10 / 72.15 32: 49.20 / 49.70 1: 27.76 / 28.00 1: 27.82 / 28.03 pgbench -v -s5 -t100: 1: 30.73 / 30.98 And with 2.4.2-pre2: pgbench -s5 -i: 1:17.46 elapsed pgbench -s5 -t100 1: 43.57 / 44.11 TPS 2: 62.85 / 63.86 TPS 4: 87.24 / 89.08 TPS 8: 86.60 / 88.38 TPS 16: 53.22 / 53.88 TPS 32: 60.28 / 61.10 TPS 1: 35.93 / 36.33 1: 34.82 / 35.18 pgbench -v -s5 -t100: 1: 35.70 / 36.01 Overall, two things jump out at me. Firstly, it looks like 2.4 is mixed news for heavy pgbench users :) Low-utilisation numbers are better, but the sweet spot seems lower and narrower. Secondly, in both occasions after a run, performance has been more than 20% lower. Restarting or performing a full vacuum does not seem to help. Is there some sort of fragmentation issue here? Matthew.
Re: [HACKERS] SSL Connections
On Wed, 20 Dec 2000, Oliver Elphick wrote: To create a quick self-signed certificate, use the CA.pl script included in OpenSSL: CA.pl -newcert Or you can do it manually: openssl req -new -text -out cert.req (you will have to enter a password) mv privkey.pem cert.pem.pw openssl rsa -in cert.pem.pw -out cert.pem (this removes the password) openssl req -x509 -in cert.req -text -key cert.pem -out cert.cert Matthew.