Re: [HACKERS] Brain dump: btree collapsing

2003-02-18 Thread Curtis Faith
> tom lane wrote: > > Hm. A single lock that must be grabbed for operations anywhere in > > the index is a concurrency bottleneck. I replied a bit later: > I don't think it would be that bad. It's not a lock but a > mutex that would only be held for relatively brief duration. > It looks like

Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
tom lane wrote: > Hmmm ... that might be made to work, but it would complicate > inserts. By the time an insert navigates to the page it > should insert on, it might find the page is dead, and then it > would have no easy way to get to the replacement page (unless > you want to dedicate another

Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
I previously wrote: > 5) A mutex/spinlock that was stored in the index could be > acquired by the scan code like this: > > buf = ReadBuffer(rel, blkno); /* pin next page > */ > > SpinLockAcquire( indexSpecificMutex );/* lock the index > reorg mutex */ > >

Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
tom lane wrote: > How does that help? The left-moving indexscan still has no > way to recover. It can't go back to the page it was on > before and try to determine which entries you added there, > because it has no reliable reference point to do so. The > entry it previously returned might n

Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
tom lane wrote: > How does that help? The left-moving indexscan still has no > way to recover. It can't go back to the page it was on > before and try to determine which entries you added there, > because it has no reliable reference point to do so. The > entry it previously returned might

Re: [HACKERS] Brain dump: btree collapsing

2003-02-13 Thread Curtis Faith
tom lane wrote: > "Curtis Faith" <[EMAIL PROTECTED]> writes: > > I don't dispute their conclusions in that context and under the > > circumstances they outline of random distribution of deletion and > > insertion values for the index keys. [But the r

Re: [HACKERS] Brain dump: btree collapsing

2003-02-13 Thread Curtis Faith
tom lane wrote: > Got any evidence to back that up? I was relying on > > [Johnson89] Johnson, T. and Shasha, D. Utilization of > B-trees with Inserts, Deletes and Modifies ACM Symp. on > PODS, 235-246, 1989. > > which provides a ton of math and simulations leading up to > the conclusion tha

Re: [HACKERS] Brain dump: btree collapsing

2003-02-13 Thread Curtis Faith
tom lane initially wrote: > Restructuring the tree during page deletion > --- > > We will delete only completely-empty pages. If we were to > merge nearly-empty pages by moving data items from one page > to an adjacent one, this would imply changing the pa

Re: [mail] Re: [HACKERS] Windows Build System

2003-01-31 Thread Curtis Faith
Christopher Browne wrote: > > >> From the MySQL site's page about MySQL vs PostgreSQL: > >>http://www.mysql.com/doc/en/MySQL-PostgreSQL_features.html > >> > >>"MySQL Server works better on Windows than PostgreSQL does. MySQL > >>Server runs as a native Windows application (a service on > >>NT/2

Re: [mail] Re: [HACKERS] Windows Build System

2003-01-29 Thread Curtis Faith
tom lane wrote: > > In all honesty, I do not *want* Windows people to think that > they're not running on the "poor stepchild" platform. We should distinguish between "poor stepchild" from a client support perspective and a production environment perspective. What is the downside to supporting d

Re: [mail] Re: [HACKERS] Windows Build System

2003-01-28 Thread Curtis Faith
> - Original Message - Bruce Momjian replied > > Are there no already-written converters from Makefile to VC project > > files? The only ones I've seen convert from Unix make files to Windows NMAKE make files. This does not really do everything you want for several reasons: 1) Building w

Re: [HACKERS] Windows Build System

2003-01-23 Thread Curtis Faith
Curtis Faith wrote: > > The Visual C++ Workspaces and Projects files are actually > > text files that have a defined format. I don't think the format is > > published but it looks pretty easy to figure out. Hannu Krosing wrote: > will probably change between releases

Re: [HACKERS] Windows Build System

2003-01-22 Thread Curtis Faith
Hannu Krosing asked: > Does anyone know how MySQL and interbase/firebird do it ? > >From the MySQL web site for version 4.0: "The Windows binaries use the Cygwin library. Source code for the version of Cygwin we have used is available on this page." I think this offers a very big opportunity to

Re: [HACKERS] Windows Build System

2003-01-22 Thread Curtis Faith
I (Curtis Faith) previously wrote: > > The Visual C++ Workspaces and Projects files are actually > > text files that have a defined format. I don't think the format is > > published but it looks pretty easy to figure out. Hannu Krosing replied: > will probably change

Re: Windows Build System was: [HACKERS] Win32 port patches submitted

2003-01-22 Thread Curtis Faith
tom lane writes: > You think we should drive away our existing unix developers > in the mere hope of attracting windows developers? Sorry, it > isn't going to happen. Tom brings up a good point, that changes to support Windows should not add to the tasks of those who are doing the bulk of the w

[PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

2002-11-26 Thread Curtis Faith
tom lane wrote: > Sure, it's just shuffling the housekeeping work from one place to > another. The thing that I like about Postgres' approach is that we > put the housekeeping in a background task (VACUUM) rather than in the > critical path of foreground transaction commit. Thinking with my marke

Re: [HACKERS] btree shrinking again

2002-11-18 Thread Curtis Faith
> Alvaro Herrera <[EMAIL PROTECTED]> writes: > > + Deletions are handled by getting a super-exclusive lock on the target > >page, so that no other backend has a pin on the page when the deletion > >starts. This means no scan is pointing at the page. This is OK for > >deleting leaf it

[HACKERS] 500 tpsQL + WAL log implementation

2002-11-15 Thread Curtis Faith
I have been experimenting with empirical tests of file system and device level writes to determine the actual constraints in order to speed up the WAL logging code. Using a raw file partition and a time-based technique for determining the optimal write position, I am able to get 8K writes physical

[HACKERS] Prepare enabled pgbench

2002-11-12 Thread Curtis Faith
Tatsuo, are you or anyone else working on adding PREPARE, EXECUTE support to pgbench? If not, I can do it myself and if you are interested, I'll send you the patch. - Curtis ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http

Re: [HACKERS] 500 tpsQL + WAL log implementation

2002-11-12 Thread Curtis Faith
tom lane wrote: > What can you do *without* using a raw partition? > > I dislike that idea for two reasons: portability and security. The > portability disadvantages are obvious. And in ordinary system setups > Postgres would have to run as root in order to write on a raw partition. > > It occurs

[HACKERS] 500 tpsQL + WAL log implementation

2002-11-11 Thread Curtis Faith
I have been experimenting with empirical tests of file system and device level writes to determine the actual constraints in order to speed up the WAL logging code. Using a raw file partition and a time-based technique for determining the optimal write position, I am able to get 8K writes physical

Re: [HACKERS] Postgresql and multithreading

2002-10-16 Thread Curtis Faith
Bruce Momjian wrote: > Let me add one more thing on this "thread". This is one email in a long > list of "Oh, gee, you aren't using that wizz-bang new > sync/thread/aio/raid/raw feature" discussion where someone shows up and > wants to know why. Does anyone know how to address these, efficiently

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
> "Curtis Faith" <[EMAIL PROTECTED]> writes: > > I'm not really worried about doing page-in reads because the > disks internal > > buffers should contain most of the blocks surrounding the end > of the log > > file. If the successive partial write

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
> "Curtis Faith" <[EMAIL PROTECTED]> writes: > > Successive writes would write different NON-OVERLAPPING sections of the > > same log buffer. It wouldn't make sense to send three separate > copies of > > the entire block. That could indeed cause pro

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
> > Since in your case all transactions A-E want the same buffer written, > > the memory (not it's content) will also be the same. > > But no, it won't: the successive writes will ask to write different > snapshots of the same buffer. Successive writes would write different NON-OVERLAPPING sectio

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
> You example of >1 trx/proc/rev will wok _only_ if no more and no less > than 1/4 of platter is filled by _other_ log writers. Not really, if 1/2 the platter has been filled we'll still get in one more commit in for a given rotation. If more than a rotation's worth of writing has occurred that m

Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-08 Thread Curtis Faith
> So you think if I try to write a 1 gig file, it will write enough to > fill up the buffers, then wait while the sync'er writes out a few blocks > every second, free up some buffers, then write some more? > > Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when > it can't get a

Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith
> Greg Copeland <[EMAIL PROTECTED]> writes: > > Doesn't this also increase the likelihood that people will be > > running in a buffer-poor environment more frequently that I > > previously asserted, especially in very heavily I/O bound > > systems? Unless I'm mistaken, that opens the door for a >

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith
> I may be missing something obvious, but I don't see a way to get more > than 1 trx/process/revolution, as each previous transaction in that > process must be written to disk before the next can start, and the only > way it can be written to the disk is when the disk heads are on the > right plac

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith
> Well, too bad. If you haven't gotten your commit record down to disk, > then *you have not committed*. This is not negotiable. (If you think > it is, then turn off fsync and quit worrying ;-)) I've never disputed this, so if I seem to be suggesting that, I've beee unclear. I'm just assuming

Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith
> This is the trickle syncer. It prevents bursts of disk activity every > 30 seconds. It is for non-fsync writes, of course, and I assume if the > kernel buffers get low, it starts to flush faster. AFAICT, the syncer only speeds up when virtual memory paging fills the buffers past a threshold a

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith
Tom, first of all, excellent job improving the current algorithm. I'm glad you look at the WALCommitLock code. > This must be so because the backends that are > released at the end of any given disk revolution will not be able to > participate in the next group commit, if there is already at leas

[HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith
> On Sun, 2002-10-06 at 11:46, Tom Lane wrote: > > I can't personally get excited about something that only helps if your > > server is starved for RAM --- who runs servers that aren't fat on RAM > > anymore? But give it a shot if you like. Perhaps your analysis is > > pessimistic. > > I don't

Re: Parallel Executors [was RE: [HACKERS] Threaded Sorting]

2002-10-07 Thread Curtis Faith
> Curtis Faith wrote: > > > The current transaction/user state seems to be stored in process > > global space. This could be changed to be a sointer to a struct > > stored in a back-end specific shared memory area which would be > > accessed by the executor process a

Parallel Executors [was RE: [HACKERS] Threaded Sorting]

2002-10-06 Thread Curtis Faith
tom lane wrote: > "Curtis Faith" <[EMAIL PROTECTED]> writes: > > What about splitting out parsing, optimization and plan generation from > > execution and having a separate pool of exececutor processes. > > > As an optimizer finished with a query plan it

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Curtis Faith
> No question about that! The sooner we can get stuff to the WAL buffers, > the more likely we will get some other transaction to do our fsync work. > Any ideas on how we can do that? More like the sooner we get stuff out of the WAL buffers and into the disk's buffers whether by write or aio_wri

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Curtis Faith
> So, you are saying that we may get back aio confirmation quicker than if > we issued our own write/fsync because the OS was able to slip our flush > to disk in as part of someone else's or a general fsync? > > I don't buy that because it is possible our write() gets in as part of > someone else

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance

2002-10-05 Thread Curtis Faith
>In particular, it would seriously degrade performance if the WAL file > isn't on its own spindle but has to share bandwidth with > data file access. If the OS is stupid I could see this happening. But if there are buffers and some sort of elevator algorithm the I/O won't happen at bad times. I

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-05 Thread Curtis Faith
> You are confusing WALWriteLock with WALInsertLock. A > transaction-committing flush operation only holds the former. > XLogInsert only needs the latter --- at least as long as it > doesn't need to write. Well that make things better than I thought. We still end up with a disk write for each tr

Re: [HACKERS] Threaded Sorting

2002-10-05 Thread Curtis Faith
tom lane writes: >The notion of a sort process pool seems possibly attractive. I'm >unconvinced that it's going to be a win though because of the cost of >shoving data across address-space boundaries. What about splitting out parsing, optimization and plan generation from execution and having

[HACKERS] Anyone else having list server problems?

2002-10-05 Thread Curtis Faith
I've been getting only about 60% of the emails sent to the list. I see many emails in the archives that I never got via email. Is anyone else having this problems? - Curtis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large PerformanceGain in WAL synching

2002-10-05 Thread Curtis Faith
Bruce Momjian wrote: > So every backend is to going to wait around until its fsync gets done by > the backend process? How is that a win? This is just another version > of our GUC parameters: > > #commit_delay = 0 # range 0-10, in microseconds > #commit_sibli

[HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-04 Thread Curtis Faith
It appears the fsync problem is pervasive. Here's Linux 2.4.19's version from fs/buffer.c: lock-> down(&inode->i_sem); ret = filemap_fdatasync(inode->i_mapping); err = file->f_op->fsync(file, dentry, 1); if (err && !ret) ret = err;

Re: [HACKERS] Potential Large Performance Gain in WAL synching

2002-10-04 Thread Curtis Faith
I resent this since it didn't seem to get to the list. After some research I still hold that fsync blocks, at least on FreeBSD. Am I missing something? Here's the evidence: Code from: /usr/src/sys/syscalls/vfs_syscalls int fsync(p, uap) struct proc *p; struct fsync_args /* {

[HACKERS] fsync exlusive lock evidence WAS: Potential Large Performance Gain in WAL synching

2002-10-04 Thread Curtis Faith
After some research I still hold that fsync blocks, at least on FreeBSD. Am I missing something? Here's the evidence: Code from: /usr/src/sys/syscalls/vfs_syscalls int fsync(p, uap) struct proc *p; struct fsync_args /* { syscallarg(int) fd; } */ *uap; {

Re: [HACKERS] Potential Large Performance Gain in WAL synching

2002-10-04 Thread Curtis Faith
I wrote: > > ... most file systems can't process fsync's > > simultaneous with other writes, so those writes block because the file > > system grabs its own internal locks. > tom lane replies: > Oh? That would be a serious problem, but I've never heard that asserted > before. Please provide som

Re: [HACKERS] Potential Large Performance Gain in WAL synching

2002-10-04 Thread Curtis Faith
Bruce Momjian wrote: > I am again confused. When we do write(), we don't have to lock > anything, do we? (Multiple processes can write() to the same file just > fine.) We do block the current process, but we have nothing else to do > until we know it is written/fsync'ed. Does aio more easily

Re: [HACKERS] Potential Large Performance Gain in WAL synching

2002-10-04 Thread Curtis Faith
I wrote: > > I'm no Unix filesystem expert but I don't see how the OS can > > handle multiple writes and fsyncs to the same file descriptors without > > blocking other processes from writing at the same time. It may be that > > there are some clever data structures they use but I've not seen huge

Re: [HACKERS] Potential Large Performance Gain in WAL synching

2002-10-03 Thread Curtis Faith
I wrote: > > The REAL issue and the one that will greatly affect total system > > throughput is that of contention on the file locks. Since > fsynch needs to > > obtain a write lock on the file descriptor, as does the write > calls which > > originate from XLogWrite as the writes are written to th

Re: [HACKERS] Potential Large Performance Gain in WAL synching

2002-10-03 Thread Curtis Faith
ile system could minimize this contention but I'll bet it's there with most of the ones that PostgreSQL most commonly runs on. I'll have to write a test and see if there really is a problem. - Curtis > -Original Message- > From: Bruce Momjian [mailto:[EMAIL PROTECTED]

Re: [HACKERS] Advice: Where could I be of help?

2002-10-03 Thread Curtis Faith
I wrote: > > My modification was to use access counts to increase the > durability of the > > more accessed blocks. > tom lane replies: > You could do it that way too, but I'm unsure whether the extra > complexity will buy anything. Ultimately, I think an LRU-anything > algorithm is equivalent

Re: [HACKERS] Potential Large Performance Gain in WAL synching

2002-10-03 Thread Curtis Faith
tom lane replies: > "Curtis Faith" <[EMAIL PROTECTED]> writes: > > So, why don't we use files opened with O_DSYNC | O_APPEND for > the WAL log > > and then use aio_write for all log writes? > > We already offer an O_DSYNC option. It's not

[HACKERS] Potential Large Performance Gain in WAL synching

2002-10-03 Thread Curtis Faith
mal fdatasync/fsync, O_SYNC/O_DSYNC options Allow multiple blocks to be written to WAL with one write() Am I missing something? Curtis Faith Principal Galt Capital, LLP -- Galt Capitalhttp://www.galtca

Re: [HACKERS] Advice: Where could I be of help?

2002-10-03 Thread Curtis Faith
tom lane wrote: > But more globally, I think that our worst problems these days have to do > with planner misestimations leading to bad plans. The planner is > usually *capable* of generating a good plan, but all too often it picks > the wrong one. We need work on improving the cost modeling equ

FW: [HACKERS] Advice: Where could I be of help?

2002-10-02 Thread Curtis Faith
Forgot to cc' the list. -Original Message- From: Curtis Faith [mailto:[EMAIL PROTECTED]] Sent: Wednesday, October 02, 2002 10:59 PM To: Tom Lane Subject: RE: [HACKERS] Advice: Where could I be of help? Tom, Here are the things that I think look interesting: 1) Eliminate unch

[HACKERS] Advice: Where could I be of help?

2002-10-02 Thread Curtis Faith
pecially slow? I've read the TODO's, and the last five months of the archives for this list, so I have some general ideas. I've also had a lot experience marketing to I.T. organizations so I'd be happy to help out on the Product Marketing for PostgreSQL advocacy, i.e. develop