> tom lane wrote:
> > Hm. A single lock that must be grabbed for operations anywhere in
> > the index is a concurrency bottleneck.
I replied a bit later:
> I don't think it would be that bad. It's not a lock but a
> mutex that would only be held for relatively brief duration.
> It looks like
tom lane wrote:
> Hmmm ... that might be made to work, but it would complicate
> inserts. By the time an insert navigates to the page it
> should insert on, it might find the page is dead, and then it
> would have no easy way to get to the replacement page (unless
> you want to dedicate another
I previously wrote:
> 5) A mutex/spinlock that was stored in the index could be
> acquired by the scan code like this:
>
> buf = ReadBuffer(rel, blkno); /* pin next page
> */
>
> SpinLockAcquire( indexSpecificMutex );/* lock the index
> reorg mutex */
>
>
tom lane wrote:
> How does that help? The left-moving indexscan still has no
> way to recover. It can't go back to the page it was on
> before and try to determine which entries you added there,
> because it has no reliable reference point to do so. The
> entry it previously returned might n
tom lane wrote:
> How does that help? The left-moving indexscan still has no
> way to recover. It can't go back to the page it was on
> before and try to determine which entries you added there,
> because it has no reliable reference point to do so. The
> entry it previously returned might
tom lane wrote:
> "Curtis Faith" <[EMAIL PROTECTED]> writes:
> > I don't dispute their conclusions in that context and under the
> > circumstances they outline of random distribution of deletion and
> > insertion values for the index keys. [But the r
tom lane wrote:
> Got any evidence to back that up? I was relying on
>
> [Johnson89] Johnson, T. and Shasha, D. Utilization of
> B-trees with Inserts, Deletes and Modifies ACM Symp. on
> PODS, 235-246, 1989.
>
> which provides a ton of math and simulations leading up to
> the conclusion tha
tom lane initially wrote:
> Restructuring the tree during page deletion
> ---
>
> We will delete only completely-empty pages. If we were to
> merge nearly-empty pages by moving data items from one page
> to an adjacent one, this would imply changing the pa
Christopher Browne wrote:
>
> >> From the MySQL site's page about MySQL vs PostgreSQL:
> >>http://www.mysql.com/doc/en/MySQL-PostgreSQL_features.html
> >>
> >>"MySQL Server works better on Windows than PostgreSQL does. MySQL
> >>Server runs as a native Windows application (a service on
> >>NT/2
tom lane wrote:
>
> In all honesty, I do not *want* Windows people to think that
> they're not running on the "poor stepchild" platform.
We should distinguish between "poor stepchild" from a client support
perspective and a production environment perspective.
What is the downside to supporting d
> - Original Message -
Bruce Momjian replied
> > Are there no already-written converters from Makefile to VC project
> > files?
The only ones I've seen convert from Unix make files to Windows NMAKE
make files. This does not really do everything you want for several
reasons:
1) Building w
Curtis Faith wrote:
> > The Visual C++ Workspaces and Projects files are actually
> > text files that have a defined format. I don't think the format is
> > published but it looks pretty easy to figure out.
Hannu Krosing wrote:
> will probably change between releases
Hannu Krosing asked:
> Does anyone know how MySQL and interbase/firebird do it ?
>
>From the MySQL web site for version 4.0:
"The Windows binaries use the Cygwin library. Source code for the
version of Cygwin we have used is available on this page."
I think this offers a very big opportunity to
I (Curtis Faith) previously wrote:
> > The Visual C++ Workspaces and Projects files are actually
> > text files that have a defined format. I don't think the format is
> > published but it looks pretty easy to figure out.
Hannu Krosing replied:
> will probably change
tom lane writes:
> You think we should drive away our existing unix developers
> in the mere hope of attracting windows developers? Sorry, it
> isn't going to happen.
Tom brings up a good point, that changes to support Windows should not
add to the tasks of those who are doing the bulk of the w
tom lane wrote:
> Sure, it's just shuffling the housekeeping work from one place to
> another. The thing that I like about Postgres' approach is that we
> put the housekeeping in a background task (VACUUM) rather than in the
> critical path of foreground transaction commit.
Thinking with my marke
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > + Deletions are handled by getting a super-exclusive lock on the target
> >page, so that no other backend has a pin on the page when the deletion
> >starts. This means no scan is pointing at the page. This is OK for
> >deleting leaf it
I have been experimenting with empirical tests of file system and device
level writes to determine the actual constraints in order to speed up the WAL
logging code.
Using a raw file partition and a time-based technique for determining the
optimal write position, I am able to get 8K writes physical
Tatsuo, are you or anyone else working on adding PREPARE, EXECUTE support to
pgbench?
If not, I can do it myself and if you are interested, I'll send you the
patch.
- Curtis
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
http
tom lane wrote:
> What can you do *without* using a raw partition?
>
> I dislike that idea for two reasons: portability and security. The
> portability disadvantages are obvious. And in ordinary system setups
> Postgres would have to run as root in order to write on a raw partition.
>
> It occurs
I have been experimenting with empirical tests of file system and device
level writes to determine the actual constraints in order to speed up the WAL
logging code.
Using a raw file partition and a time-based technique for determining the
optimal write position, I am able to get 8K writes physical
Bruce Momjian wrote:
> Let me add one more thing on this "thread". This is one email in a long
> list of "Oh, gee, you aren't using that wizz-bang new
> sync/thread/aio/raid/raw feature" discussion where someone shows up and
> wants to know why. Does anyone know how to address these, efficiently
> "Curtis Faith" <[EMAIL PROTECTED]> writes:
> > I'm not really worried about doing page-in reads because the
> disks internal
> > buffers should contain most of the blocks surrounding the end
> of the log
> > file. If the successive partial write
> "Curtis Faith" <[EMAIL PROTECTED]> writes:
> > Successive writes would write different NON-OVERLAPPING sections of the
> > same log buffer. It wouldn't make sense to send three separate
> copies of
> > the entire block. That could indeed cause pro
> > Since in your case all transactions A-E want the same buffer written,
> > the memory (not it's content) will also be the same.
>
> But no, it won't: the successive writes will ask to write different
> snapshots of the same buffer.
Successive writes would write different NON-OVERLAPPING sectio
> You example of >1 trx/proc/rev will wok _only_ if no more and no less
> than 1/4 of platter is filled by _other_ log writers.
Not really, if 1/2 the platter has been filled we'll still get in one more
commit in for a given rotation. If more than a rotation's worth of writing
has occurred that m
> So you think if I try to write a 1 gig file, it will write enough to
> fill up the buffers, then wait while the sync'er writes out a few blocks
> every second, free up some buffers, then write some more?
>
> Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when
> it can't get a
> Greg Copeland <[EMAIL PROTECTED]> writes:
> > Doesn't this also increase the likelihood that people will be
> > running in a buffer-poor environment more frequently that I
> > previously asserted, especially in very heavily I/O bound
> > systems? Unless I'm mistaken, that opens the door for a
>
> I may be missing something obvious, but I don't see a way to get more
> than 1 trx/process/revolution, as each previous transaction in that
> process must be written to disk before the next can start, and the only
> way it can be written to the disk is when the disk heads are on the
> right plac
> Well, too bad. If you haven't gotten your commit record down to disk,
> then *you have not committed*. This is not negotiable. (If you think
> it is, then turn off fsync and quit worrying ;-))
I've never disputed this, so if I seem to be suggesting that, I've beee
unclear. I'm just assuming
> This is the trickle syncer. It prevents bursts of disk activity every
> 30 seconds. It is for non-fsync writes, of course, and I assume if the
> kernel buffers get low, it starts to flush faster.
AFAICT, the syncer only speeds up when virtual memory paging fills the
buffers past
a threshold a
Tom, first of all, excellent job improving the current algorithm. I'm glad
you look at the WALCommitLock code.
> This must be so because the backends that are
> released at the end of any given disk revolution will not be able to
> participate in the next group commit, if there is already at leas
> On Sun, 2002-10-06 at 11:46, Tom Lane wrote:
> > I can't personally get excited about something that only helps if your
> > server is starved for RAM --- who runs servers that aren't fat on RAM
> > anymore? But give it a shot if you like. Perhaps your analysis is
> > pessimistic.
>
> I don't
> Curtis Faith wrote:
>
> > The current transaction/user state seems to be stored in process
> > global space. This could be changed to be a sointer to a struct
> > stored in a back-end specific shared memory area which would be
> > accessed by the executor process a
tom lane wrote:
> "Curtis Faith" <[EMAIL PROTECTED]> writes:
> > What about splitting out parsing, optimization and plan generation from
> > execution and having a separate pool of exececutor processes.
>
> > As an optimizer finished with a query plan it
> No question about that! The sooner we can get stuff to the WAL buffers,
> the more likely we will get some other transaction to do our fsync work.
> Any ideas on how we can do that?
More like the sooner we get stuff out of the WAL buffers and into the
disk's buffers whether by write or aio_wri
> So, you are saying that we may get back aio confirmation quicker than if
> we issued our own write/fsync because the OS was able to slip our flush
> to disk in as part of someone else's or a general fsync?
>
> I don't buy that because it is possible our write() gets in as part of
> someone else
>In particular, it would seriously degrade performance if the WAL file
> isn't on its own spindle but has to share bandwidth with
> data file access.
If the OS is stupid I could see this happening. But if there are
buffers and some sort of elevator algorithm the I/O won't happen
at bad times.
I
> You are confusing WALWriteLock with WALInsertLock. A
> transaction-committing flush operation only holds the former.
> XLogInsert only needs the latter --- at least as long as it
> doesn't need to write.
Well that make things better than I thought. We still end up with
a disk write for each tr
tom lane writes:
>The notion of a sort process pool seems possibly attractive. I'm
>unconvinced that it's going to be a win though because of the cost of
>shoving data across address-space boundaries.
What about splitting out parsing, optimization and plan generation from
execution and having
I've been getting only about 60% of the emails sent to the list.
I see many emails in the archives that I never got via email.
Is anyone else having this problems?
- Curtis
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to
Bruce Momjian wrote:
> So every backend is to going to wait around until its fsync gets done by
> the backend process? How is that a win? This is just another version
> of our GUC parameters:
>
> #commit_delay = 0 # range 0-10, in microseconds
> #commit_sibli
It appears the fsync problem is pervasive. Here's Linux 2.4.19's
version from fs/buffer.c:
lock-> down(&inode->i_sem);
ret = filemap_fdatasync(inode->i_mapping);
err = file->f_op->fsync(file, dentry, 1);
if (err && !ret)
ret = err;
I resent this since it didn't seem to get to the list.
After some research I still hold that fsync blocks, at least on
FreeBSD. Am I missing something?
Here's the evidence:
Code from: /usr/src/sys/syscalls/vfs_syscalls
int
fsync(p, uap)
struct proc *p;
struct fsync_args /* {
After some research I still hold that fsync blocks, at least on
FreeBSD. Am I missing something?
Here's the evidence:
Code from: /usr/src/sys/syscalls/vfs_syscalls
int
fsync(p, uap)
struct proc *p;
struct fsync_args /* {
syscallarg(int) fd;
} */ *uap;
{
I wrote:
> > ... most file systems can't process fsync's
> > simultaneous with other writes, so those writes block because the file
> > system grabs its own internal locks.
>
tom lane replies:
> Oh? That would be a serious problem, but I've never heard that asserted
> before. Please provide som
Bruce Momjian wrote:
> I am again confused. When we do write(), we don't have to lock
> anything, do we? (Multiple processes can write() to the same file just
> fine.) We do block the current process, but we have nothing else to do
> until we know it is written/fsync'ed. Does aio more easily
I wrote:
> > I'm no Unix filesystem expert but I don't see how the OS can
> > handle multiple writes and fsyncs to the same file descriptors without
> > blocking other processes from writing at the same time. It may be that
> > there are some clever data structures they use but I've not seen huge
I wrote:
> > The REAL issue and the one that will greatly affect total system
> > throughput is that of contention on the file locks. Since
> fsynch needs to
> > obtain a write lock on the file descriptor, as does the write
> calls which
> > originate from XLogWrite as the writes are written to th
ile system could
minimize this contention but I'll bet it's there with most of the ones
that PostgreSQL most commonly runs on.
I'll have to write a test and see if there really is a problem.
- Curtis
> -Original Message-
> From: Bruce Momjian [mailto:[EMAIL PROTECTED]
I wrote:
> > My modification was to use access counts to increase the
> durability of the
> > more accessed blocks.
>
tom lane replies:
> You could do it that way too, but I'm unsure whether the extra
> complexity will buy anything. Ultimately, I think an LRU-anything
> algorithm is equivalent
tom lane replies:
> "Curtis Faith" <[EMAIL PROTECTED]> writes:
> > So, why don't we use files opened with O_DSYNC | O_APPEND for
> the WAL log
> > and then use aio_write for all log writes?
>
> We already offer an O_DSYNC option. It's not
mal fdatasync/fsync, O_SYNC/O_DSYNC options
Allow multiple blocks to be written to WAL with one write()
Am I missing something?
Curtis Faith
Principal
Galt Capital, LLP
--
Galt Capitalhttp://www.galtca
tom lane wrote:
> But more globally, I think that our worst problems these days have to do
> with planner misestimations leading to bad plans. The planner is
> usually *capable* of generating a good plan, but all too often it picks
> the wrong one. We need work on improving the cost modeling equ
Forgot to cc' the list.
-Original Message-
From: Curtis Faith [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 02, 2002 10:59 PM
To: Tom Lane
Subject: RE: [HACKERS] Advice: Where could I be of help?
Tom,
Here are the things that I think look interesting:
1) Eliminate unch
pecially slow?
I've read the TODO's, and the last five months of the archives for this
list, so I have some general ideas.
I've also had a lot experience marketing to I.T. organizations so I'd be
happy to help out on the Product Marketing for PostgreSQL advocacy, i.e.
develop
56 matches
Mail list logo