Tom Lane wrote:
Not really: it only solves the problem *if you change the application*,
which is IMHO not acceptable. In particular, why should a non-threaded
app expect to have to change to deal with this issue? But we can't
safely build a thread-safe libpq.so for general use if it breaks
Bruce Momjian wrote:
Comments? This seems like our only solution.
This would be a transparent solution. Another approach would be:
- Use the old 7.3 approach by default. This means perfect backward
compatibility for single-threaded apps and broken multithreaded apps.
- Add a new PQinitDB(int
Tom Lane wrote:
Bruce Momjian [EMAIL PROTECTED] writes:
His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask. Seems
we could also check errno for SIGPIPE rather than calling sigpending.
He has a concern about an application that already blocked SIGPIPE and
has a pending
[EMAIL PROTECTED] wrote a few months ago:
PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be
[EMAIL PROTECTED] wrote:
If the memset
bypasses the cache then the following access will cause a cache line
miss, which can be so slow that using the faster memset can result in a
net performance loss.
Could you suggest some structs to test? If I get your meaning, I would make a loop that
Marc Colosimo wrote:
Oops, I used the same setting as in the old hacking message (-O2, gcc
3.3). If I understand what you are saying, then it turns out yes, PG's
MemSet is faster for smaller blocksizes (see below, between 32 and
64). I just replaced the whole MemSet with memset and it is not
Josh Berkus wrote:
Gaetano,
I knew there was an evaluation on the futex vs spinlock,
and Josh Berkus on IRC told me that there was only a 20%
performance increase, is this increase to throw away ?
Before we get totally off track here
I evaluated futexes strictly as an attempt to solve
[EMAIL PROTECTED] wrote:
Something to think about:
if you run PostgreSQL with fsync on, but you use the hardware write cache
on your disk drives, how likely are you to lose data? Obviously, this is a
fairly limited problem, as it only applies to power down (which you can
control) or power loss
[EMAIL PROTECTED] wrote:
Tom Lane wrote
NOT LOGGED options on CREATE INDEX and COPY, to allow users to take
advantage of the no logging optimization without turning off PITR system
wide. (Just as this is possible in Oracle and Teradata).
Isn't this in direct conflict with your opinion
Gaetano Mendola wrote:
a1) If exist check that is a 16MB file ( the request can
~arrive during the copy ),
I think this will fail under windows: copy first sets the file size
and then transfers the data. I wouldn't rule out that some Unices use
the same implementation.
~
[EMAIL PROTECTED] wrote:
I have been considering a full sweep in my test lab off client time later on.
ext2, ext3, jfs, xfs, and ReiserFS, fsync on with fdatasync or open_sync,
and fsync off.
Before you start: double check that the disks are not lying:
At least the suse 2.4 kernel send cache
Andreas Pflug wrote:
Tom Lane wrote:
Do we have a TODO for allowing users to
force switching to a new WAL file segment?
Together with PITR, this might make sense?
Another idea:
Has anyone tried to put the WAL segment directory on a cluster
filesystem and use that for cold (perhaps even hot)
Tom Lane wrote:
[EMAIL PROTECTED] writes:
The improvements were REALLY astounding, and I would like to know if other
Linux users see this performance increase, I mean, it is almost 8~10 times
faster than using fsync.
Furthermore, it seems to also have the added benefit of reducing the I/O
storm
Christopher Browne wrote:
The fix for this problem is to rewrite all of your applications so
that they become conscious of which bits of memory they're using so
they can tune their own behaviour. This, of course, requires
discarding useful notions such as virtual memory that are _assumed_
by most
[EMAIL PROTECTED] wrote:
What is the recommended way to create mutex objects (CreateMutex) from
Win32 libraries? There must be a clean way like there is in pthreads.
A mutex is inherently a global object. CreateMutex(NULL, FALSE, NULL) will
return a handle to an unowned mutex.
That's not
Bruce Momjian wrote:
The only downside to removal is that folks without symlinks (I believe
Win32 only) will loose that functionality with nothing to replace it.
However, I think the clarity of removing it is worth it. Also, I think
someone had a special way to do symlinks on Win32 and we should
Gregory Stark wrote:
This patch also looks relevant to Postgres for two reasons.
This part seems like it might expose some bugs that otherwise might have
remained hidden:
This affects I/O scheduling potentially quite significantly. It is no
longer the case that the kernel
Diego Montenegro wrote:
Hello all,
Can anyone point me to where in the code does Postgres Flush all the
Data to disk???
When XLogFlush is called, it only flushes the XLOG to disk, right? Does
the entire Data get flushed at the same time as the Log?
in src/backend/storage/smgr/md.c, mdsync():
[EMAIL PROTECTED] wrote:
Compare file sync methods with one 8k write:
(o_dsync unavailable)
open o_sync, write 6.270724
write, fdatasync13.275225
write, fsync, 13.359847
Odd. Which filesystem, which kernel? It seems fdatasync is broken and
Tom Lane wrote:
[EMAIL PROTECTED] writes:
I could certainly do some testing if you want to see how DBT-2 does.
Just tell me what to do. ;)
Just do some runs that are identical except for the wal_sync_method
setting. Note that this should not have any impact on SELECT
performance, only
Yusuf Goolamabbas wrote:
I sent this to Bruce but forgot to cc pgsql-hackers, The patches are
likely to go into 2.6.6. People interested in extremely safe fsync
writes should also follow the IDE barrier thread and the true fsync() in
Linux on IDE thread
Actually the most interesting part of
Marty Scholes wrote:
2. Put them on an actual (or mirrored actual) spindle
Pros:
* Keeps WAL and data file I/O separate
Cons:
* All of the non array drives are still slower than the array
Are you sure this is a problem? The dbt-2 benchmarks from osdl run on an
8-way Intel computer with several
Bruce Momjian wrote:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
I will try to apply it within the next 48 hours.
You are too fast: the patch was a proof of concept, not really tested
(actually quite buggy).
Bruce Momjian wrote:
How can we test if libpq needs to call that? Seems that is an issue
whether we are threaded or not, no?
I think it's always an issue: in the non-threaded case, it's just not
fatal. At least some openssl init functions are protected with if
(done) return; done = 1;, and
Bruce Momjian wrote:
Which basically shows one fsync, no O_SYNC's, and setting of the flag
only for klog reads.
Which sysklogd do you look at? The version from RedHat 9 contains this
block:
/*
* Crack a configuration file line
*/
void cfline(line, f)
char *line;
register
Bruce Momjian wrote:
What killed the idea of doing ssl or kerberos locking inside libpq was
that there was no way to be sure that outside code didn't also access
those routines.
A callback based implementation can handle that: libpq has a default
implementation for apps that do not use openssl
zohn_ming wu wrote:
swap_free: Bad swap file entry 0004
Do you use ECC memory, is ECC enabled in the BIOS [and does it work -
some vendors lie about ECC support]?
I would bet that it's a soft memory error: means not used. One
bit differs, and the kernel complains about the
Bruce Momjian wrote:
However, we really have two types of function tested.
The first, strerror, can be thread safe by using thread-local storage
_or_ by returning pointers to static strings. The other two function
tests require thread-local storage to be thread-safe.
You are completely
Bruce Momjian wrote:
Woh, as far as I know, any application should run fine with -lpthread,
threaded or not. What OS are you on? This is the first I have heard of
this problem.
Perhaps we should try to figure out how other packages handle
multithreaded/singlethreaded libraries? I'm looking
Greg Stark wrote:
I do know that AFS returns quota failures on close. This was unusual enough
that when AFS was deployed at school unix tools failed left and right over
precisely this issue. Though it mostly just meant they returned the wrong exit
status.
That means
open();
write();
Tom Lane wrote:
Claudio Natoli [EMAIL PROTECTED] writes:
Or, maybe we'll just use the tas() implementation that already exists for
__i386__/__x86_64__ in s_lock.h. How did I miss that?
Move along. Nothing to see here.
Actually, I was expecting you to complain that the s_lock.h coding is
Tom Lane wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
What are the chances for Win64 support? sizeof(unsigned long) remains 4,
sizeof(void*) is 8.
If you can tell me what type Datum should be (unsigned long long
maybe?), we could probably handle that.
Probably uintptr_t: That's
libpq needs additional changes for complete thread safety:
- openssl needs different initialization.
- kerberos is not thread safe.
- functions such as gethostbyname are not thread safe, and could be used
by kerberos. Right now protected with a libpq specific mutex.
- dito for getpwuid and
From fe-secure.c:
/*
* Indicates whether the current thread is in send()
* For use by SIGPIPE signal handlers; they should
* ignore SIGPIPE when libpq is in send(). This means
* that the backend has died unexpectedly.
*/
pqbool
PQinSend(void)
{
#ifdef
Tom Lane wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
But what about kerberos: I'm a bit reluctant to add a forth mutex: what
if kerberos calls gethostbyname or getpwuid internally?
Wouldn't help anyway, if some other part of the app also calls kerberos.
That's why I've proposed
Tom Lane wrote:
Personally I find diff -u format completely unreadable :-(. Send
diff -c if you want useful commentary.
diff -c is attached. I've removed the signal changes, they are
unrelated. I'll resent them separately.
--
Manfred
Index: src/interfaces/libpq/libpq-fe.h
Tom Lane wrote:
Wait a minute. I am *not* buying into any proposal that we need to
support ENABLE_THREAD_SAFETY on machines where libc is not thread-safe.
We have other things to do than adopt an open-ended commitment to work
around threading bugs on obsolete platforms. I don't believe that any
Bruce Momjian wrote:
[EMAIL PROTECTED] wrote:
Hi Manfred,
Just wanted to let you know I tried your patch-spinlock-i386 patch on
our STP (our automated test platform) 8-way systems and saw a 5.5%
improvement with Pentium III Xeons. If you want to see those results:
PostgreSQL 7.4.1:
Bruce Momjian wrote:
Anyone see an attack path here?
Should we have one lock per hash bucket rather than one for the entire
hash?
That's the simple part. The problem is the aging strategy: we need a
strategy that doesn't rely on a global list that's updated after every
lookup. If I
Jan Wieck wrote:
Moving the Cache Directory Block (cdb) on a hit to the MRU position of
the appropriate queue is the bookkeeping of this strategy. The whole
algorithm is based on it, and I don't see yet how to avoid that without
opening a huge can of worms that look like deadlocks. But I'll think
[EMAIL PROTECTED] wrote:
Hi Manfred,
I'm using unixware 7 but couldn't compile your source with native cc, I
had to compile it with gcc.
here are the results:
Thanks. The test app compares the time needed for three different short
loops: a loop with six empty function calls, a loop with six
Josh Berkus wrote:
Initial debug logging of a test on one Xeon system demonstrating this issue
showed a very large number of unattributed semop() calls. We are still
following up on this.
Postgres has it's own user space spinlock and semaphore implementation.
Both fall back to semop if
Bruce Momjian wrote:
write 0.000360
write fsync 0.001391
write, close fsync 0.001308
open o_fsync, write0.000924
That's 1 milliseconds vs. 1.3 milliseconds. Neither value is realistic -
I guess the hw cache on and the os doesn't issue cache flush
Tom Lane wrote:
Greg Stark [EMAIL PROTECTED] writes:
Treating pointers as integers is technically nonportable but
realistically you would be pretty hard pressed to find any
architecture anyone runs postgres on where there isn't some integer
datatype that you can cast both directions from
Hi,
I've searched through libpq and looked for global or static variables as
indicators of non-threadsafe code. I found:
- Win32 and BeOS: there is a global ioctlsocket_ret variable, but it
seems to be a dummy variable that is always discarded.
- pg_krb4_init(): Are the kerberos libraries
Greg Stark wrote:
I'm assuming fsync syncs writes issued by other processes on the same file,
which isn't necessarily true though.
It was already pointed out that we can't rely on that assumption.
So the NetBSD and Sun developers I checked with both asserted fsync does in
fact
Jan Wieck wrote:
_Vacuum page delay_:
Tom Lane's napping during vacuums with another tuning option. I
replaced the usleep() call with a PG_DELAY(msec) macro in miscadmin.h,
which does use select(2) instead. That should address the possible
portability problems.
What about skipping the delay
Tom Lane wrote:
Manfred's idea is interesting but AFAICS completely unimplementable
in any portable fashion. You'd have to have hooks into the kernel.
I thought about outstanding operations from postgres - I don't know
enough about the buffer layer if it's possible to keep a counter of the
[EMAIL PROTECTED] wrote:
On 1 Nov, Tom Lane wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
signal handlers are a process property, not a thread property - that
code is broken for multi-threaded apps.
Yeah, that's been mentioned before, but I don't see any way around it.
What we
Tom Lane wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
For multithreaded apps, this is not possible: sigaction is per process.
Thus the calling application must handle the SIGPIPE signals for libpq -
either by blocking or ignoring them. We are still discussing the exact
API. Probably
Neil Conway wrote:
The present Linux implementation doesn't do this, AFAICS -- all it does
it increase the readahead for this file:
AFAIK Linux uses a modified LRU that automatically puts pages that were
touched only once at a lower priority than frequently accessed pages.
Neil: what about
Tom Lane wrote:
It strikes me that sigpipe handling will be a global affair in any
particular application --- it's unlikely that it would be correct for
some PG connections and wrong for others. So one possibility is to make
the control variable be global (static) and thus it could be set before
Mark Wong wrote:
On Sat, Nov 01, 2003 at 10:29:34PM +0100, Manfred Spraul wrote:
Mark Wong wrote:
Yeah, my dbt2 applications are multithreaded.
Do you need SIGPIPE delivery in your app? If no, could you try what
happens if you apply the attached patch to postgres, and perform
AgentM wrote:
That wouldn't offer a solution for people who use SIGPIPE for other
things during the lifetime of the program (after creating the
connection) and if a SIGPIPE handler is called due to the connection,
the handler won't be expecting the source, and polling signal for
state is
[EMAIL PROTECTED] wrote:
Results from 7.4beta5
http://developer.osdl.org/markw/dbt2-pgsql/188/
- metric 1446.01
CPU: P4 / Xeon with 2 hyper-threads, speed 1497.51 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a
unit mask of 0x01 (count
I've straced
$ pgbench -c 5 -s 6 -t 1000
total 157k syscalls, 70k of them are rt_sigaction(SIGPIPE):
1754 poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1
1754 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
1754 send(3, \0\0\0%\0\3\0\0user\0postgres\0database\0t..., 37,
Tom Lane wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
signal handlers are a process property, not a thread property - that
code is broken for multi-threaded apps.
Yeah, that's been mentioned before, but I don't see any way around it.
Do not handle SIGPIPE on multithreaded apps
Tom Lane wrote:
A bigger objection is that we couldn't get libssl to use it (AFAIK).
The flag really needs to be settable on the socket (eg, via fcntl),
not per-send.
It's a per-send flag, it's not possible to force it on with a fcntl :-(
What about an option to skip the sigaction calls for apps
Mark Wong wrote:
Yeah, my dbt2 applications are multithreaded.
Do you need SIGPIPE delivery in your app? If no, could you try what
happens if you apply the attached patch to postgres, and perform the
signal(SIGPIPE, SIG_IGN);
once in your dbt2 app?
--
Manfred
---
Tom Lane wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
What about an option to skip the sigaction calls for apps that can
handle SIGPIPE?
If the app is ignoring SIGPIPE globally, then our calls will have no
effect anyway.
Wrong. From the opengroup manpage:
SIG_IGN - ignore signal
Greg Stark wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
One problem for WAL is that O_DIRECT would disable the write cache -
each operation would block until the data arrived on disk, and that might block
other backends that try to access WALWriteLock.
Perhaps a dedicated backend that does
Tom Lane wrote:
Not for WAL --- we never read the WAL at all in normal operation. (If
it works for writes, then we would want to use it for writing WAL, but
that's not apparent from what Christopher quoted.)
At least under Linux, it works for writes. Oracle uses O_DIRECT to
access (both read
Andrew Dunstan wrote:
I have wondered (somewhat fruitlessly) for several years about the
possibilities of special purpose lightweight file systems that could
relax some of the assumptions and checks used in general purpose file
systems. Such a thing might provide most of the benefits of a
Andrew Dunstan wrote:
Bruce Momjian wrote:
This seems to be a bug in gcc-3.3.1. -fstrict-aliasing is enabled by
-O2 or higher optimization in gcc 3.3.1.
According to the C standard, it's illegal to access a data with a
pointer of the wrong type. The only exception is char *.
This can be used
scott.marlowe wrote:
OK, I've done some more testing on our IDE drive machine.
First, some background. The hard drives we're using are Seagate
drives, model number ST380023A. Firmware version is 3.33. The machine
they are in is running RH9. The setup string I'm feeding them on startup
Peter Eisentraut wrote:
Tom Lane writes:
No. The real problem with 2PC in my mind is that its failure modes
occur *after* you have promised commit to one or more parties. In
multi-master, if you fail you know it before you have told the client
his data is committed.
I have a book here
Tom Lane wrote:
Claudio Natoli [EMAIL PROTECTED] writes:
How are you dealing with the issue of wanting some static variables to
be per-thread and others not?
To be perfectly honest, I'm still trying to familiarize myself with the code
sufficiently well so that I can tell which
Tom Lane wrote:
AFAIK, semops are not done unless we actually have to yield the
processor, so saving a syscall or two in that path doesn't sound like a
big win. I'd be more interested in asking why you're seeing long series
of semops in the first place.
Virtually all semops yield the
Tom Lane wrote:
Oh, pgbench ;-). Are you aware that you need a scale factor (-s)
larger than the number of clients to avoid unreasonable levels of
contention in pgbench?
No. What about adding a few reasonable examples to README? I've switched
to pgbench -c 10 -s 11 -t 1000 test. Is that ok?
Now
Tom Lane wrote:
Manfred Spraul [EMAIL PROTECTED] writes:
... Initially I tried to increase MAX_ALIGNOF to 16, but
the result didn't work:
You would need to do a full recompile and initdb to alter MAX_ALIGNOF.
I think I did that, but it still failed. 7.4cvs works, I'll ignore
I've noticed that postgres strace output contains long groups of
setitimer/semop/setitimer.
Just FYI: semtimedop is a special syscalls that implements a semop with
a timeout. It was added just for the purpose of avoiding the setitimer
calls.
I know that it's supported by Solaris and recent
Hi,
When analyzing the kernel profile from osdl dbt benchmarks, I noticed
that around 50% of the kernel time is spent in __copy_user_intel.
http://khack.osdl.org/stp/280060/profile/
This function is one of two functions that does the actual memory copy
from/to kernel space to/from user space.
Bruce Momjian wrote:
Tom Lane wrote:
Bruce Momjian [EMAIL PROTECTED] writes:
He is uncomfortable with the port/*.h changes at this point, so it seems
I am going to have to add Itanium/Opteron tests to most of those files.
Why don't you try to put together a proposed patch of that
Manfred Spraul wrote:
Is the Itanium tas implementation correct? I think it should be
xchg4.aqv instead of just xchg4 - as far as I know a normal atomic
exchange is is not a memory barrier on Itanium. At least the Linux
kernel version contains cmpxchg4.aqv.
Sorry for the noise, I'm wrong
Jeroen Ruigrok/asmodai wrote:
-On [20030908 23:52], Peter Eisentraut ([EMAIL PROTECTED]) wrote:
Why would FreeBSD have a library of thread-safe libc functions (libc_r)
if the functions weren't thread-safe? I think the test is faulty.
A thread-safe library has a per-thread errno value
Another question:
Is it possible to apply patches to postgresql before a DBT-2 run, or is
only patching the kernel supported?
--
Manfred
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
[EMAIL PROTECTED] wrote:
http://developer.osdl.org/markw/44/
I threw together (kind of sloppily) a web page of the data I was
starting to collect for our DBT-2 workload (TPC-C derivative) on
PostgreSQL 7.3.4. Keep in mind not much database tuning has been done
yet. Feel free to ask any
Bruce Momjian wrote:
if test $enable_debug = yes test $ac_cv_prog_cc_g = yes; then
CFLAGS=$CFLAGS -g
fi
+
+ /* Compile AMD Opteron using gcc in 64-bit mode */
+ if test $GCC = yes; then
+ case $host in
+ ia64-*) CFLAGS=$CFLAGS -m64
+LDFLAGS=$LDFLAGS -melf_x86_64;;
+ esac
+
Shridhar Daithankar wrote:
2) Native freeBSD threads
pthread.h in /usr/include and lc_r
Do you know if FreeBSD supports pthread_rwlock with
PTHREAD_PROCESS_SHARED? I'm trying to replace the LWLocks with
pthread_rwlocks.
What about other Unices?
--
Manfred
---(end
79 matches
Mail list logo