Re: [HACKERS] pgbench vs. wait events

2016-10-07 Thread Alfred Perlstein



On 10/7/16 10:42 AM, Andres Freund wrote:

Hi,

On 2016-10-06 20:52:22 -0700, Alfred Perlstein wrote:

This contention on WAL reminds me of another scenario I've heard about that
was similar.

To fix things what happened was that anyone that the first person to block
would be responsible for writing out all buffers for anyone blocked behind
"him".

We pretty much do that already. But while that's happening, the other
would-be-writers show up as blocking on the lock.  We don't use kind of
an odd locking model for the waiters (LWLockAcquireOrWait()), which
waits for the lock to be released, but doesn't try to acquire it
afterwards. Instead the wal position is rechecked, and in many cases
we'll be done afterwards, because enough has been written out.

Greetings,

Andres Freund



Are the batched writes all done before fsync is called?

Are you sure that A only calls fsync after flushing all the buffers from 
B, C, and D?  Or will it fsync twice?  Is there instrumentation to show 
that?


I know there's a tremendous level of skill involved in this code, but 
simply asking in case there's some tricks.


Another strategy that may work is actually intentionally 
waiting/buffering some few ms between flushes/fsync, for example, make 
sure that the number of flushes per second doesn't exceed some 
configurable amount because each flush likely eats at least one iop from 
the disk and there is a maximum iops per disk, so might as well buffer 
more if you're exceeding that iops count.  You'll trade some latency, 
but gain throughput for doing that.


Does this make sense?  Again apologies if this has been covered.  Is 
there a whitepaper or blog post or clear way I can examine the algorithm 
wrt locks/buffering for flushing WAL logs?


-Alfred






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench vs. wait events

2016-10-07 Thread Alfred Perlstein

Robert,

This contention on WAL reminds me of another scenario I've heard about 
that was similar.


To fix things what happened was that anyone that the first person to 
block would be responsible for writing out all buffers for anyone 
blocked behind "him".


The for example if you have many threads, A, B, C, D

If while A is writing to WAL and hold the lock, then B arrives, B of 
course blocks, then C comes along and blocks as well, then D.


Finally A finishes its write and then 

Now you have two options for resolving this, either

1) A drops its lock, B picks up the lock... B writes its buffer and then 
drops the lock.  Then C gets the lock, writes its buffer, drops the 
lock, then finally D gets the lock, writes its buffer and then drops the 
lock.


2) A then writes out B's, C's, and D's buffers, then A drops the lock, 
B, C and D wake up, note that their respective buffers are written and 
just return.  This greatly speeds up the system. (just be careful to 
make sure A doesn't do "too much work" otherwise you can get a sort of 
livelock if too many threads are blocked behind it, generally only issue 
one additional flush on behalf of other threads, do not "loop until the 
queue is empty")


I'm not sure if this is actually possible with the way WAL is 
implemented, (or perhaps if this strategy is already implemented) but 
it's definitely worth if not done already as it can speed things up 
enormously.


On 10/6/16 11:38 AM, Robert Haas wrote:

Hi,

I decided to do some testing on hydra (IBM-provided community
resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
the newly-enhanced wait event stuff to try to get an idea of what
we're waiting for during pgbench.  I did 30-minute pgbench runs with
various configurations, but all had max_connections = 200,
shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
on.  During each run, I ran this psql script in another window and
captured the output:

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid()
\watch 0.5

Then, I used a little shell-scripting to count up the number of times
each wait event occurred in the output.  First, I tried scale factor
3000 with 32 clients and got these results:

   1  LWLockTranche   | buffer_mapping
   9  LWLockNamed | CLogControlLock
  14  LWLockNamed | ProcArrayLock
  16  Lock| tuple
  25  LWLockNamed | CheckpointerCommLock
  49  LWLockNamed | WALBufMappingLock
 122  LWLockTranche   | clog
 182  Lock| transactionid
 287  LWLockNamed | XidGenLock
1300  Client  | ClientRead
1375  LWLockTranche   | buffer_content
3990  Lock| extend
   21014  LWLockNamed | WALWriteLock
   28497  |
   58279  LWLockTranche   | wal_insert

tps = 1150.803133 (including connections establishing)

What jumps out here is, at least to me, is that there is furious
contention on both the wal_insert locks and on WALWriteLock.
Apparently, the system simply can't get WAL on disk fast enough to
keep up with this workload.   Relation extension locks and
buffer_content locks also are also pretty common, both ahead of
ClientRead, a relatively uncommon wait event on this test.  The load
average on the system was only about 3 during this test, indicating
that most processes are in fact spending most of their time off-CPU.
The first thing I tried was switching to unlogged tables, which
produces these results:

   1  BufferPin   | BufferPin
   1  LWLockTranche   | lock_manager
   2  LWLockTranche   | buffer_mapping
   8  LWLockNamed | ProcArrayLock
   9  LWLockNamed | CheckpointerCommLock
   9  LWLockNamed | CLogControlLock
  11  LWLockTranche   | buffer_content
  37  LWLockTranche   | clog
 153  Lock| tuple
 388  LWLockNamed | XidGenLock
 827  Lock| transactionid
1267  Client  | ClientRead
   20631  Lock| extend
   91767  |

tps = 1223.239416 (including connections establishing)

If you don't look at the TPS number, these results look like a vast
improvement.  The overall amount of time spent not waiting for
anything is now much higher, and the problematic locks have largely
disappeared from the picture.  However, the load average now shoots up
to about 30, because most of the time that the backends are "not
waiting for anything" they are in fact in kernel wait state D; that
is, they're stuck doing I/O.  This suggests that we might want to
consider advertising a wait state when a backend is doing I/O, so we
can measure this sort of thing.

Next, I tried lowering the scale factor to something that fits in
shared buffers.  Here are the results at scale factor 300:

  14  Lock| 

Re: [HACKERS] pgbench vs. wait events

2016-10-06 Thread Alfred Perlstein

Robert,

This contention on WAL reminds me of another scenario I've heard about 
that was similar.


To fix things what happened was that anyone that the first person to 
block would be responsible for writing out all buffers for anyone 
blocked behind "him".


The for example if you have many threads, A, B, C, D

If while A is writing to WAL and hold the lock, then B arrives, B of 
course blocks, then C comes along and blocks as well, then D.


Finally A finishes its write and then 

Now you have two options for resolving this, either

1) A drops its lock, B picks up the lock... B writes its buffer and then 
drops the lock.  Then C gets the lock, writes its buffer, drops the 
lock, then finally D gets the lock, writes its buffer and then drops the 
lock.


2) A then writes out B's, C's, and D's buffers, then A drops the lock, 
B, C and D wake up, note that their respective buffers are written and 
just return.  This greatly speeds up the system. (just be careful to 
make sure A doesn't do "too much work" otherwise you can get a sort of 
livelock if too many threads are blocked behind it, generally only issue 
one additional flush on behalf of other threads, do not "loop until the 
queue is empty")


I'm not sure if this is actually possible with the way WAL is 
implemented, (or perhaps if this strategy is already implemented) but 
it's definitely worth if not done already as it can speed things up 
enormously.



On 10/6/16 11:38 AM, Robert Haas wrote:

Hi,

I decided to do some testing on hydra (IBM-provided community
resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
the newly-enhanced wait event stuff to try to get an idea of what
we're waiting for during pgbench.  I did 30-minute pgbench runs with
various configurations, but all had max_connections = 200,
shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
on.  During each run, I ran this psql script in another window and
captured the output:

\t
select wait_event_type, wait_event from pg_stat_activity where pid !=
pg_backend_pid()
\watch 0.5

Then, I used a little shell-scripting to count up the number of times
each wait event occurred in the output.  First, I tried scale factor
3000 with 32 clients and got these results:

   1  LWLockTranche   | buffer_mapping
   9  LWLockNamed | CLogControlLock
  14  LWLockNamed | ProcArrayLock
  16  Lock| tuple
  25  LWLockNamed | CheckpointerCommLock
  49  LWLockNamed | WALBufMappingLock
 122  LWLockTranche   | clog
 182  Lock| transactionid
 287  LWLockNamed | XidGenLock
1300  Client  | ClientRead
1375  LWLockTranche   | buffer_content
3990  Lock| extend
   21014  LWLockNamed | WALWriteLock
   28497  |
   58279  LWLockTranche   | wal_insert

tps = 1150.803133 (including connections establishing)

What jumps out here is, at least to me, is that there is furious
contention on both the wal_insert locks and on WALWriteLock.
Apparently, the system simply can't get WAL on disk fast enough to
keep up with this workload.   Relation extension locks and
buffer_content locks also are also pretty common, both ahead of
ClientRead, a relatively uncommon wait event on this test.  The load
average on the system was only about 3 during this test, indicating
that most processes are in fact spending most of their time off-CPU.
The first thing I tried was switching to unlogged tables, which
produces these results:

   1  BufferPin   | BufferPin
   1  LWLockTranche   | lock_manager
   2  LWLockTranche   | buffer_mapping
   8  LWLockNamed | ProcArrayLock
   9  LWLockNamed | CheckpointerCommLock
   9  LWLockNamed | CLogControlLock
  11  LWLockTranche   | buffer_content
  37  LWLockTranche   | clog
 153  Lock| tuple
 388  LWLockNamed | XidGenLock
 827  Lock| transactionid
1267  Client  | ClientRead
   20631  Lock| extend
   91767  |

tps = 1223.239416 (including connections establishing)

If you don't look at the TPS number, these results look like a vast
improvement.  The overall amount of time spent not waiting for
anything is now much higher, and the problematic locks have largely
disappeared from the picture.  However, the load average now shoots up
to about 30, because most of the time that the backends are "not
waiting for anything" they are in fact in kernel wait state D; that
is, they're stuck doing I/O.  This suggests that we might want to
consider advertising a wait state when a backend is doing I/O, so we
can measure this sort of thing.

Next, I tried lowering the scale factor to something that fits in
shared buffers.  Here are the results at scale factor 300:

  14  Lock| 

Re: [HACKERS] Why we lost Uber as a user

2016-08-16 Thread Alfred Perlstein



On 8/3/16 3:29 AM, Greg Stark wrote:


Honestly the take-away I see in the Uber story is that they apparently
had nobody on staff that was on -hackers or apparently even -general
and tried to go it alone rather than involve experts from outside
their company. As a result they misdiagnosed their problems based on
prejudices seeing what they expected to see rather than what the real
problem was.


Agree strongly, but there are still lessons to be learned on the psql side.

-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-16 Thread Alfred Perlstein



On 8/2/16 10:02 PM, Mark Kirkwood wrote:

On 03/08/16 02:27, Robert Haas wrote:


Personally, I think that incremental surgery on our current heap
format to try to fix this is not going to get very far.  If you look
at the history of this, 8.3 was a huge release for timely cleanup of
dead tuple.  There was also significant progress in 8.4 as a result of
5da9da71c44f27ba48fdad08ef263bf70e43e689.   As far as I can recall, we
then made no progress at all in 9.0 - 9.4.  We made a very small
improvement in 9.5 with 94028691609f8e148bd4ce72c46163f018832a5b, but
that's pretty niche.  In 9.6, we have "snapshot too old", which I'd
argue is potentially a large improvement, but it was big and invasive
and will no doubt pose code maintenance hazards in the years to come;
also, many people won't be able to use it or won't realize that they
should use it.  I think it is likely that further incremental
improvements here will be quite hard to find, and the amount of effort
will be large relative to the amount of benefit.  I think we need a
new storage format where the bloat is cleanly separated from the data
rather than intermingled with it; every other major RDMS works that
way.  Perhaps this is a case of "the grass is greener on the other
side of the fence", but I don't think so.


Yeah, I think this is a good summary of the state of play.

The only other new db development to use a non-overwriting design like 
ours that I know of was Jim Starky's Falcon engine for (ironically) 
Mysql 6.0. Not sure if anyone is still progressing that at all now.


I do wonder if Uber could have successfully tamed dead tuple bloat 
with aggressive per-table autovacuum settings (and if in fact they 
tried), but as I think Robert said earlier, it is pretty easy to come 
up with a highly update (or insert + delete) workload that makes for a 
pretty ugly bloat component even with real aggressive autovacuuming.
I also wonder if they had used "star schema" which to my understanding 
would mean multiple tables to replace the single-table that has multiple 
indecies to work around the write amplification problem in postgresql.




Cheers

Mark







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-04 Thread Alfred Perlstein



On 8/4/16 2:00 AM, Torsten Zuehlsdorff wrote:



On 03.08.2016 21:05, Robert Haas wrote:

On Wed, Aug 3, 2016 at 2:23 PM, Tom Lane  wrote:

Robert Haas  writes:

I don't think they are saying that logical replication is more
reliable than physical replication, nor do I believe that to be true.
I think they are saying that if logical corruption happens, you can
fix it by typing SQL statements to UPDATE, INSERT, or DELETE the
affected rows, whereas if physical corruption happens, there's no
equally clear path to recovery.


Well, that's not an entirely unreasonable point, but I dispute the
implication that it makes recovery from corruption an easy thing to do.
How are you going to know what SQL statements to issue?  If the master
database is changing 24x7, how are you going to keep up with that?


I think in many cases people fix their data using business logic.  For
example, suppose your database goes down and you have to run
pg_resetxlog to get it back up.  You dump-and-restore, as one does,
and find that you can't rebuild one of your unique indexes because
there are now two records with that same PK.  Well, what you do is you
look at them and judge which one has the correct data, often the one
that looks more complete or the one with the newer timestamp. Or,
maybe you need to merge them somehow.  In my experience helping users
through problems of this type, once you explain the problem to the
user and tell them they have to square it on their end, the support
call ends.  The user may not always be entirely thrilled about having
to, say, validate a problematic record against external sources of
truth, but they usually know how to do it.  Database bugs aren't the
only way that databases become inaccurate.  If the database that they
use to keep track of land ownership in the jurisdiction where I live
says that two different people own the same piece of property,
somewhere there is a paper deed in a filing cabinet.  Fishing that out
to understand what happened may not be fun, but a DBA can explain that
problem to other people in the organization and those people can get
it fixed.  It's a problem, but it's fixable.

On the other hand, if a heap tuple contains invalid infomask bits that
cause an error every time you read the page (this actually happened to
an EnterpriseDB customer!), the DBA can't tell other people how to fix
it and can't fix it personally either.  Instead, the DBA calls me.


After reading this statement the ZFS filesystem pops into my mind. It 
has protection build in against various problems (data degradation, 
current spikes, phantom writes, etc).


For me this raises two questions:

1) would the usage of ZFS prevent such errors?

My feeling would say yes, but i have no idea about how a invalid 
infomask bit could occur.


2) would it be possible to add such prevention to PostgreSQL

I know this could add a massive overhead, but it its optional this 
could be a fine thing?
Postgresql is very "zfs-like" in its internals.  The problem was a bug 
in postgresql that caused it to just write data to the wrong place.


Some vendors use ZFS under databases to provide very cool services such 
as backup snapshots, test snapshots and other such uses.  I think Joyent 
is one such vendor but I'm not 100% sure.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-03 Thread Alfred Perlstein


> On Aug 3, 2016, at 3:29 AM, Greg Stark  wrote:
> 
>> 
> 
> Honestly the take-away I see in the Uber story is that they apparently
> had nobody on staff that was on -hackers or apparently even -general
> and tried to go it alone rather than involve experts from outside
> their company. As a result they misdiagnosed their problems based on
> prejudices seeing what they expected to see rather than what the real
> problem was.
> 

+1 very true. 

At the same time there are some lessons to be learned. At the very least 
putting in big bold letters where to come for help is one. 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein



On 8/2/16 2:14 PM, Tom Lane wrote:

Stephen Frost  writes:

With physical replication, there is the concern that a bug in *just* the
physical (WAL) side of things could cause corruption.

Right.  But with logical replication, there's the same risk that the
master's state could be fine but a replication bug creates corruption on
the slave.

Assuming that the logical replication works by issuing valid SQL commands
to the slave, one could hope that this sort of "corruption" only extends
to having valid data on the slave that fails to match the master.
But that's still not a good state to be in.  And to the extent that
performance concerns lead the implementation to bypass some levels of the
SQL engine, you can easily lose that guarantee too.

In short, I think Uber's position that logical replication is somehow more
reliable than physical is just wishful thinking.  If anything, my money
would be on the other way around: there's a lot less mechanism that can go
wrong in physical replication.  Which is not to say there aren't good
reasons to use logical replication; I just do not believe that one.

regards, tom lane


The reason it can be less catastrophic is that for logical replication 
you may futz up your data, but you are safe from corrupting your entire 
db.  Meaning if an update is missed or doubled that may be addressed by 
a fixup SQL stmt, however if the replication causes a write to the 
entirely wrong place in the db file then you need to "fsck" your db and 
hope that nothing super critical was blown away.


The impact across a cluster is potentially magnified by physical 
replication.


So for instance, let's say there is a bug in the master's write to 
disk.  The logical replication acts as a barrier from that bad write 
going to the slaves.   With bad writes going to slaves then any 
corruption experienced on the master will quickly reach the slaves and 
they too will be corrupted.


With logical replication a bug may be stopped at the replication layer.  
At that point you can resync the slave from the master.


Now in the case of physical replication all your base are belong to zuul 
and you are in a very bad state.


That said with logical replication, who's to say that if the statement 
is replicated to a slave that the slave won't experience the same bug 
and also corrupt itself.


We may be saying the same thing, but still there is something to be said 
for logical replication... also, didnt they show that logical 
replication was faster for some use cases at Uber?


-Alfred







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein


> On Aug 2, 2016, at 2:33 AM, Geoff Winkless <pgsqlad...@geoff.dj> wrote:
> 
>> On 2 August 2016 at 08:11, Alfred Perlstein <alf...@freebsd.org> wrote:
>>> On 7/2/16 4:39 AM, Geoff Winkless wrote:
>>> I maintain that this is a nonsense argument. Especially since (as you 
>>> pointed out and as I missed first time around) the bug actually occurred at 
>>> different records on different slaves, so he invalidates his own point.
> 
>> Seriously?
> 
> No, I make a habit of spouting off random arguments to a list full of
> people whose opinions I massively respect purely for kicks. What do
> you think?
> 
>> There's a valid point here, you're sending over commands at the block level, 
>> effectively "write to disk at this location" versus "update this record 
>> based on PK", obviously this has some drawbacks that are reason for concern.
> 
> Writing values directly into file offsets is only problematic if
> something else has failed that has caused the file to be an inexact
> copy. If a different bug occurred that caused the primary key to be
> corrupted on the slave (or indeed the master), PK-based updates would
> exhibit similar propagation errors.
> 
> To reiterate my point, uber's described problem came about because of
> a bug. Every software has bugs at some point in its life, to pretend
> otherwise is simply naive. I'm not trying to excuse the bug, or to
> belittle the impact that such a bug has on data integrity or on uber
> or indeed on the reputation of PostgreSQL. While I'm prepared to
> accept (because I have a job that requires I spend time on things
> other than digging through obscure reddits and mailing lists to
> understand more fully the exact cause) that in _this particular
> instance_ the bug was propagated because of the replication mechanism
> (although I'm still dubious about that, as per my comment above), that
> does _not_ preclude other bugs propagating in a statement-based
> replication. That's what I said is a nonsense argument, and no-one has
> yet explained in what way that's incorrect.
> 
> Geoff


Geoff,

You are quite technical, my feeling is that you will understand it, however it 
will need to be a self learned lesson. 

-Alfred




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein



On 7/26/16 9:54 AM, Joshua D. Drake wrote:

Hello,

The following article is a very good look at some of our limitations 
and highlights some of the pains many of us have been working "around" 
since we started using the software.


https://eng.uber.com/mysql-migration/

Specifically:

* Inefficient architecture for writes
* Inefficient data replication
* Issues with table corruption
* Poor replica MVCC support
* Difficulty upgrading to newer releases

It is a very good read and I encourage our hackers to do so with an 
open mind.


Sincerely,

JD


It was a good read.

Having based a high performance web tracking service as well as a high 
performance security appliance on Postgresql I too have been bitten by 
these issues.


I had a few questions that maybe the folks with core knowledge can answer:

1) Would it be possible to create a "star-like" schema to fix this 
problem?  For example, let's say you have a table that is similar to Uber's:

col0pk, col1, col2, col3, col4, col5

All cols are indexed.
Assuming that updates happen to only 1 column at a time.
Why not figure out some way to encourage or automate the splitting of 
this table into multiple tables that present themselves as a single table?


What I mean is that you would then wind up with the following tables:
table1: col0pk, col1
table2: col0pk, col2
table3: col0pk, col3
table4: col0pk, col4
table5: col0pk, col5

Now when you update "col5" on a row, you only have to update the index 
on table5:col5 and table5:col0pk as opposed to beforehand where you 
would have to update more indecies.  In addition I believe that vacuum 
would be somewhat mitigated as well in this case.


2) Why not have a look at how innodb does its storage, would it be 
possible to do this?


3) For the small-ish table that Uber mentioned, is there a way to "have 
it in memory" however provide some level of sync to disk so that it is 
consistent?


thanks!
-Alfred




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein



On 7/28/16 7:08 AM, Merlin Moncure wrote:


*) postgres may not be the ideal choice for those who want a thin and
simple database
This is a huge market, addressing it will bring mindshare and more jobs, 
code and braintrust to psql.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-02 Thread Alfred Perlstein



On 7/28/16 4:39 AM, Geoff Winkless wrote:


On 28 Jul 2016 12:19, "Vitaly Burovoy" > wrote:

>
> On 7/28/16, Geoff Winkless > wrote:
> > On 27 July 2016 at 17:04, Bruce Momjian > wrote:

> >
> >> Well, their big complaint about binary replication is that a bug can
> >> spread from a master to all slaves, which doesn't happen with 
statement

> >> level replication.
> >
> > ​
> > I'm not sure that that makes sense to me. If there's a database 
bug that
> > occurs when you run a statement on the master, it seems there's a 
decent
> > chance that that same bug is going to occur when you run the same 
statement

> > on the slave.
> >
> > Obviously it depends on the type of bug and how identical the 
slave is, but

> > statement-level replication certainly doesn't preclude such a bug from
> > propagating.
> >
> > ​Geoff
>
> Please, read the article first! The bug is about wrong visibility of
> tuples after applying WAL at slaves.
> For example, you can see two different records selecting from a table
> by a primary key (moreover, their PKs are the same, but other columns
> differ).

I read the article. It affected slaves as well as the master.

I quote:
"because of the way replication works, this issue has the potential to 
spread into all of the databases in a replication hierarchy"


I maintain that this is a nonsense argument. Especially since (as you 
pointed out and as I missed first time around) the bug actually 
occurred at different records on different slaves, so he invalidates 
his own point.


Geoff


Seriously?

There's a valid point here, you're sending over commands at the block 
level, effectively "write to disk at this location" versus "update this 
record based on PK", obviously this has some drawbacks that are reason 
for concern.  Does it validate the move on its own? NO.  Does it add to 
the reasons to move away?  Yes, that much is obvious.


Please read this thread:
https://www.reddit.com/r/programming/comments/4vms8x/why_we_lost_uber_as_a_user_postgresql_mailing_list/d5zx82n

Do I love postgresql?  Yes.
Have I been bitten by things such as this?  Yes.
Should the community learn from these things and think of ways to avoid 
it?  Absolutely!


-Alfred


[HACKERS] Question about durability and postgresql.

2015-02-20 Thread Alfred Perlstein
Hello,

We have a combination of 9.3 and 9.4 databases used for logging of data.

We do not need a strong durability guarantee, meaning it is ok if on crash a 
minute or two of data is lost from our logs.  (This is just stats for our 
internal tool).

I am looking at this page:
http://www.postgresql.org/docs/9.4/static/non-durability.html

And it's not clear which setting I should turn on.

What we do NOT want is to lose the entire table or corrupt the database.  We do 
want to gain speed though by not making DATA writes durable.

Which setting is appropriate for this use case?

At a glance it looks like a combination of
1) Turn off synchronous_commit
and possibly:
2)  Increase checkpoint_segments and checkpoint_timeout ; this reduces the 
frequency of checkpoints, but increases the storage requirements of /pg_xlog.
3) Turn off full_page_writes; there is no need to guard against partial page 
writes.

The point here is to never get a corrupt database, but in case of crash we 
might lose a few minutes of last transactions.

Any suggestions please?

thank you,
-Alfred

[HACKERS] Question about durability and postgresql.

2015-02-19 Thread Alfred Perlstein

Hello,

We have a combination of 9.3 and 9.4 databases used for logging of data.

We do not need a strong durability guarantee, meaning it is ok if on crash a 
minute or two of data is lost from our logs.  (This is just stats for our 
internal tool).

I am looking at this page:
http://www.postgresql.org/docs/9.4/static/non-durability.html

And it's not clear which setting I should turn on.

What we do NOT want is to lose the entire table or corrupt the database.  We do 
want to gain speed though by not making DATA writes durable.

Which setting is appropriate for this use case?

At a glance it looks like a combination of
1) Turn off synchronous_commit
and possibly:
2)  Increase checkpoint_segments and checkpoint_timeout ; this reduces the 
frequency of checkpoints, but increases the storage requirements of /pg_xlog.
3) Turn off full_page_writes; there is no need to guard against partial page 
writes.

The point here is to never get a corrupt database, but in case of crash we 
might lose a few minutes of last transactions.

Any suggestions please?

thank you,
-Alfred



Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-26 Thread Alfred Perlstein
JFYI we have 3 or 4 machines racked for the pgsql project in our DC. 

Tom informed me he would be lighting them up this week time permitting.  

Sent from my iPhone

 On Apr 26, 2014, at 6:15 PM, Stephen Frost sfr...@snowman.net wrote:
 
 Jim,
 
 * Jim Nasby (j...@nasby.net) wrote:
 On 4/22/14, 5:01 PM, Alfred Perlstein wrote:
 We also have colo space and power, etc.  So this would be the whole deal.  
 The cluster would be up for as long as needed.
 
 Are the machine specs sufficient?  Any other things we should look for?
 
 CC'd Tom on this email.
 
 Did anyone respond to this off-list?
 
 Yes, I did follow-up with Tom.  I'll do so again, as the discussion had
 died down.
 
 Would these machines be more useful as dedicated performance test servers 
 for the community or generic BenchFarm members?
 
 I don't believe they would be terribly useful as buildfarm systems; we
 could set up similar systems with VMs to just run the regression tests.
 Where I see these systems being particularly valuable would be as the
 start of our performance farm, and perhaps one of the systems as a PG
 infrastructure server.
 
Thanks!
 
Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Alfred Perlstein


On 4/22/14, 8:26 AM, Andrew Dunstan wrote:


On 04/22/2014 01:36 AM, Joshua D. Drake wrote:


On 04/21/2014 06:19 PM, Andrew Dunstan wrote:



If we never start we'll never get there.

I can think of several organizations that might be approached to donate
hardware.


Like .Org?

We have a hardware farm, a rack full of hardware and spindles. It 
isn't the most current but it is there.






I'm going away tomorrow for a few days RR. when I'm back next week I 
will set up a demo client running this module. If you can have a 
machine prepped for this purpose by then so much the better, otherwise 
I will have to drag out a box I recently rescued and have been waiting 
for something to use it with. It's more important that it's stable 
(i.e. nothing else running on it) than that it's very powerful. It 
could be running Ubuntu or some Redhattish variant or, yes, even FreeBSD.


cheers

andrew




Hey folks, I just spoke with our director of netops Tom Sparks here at 
Norse and we have a vested interest in Postgresql.  We can throw 
together a cluster of 4 machines with specs approximately in the range 
of dual quad core westmere with ~64GB of ram running FreeBSD 10 or 11.  
We can also do an Ubungu install as well or other Linux distro.  Please 
let me know if that this would be a something that the project could 
make use of please.


We also have colo space and power, etc.  So this would be the whole 
deal.  The cluster would be up for as long as needed.


Are the machine specs sufficient?  Any other things we should look for?

CC'd Tom on this email.

-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 4:10 AM, Andres Freund wrote:

Hi,

On 2014-04-20 11:24:38 +0200, Palle Girgensohn wrote:

I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
wondering who to poke to mitigate the problem. In reference to this thread [1], 
who where the FreeBSD people that Francois mentioned? If mmap needs to perform 
well in the kernel, I'd like to know of someone with FreeBSD kernel knowledge 
who is interested in working with mmap perfocmance. If mmap is indeed the 
cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the mmap patch, 
although I believe Francois did just that with similar results.

If there are indeed such large regressions on FreeBSD we need to treat
them as postgres regressions. It's nicer not to add config options for
things that don't need it, but apparently that's not the case here.

Imo this means we need to add GUC to control wether anon mmap() or sysv
shmem is to be used. In 9.3.

Greetings,

Andres Freund

Andres, thank you.  Speaking as a FreeBSD developer that would be a good 
idea.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 8:45 AM, Andrew Dunstan wrote:


On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund 
and...@2ndquadrant.com mailto:and...@2ndquadrant.com wrote:


On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com
mailto:and...@2ndquadrant.com writes:
  If there are indeed such large regressions on FreeBSD we need
to treat
  them as postgres regressions. It's nicer not to add config
options for
  things that don't need it, but apparently that's not the case
here.

  Imo this means we need to add GUC to control wether anon
mmap() or sysv
  shmem is to be used. In 9.3.

 I will resist this mightily.  One of the main reasons to switch
to mmap
 was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm 
implementations

is based on sysv shmem. So I don't see this as a convincing reason.

Regressing installed OSs by 15-20% just to save a couple of lines of
docs and code seems rather unconvincing to me.


There's also the fact that even if it's changed in FreeBSD, that 
might be somethign that takes years to trickle out to whatever stable 
release people are actually using.


But do we really want a *guc* for it though? Isn't it enough (and in 
fact better) with a configure switch to pick the implementation when 
multiple are available, that could then be set by default for example 
by the freebsd ports build? That's a lot less overhead to keep 
dragging around...





That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.


I am unsure of the true overhead of making this a runtime tunable so 
pardon if I'm asking for a lot.  From the perspective of both an OS 
developer and postgresql user (I am both) it really makes more sense to 
have it a runtime tunable for the following reasons:


From an OS developer making this a runtime allows us to much more 
easily do the testing (instead of needing two compiled versions).
From a sysadmin perspective it makes switching to/from a LOT easier in 
case the new mmap code exposes a stability or performance bug.


-Alfred






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 8:58 AM, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-04-21 11:45:49 -0400, Andrew Dunstan wrote:

That seems to make more sense. I can't imagine why this would be a runtime
parameter as opposed to build time.

Because that implies that packagers and porters need to make that
decision. If it's a GUC people can benchmark it and decide.

As against that, the packager would be more likely to get it right
(or even to know that there's an issue).


Can the package builder not set the default for the runtime tunable?

Honestly we're about to select a db platform for another FreeBSD based 
system we are building, I strongly hoping that we can get back to 
sysvshm easily otherwise we may have to select another store.


-Alfred (who still remembers back when Tom had a login on our primary db 
to help us. :) )




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 9:13 AM, Stephen Frost wrote:

* Alfred Perlstein (alf...@freebsd.org) wrote:

Can the package builder not set the default for the runtime tunable?

Yeah, I was thinking about that also, but at least in this case it seems
pretty clear that the 'right' answer is known at build time.


Honestly we're about to select a db platform for another FreeBSD
based system we are building, I strongly hoping that we can get back
to sysvshm easily otherwise we may have to select another store.

Is there no hope of this getting fixed in FreeBSD..?  PG wouldn't be the
only application impacted by this, I'm sure...
There is definitely hope, however changes to the FreeBSD vm are taken as 
seriously as changes to core changes to Postresql's store. In addition 
changes to vm is somewhat in the realm of complexity of Postgresql store 
as well so it may not be coming in the next few days/weeks, but rather a 
month or two.  I am not sure if an easy fix is available in FreeBSD but 
we will see in short order.


I need to do some research.  I work with Adrian (FreeBSD kernel dev 
mentioned earlier in the thread), I'll grab him today and discuss what 
the issue may be.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 9:24 AM, Andrew Dunstan wrote:


On 04/21/2014 11:59 AM, Alfred Perlstein wrote:

On 4/21/14 8:45 AM, Andrew Dunstan wrote:


On 04/21/2014 11:39 AM, Magnus Hagander wrote:
On Mon, Apr 21, 2014 at 4:51 PM, Andres Freund 
and...@2ndquadrant.com mailto:and...@2ndquadrant.com wrote:


On 2014-04-21 10:45:24 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com
mailto:and...@2ndquadrant.com writes:
  If there are indeed such large regressions on FreeBSD we need
to treat
  them as postgres regressions. It's nicer not to add config
options for
  things that don't need it, but apparently that's not the case
here.

  Imo this means we need to add GUC to control wether anon
mmap() or sysv
  shmem is to be used. In 9.3.

 I will resist this mightily.  One of the main reasons to switch
to mmap
 was so we would no longer have to explain about SysV shm
configuration.

It's still explained in the docs and one of the dynshm 
implementations
is based on sysv shmem. So I don't see this as a convincing 
reason.


Regressing installed OSs by 15-20% just to save a couple of 
lines of

docs and code seems rather unconvincing to me.


There's also the fact that even if it's changed in FreeBSD, that 
might be somethign that takes years to trickle out to whatever 
stable release people are actually using.


But do we really want a *guc* for it though? Isn't it enough (and 
in fact better) with a configure switch to pick the implementation 
when multiple are available, that could then be set by default for 
example by the freebsd ports build? That's a lot less overhead to 
keep dragging around...





That seems to make more sense. I can't imagine why this would be a 
runtime parameter as opposed to build time.


I am unsure of the true overhead of making this a runtime tunable so 
pardon if I'm asking for a lot.  From the perspective of both an OS 
developer and postgresql user (I am both) it really makes more sense 
to have it a runtime tunable for the following reasons:


From an OS developer making this a runtime allows us to much more 
easily do the testing (instead of needing two compiled versions).
From a sysadmin perspective it makes switching to/from a LOT easier 
in case the new mmap code exposes a stability or performance bug.





1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only add 
them when we must. The more there are the more we confuse users. If a 
packager can pick a default surely they can pick build options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a user 
and a fan of the system. :)




-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 9:34 AM, Stephen Frost wrote:

* Alfred Perlstein (alf...@freebsd.org) wrote:

There is definitely hope, however changes to the FreeBSD vm are
taken as seriously as changes to core changes to Postresql's store.
In addition changes to vm is somewhat in the realm of complexity of
Postgresql store as well so it may not be coming in the next few
days/weeks, but rather a month or two.  I am not sure if an easy fix
is available in FreeBSD but we will see in short order.

This has been known for over a year.. :(

I know!  I remember warning y'all about it back at pgcon last year. :)



I need to do some research.  I work with Adrian (FreeBSD kernel dev
mentioned earlier in the thread), I'll grab him today and discuss
what the issue may be.

Hopefully that'll get things moving in the right direction, finally..
Sure, to be fair, we are under the gun here for a product, it may just 
mean that the end result of that conversation is mysql.


I'm hoping we can use Postgresql as I've been a huge fan since 1999.  I 
based my first successful project on it and had a LOT of help from the 
pgsql community, Tom, Bruce and we even contracted Vadim for some work 
on incremental vacuums!


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 9:38 AM, Andrew Dunstan wrote:


On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse users. 
If a packager can pick a default surely they can pick build options 
too.
Thank you for the lecture Andrew!  Really pleasant way to treat a 
user and a fan of the system. :)





I confess to being mightily confused.


Sure, to clarify:

Andrew, you just told someone who in a db stack sits both below (as a 
pgsql user 15 years) and above (as a FreeBSD kernel dev 15 years) your 
software what they really need.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein

On 4/21/14 4:10 AM, Andres Freund wrote:

Hi,

On 2014-04-20 11:24:38 +0200, Palle Girgensohn wrote:

I see performance degradation with PostgreSQL 9.3 vs 9.2 on FreeBSD, and I'm 
wondering who to poke to mitigate the problem. In reference to this thread [1], 
who where the FreeBSD people that Francois mentioned? If mmap needs to perform 
well in the kernel, I'd like to know of someone with FreeBSD kernel knowledge 
who is interested in working with mmap perfocmance. If mmap is indeed the 
cuplrit, I've just tested 9.2.8 vs 9.3.4, I nevere isolated the mmap patch, 
although I believe Francois did just that with similar results.

If there are indeed such large regressions on FreeBSD we need to treat
them as postgres regressions. It's nicer not to add config options for
things that don't need it, but apparently that's not the case here.

Imo this means we need to add GUC to control wether anon mmap() or sysv
shmem is to be used. In 9.3.

Greetings,

Andres Freund

Andres, thank you.  Speaking as a FreeBSD developer that would be a good 
idea.


--
Alfred Perlstein



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14, 9:51 AM, Andrew Dunstan wrote:


On 04/21/2014 12:44 PM, Alfred Perlstein wrote:

On 4/21/14 9:38 AM, Andrew Dunstan wrote:


On 04/21/2014 12:25 PM, Alfred Perlstein wrote:






1. OS developers are not the target audience for GUCs. If the OS 
developers want to test and can't be botherrd with building with a 
couple of different parameters then I'm not very impressed.


2. We should be trying to get rid of GUCs where possible, and only 
add them when we must. The more there are the more we confuse 
users. If a packager can pick a default surely they can pick build 
options too.
Thank you for the lecture Andrew!  Really pleasant way to treat a 
user and a fan of the system. :)





I confess to being mightily confused.


Sure, to clarify:

Andrew, you just told someone who in a db stack sits both below (as a 
pgsql user 15 years) and above (as a FreeBSD kernel dev 15 years) 
your software what they really need.






I told you what *we* (i.e. the PostgreSQL community) need, IMNSHO (and 
speaking as a Postgres developer and consultant of 10 or so years 
standing).


How high on the hierarchy of PostgreSQL's needs is making a single 
option a tunable versus compile time thing?  I mean seriously you mean 
to stick on this one point when one of your users are asking you about 
this?   That is pretty concerning to me.


-Alfred





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14, 9:51 AM, Andres Freund wrote:

On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:

Sure, to be fair, we are under the gun here for a product, it may just mean
that the end result of that conversation is mysql.

Personally arguments in that vain are removing just about any incentive
I have to work on the problem.


I was just explaining that we have a timeline over here and while that 
may disincentive you for providing what we need it would be very unfair.


In that I mean sometimes the reality of a situation can be inconvenient 
and for that I do apologize.


What I am seeing here is unfortunately a very strong departure from 
FreeBSD support by the community from several of the developers.  In 
fact over drinks at pgcon last year there were a TON of jokes making fun 
of FreeBSD users and developers which I took in stride as professional 
joking with alcohol involved.  I thought it was pretty funny.  However a 
year later and I realize that there appears to be a real problem with 
FreeBSD in the pgsql community.


There are other Linux centric dbs to pick from.  If pgsql is just 
another Linux centric DB then that is unfortunate but something I can 
deal with.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14, 9:52 AM, Alvaro Herrera wrote:

Alfred Perlstein wrote:


I am unsure of the true overhead of making this a runtime tunable so
pardon if I'm asking for a lot.  From the perspective of both an
OS developer and postgresql user (I am both) it really makes more
sense to have it a runtime tunable for the following reasons:

 From an OS developer making this a runtime allows us to much more
easily do the testing (instead of needing two compiled versions).
 From a sysadmin perspective it makes switching to/from a LOT easier
in case the new mmap code exposes a stability or performance bug.

In this case, AFAICS the only overhead of a runtime option (what we call
a GUC) is the added potential for user confusion, and the necessary
documentation.  If we instead go for a compile-time option, both items
become smaller.

In any case, I don't see that there's much need for a runtime option,
really; you already know that the mmap code path is slower in FreeBSD.
You only need to benchmark both options once the FreeBSD vm code itself
is fixed, right?

In fact, it might not even need to be a configure option; I would
suggest a pg_config_manual.h setting instead, and perhaps tweaks to the
src/template/freebsd file to enable it automatically on the broken
FreeBSD releases.  We could then, in the future, have the template
itself turn the option off for the future FreeBSD release that fixes the
problem.

That is correct, until you're in prod and suddenly one option becomes 
unstable, or you want to try a quick kernel patch without rebooting.


Look, this is an argument I've lost time and time again in open source 
software communities, the idea of a software option as opposed to 
compile time really seems to hit people the wrong way.


I think from now on it just makes sense to sit back and let whatever 
happens happen.


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14, 11:14 AM, Stephen Frost wrote:

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:

On 4/21/14, 9:51 AM, Andres Freund wrote:

On 2014-04-21 09:42:06 -0700, Alfred Perlstein wrote:

Sure, to be fair, we are under the gun here for a product, it may just mean
that the end result of that conversation is mysql.

Personally arguments in that vain are removing just about any incentive
I have to work on the problem.

I was just explaining that we have a timeline over here and while
that may disincentive you for providing what we need it would be
very unfair.

I'm pretty sure Andres was referring to the part where there's a
'threat' to move to some other platform due to a modest performance
degredation, as if it's the only factor involved in making a decision
among the various RDBMS options.  If that's really your deciding
criteria instead of the myriad of other factors, I daresay you have your
priorities mixed up.


There are other Linux centric dbs to pick from.  If pgsql is just
another Linux centric DB then that is unfortunate but something I
can deal with.

These attacks really aren't going to get you anywhere.  We're talking
about a specific performance issue that FreeBSD has and how much PG
(surely not the only application impacted by this issue) should bend
to address it, even though the FreeBSD folks were made aware of the
issue over year ago and have done nothing to address it.

Moreover, you'd like to also define the way we deal with the issue as
being to make it runtime configurable rather than as a compile-time
option, even though 90% of the users out there won't understand the
difference nor would know how to correctly set it (and, in many cases,
may end up making the wrong decision because it's the default for
other platforms, unless we add more code to address this at initdb
time).

Basically, it doesn't sound like you're terribly concerned with the
majority of our user base, even on FreeBSD, and would prefer to try
and browbeat us into doing what you've decided is the correct solution
because it'd work better for you.

I've been guiltly of the same in the past and it's not fun having to
back off from a proposal when it's pointed out that there's a better
option, particularly when it doesn't seem like the alternative is
better for me, but that's just part of working in any large project.

Stephen, please calm down on the hyperbole, seriously, picking another 
db is not an attack.


I was simply asking for a feature that would make my life easier as both 
an admin deploying postgresql and a kernel dev attempting to fix a 
problem.  I'm one guy, probably the only guy right now asking.  Honestly 
the thought of needing to compile two versions of postgresql to do sysv 
vs mmap performance would take me more time than I would like to devote 
to the issue when my time is already limited.


Again, it was an ask, you are free to do what you like, the same way you 
were free to ignore my advice at pgcon about mmap being less efficient.  
It does not make what I'm saying an attack.  Just like when 
interviewing people choosing a different candidate for a job is not an 
attack on the other candidates!


-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14, 12:47 PM, Stephen Frost wrote:
  Asking for help to address the FreeBSD performance would have been 
much better received. Thanks, Stephen 


That is exactly what I did, I asked for a version of postgresql that was 
easy to switch at runtime between two behaviors.


That would make it a LOT easier to run a few scripts and make sure I got 
the correct binary without having to munge PREFIX and a bunch of PATH 
and other tools to get my test harness to DTRT.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-21 Thread Alfred Perlstein


On 4/21/14, 2:23 PM, Stephen Frost wrote:

Alfred,

* Alfred Perlstein (alf...@freebsd.org) wrote:

On 4/21/14, 12:47 PM, Stephen Frost wrote:

  Asking for help to address the FreeBSD performance would have
been much better received. Thanks, Stephen

That is exactly what I did, I asked for a version of postgresql that
was easy to switch at runtime between two behaviors.

That would make it a LOT easier to run a few scripts and make sure I
got the correct binary without having to munge PREFIX and a bunch of
PATH and other tools to get my test harness to DTRT.

I'm sure one of the hackers would be happy to provide you with a patch
to help you with your testing.

That would be fine.

That's quite a different thing from asking for a GUC to be provided and
then supported over the next 5 years as part of the core release, which
is what I believe we all thought you were asking for.
I did not know that GUCs were not classified into 
experimental/non-experimental.  The fact that a single GUC would need 
to be supported for 5 years is definitely something to consider.  Now I 
understand the push back a little more.


-Alfred



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PGCON meetup FreeNAS/FreeBSD: In Ottawa Tue Wed.

2013-05-20 Thread Alfred Perlstein

Hello PostgreSQL Hackers,

I am now in Ottawa, last week we wrapped up the BSDCon and I was hoping 
to chat with a few Postgresql developers in person about using 
Postgresql in FreeNAS and offering it as an extension to the platform as 
a plug-in technology.  Unfortunately due to time constraints I can not 
attend the entire conference and I am only in town until Wednesday at noon.


I'm hoping there's a good time to talk to a few developers about 
Postgresql + FreeNAS before I have to depart back to the bay area.


Some info on me:  My name is Alfred Perlstein, I am a FreeBSD developer 
and FreeNAS project lead.  I am the VP of Software Engineering at 
iXsystems.  I have been a fan of Postgresql for many years.  In the 
early 2000s we build a high speed web tracking application on top of 
Postgresql and worked closely with the community to shake out 
performance and bug, so closely that Tom Lane and Vadim Mikheevhad 
logins on our box.  Since that time I have tried to get Postgresql into 
as many places as possible.


Some info on the topics I wanted to briefly discuss:

1) Using Postgresql as the config store for FreeNAS.
We currently use SQLITE, SQLITE fits our needs until we get to the point 
of replication between HA (high availability) units.  Then we are forced 
to manually sync data between configurations.  A discussion on how we 
might do this better using Postgresql, while still maintaining our ease 
of config export (single file) and small footprint would be interesting.


2) Postgresql plugin for FreeNAS.
Flip a switch and suddenly your file server is also serving enterprise 
data.  We currently have a plug-in architecture, but would like to 
discuss the possibility of a tighter integration so that Postgresql 
looks like a more cohesive addition to FreeNAS.


3) Statistic monitoring / EagleEye
In FreeBSD/FreeNAS I have developed a system called EagleEye. EagleEye 
is a system where all mibs are easily exportable with timestamps in a 
common format (for now YAML  modified CSV) which is then consumed by a 
utility which can then provide graphs. The entire point of EagleEye is 
to eventually upstream the modifications to future proof statistics 
tracking into the FreeBSD and FreeNAS systems.  I have spoken with some 
Illuminos/ZFS developers and they are interested as well.


I think that is all I have, please drop me a note if you'll have some 
time in Ottawa today, tomorrow or early Wednesday.  I'd love to discuss 
and buy some beers for the group.


thank you,
-Alfred Perlstein
VP Software Engineering, iXsystems.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010502 14:01] wrote:
 I was talking to a Linux user yesterday, and he said that performance
 using the xfs file system is pretty bad.  He believes it has to do with
 the fact that fsync() on log-based file systems requires more writes.
 
 With a standard BSD/ext2 file system, WAL writes can stay on the same
 cylinder to perform fsync.  Is that true of log-based file systems?
 
 I know xfs and reiser are both log based.  Do we need to be concerned
 about PostgreSQL performance on these file systems?  I use BSD FFS with
 soft updates here, so it doesn't affect me.

The problem with log based filesystems is that they most likely
do not know the consequences of a write so an fsync on a file may
require double writing to both the log and the real portion of
the disk.  They can also exhibit the problem that an fsync may
cause all pending writes to require scheduling unless the log is
constructed on the fly rather than incrementally.

There was also the problem that was brought up recently that
certain versions (maybe all?) of Linux perform fsync() in a very
non-optimal manner, if the user is able to use the O_FSYNC option
rather than fsync he may see a performance increase.

But his guess is probably nearly as good as mine. :)


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
http://www.egr.unlv.edu/~slumos/on-netbsd.html

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010502 15:20] wrote:
  The problem with log based filesystems is that they most likely
  do not know the consequences of a write so an fsync on a file may
  require double writing to both the log and the real portion of
  the disk.  They can also exhibit the problem that an fsync may
  cause all pending writes to require scheduling unless the log is
  constructed on the fly rather than incrementally.
 
 Yes, this double-writing is a problem.  Suppose you have your WAL on a
 separate drive.  You can fsync() WAL with zero head movement.  With a
 log based file system, you need two head movements, so you have gone
 from zero movements to two.

It may be worse depending on how the filesystem actually does
journalling.  I wonder if an fsync() may cause ALL pending
meta-data to be updated (even metadata not related to the 
postgresql files).

Do you know if reiser or xfs have this problem?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] COPY commands could use an enhancement.

2001-04-30 Thread Alfred Perlstein

It would be very helpful if the COPY command could be expanded
in order to provide positional parameters.

I noticed that it didn't a while back and it can really hurt
someone when they happen to try to use pg_dump to move data
from one database to another database and they happened to
create the feilds in the tables in different orders.

Basically:
COPY webmaster FROM stdin;

could become:
COPY webmaster FIELDS id, name, ssn FROM stdin;

this way when sourcing it would know where to place the
feilds.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] COPY commands could use an enhancement.

2001-04-30 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010430 08:37] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  It would be very helpful if the COPY command could be expanded
  in order to provide positional parameters.
 
 I think it's a bad idea to try to expand COPY into a full-tilt data
 import/conversion utility, which is the direction that this sort of
 suggestion is headed in.  COPY is designed as a simple, fast, reliable,
 low-overhead data transfer mechanism for backup and restore.  The more
 warts we add to it, the less well it will serve that purpose.

Honestly it would be hard for COPY to be any more less serving of
people's needs, it really makes sense for it to be able to parse
positional paramters for both speed and correctness.

 Example: if we allow selective column import, what do we do with missing
 columns?

What is already done, if you initiate a copy into a 5 column table
using only 4 columns of copy data the fifth is left empty.

 Must COPY now be able to handle insertion of default-value
 expressions?

No, copy should be what it is simple but at the same time useful
enough for bulk transfer without painful contortions and fear
of modifying tables.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Re: SAP-DB

2001-04-29 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010429 10:44] wrote:
   I swore I'd never post to the hackers list again, but this is an amazing
   statement by Bruce.
   
   Boy, the robustness of the software is determined by the number of characters
   in the directory name?
   
   By the languages used?
  
  [Snip]
  
  My guess is that Bruce was implying that the code was obfuscated. It is a
  common trick for closed source to be open but not really.
  
  I don't think it was any sort of technology snobbery. Far be it for me to
  suggest an explanation to the words of others, that is just how I read it.
 
 I don't think they intentionally confused the code.
 
 The real problem I see in that it was very hard for me to find anything
 in the code.  I would be interested to see if others can find stuff.

I think this is general problem in a lot of projects, you open up
foo.c and say... what the heck is this... after a few hours of
studying the source you finally figure out is something that does
miniscule part X of massive part Y and by then you're too engrossed
to write a little banner for the file or dir explaining what it's
for and incorrectly assume that even if you did, it wouldn't help
that user unless he went through the same painful steps that you
did.

Been there, done that.. er, actually, still there, mostly still
doing that.  :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
http://www.egr.unlv.edu/~slumos/on-netbsd.html

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Thanks, naming conventions, and count()

2001-04-29 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010429 20:14] wrote:

 Yes, I like that idea, but the problem is that it is hard to update just
 one table in the file.  You sort of have to update the entire file each
 time a table changes.  That is why I liked symlinks because they are
 per-table, but you are right that the symlink creation could fail
 because the new table file was never created or something, leaving the
 symlink pointing to nothing.  Not sure how to address this.  Is there a
 way to update a flat file when a single table changes?

Sort of, if that flat file is in the form of:
123456;tablename   
33;another_table   

ie, each line is a fixed length.


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Thanks, naming conventions, and count()

2001-04-29 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010429 23:12] wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  big problem is that there is no good way to make the symlinks reliable
  because in a crash, the symlink could point to a table creation that got
  rolled back or the renaming of a table that got rolled back.
 
 Yes.  Have you already forgotten the very long discussion we had about
 this some months back?  There is no way to provide a reliable symlink
 mapping without re-introducing all the same problems that we went to
 numeric filenames to avoid.  Now if you want an *UNRELIABLE* symlink
 mapping, maybe we could talk about it ... but IMHO such a feature would
 be worse than useless.  Murphy's law says that the symlinks would be
 right often enough to mislead dbadmins into trusting them, and wrong
 exactly when it would do the most damage to trust them.  The same goes
 for other methods of unreliably exporting the name-to-number mapping,
 such as dumping it into a flat file.
 
 We do need to document how to get the mapping (ie, select relfilenode,
 relname from pg_class).  But I really doubt that an automated method
 for exporting the mapping would be worth the cycles it would cost,
 even if it could be made reliable which it can't.

Perhaps an external tool to rebuild the symlink state that could be
run on an offline database.  But I'm sure you have more important
things to do. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.1 vacuum

2001-04-27 Thread Alfred Perlstein

* Magnus Naeslund(f) [EMAIL PROTECTED] [010426 21:17] wrote:
 How does 7.1 work now with the vacuum and all?
 
 Does it go for indexes by default, even when i haven't run a vacuum at all?
 Does vacuum lock up postgres? It says the analyze part shouldn't, but how's
 that for all of the vacuum?
 
 An 7.0.3 db we have here we are forced to run vacuum every hour to get an
 acceptable speed, and while doing that vacuum (5-10 minutes) it totaly
 blocks our application that's mucking with the db.

http://people.freebsd.org/~alfred/vacfix/

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Instead of asking why a piece of software is using 1970s technology,
start asking why software is ignoring 30 years of accumulated wisdom.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



[HACKERS] Re: 7.1 vacuum

2001-04-27 Thread Alfred Perlstein

* mlw [EMAIL PROTECTED] [010427 05:50] wrote:
 Alfred Perlstein wrote:
  
  * Magnus Naeslund(f) [EMAIL PROTECTED] [010426 21:17] wrote:
   How does 7.1 work now with the vacuum and all?
  
   Does it go for indexes by default, even when i haven't run a vacuum at all?
   Does vacuum lock up postgres? It says the analyze part shouldn't, but how's
   that for all of the vacuum?
  
   An 7.0.3 db we have here we are forced to run vacuum every hour to get an
   acceptable speed, and while doing that vacuum (5-10 minutes) it totaly
   blocks our application that's mucking with the db.
  
  http://people.freebsd.org/~alfred/vacfix/
 
 What's the deal with vacuum lazy in 7.1? I was looking forward to it. It was
 never clear whether or not you guys decided to put it in.
 
 If it is in as a feature, how does one use it?
 If it is a patch, how does one get it?

If you actually download and read the enclosed READMEs it's pretty
clear.

 If it is neither a patch nor an existing feature, has development stopped?

I have no idea, I haven't been tracking postgresql all that much 
since leaving the place where we contracted that work.


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] CVS tags for betas and release candidate

2001-03-27 Thread Alfred Perlstein

* The Hermit Hacker [EMAIL PROTECTED] [010327 04:53] wrote:
 On Mon, 26 Mar 2001, Matthias Juchem wrote:
 
  Hi there.
 
  I was just looking for the CVS tags for downloading the beta6 and the
  RC1 of 7.1 but there are only the following tags:
 
  REL_7_1_BETA2
  REL_7_1_BETA3
  REL_7_1
 
  Aren't there tags for the versions I am looking for?
 
 Nope ... doing the tags didn't work as well as was hoped, so we've just
 been using date ranges instead ... release itself will be tag'd ...

You know you can nuke tags right?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] pgindent run?

2001-03-21 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010321 21:14] wrote:
  The Hermit Hacker [EMAIL PROTECTED] writes:
   and most times, those have to be merged into the source tree due to
   extensive changes anyway ... maybe we should just get rid of the use of
   pgindent altogether?
  
  I think pgindent is a good thing; the style of different parts of the
  code would vary too much without it.  I'm only unhappy about the risk
  issues of running it at this late stage of the release cycle.
 
 This is the usualdiscussion.  Some like it, some don't like the risk,
 some don't like the timing.  I don't think we ever came up with a better
 time than before RC, though I think we could do it a little earlier in
 beta if people were not holding patches during that period.  It is the
 beta patching folks that we have the most control over.

It seems that you guys are dead set on using this pgindent tool,
this is cool, we'd probably use some indentation tool on the FreeBSD
sources if there was one that met our code style(9) guidelines.

With that said, I really scares the crud out of me to see those massive
pg_indent runs right before you guys do a release.

It would make a lot more sense to force a pgindent run after applying
each patch.  This way you don't loose the history.

You want to be upset with yourself Bruce?  Go into a directory and type:

cvs annotate any file that's been pgindented

cvs annotate is a really, really handy tool, unfortunetly these
indent runs remove this very useful tool as well as do a major job
of obfuscating the code changes.

It's not like you guys have a massive devel team with new people each
week that have a steep committer learning curve ahead of them, making
pgindent as patches are applied should work.

There's also the argument that a developer's pgindent may force a
contributor to resolve conflicts, while this is true, it's also
true that you guys expect diffs to be in context format, comments
to be in english, function prototypes to be new style, etc, etc..

I think contributors can deal with this.

just my usual 20 cents. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()

2001-03-20 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010320 14:10] wrote:
The patch below adds:
   
- acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
   pages
  through mmap() works.  This is taken from Joerg Schilling's star.
- configure.in:  A_FUNC_SMMAP
- ogg123/buffer.c:  If we have a working mmap(), use it to create
  a region of shared memory instead of using System V IPC.
   
Works on BSD.  Should also work on SVR4 and offspring (Solaris),
and Linux.
  
  This is a really bad idea performance wise.  Solaris has a special
  code path for SYSV shared memory that doesn't require tons of swap
  tracking structures per-page/per-process.  FreeBSD also has this
  optimization (it's off by default, but should work since FreeBSD
  4.2 via the sysctl kern.ipc.shm_use_phys=1)
 
  
  Both OS's use a trick of making the pages non-pageable, this allows
  signifigant savings in kernel space required for each attached
  process, as well as the use of large pages which reduce the amount
  of TLB faults your processes will incurr.
 
 That is interesting.  BSDi has SysV shared memory as non-pagable, and I
 always thought of that as a bug.  Seems you are saying that having it
 pagable has a significant performance penalty.  Interesting.

Yes, having it pageable is actually sort of bad.

It doesn't allow you to do several important optimizations.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Final Call: RC1 about to go out the door ...

2001-03-20 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010320 10:21] wrote:
 The Hermit Hacker [EMAIL PROTECTED] writes:
  Speak now, or forever hold your piece (where forever is the time
  between now and RC1 is packaged) ...
 
 I rather hope it's *NOT* 

And still no LAZY vacuum.  *sigh*

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()

2001-03-19 Thread Alfred Perlstein

WOOT WOOT! DANGER WILL ROBINSON!

 - Original Message -
 From: "Christian Weisgerber" [EMAIL PROTECTED]
 Newsgroups: list.vorbis.dev
 To: [EMAIL PROTECTED]
 Sent: Saturday, March 17, 2001 12:01 PM
 Subject: [vorbis-dev] ogg123: shared memory by mmap()
 
 
  The patch below adds:
 
  - acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
 pages
through mmap() works.  This is taken from Joerg Schilling's star.
  - configure.in:  A_FUNC_SMMAP
  - ogg123/buffer.c:  If we have a working mmap(), use it to create
a region of shared memory instead of using System V IPC.
 
  Works on BSD.  Should also work on SVR4 and offspring (Solaris),
  and Linux.

This is a really bad idea performance wise.  Solaris has a special
code path for SYSV shared memory that doesn't require tons of swap
tracking structures per-page/per-process.  FreeBSD also has this
optimization (it's off by default, but should work since FreeBSD
4.2 via the sysctl kern.ipc.shm_use_phys=1)

Both OS's use a trick of making the pages non-pageable, this allows
signifigant savings in kernel space required for each attached
process, as well as the use of large pages which reduce the amount
of TLB faults your processes will incurr.

Anyhow, if you could make this a runtime option it wouldn't be so
evil, but as a compile time option, it's a really bad idea for
Solaris and FreeBSD.

--
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] ODBC/FreeBSD/LinuxEmulation/RPM?

2001-03-19 Thread Alfred Perlstein

* Larry Rosenman [EMAIL PROTECTED] [010319 10:35] wrote:
 
 Is there any way to get just the ODBC RPM to install with OUT
 installing the whole DB? 
 
 I have a strange situation:
 
 StarOffice 5.2 (Linux) Running under FreeBSD Linux Emulation
 PG running NATIVE.
 
 I want the two to talk, using ODBC.
 
 How do I make this happen?

rpm2cpio pg_rpmfile.rpm  pg_rpmfile.cpio
cpio -i  pg_rpmfile.cpio
tar xzvf pg_rpmfile.tgz

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] ODBC/FreeBSD/LinuxEmulation/RPM?

2001-03-19 Thread Alfred Perlstein

* Alfred Perlstein [EMAIL PROTECTED] [010319 11:27] wrote:
 * Larry Rosenman [EMAIL PROTECTED] [010319 10:35] wrote:
  
  Is there any way to get just the ODBC RPM to install with OUT
  installing the whole DB? 
  
  I have a strange situation:
  
  StarOffice 5.2 (Linux) Running under FreeBSD Linux Emulation
  PG running NATIVE.
  
  I want the two to talk, using ODBC.
  
  How do I make this happen?
 
 rpm2cpio pg_rpmfile.rpm  pg_rpmfile.cpio
 cpio -i  pg_rpmfile.cpio
 tar xzvf pg_rpmfile.tgz

Sorry, i was just waking up when I wrote this... the idea is to
extract the rpm then just grab the required ODBC files.

best of luck,
-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Alfred Perlstein

* William K. Volkman [EMAIL PROTECTED] [010318 11:56] wrote:
 The Hermit Hacker wrote:
  
  But, with shared libraries, are you really pulling in a "whole
  thread-support library"?  My understanding of shared libraries (altho it
  may be totally off) was that instead of pulling in a whole library, you
  pulled in the bits that you needed, pretty much as you needed them ...
 
 Just by making a thread call libc changes personality to use thread
 safe routines (I.E. add mutex locking).  Use one thread feature, get
 the whole set...which may not be that bad.

Actually it can be pretty bad.  Locked bus cycles needed for mutex
operations are very, very expensive, not something you want to do
unless you really really need to do it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Alfred Perlstein

* Larry Rosenman [EMAIL PROTECTED] [010318 14:17] wrote:
 * Tom Lane [EMAIL PROTECTED] [010318 14:55]:
  Alfred Perlstein [EMAIL PROTECTED] writes:
   Just by making a thread call libc changes personality to use thread
   safe routines (I.E. add mutex locking).  Use one thread feature, get
   the whole set...which may not be that bad.
  
   Actually it can be pretty bad.  Locked bus cycles needed for mutex
   operations are very, very expensive, not something you want to do
   unless you really really need to do it.
  
  It'd be interesting to try to get some numbers about the actual cost
  of using a thread-aware libc, on platforms where there's a difference.
  Shouldn't be that hard to build a postgres executable with the proper
  library and run some benchmarks ... anyone care to try?
 I can get the code compiled, but don't have the skills to generate
 a test case worthy of anything

There's a 'make test' or something ('regression' maybe?) target that
runs a suite of tests on the database, you could use that as a
bench/timer, you could also try mysql's "crashme" script.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Xu Yifeng [EMAIL PROTECTED] [010316 01:15] wrote:
 Hello Alfred,
 
 Friday, March 16, 2001, 3:21:09 PM, you wrote:
 
 AP * Xu Yifeng [EMAIL PROTECTED] [010315 22:25] wrote:
 
  Could anyone consider fork a syncer process to sync data to disk ?
  build a shared sync queue, when a daemon process want to do sync after
  write() is called, just put a sync request to the queue. this can release
  process from blocked on writing as soon as possible. multipile sync
  request for one file can be merged when the request is been inserting to
  the queue.
 
 AP I suggested this about a year ago. :)
 
 AP The problem is that you need that process to potentially open and close
 AP many files over and over.
 
 AP I still think it's somewhat of a good idea.
 
 I am not a DBMS guru.

Hah, same here. :)

 couldn't the syncer process cache opened files? is there any problem I
 didn't consider ?

1) IPC latency, the amount of time it takes to call fsync will
   increase by at least two context switches.

2) a working set (number of files needed to be fsync'd) that
   is larger than the amount of files you wish to keep open.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010316 07:11] wrote:
   Could anyone consider fork a syncer process to sync data to disk ?
   build a shared sync queue, when a daemon process want to do sync after
   write() is called, just put a sync request to the queue. this can release
   process from blocked on writing as soon as possible. multipile sync
   request for one file can be merged when the request is been inserting to
   the queue.
  
  I suggested this about a year ago. :)
  
  The problem is that you need that process to potentially open and close
  many files over and over.
  
  I still think it's somewhat of a good idea.
 
 I like the idea too, but people want the transaction to return COMMIT
 only after data has been fsync'ed so I don't see a big win.

This isn't simply handing off the sync to this other process, it requires
an ack from the syncer before returning 'COMMIT'.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010316 08:16] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  couldn't the syncer process cache opened files? is there any problem I
  didn't consider ?
 
  1) IPC latency, the amount of time it takes to call fsync will
 increase by at least two context switches.
 
  2) a working set (number of files needed to be fsync'd) that
 is larger than the amount of files you wish to keep open.
 
 These days we're really only interested in fsync'ing the current WAL
 log file, so working set doesn't seem like a problem anymore.  However
 context-switch latency is likely to be a big problem.  One thing we'd
 definitely need before considering this is to replace the existing
 spinlock mechanism with something more efficient.

What sort of problems are you seeing with the spinlock code?

 Vadim has designed the WAL stuff in such a way that a separate
 writer/syncer process would be easy to add; in fact it's almost that way
 already, in that any backend can write or sync data that's been added
 to the queue by any other backend.  The question is whether it'd
 actually buy anything to have another process.  Good stuff to experiment
 with for 7.2.

The delayed/coallecesed (sp?) fsync looked interesting.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 09:35] wrote:
 
 BTW, are there any platforms where O_DSYNC exists but has a different
 spelling?

Yes, FreeBSD only has: O_FSYNC
it doesn't have O_SYNC nor O_DSYNC.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 11:07] wrote:
 "Mikheev, Vadim" [EMAIL PROTECTED] writes:
  ... I would either
  use fsync as default or don't deal with O_SYNC at all.
  But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
  use O_DSYNC by default.
 
 Hm.  We could do that reasonably painlessly as a compile-time test in
 xlog.c, but I'm not clear on how it would play out as a GUC option.
 Peter, what do you think about configuration-dependent defaults for
 GUC variables?

Sorry, what's a GUC? :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 11:45] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  And since we're sorta on the topic of IO, I noticed that it looks
  like (at least in 7.0.3) that vacuum and certain other routines
  read files in reverse order.
 
 Vacuum does that because it's trying to push tuples down from the end
 into free space in earlier blocks.  I don't see much way around that
 (nor any good reason to think that it's a critical part of vacuum's
 performance anyway).  Where else have you seen such behavior?

Just vacuum, but the source is large, and I'm sort of lacking
on database-foo so I guessed that it may be done elsewhere.

You can optimize this out by implementing the read behind yourselves
sorta like this:

struct sglist *
read(fd, len)
{

if (fd.lastpos - fd.curpos = THRESHOLD) {
fd.curpos = fd.lastpos - THRESHOLD;
len = THRESHOLD;
}

return (do_read(fd, len));
}

of course this is entirely wrong, but illustrates what
would/could help.

I would fix FreeBSD, but it's sort of a mess and beyond what
I've got time to do ATM.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Mikheev, Vadim [EMAIL PROTECTED] [010315 13:52] wrote:
  I believe that we don't know enough yet to nail down a hard-wired
  decision.  Vadim's idea of preferring O_DSYNC if it appears to be
  different from O_SYNC is a good first cut, but I think we'd 
  better make it possible to override that, at least for testing purposes.
 
 So let's leave fsync as default and add option to open log files
 with O_DSYNC/O_SYNC.

I have a weird and untested suggestion:

How many files need to be fsync'd?

If it's more than one, what might work is using mmap() to map the
files in adjacent areas, then calling msync() on the entire range,
this would allow you to batch fsync the data.

The only problem is that I'm not sure:

1) how portable msync() is.
2) if msync garauntees metadata consistancy.

Another benifit of mmap() is the 'zero' copy nature of it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 14:54] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  How many files need to be fsync'd?
 
 Only one.
 
  If it's more than one, what might work is using mmap() to map the
  files in adjacent areas, then calling msync() on the entire range,
  this would allow you to batch fsync the data.
 
 Interesting thought, but mmap to a prespecified address is most
 definitely not portable, whether or not you want to assume that
 plain mmap is ...

Yeah... :(

Evil thought though (for reference):

mmap(anon memory) returns addr1
addr2 = addr1 + maplen
split addr1-addr2 on points A B and C
mmap(file1 over addr1 to A)
mmap(file2 over A to B)
mmap(file3 over B to C)
mmap(file4 over C to addr2)

It _should_ work, but there's probably some corner cases where it
doesn't.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner [EMAIL PROTECTED] [010315 16:14] wrote:
 At 06:57 15/03/01 -0500, Jan Wieck wrote:
 
 And  shared  memory has all the interlocking problems we want
 to avoid.
 
 I suspect that if we keep per-backend data in a separate area, then we
 don;t need locking since there is only one writer. It does not matter if a
 reader gets an inconsistent view, the same as if you drop a few UDP packets.

No, this is completely different.

Lost data is probably better than incorrect data.  Either use locks
or a copying mechanism.  People will depend on the data returned
making sense.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner [EMAIL PROTECTED] [010315 16:46] wrote:
 At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
 
 Lost data is probably better than incorrect data.  Either use locks
 or a copying mechanism.  People will depend on the data returned
 making sense.
 
 
 But with per-backend data, there is only ever *one* writer to a given set
 of counters. Everyone else is a reader.

This doesn't prevent a reader from getting an inconsistant view.

Think about a 64bit counter on a 32bit machine.  If you charged per
megabyte, wouldn't it upset you to have a small chance of loosing
4 billion units of sale?

(ie, doing a read after an addition that wraps the low 32 bits
but before the carry is done to the top most signifigant 32bits?)

Ok, what what if everything can be read atomically by itself?

You're still busted the minute you need to export any sort of
compound stat.

If A, B and C need to add up to 100 you have a read race.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Performance monitor signal handler

2001-03-15 Thread Alfred Perlstein

* Philip Warner [EMAIL PROTECTED] [010315 17:08] wrote:
 At 16:55 15/03/01 -0800, Alfred Perlstein wrote:
 * Philip Warner [EMAIL PROTECTED] [010315 16:46] wrote:
  At 16:17 15/03/01 -0800, Alfred Perlstein wrote:
  
  Lost data is probably better than incorrect data.  Either use locks
  or a copying mechanism.  People will depend on the data returned
  making sense.
  
  
  But with per-backend data, there is only ever *one* writer to a given set
  of counters. Everyone else is a reader.
 
 This doesn't prevent a reader from getting an inconsistant view.
 
 Think about a 64bit counter on a 32bit machine.  If you charged per
 megabyte, wouldn't it upset you to have a small chance of loosing
 4 billion units of sale?
 
 (ie, doing a read after an addition that wraps the low 32 bits
 but before the carry is done to the top most signifigant 32bits?)
 
 I assume this means we can not rely on the existence of any kind of
 interlocked add on 64 bit machines?
 
 
 Ok, what what if everything can be read atomically by itself?
 
 You're still busted the minute you need to export any sort of
 compound stat.
 
 Which is why the backends should not do anything other than maintain the
 raw data. If there is atomic data than can cause inconsistency, then a
 dropped UDP packet will do the same.

The UDP packet (a COPY) can contain a consistant snapshot of the data.
If you have dependancies, you fit a consistant snapshot into a single
packet.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Xu Yifeng [EMAIL PROTECTED] [010315 22:25] wrote:
 Hello Tom,
 
 Friday, March 16, 2001, 6:54:22 AM, you wrote:
 
 TL Alfred Perlstein [EMAIL PROTECTED] writes:
  How many files need to be fsync'd?
 
 TL Only one.
 
  If it's more than one, what might work is using mmap() to map the
  files in adjacent areas, then calling msync() on the entire range,
  this would allow you to batch fsync the data.
 
 TL Interesting thought, but mmap to a prespecified address is most
 TL definitely not portable, whether or not you want to assume that
 TL plain mmap is ...
 
 TL regards, tom lane
 
 Could anyone consider fork a syncer process to sync data to disk ?
 build a shared sync queue, when a daemon process want to do sync after
 write() is called, just put a sync request to the queue. this can release
 process from blocked on writing as soon as possible. multipile sync
 request for one file can be merged when the request is been inserting to
 the queue.

I suggested this about a year ago. :)

The problem is that you need that process to potentially open and close
many files over and over.

I still think it's somewhat of a good idea.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein

* Philip Warner [EMAIL PROTECTED] [010312 18:56] wrote:
 At 13:34 12/03/01 -0800, Alfred Perlstein wrote:
 Is it possible
 to have a spinlock over it so that an external utility can take a snapshot
 of it with the spinlock held?
 
 I'd suggest that locking the stats area might be a bad idea; there is only
 one writer for each backend-specific chunk, and it won't matter a hell of a
 lot if a reader gets inconsistent views (since I assume they will be
 re-reading every second or so). All the stats area should contain would be
 a bunch of counters with timestamps, I think, and the cost up writing to it
 should be kept to an absolute minimum.
 
 
 
 just some ideas..
 
 
 Unfortunatley, based on prior discussions, Bruce seems quite opposed to a
 shared memory solution.

Ok, here's another nifty idea.

On reciept of the info signal, the backends collaborate to piece
together a status file.  The status file is given a temporay name.
When complete the status file is rename(2)'d over a well known
file.

This ought to always give a consistant snapshot of the file to
whomever opens it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein

* Philip Warner [EMAIL PROTECTED] [010313 06:42] wrote:
 
 This ought to always give a consistant snapshot of the file to
 whomever opens it.
 
 
 I think Tom has previously stated that there are technical reasons not to
 do IO in signal handlers, and I have philosophical problems with
 performance monitors that ask 50 backends to do file IO. I really do think
 shared memory is TWTG.

I wasn't really suggesting any of those courses of action, all I
suggested was using rename(2) to give a seperate appilcation a
consistant snapshot of the stats.

Actually, what makes the most sense (although it may be a performance
killer) is to have the backends update a system table that the external
app can query.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] WAL SHM principles

2001-03-13 Thread Alfred Perlstein

* Matthew Kirkwood [EMAIL PROTECTED] [010313 13:12] wrote:
 On Tue, 13 Mar 2001, Ken Hirsch wrote:
 
   mlock() guarantees that the locked address space is in memory.  This
   doesn't imply that updates are not written to the backing file.
 
  I've wondered about this myself.  It _is_ true on Linux that mlock
  prevents writes to the backing store,
 
 I don't believe that this is true.  The manpage offers no
 such promises, and the semantics are not useful.

Afaik FreeBSD's Linux emulator:

revision 1.13
date: 2001/02/28 04:30:27;  author: dillon;  state: Exp;  lines: +3 -1
Linux does not filesystem-sync file-backed writable mmap pages on
a regular basis.  Adjust our linux emulation to conform.  This will
cause more dirty pages to be left for the pagedaemon to deal with,
but our new low-memory handling code can deal with it.   The linux
way appears to be a trend, and we may very well make MAP_NOSYNC the
default for FreeBSD as well (once we have reasonable sequential
write-behind heuristics for random faults).
(will be MFC'd prior to 4.3 freeze)

Suggested by: Andrew Gallatin

Basically any mmap'd data doesn't seem to get sync()'d out on
a regular basis.

  and this is used as a security feature for cryptography software.
 
 mlock() is used to prevent pages being swapped out.  Its
 use for crypto software is essentially restricted to anon
 memory (allocated via brk() or mmap() of /dev/zero).

What about userland device drivers that want to send parts
of a disk backed file to a driver's dma routine?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[HACKERS] Re: Performance monitor signal handler

2001-03-13 Thread Alfred Perlstein

* Thomas Swan [EMAIL PROTECTED] [010313 13:37] wrote:
 
 On reciept of the info signal, the backends collaborate to piece
 together a status file.  The status file is given a temporay name.
 When complete the status file is rename(2)'d over a well known
 file.
 
 Reporting to files, particularly well known ones, could lead to race 
 conditions.
 
 All in all, I think your better off passing messages through pipes or a 
 similar communication method.
 
 I really liked the idea of a "server" that could parse/analyze data from 
 multiple backends.
 
 My 2/100 worth...

Take a few moments to think about the semantics of rename(2).

Yes, you would still need syncronization between the backend
processes to do this correctly, but not any external app.

The external app can just open the file, assuming it exists it
will always have a complete and consistant snapshot of whatever
the backends agreed on.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Performance monitor signal handler

2001-03-12 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010312 12:12] wrote:
 I was going to implement the signal handler like we do with Cancel,
 where the signal sets a flag and we check the status of the flag in
 various _safe_ places.
 
 Can anyone think of a better way to get information out of a backend?

Why not use a static area of the shared memory segment?  Is it possible
to have a spinlock over it so that an external utility can take a snapshot
of it with the spinlock held?

Also, this could work for other stuff as well, instead of overloading
a lot of signal handlers one could just periodically poll a region of
the shared segment.

just some ideas..

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010306 10:10] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  I'm sure some sort of encoding of the PGDATA directory along with
  the pids stored in the shm segment...
 
 I thought about this too, but it strikes me as not very trustworthy.
 The problem is that there's no guarantee that the new postmaster will
 even notice the old shmem segment: it might select a different shmem
 key.  (The 7.1 coding of shmem key selection makes this more likely
 than it used to be, but even under 7.0, it will certainly fail to work
 if I choose to start the new postmaster using a different port number
 than the old one had.  The shmem key is driven primarily by port number
 not data directory ...)

This seems like a mistake.  

I'm suprised you guys aren't just using some form of the FreeBSD
ftok() algorithm for this:

FTOK(3)FreeBSD Library Functions ManualFTOK(3)

...

 The ftok() function attempts to create a unique key suitable for use with
 the msgget(3), semget(2) and shmget(2) functions given the path of an ex-
 isting file and a user-selectable id.

 The specified path must specify an existing file that is accessible to
 the calling process or the call will fail.  Also, note that links to
 files will return the same key, given the same id.

BUGS
 The returned key is computed based on the device minor number and inode
 of the specified path in combination with the lower 8 bits of the given
 id.  Thus it is quite possible for the routine to return duplicate keys.

The "BUGS" seems to be exactly what you guys are looking for, a somewhat
reliable method of obtaining a system id.  If that sounds evil, read 
below for an alternate suggestion.

 The interlock has to be tightly tied to the PGDATA directory, because
 what we're trying to protect is the files in and under that directory.
 It seems that something based on file(s) in that directory is the way
 to go.
 
 The best idea I've seen so far is Hiroshi's idea of having all the
 backends hold fcntl locks on the same file (probably postmaster.pid
 would do fine).  Then the new postmaster can test whether any backends
 are still alive by trying to lock the old postmaster.pid file.
 Unfortunately, I read in the fcntl man page:
 
 Locks are not inherited by a child process in a fork(2) system call.
 
 This makes the idea much less attractive than I originally thought:
 a new backend would not automatically inherit a lock on the
 postmaster.pid file from the postmaster, but would have to open/lock it
 for itself.  That means there's a window where the new backend exists
 but would be invisible to a hypothetical new postmaster.
 
 We could work around this with the following, very ugly protocol:
 
 1. Postmaster normally maintains fcntl read lock on its postmaster.pid
 file.  Each spawned backend immediately opens and read-locks
 postmaster.pid, too, and holds that file open until it dies.  (Thus
 wasting a kernel FD per backend, which is one of the less attractive
 things about this.)  If the backend is unable to obtain read lock on
 postmaster.pid, then it complains and dies.  We must use read locks
 here so that all these processes can hold them separately.
 
 2. If a newly started postmaster sees a pre-existing postmaster.pid
 file, it tries to obtain a *write* lock on that file.  If it fails,
 conclude that an old postmaster or backend is still alive; complain
 and quit.  If it succeeds, sit for say 1 second before deleting the file
 and creating a new one.  (The delay here is to allow any just-started
 old backends to fail to acquire read lock and quit.  A possible
 objection is that we have no way to guarantee 1 second is enough, though
 it ought to be plenty if the lock acquisition is just after the fork.)
 
 One thing that worries me a little bit is that this means an fcntl
 read-lock request will exist inside the kernel for each active backend.
 Does anyone know of any performance problems or hard kernel limits we
 might run into with large numbers of backends (lots and lots of fcntl
 locks)?  At least the locks are on a file that we don't actually touch
 in the normal course of business.
 
 A small savings is that the backends don't actually need to open new FDs
 for the postmaster.pid file; they can use the one they inherit from the
 postmaster, even though they do need to lock it again.  I'm not sure how
 much that saves inside the kernel, but at least something.
 
 There are also the usual set of concerns about portability of flock,
 though this time we're locking a plain file and not a socket, so it
 shouldn't be as much trouble as it was before.
 
 Comments?  Does anyone see a better way to do it?

Possibly...

What about encoding the shm id in the pidfile?  Then one can just ask
how many processes are attached to that segment?  (if it doesn't
exist, one can assume all backends have exited)

you want the field 'shm_nattch'

 The shmid

Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010306 10:35] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
 
  What about encoding the shm id in the pidfile?  Then one can just ask
  how many processes are attached to that segment?  (if it doesn't
  exist, one can assume all backends have exited)
 
 Hmm ... that might actually be a pretty good idea.  A small problem is
 that the shm key isn't yet selected at the time we initially create the
 lockfile, but I can't think of any reason that we could not go back and
 append the key to the lockfile afterwards.
 
  you want the field 'shm_nattch'
 
 Are there any portability problems with relying on shm_nattch to be
 available?  If not, I like this a lot...

Well it's available on FreeBSD and Solaris, I'm sure Redhat has
some deamon that resets the value to 0 periodically just for kicks
so it might not be viable... :)

Seriously, there's some dispute on the type that 'shm_nattch' is,
under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
it's 'short' (i should fix this. :)).

But since you're really only testing for 0'ness then it shouldn't
really be a problem.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010306 11:03] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  Are there any portability problems with relying on shm_nattch to be
  available?  If not, I like this a lot...
 
  Well it's available on FreeBSD and Solaris, I'm sure Redhat has
  some deamon that resets the value to 0 periodically just for kicks
  so it might not be viable... :)
 
 I notice that our BeOS and QNX emulations of shmctl() don't support
 IPC_STAT, but that could be dealt with, at least to the extent of
 stubbing it out.

Well since we already have spinlocks, I can't see why we can't
keep the refcount and spinlock in a special place in the shm
for all cases?

 This does raise the question of what to do if shmctl(IPC_STAT) fails
 for a reason other than EINVAL.  I think the conservative thing to do
 is refuse to start up.  On EPERM, for example, it's possible that there
 is a postmaster running in your PGDATA but with a different userid.

Yes, if possible a more meaningfull error message and pointer to
some docco would be nice or even a nice "i don't care, i killed
all the backends, just start darnit" flag, it's really no fun at
all to have to attempt to decypher some cryptic error message at
3am when the database/system is acting up. :)

  Seriously, there's some dispute on the type that 'shm_nattch' is,
  under Solaris it's "shmatt_t" (unsigned long afaik), under FreeBSD
  it's 'short' (i should fix this. :)).
 
  But since you're really only testing for 0'ness then it shouldn't
  really be a problem.
 
 We need not copy the value anywhere, so as long as the struct is
 correctly declared in the system header files I don't think it matters
 what the field type is ...

Yup, my point exactly.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010306 11:30] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  * Tom Lane [EMAIL PROTECTED] [010306 11:03] wrote:
  I notice that our BeOS and QNX emulations of shmctl() don't support
  IPC_STAT, but that could be dealt with, at least to the extent of
  stubbing it out.
 
  Well since we already have spinlocks, I can't see why we can't
  keep the refcount and spinlock in a special place in the shm
  for all cases?
 
 No, we mustn't go there.  If the kernel isn't keeping the refcount
 then it's worse than useless: as soon as some process crashes without
 decrementing its refcount, you have a condition that you can't recover
 from without reboot.

Not if the postmaster outputs the following:

 What I'm currently imagining is that the stub implementations will just
 return a failure code for IPC_STAT, and the outer code will in turn fail
 with a message along the lines of "It looks like there's a pre-existing
 shmem block (id XXX) still in use.  If you're sure there are no old
 backends still running, remove the shmem block with ipcrm(1), or just
 delete $PGDATA/postmaster.pid."  I dunno what shmem management tools
 exist on BeOS/QNX, but deleting the lockfile will definitely suppress
 the startup interlock ;-).
 
  Yes, if possible a more meaningfull error message and pointer to
  some docco would be nice
 
 Is the above good enough?

Sure. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Lamar Owen [EMAIL PROTECTED] [010306 11:39] wrote:
 Peter Eisentraut wrote:
  Not only note the shm_nattch type, but also shm_segsz, and the "unused"
  fields in between.  I don't know a thing about the Linux kernel sources,
  but this doesn't seem right.
 
 Red Hat 7, right?  My RedHat 7 system isn't running RH 7 right now (it's
 this notebook that I'm running Win95 on right now), but see which RPM's
 own the two headers.  You may be in for a shock.  IIRC, the first system
 include is from the 2.4 kernel, and the second in the kernel source tree
 is from the 2.2 kernel.
 
 Odd, but not really broken.  Should be fixed in the latest public beta
 of RedHat, that actually has the 2.4 kernel.  I can't really say any
 more about that, however.

Y'know, I was only kidding about Linux going out of its way to
defeat the 'shm_nattch' trick... *sigh*

As a FreeBSD developer I'm wondering if Linux keeps compatibility
calls around for old binaries or not.  Any idea?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010306 11:49] wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  What I don't like is that my /usr/include/sys/shm.h (through other
  headers) has [foo]
  whereas /usr/src/linux/include/shm.h has [bar]
 
 Are those declarations perhaps bit-compatible?  Looks a tad endian-
 dependent, though ...

Of course not, the size of the struct changed (short-unsigned
long, basically int16_t - uint32_t), because the kernel and userland
in Linux are hardly in sync you have the fun of guessing if you
get:

old struct - old syscall (ok)
new struct - old syscall (boom)
old struct - new syscall (boom)
new struct - new syscall (ok)

Honestly I think this problem should be left to the vendor to fix
properly (if it needs fixing), the sysV API was published at least
6 years ago, they ought to have it mostly correct by now.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

* Lamar Owen [EMAIL PROTECTED] [010306 13:27] wrote:
 Nathan Myers wrote:
  That is why there is no problem with version skew in the syscall
  argument structures on a correctly-configured Linux system.  (On a
  Red Hat system it is very easy to get them out of sync, but RH fans
  are used to problems.)
 
 Is RedHat bashing really necessary here?  At least they are payrolling
 Second Chair on the Linux kernel hierarchy.  And they are very
 supportive of PostgreSQL (by shipping us with their distribution).

Just because they do some really nice things and have some really
nice stuff doesn't mean they should really get cut any slack for
doing things like shipping out of sync kernel/system headers, kill
-9'ing databases and having programs like 'tmpwatch' running on
the boxes.  It really shows a lack of understanding of how Unix is
supposed to run.

What they really need to do is hire some grey beards (old school
Unix folks) to QA the releases and keep stuff like this from
happening/shipping. 

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-06 Thread Alfred Perlstein

 Alfred Perlstein [EMAIL PROTECTED] writes:
  Are there any portability problems with relying on shm_nattch to be
  available?  If not, I like this a lot...
 
  Well it's available on FreeBSD and Solaris, I'm sure Redhat has
  some deamon that resets the value to 0 periodically just for kicks
  so it might not be viable... :)
 
 I notice that our BeOS and QNX emulations of shmctl() don't support
 IPC_STAT, but that could be dealt with, at least to the extent of
 stubbing it out.

* Cyril VELTER [EMAIL PROTECTED] [010306 16:15] wrote:
 
BeOS haven't this stat (I have a bunch of others but not this one).
 
If I unsterstand correctly, you want to check if there is some backend 
 still attached to shared mem segment of a given key ? In this case, I have an 
 easy solution to fake the stat, because all segment have an encoded name 
 containing this key, so I can count them.

We need to be able to take a single shared memory segment and
determine if any other process is using it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-05 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010305 14:51] wrote:
 
 I think we need a stronger interlock to prevent this scenario, but I'm
 unsure what it should be.  Ideas?

Re having multiple postmasters active by accident.

The sysV IPC stuff has some hooks in it that may help you.

One idea is to check the 'struct shmid_ds' feild 'shm_nattch',
basically at startup if it's not 1 (or 0) then you have more than
one postgresql instance messing with it and it should not proceed.

I'd also suggest looking into using sysV semaphores and the semundo
stuff, afaik it can be used to track the number of consumers of
a reasource.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster

2001-03-05 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010305 19:13] wrote:
 Lamar Owen [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Postmaster down, backends alive is not a scenario we're currently
  prepared for.  We need a way to plug that gap.
 
  Postmaster can easily enough find out if zombie backends are 'out there'
  during startup, right?
 
 If you think it's easy enough, enlighten the rest of us ;-).  Be sure
 your solution only finds leftover backends from the previous instance of
 the same postmaster, else it will prevent running multiple postmasters
 on one system.

I'm sure some sort of encoding of the PGDATA directory along with
the pids stored in the shm segment...

  What can postmaster _do_ about it, though?  It
  won't necessarily be able to kill them -- but it also can't control
  them.  If it _can_ kill them, should it try?
 
 I think refusal to start is sufficient.  They should go away by
 themselves as their clients disconnect, and forcing the issue doesn't
 seem like it will improve matters.  The admin can kill them (hopefully
 with just a SIGTERM ;-)) if he wants to move things along ... but I'd
 not like to see a newly-starting postmaster do that automatically.

I agree, shooting down processes incorrectly should be left up to
vendors braindead scripts. :)

  Should the backend look for the presence of its parent postmaster
  periodically and gracefully come down if postmaster goes away without
  the proper handshake?
 
 Unless we checked just before every disk write, this wouldn't represent
 a safe failure mode.  The onus has to be on the newly-starting
 postmaster, I think, not on the old backends.
 
  Should a set of backends detect a new postmaster coming up and try to
  'sync up' with that postmaster,
 
 Nice try ;-).  How will you persuade the kernel that these processes are
 now children of the new postmaster?

Oh, easy, use ptrace. :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] preproc.y error

2001-02-07 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010207 17:24] wrote:
 Vince Vielhaber [EMAIL PROTECTED] writes:
  Now I get:
  byacc -d  preproc.y
  byacc: f - maximum table size exceeded
  gmake[4]: *** [preproc.c] Error 2
 
 Better install bison if you want to work with CVS sources ...
 the lack of bison probably explains why it's failing for you on
 this system when it's OK on other FreeBSD boxes.
 
 I wonder if we ought not accept byacc as a suitable yacc in configure?
 Peter, what do you think?

I think I reported this broken a couple of months ago, but it was
too late to add the check to configure for 7.0.  byacc doesn't work,
you need bison (or maybe some special flags to byacc).

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Auto-indexing

2001-02-06 Thread Alfred Perlstein

* Christopher Kings-Lynne [EMAIL PROTECTED] [010206 18:29] wrote:
 Is it a feasible idea that PostgreSQL could detect when an index would be
 handy, and create it itself, or at least log that a table is being queried
 but the indices are not appropriate?
 
 I suggest this as it's a feature of most windows databases, and MySQL does
 it.  I think it would be a great timesaver as we have hundreds of different
 queries, and it's a real pain to have to EXPLAIN them all, etc.   Is that
 possible?  Feasible?

Probably both, but if it's done there should be options to:

.) disable it completely or by table/database or even threshold or
   disk free parameters (indicies can be large)
.) log any auto-created databases to inform the DBA.
.) if disabled optionally log when it would have created an index on
   the fly.  (suggest an index)
.) expire old and unused auto-created indecies.

Generally Postgresql assumes the user knows what he's doing, but
it couldn't hurt too much to provide an option to have it assist
the user.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] RE: Index grows huge, possible leakage?

2001-02-05 Thread 'Alfred Perlstein'

* Mikheev, Vadim [EMAIL PROTECTED] [010202 10:39] wrote:
  After several weeks our idicies grow very large (in one case to
  4-5 gigabytes) After droppping and recreating the indecies they
  shrink back to something more reasonable (500megs same case).
  
  We are currently using Vadim's vacuum patches for VLAZY and MMNB, 
  against 7.0.3.  We are using a LAZY vacuum on these tables
  
  However a normal (non-lazy) vacuum doesn't shrink the index, the
  only thing that helps reduce the size is dropping and recreating.
  
  Is this a bug in 7.0.3?  A possible bug in Vadim's patches? Or is
  this somewhat expected behavior that we have to cope with?
 
 When index is created its pages are filled in full = any insert
 into such pages results in page split - ie in additional page.
 So, it's very easy to get 4Gb from 500Mb.

Well that certainly stinks. :(

 Vacuum was never able to shrink indices - it just removes dead index
 tuples and so allows to re-use space ... if you'll insert the same
 keys.

This doesn't make sense to me, seriously, if the table is locked
during a normal vacuum (not VLAZY), why not have vaccum make a
new index by copying valid index entries into a new index instead
of just vacating slots that aren't used?

 To know does VLAZY work properly or not I would need in vacuum debug
 messages. Did you run vacuum with verbose option or do you have
 postmaster' logs? With LAZY vacuum writes messages like
 
 Index _name_: deleted XXX unfound YYY
 
 YYY supposed to be 0...

With what you explained (indecies normally growing) I don't think
VLAZY is the problem here.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



[HACKERS] Index grows huge, possible leakage?

2001-02-01 Thread Alfred Perlstein

After several weeks our idicies grow very large (in one case to
4-5 gigabytes) After droppping and recreating the indecies they
shrink back to something more reasonable (500megs same case).

We are currently using Vadim's vacuum patches for VLAZY and MMNB, 
against 7.0.3.  We are using a LAZY vacuum on these tables

However a normal (non-lazy) vacuum doesn't shrink the index, the
only thing that helps reduce the size is dropping and recreating.

Is this a bug in 7.0.3?  A possible bug in Vadim's patches? Or is
this somewhat expected behavior that we have to cope with?

As a side note, the space requirement is actually 'ok' it's just
that performance gets terrible once the indecies reach such huge
sizes.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Sure enough, the lock file is gone

2001-01-26 Thread Alfred Perlstein

* Peter Eisentraut [EMAIL PROTECTED] [010126 12:11] wrote:
 The 'tmpwatch' program on Red Hat will remove the /tmp/.s.PGSQL.5432.lock
 file after the server has run 6 days.  This will be a problem.
 
 We could touch (open) the file once every time the ServerLoop() runs
 around.  It's not perfect but it should work in practice.

Why not have the RPM/configure scripts stick it in where ever redhat
says it's safe to?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Libpq async issues

2001-01-24 Thread Alfred Perlstein
res as well (too long query
  sends it into an infinite loop trying to queue data, most likely).
  
  A possible answer is to specify that a return of +N means "N bytes
  remain unqueued due to risk of blocking" (after having queued as much
  as you could).  This would put the onus on the caller to update his
  pointers/counts properly; propagating that into all the internal uses
  of pqPutBytes would be no fun.  (Of course, so far you haven't updated
  *any* of the internal callers to behave reasonably in case of a
  won't-block return; PQfn is just one example.)
  
  Another possible answer is to preserve pqPutBytes' old API, "queue or
  bust", by the expedient of enlarging the output buffer to hold whatever
  we can't send immediately.  This is probably more attractive, even
  though a long query might suck up a lot of space that won't get
  reclaimed as long as the connection lives.  If you don't do this then
  you are going to have to make a lot of ugly changes in the internal
  callers to deal with won't-block returns.  Actually, a bulk COPY IN
  would probably be the worst case --- the app could easily load data into
  the buffer far faster than it could be sent.  It might be best to extend
  PQputline to have a three-way return and add code there to limit the
  growth of the output buffer, while allowing all internal callers to
  assume that the buffer is expanded when they need it.
  
  pqFlush has the same kind of interface design problem: the same EOF code
  is returned for either a hard error or can't-flush-yet, but it would be
  disastrous to treat those cases alike.  You must provide a 3-way return
  code.
  
  Furthermore, the same sort of 3-way return code convention will have to
  propagate out through anything that calls pqFlush (with corresponding
  documentation updates).  pqPutBytes can be made to hide a pqFlush won't-
  block return by trying to enlarge the output buffer, but in most other
  places you won't have a choice except to punt it back to the caller.
  
  PQendcopy has the same interface design problem.  It used to be that
  (unless you passed a null pointer) PQendcopy would *guarantee* that
  the connection was no longer in COPY state on return --- by resetting
  it, if necessary.  So the return code was mainly informative; the
  application didn't have to do anything different if PQendcopy reported
  failure.  But now, a nonblocking application does need to pay attention
  to whether PQendcopy completed or not --- and you haven't provided a way
  for it to tell.  If 1 is returned, the connection might still be in
  COPY state, or it might not (PQendcopy might have reset it).  If the
  application doesn't distinguish these cases then it will fail.
  
  I also think that you want to take a hard look at the automatic "reset"
  behavior upon COPY failure, since a PQreset call will block the
  application until it finishes.  Really, what is needed to close down a
  COPY safely in nonblock mode is a pair of entry points along the line of
  "PQendcopyStart" and "PQendcopyPoll", with API conventions similar to
  PQresetStart/PQresetPoll.  This gives you the ability to do the reset
  (if one is necessary) without blocking the application.  PQendcopy
  itself will only be useful to blocking applications.
  
   I'm sorry if they don't work for some situations other than COPY IN,
   but it's functionality that I needed and I expect to be expanded on
   by myself and others that take interest in nonblocking operation.
  
  I don't think that the nonblock code is anywhere near production quality
  at this point.  It may work for you, if you don't stress it too hard and
  never have a communications failure; but I don't want to see us ship it
  as part of Postgres unless these issues get addressed.
  
  regards, tom lane
  
  
  
 
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive,     |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Libpq async issues

2001-01-24 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010124 10:27] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  * Bruce Momjian [EMAIL PROTECTED] [010124 07:58] wrote:
  I have added this email to TODO.detail and a mention in the TODO list.
 
  The bug mentioned here is long gone,
 
 Au contraire, the misdesign is still there.  The nonblock-mode code
 will *never* be reliable under stress until something is done about
 that, and that means fairly extensive code and API changes.

The "bug" is the one mentioned in the first paragraph of the email
where I broke _blocking_ connections for a short period.

I still need to fix async connections for myself (and of course
contribute it back), but I just haven't had the time.  If anyone
else wants it fixed earlier they can wait for me to do it, do it
themself, contract me to do it or hope someone else comes along
to fix it.

I'm thinking that I'll do what you said and have seperate paths
for writing/reading to the socket and API's to do so that give
the user the option of a boundry, basically:

 buffer this, but don't allow me to write until it's flushed

which would allow for larger than 8k COPY rows to go into the
backend.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Patches with vacuum fixes available for 7.0.x

2001-01-23 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010122 19:55] wrote:
 
 Vadim, did these patches ever make it into 7.1?

According to:
http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/parser/gram.y?rev=2.217content-type=text/x-cvsweb-markup

nope. :(

 
  We recently had a very satisfactory contract completed by
  Vadim.
  
  Basically Vadim has been able to reduce the amount of time
  taken by a vacuum from 10-15 minutes down to under 10 seconds.
  
  We've been running with these patches under heavy load for
  about a week now without any problems except one:
don't 'lazy' (new option for vacuum) a table which has just
had an index created on it, or at least don't expect it to
take any less time than a normal vacuum would.
  
  There's three patchsets and they are available at:
  
  http://people.freebsd.org/~alfred/vacfix/
  
  complete diff:
  http://people.freebsd.org/~alfred/vacfix/v.diff
  
  only lazy vacuum option to speed up index vacuums:
  http://people.freebsd.org/~alfred/vacfix/vlazy.tgz
  
  only lazy vacuum option to only scan from start of modified
  data:
  http://people.freebsd.org/~alfred/vacfix/mnmb.tgz
  
  Although the patches are for 7.0.x I'm hoping that they
  can be forward ported (if Vadim hasn't done it already)
  to 7.1.
  
  enjoy!
  
  -- 
  -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
  "I have the heart of a child; I keep it in a jar on my desk."
  
 
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Does Oracle store values in indices?

2001-01-23 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010123 11:17] wrote:
 [ Charset KOI8-R unsupported, converting... ]
 The reason you have to visit the main table is that tuple validity
 status is only stored in the main table, not in each index.
 See prior discussions in the archives.
   
But how Oracle handles this?
  
   Oracle doesn't have non-overwriting storage manager but uses
   rollback segments to maintain MVCC. Rollback segments are used
   to restore valid version of entire index/table page.
  
  Are there any plans to have something like this? I mean overwriting storage 
  manager.
 
 We hope to have it some day, hopefully soon.

Vadim says that he hopes it to be done by 7.2, so if things go
well it shouldn't be that far off...

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Re: [GENERAL] postgres memory management

2001-01-22 Thread Alfred Perlstein

* Peter Mount [EMAIL PROTECTED] [010122 13:21] wrote:
 At 13:18 21/01/01 +0100, Alexander Jerusalem wrote:
 Hi all,
 
 I'm experiencing some strange behaviour with postgresql 7.0.3 on Red Hat 
 Linux 7. I'm sending lots of insert statements to the postgresql server 
 from another machine via JDBC. During that process postgresql continues to 
 take up more and more memory and seemingly never returns it to the system. 
 Oddly if I watch the postmaster and it's sub processes in ktop, I can't 
 see which process takes up this memory. ktop shows that the postgresql 
 related processes have a constant memory usage but the overall memory 
 usage always increases as long as I continue to send insert statements.
 
 When the database connection is closed, no memory is reclaimed, the 
 overall memory usage stays the same. And when I close down all postgresql 
 processes including postmaster, it's the same.
 I'm rather new to Linux and postgresql so I'm not sure if I should call 
 this a memory leak :-)
 Has anybody experienced a similar thing?
 
 I'm not sure myself. You can rule out JDBC (or Java) here as you say you 
 are connecting from another machine.
 
 When your JDBC app closes, does it call the connection's close() method? 
 Does any messages like "Unexpected EOF from client" appear on the server side?
 
 The only other thing that comes to mine is possibly something weird is 
 happening with IPC. After you closed down postgres, does ipcclean free up 
 any memory?

I don't know if this is valid for Linux, but it is how FreeBSD
works, for the most part used memory is never free'd, it is only
marked as reclaimable.  This is so the system can cache more data.
On a freshly booted FreeBSD box you'll have a lot of 'free' memory,
after the box has been running for a long time the 'free' memory
will probably never go higher that 10megs, the rest is being used
as cache.

The main things you have to worry about is:
a) really running out of memory (are you useing a lot of swap?)
b) not cleaning up IPC as Peter suggested.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Transactions vs speed.

2001-01-13 Thread Alfred Perlstein

* mlw [EMAIL PROTECTED] [010113 17:19] wrote:
 I have a question about Postgres:
 
 Take this update:
   update table set field = 'X' ;
 
 
 This is a very expensive function when the table has millions of rows,
 it takes over an hour. If I dump the database, and process the data with
 perl, then reload the data, it takes minutes. Most of the time is used
 creating indexes.
 
 I am not asking for a feature, I am just musing. 

Well you really haven't said if you've tuned your database at all, the
way postgresql ships by default it doesn't use a very large shared memory
segment, also all the writing (at least in 7.0.x) is done syncronously.

There's a boatload of email out there that explains various ways to tune
the system.  Here's some of the flags that I use:

-B 32768   # uses over 300megs of shared memory
-o "-F" # tells database not to call fsync on each update

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



[HACKERS] Re: Transactions vs speed.

2001-01-13 Thread Alfred Perlstein

* mlw [EMAIL PROTECTED] [010113 19:37] wrote:
 Alfred Perlstein wrote:
  
  * mlw [EMAIL PROTECTED] [010113 17:19] wrote:
   I have a question about Postgres:
  
   Take this update:
 update table set field = 'X' ;
  
  
   This is a very expensive function when the table has millions of rows,
   it takes over an hour. If I dump the database, and process the data with
   perl, then reload the data, it takes minutes. Most of the time is used
   creating indexes.
  
   I am not asking for a feature, I am just musing.
  
  Well you really haven't said if you've tuned your database at all, the
  way postgresql ships by default it doesn't use a very large shared memory
  segment, also all the writing (at least in 7.0.x) is done syncronously.
  
  There's a boatload of email out there that explains various ways to tune
  the system.  Here's some of the flags that I use:
  
  -B 32768   # uses over 300megs of shared memory
  -o "-F" # tells database not to call fsync on each update
 
 I have a good number of buffers (Not 32768, but a few), I have the "-F"
 option.

Explain a "good number of buffers" :)

Also, when was the last time you ran vacuum on this database?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Alfred Perlstein

* Mikheev, Vadim [EMAIL PROTECTED] [010108 23:08] wrote:
   Killing an individual backend with SIGTERM is bad luck.  
   The backend will assume that it's being killed by the postmaster,
   and will exit without a whole lot of concern for cleaning up shared 
   memory --- the
 
 SIGTERM -- die() -- elog(FATAL)
 
 Is it true that elog(FATAL) doesn't clean up shmem etc?
 This would be very bad...
 
   What code will be returned to postmaster in this case?
  
  Right at the moment, the backend will exit with status 0.  I think you
  are thinking the same thing I am: maybe a backend that 
  receives SIGTERM ought to exit with nonzero status.
  
  That would mean that killing an individual backend would instantly
  translate into an installation-wide restart.  I am not sure whether
  that's a good idea.  Perhaps this cure is worse than the disease.
 
 Well, it's not good idea because of SIGTERM is used for ABORT + EXIT
 (pg_ctl -m fast stop), but shouldn't ABORT clean up everything?

Er, shouldn't ABORT leave the system in the exact state that it's
in so that one can get a crashdump/traceback on a wedged process
without it trying to clean up after itself?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Patches with vacuum fixes available for 7.0.x

2001-01-02 Thread Alfred Perlstein

* Peter Schmidt [EMAIL PROTECTED] [010102 12:53] wrote:
 Will these patchsets be available to the public?
 I get:
 "You don't have permission to access /~alfred/vacfix/vlazy.tgz on this
 server"
 
 Thanks.
 Peter
 
 
 There's three patchsets and they are available at:
 
 http://people.freebsd.org/~alfred/vacfix/
 
 complete diff:
 http://people.freebsd.org/~alfred/vacfix/v.diff
 
 only lazy vacuum option to speed up index vacuums:
 http://people.freebsd.org/~alfred/vacfix/vlazy.tgz
 
 only lazy vacuum option to only scan from start of modified
 data:
 http://people.freebsd.org/~alfred/vacfix/mnmb.tgz

Oops!  The permissions should be fixed now, if anyone wants to
grab these feel free.

Peter, thanks for pointing it out.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-01 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010101 23:59] wrote:
  Alfred Perlstein [EMAIL PROTECTED] writes:
   One trick that may help is calling sched_yield(2) on a lock miss,
   it's a POSIX call and quite new so you'd need a 'configure' test
   for it.
  
  The author of the current s_lock code seems to have thought that
  select() with a zero delay would do the equivalent of sched_yield().
  I'm not sure if that's true on very many kernels, if indeed any...
  
  I doubt we could buy much by depending on sched_yield(); if you want
  to assume POSIX facilities, ISTM you might as well go for user-space
  semaphores and forget the whole TAS mechanism.
 
 
 Another issue is that sched_yield brings in the pthreads library/hooks
 on some OS's, which we certainly want to avoid.

I know it's a major undertaking, but since the work is sort of done,
have you guys considered the port to solaris threads and seeing about
making a pthreads port of that?

I know it would probably get you considerable gains under Windows
at the expense of dropping some really really legacy system.

Or you could do what apache (is rumored) does and have it do either
threads or processes or both...

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] GNU readline and BSD license

2000-12-29 Thread Alfred Perlstein

* The Hermit Hacker [EMAIL PROTECTED] [001229 14:11] wrote:
 On Sat, 23 Dec 2000, Bruce Momjian wrote:
 
   FreeBSD has a freely available library called 'libedit' that could
   be shipped with postgresql, it's under the BSD license.
  
  Yes, that is our solution if we have a real problem here.
 
 Is there a reason *not* to move towards that for v7.2 so that the
 functions we are making optional with readline are automatic?  Since we
 could then ship the code, we could make it a standard vs optional
 "feature" ...
 
 My thought would be to put 'make history feaure standard using libedit'
 onto the TODO list and take it from there ...

I doubt I'd have the time to do it, but if you guys want to use
libedit it'd probably be a good idea at least to reduce the amount
of potential GPL tainting in the source code.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] GNU readline and BSD license

2000-12-29 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [001229 15:43] wrote:
 Lamar Owen [EMAIL PROTECTED] writes:
  How different is the feature set?
 
 I was going to ask the same thing.  If it's an exact replacement then
 OK, but I do not want to put up with non-Emacs-compatible keybindings,
 to mention just one likely issue.
 
 The whole thing really strikes me as make-work anyway.  Linux is GPL'd;
 does anyone want to argue that we shouldn't run on Linux?  Since we
 are not including libreadline in our distribution, there is NO reason
 to worry about using it when it's available.  Wanting to find a
 replacement purely because of the license amounts to license bigotry,
 IMHO.

Rasmus Lerdorf warned one of you guys that simply linking to GNU
readline can contaminate code with the GPL.

Readline isn't LGPL which permits linking without lincense issues,
it is GPL which means that if you link to it, you must be GPL as
well.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] GNU readline and BSD license

2000-12-29 Thread Alfred Perlstein

* Peter Eisentraut [EMAIL PROTECTED] [001229 16:01] wrote:
 The Hermit Hacker writes:
 
  Is there a reason *not* to move towards that for v7.2 so that the
  functions we are making optional with readline are automatic?  Since we
  could then ship the code, we could make it a standard vs optional
  "feature" ...
 
  My thought would be to put 'make history feaure standard using libedit'
  onto the TODO list and take it from there ...
 
 In my mind this is a pointless waste of developer time because there is no
 problem to solve here.  I'm sure we all have better things to do than
 porting libedit to a dozen systems and then explaining to users why the
 tarball is bloated and their carefully composed readline configuration
 doesn't work anymore.
 
 If there is something functionally wrong with Readline then let's talk
 about it, but let's not replace it with something because some PHP dude
 said that RMS said something.

From http://www.gnu.org/copyleft/gpl.html

  This General Public License does not permit incorporating your
  program into proprietary programs. If your program is a subroutine
  library, you may consider it more useful to permit linking
  proprietary applications with the library. If this is what you
  want to do, use the GNU Library General Public License instead
  of this License.

My understanding (from the recent discussion) is that Postgresql
has certain dependancies on libreadline and won't compile/work
without it, if true this effectively forces anyone wishing to derive
a viable commercial product based on Postgresql to switch to the
GPL or port to libedit anyway.

If readline is completely optional then there's really no problem.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] GNU readline and BSD license

2000-12-29 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [001229 16:38] wrote:
 The Hermit Hacker [EMAIL PROTECTED] writes:
  Actually, IMHO, the pro to moving to libedit is that we could include it
  as part of the distribution and make history a *standard* feature
 
 How big is libedit?  If it's tiny, that might be a good argument ...
 but I don't want to see us bulking up our distro with something that
 people could and should get directly from its source.

~350k

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] GNU readline and BSD license

2000-12-29 Thread Alfred Perlstein

* The Hermit Hacker [EMAIL PROTECTED] [001229 17:06] wrote:
 On Fri, 29 Dec 2000, Tom Lane wrote:
 
  Alfred Perlstein [EMAIL PROTECTED] writes:
   My understanding (from the recent discussion) is that Postgresql
   has certain dependancies on libreadline and won't compile/work
   without it,
 
  Then you're working from a misconception.
 
 I think the misconception that he might be working on here is the point
 someone brought up that when configure runs, it is adding -lreadline to
 the backend compile, even though that I don't think there is any reason
 for doing such?

I thought psql required libreadline, I'm not sure who said it.

If nothing requires it then there's not much point in moving to
libedit from a devel cost/benifit analysis.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2000-12-28 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [001228 14:25] wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  I wonder about the advisability of using spinlocks in user-level code 
  which might be swapped out any time.
 
 The reason we use spinlocks is that we expect the lock to succeed (not
 block) the majority of the time, and we want the code to fall through
 as quickly as possible in that case.  In particular we do *not* want to
 expend a kernel call when we are able to acquire the lock immediately.
 It's not a true "spin" lock because we don't sit in a tight loop when
 we do have to wait for the lock --- we use select() to delay for a small
 interval before trying again.  See src/backend/storage/buffer/s_lock.c.
 
 The design is reasonable, even if a little bit offbeat.

It sounds pretty bad, if you have a contested lock you'll trap into
the kernel each time you miss, crossing the protection boundry and
then waiting.  It's a tough call to make, because on UP systems
you loose bigtime by spinning for your quantum, however on SMP
systems there's a chance that the lock is owned by a process on
another CPU and spinning might be benificial.

One trick that may help is calling sched_yield(2) on a lock miss,
it's a POSIX call and quite new so you'd need a 'configure' test
for it.

http://www.freebsd.org/cgi/man.cgi?query=sched_yieldapropos=0sektion=0manpath=FreeBSD+4.2-RELEASEformat=html

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] GNU readline and BSD license

2000-12-23 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [001223 06:59] wrote:
 Rasmus Lerdorf, the big PHP developer, told me that the existance of GNU
 readline hooks in our source tree could cause RMS/GNU to force us to a
 GNU license.
 
 Obviously, we could remove readline hooks and ship a BSD line editing
 library, but does this make any sense to you?  It doesn't make sense to
 me, but he was quite certain.
 
 Our ODBC library is also GNU licensed, but I am told this is not a
 problem because it doesn't link into the backend.  However, neither does
 readline.  However, readline does link into psql.

FreeBSD has a freely available library called 'libedit' that could
be shipped with postgresql, it's under the BSD license.

If you have access to a FreeBSD box see the editline(3) manpage,
or go to: 
http://www.freebsd.org/cgi/man.cgi?query=editlineapropos=0sektion=0manpath=FreeBSD+4.2-RELEASEformat=html

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Re: Too many open files (was Re: spinlock problems reported earlier)

2000-12-23 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [001223 14:16] wrote:
 Department of Things that Fell Through the Cracks:
 
 Back in August we had concluded that it is a bad idea to trust
 "sysconf(_SC_OPEN_MAX)" as an indicator of how many files each backend
 can safely open.  FreeBSD was reported to return 4136, and I have
 since noticed that LinuxPPC returns 1024.  Both of those are
 unreasonably large fractions of the actual kernel file table size.
 A few dozen backends opening hundreds of files apiece will fill the
 kernel file table on most Unix platforms.

getdtablesize(2) on BSD should tell you the per-process limit.
sysconf on FreeBSD shouldn't lie to you.

getdtablesize should take into account limits in place.

later versions of FreeBSD have a sysctl 'kern.openfiles' which
can be checked to see if the system is approaching the systemwide
limit.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Unable to check out REL7_1 via cvs

2000-12-22 Thread Alfred Perlstein

* Yusuf Goolamabbas [EMAIL PROTECTED] [001222 15:34] wrote:
 Hi, I am using the following command to check out the 7.1 branch of
 PostgreSQL.
 cvs -d :pserver:[EMAIL PROTECTED]:/home/projects/pgsql/cvsroot co -r REL7_1 
pgsql
 
 This is the error I am getting.
 cvs [server aborted]: cannot write /home/projects/pgsql/cvsroot/CVSROOT/val-tags: 
Permission denied
 
 I can check out HEAD perfectly alright
 
 Anybody else seeing similar results ?

Try using "cvs -Rq ..." or just use CVSup it's (cvsup) a lot quicker.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



  1   2   >