Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread Greg Copeland
That was going to be my question too.

I thought NFS didn't have some of the requisite file system behaviors
(locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
reliably.

Please correct as needed.

Regards,

Greg


On Thu, 2003-01-30 at 13:02, mlw wrote:
 Forgive my stupidity, are you running PostgreSQL with the data on an NFS 
 share?
 
 
 D'Arcy J.M. Cain wrote:
 
 I have posted before about this but I am now posting to both NetBSD and 
 PostgreSQL since it seems to be some sort of interaction between the two.  I 
 have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
 PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
 with no problem whatsoever but as soon as I try to open the database the NFS 
 mount hangs and I can't do any operations on that mounted drive without 
 hanging.  Other things continue to run but the minute I do a df or an ls on 
 that drive that terminal is lost.
 
 On the NetBSD side I get a server not responding error.  On the filer I see 
 no problems at all.  A reboot of the filer doesn't correct anything.
 
 Since NetBSD works just fine with this until I start PostgreSQL and 
 PostgreSQL, from all reports, works well with the NetApp filer, I assume that 
 there is something out of the ordinary about PostgreSQL's disk access that is 
 triggering some subtle bug in NetBSD.  Does the shared memory stuff use disk 
 at all?  Perhaps that's the difference between PostgreSQL and other 
 applications.
 
 The NetApp people are being very helpful and are willing to follow up any 
 leads people might have and may even suggest fixes if necessary.  I have 
 Bcc'd the engineer on this message and will send anything I get to them.
 
   
 
 
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html
-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Win32 port powerfail testing

2003-02-01 Thread Greg Copeland
On Sat, 2003-02-01 at 00:34, Adam Haberlach wrote:
 On Sat, Feb 01, 2003 at 12:27:31AM -0600, Greg Copeland wrote:
  On Fri, 2003-01-31 at 14:36, Dave Page wrote:
   
   I intend to run the tests on a Dual PIII 1GHz box, with 1Gb of Non-ECC
   RAM and a 20Gb (iirc) IDE disk. I will run on Windows 2000 Server with
   an NTFS filesystem, and again on Slackware Linux 8 with either ext3 or
   reiserfs (which is preferred?).
   
  
  Please go with XFS or ext3.  There are a number of blessed and horror
  stories which still float around about reiserfs (recent and old; even
  though I've never lost data with it -- using it now even).
  
  Might be worth testing FAT32 on NT as well.  Even if we don't advocate
  it's use, it may not hurt to at least get an understanding of what one
  might reasonably expect from it.  I'm betting there are people just
  waiting to run with FAT32 in the Win32 world.  ;)
 
   You'd better go with NTFS.  There are a number of blessed and horror
 stories which still float around about FAT32 (recent and old; even though
 I've never lost data with it -- using it now even now.
 
   Might be worth testing reiserfs on Linux as well.  Even if we don't
 advocate it's use, it may not hurt to at least get an understanding of
 what one my reasonably expect from it.  I'm better there are people
 just waiting to run with reiserfs in the Linux world.  ;)
 
 Regards, and tongue firmly in cheek,


Touche!  :P

While I understand and even appreciate the humor value, I do believe the
picture is slightly different than your analysis.  If we make
something that runs on Win32 platforms, might it also run on Win98,
WinME, etc.?  Let's face the facts that should it also run on these
platforms, it's probably only a matter of time before someone has it
running on FAT32 (even possible on NT, etc).  In other words, I'm fully
expecting the lowest common denominator of MySQL user to be looking at
PostgreSQL on Win32.  Which potentially means lots of FAT32 use.  And
yes, even for a production environment.  Ack!  Double-ack!

Also, Dave was asking for feedback between reiserfs and ext3.  I offered
XFS and ext3 as candidates.  I personally believe that ext3 and XFS are
going to be the more common (in that order) of journaled FS for DB Linux
users.  Besides, aside from any bugs in reiserfs, testing results for
ext3 or XFS should probably coincide with reasonable expectations for
reiserfs as well.

As I consider FAT32 to be much more fragile than ext2 (having had
seriously horrendous corruption and repaired/recovered from it on ext2),
the results may prove interesting.  Which is to say, should testing
prove absolutely horrible results, proper disclaimers and warnings
should be made readily available to avoid its use.  Which is probably
not a bad idea to begin with.  ;)  Nonetheless, it's an unknown right
now in my mind.  Hopefully some testing my reveal what reasonable
expectations we should hold so that we can knowingly advise accordingly.


Regards,


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PGP signing releases

2003-02-02 Thread Greg Copeland
On Sun, 2003-02-02 at 18:39, Neil Conway wrote:
 Folks,
 
 I think we should PGP sign all the official packages that are provided
 for download from the various mirror sites. IMHO, this is important
 because:
 
 - ensuring that end users can trust PostgreSQL is an important part to
 getting the product used in mission-critical applications, as I'm sure
 you all know. Part of that is producing good software; another part is
 ensuring that users can trust that the software we put out hasn't been
 tampered with.
 
 - people embedding trojan horses in open source software is not unheard
 of. In fact, it's probably becoming more common: OpenSSH, sendmail,
 libpcap/tcpdump and bitchx have all been the victim of trojan horse
 attacks fairly recently.
 
 - PGP signing binaries is relatively easy, and doesn't need to be done
 frequently.
 
 Comments?
 
 I'd volunteer to do the work myself, except that it's pretty closely
 intertwined with the release process itself...
 
 Cheers,
 
 Neil


Actually, if you just had everyone sign the official key and submit it
back to the party that's signing, that would probably be good enough. 
Basically, as long as people can verify the package has been signed and
can reasonably verify that the signing key is safe and/or can be
verified, confidence should be high in the signed package.

I certainly have no problem with people signing my key nor with signing
others as long as we can verify/authenticate each others keys prior.


Regards,


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PGP signing releases

2003-02-03 Thread Greg Copeland
On Sun, 2003-02-02 at 20:23, Marc G. Fournier wrote:

 right, that is why we started to provide md5 checksums ...

md5 checksums only validate that the intended package (trojaned or
legit) has been properly received.  They offer nothing from a security
perspective unless the checksums have been signed with a key which can
be readily validated from multiple independent sources.

Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PGP signing releases

2003-02-03 Thread Greg Copeland
On Mon, 2003-02-03 at 13:55, Kurt Roeckx wrote:
 On Mon, Feb 03, 2003 at 12:24:14PM -0600, Greg Copeland wrote:
  On Sun, 2003-02-02 at 20:23, Marc G. Fournier wrote:
  
   right, that is why we started to provide md5 checksums ...
  
  md5 checksums only validate that the intended package (trojaned or
  legit) has been properly received.  They offer nothing from a security
  perspective unless the checksums have been signed with a key which can
  be readily validated from multiple independent sources.
 
 If you can get the md5 sum of multiple independent sources,
 it's about the same thing.  It all depends on how much you trust
 those sources.
 
 I'm not saying md5 is as secure as pgp, not at all, but you can't
 trust those pgp keys to be the real one either.


No, that is not the same thing at all.  PKI specifically allows for web
of trust.  Nothing about md5 checksums allows for this.  As such,
chances are, if a set of md5 checksums have been forged, they will be
propagated and presented as being valid even though they are not.

I'll say this again.  Checksums alone offers zero security protection. 
It was never intended to address that purpose.  As such, it does not
address it.  If you need security, use a security product.  Checksums
ONLY purpose is to ensure copy propagation validation.  It does not
address certification of authenticity in any shape or form.

As for trusting the validity of the keys contained within a PKI, that's
where the whole concept of web of trust comes into being.  You can
ignore it and not benefit or you can embrace it, as people are
advocating, and leverage it.

Validation of keys can be as simple as snail-mail, phone calls, and
fingerprint validation.  It's that simple.  It's why fingerprints exist
in the first place.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PGP signing releases

2003-02-03 Thread Greg Copeland
On Mon, 2003-02-03 at 22:35, Curt Sampson wrote:
 On Mon, 3 Feb 2003, Kurt Roeckx wrote:
 
  I'm not saying md5 is as secure as pgp, not at all, but you can't
  trust those pgp keys to be the real one either.
 
 Sure you can. Just verify that they've been signed by someone you trust.
 
 For example, next time I happen to run into Bruce Momjian, I hope he'll
 have his PGP key fingerprint with him. I can a) verify that he's the
 same guy I who, under the name Bruce Momjian, was giving the seminar I
 went to last weekend, and b) check his passport ID to see that the U.S.
 government believes that someone who looks him is indeed Bruce Momjian
 and a U.S. citizen. That, for me, is enough to trust that he is who he
 says he is when he gives me the fingerprint.
 
 I take that fingerprint back to my computer and verify that the key I
 downloaded from the MIT keyserver has the same fingerprint. Then I sign
 that key with my own signature, assigning it an appropriate level of trust.
 
 Next time I download a postgres release, I then grab a copy of the
 postgres release-signing public key, and verify that its private key was
 used to sign the postgres release, and that it is signed by Bruce's key.
 
 Now I have a direct chain of trust that I can evaluate:
 
 1. Do I believe that the person I met was indeed Bruce Momjian?
 
 2. Do I trust him to take care of his own key and be careful signing
 other keys?
 
 3. Do I trust his opinion that the postgres release-signing key that
 he signed is indeed valid?
 
 4. Do I trust the holder of the postgres release-signing key to have
 taken care of the key and have been careful about signing releases
 with it?
 
 Even if you extend this chain by a couple of people, that's trust in a
 lot fewer people than you're going to need if you want to trust an MD5
 signature.
 
 cjs

And that's the beginning of the web of trust. ;) Worth noting that
snail-mail and phone calls can easily play a role in this process as
well.  I think if USPO can play a role in delivering master keys for pin
pads used by banks across America and the around the world, surely it's
good enough to help propagate key information for signing packages.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PGP signing releases

2003-02-03 Thread Greg Copeland
On Mon, 2003-02-03 at 22:35, Curt Sampson wrote:
 2. Do I trust him to take care of his own key and be careful signing
 other keys?
 
 3. Do I trust his opinion that the postgres release-signing key that
 he signed is indeed valid?
 
 4. Do I trust the holder of the postgres release-signing key to have
 taken care of the key and have been careful about signing releases
 with it?
 

Sorry to respond again, however, I did want to point out, signing a key
does not have to imply an absolute level of trust of the signer.  There
are several trust levels.  For example, if we validated keys via phone
and mail, I would absolutely not absolutely trust the key I'm signing. 
However, if I had four people which mostly trusted the signed key and
one or two which absolutely trusted the signed key whom I absolutely
trust, then it's a fairly safe bet I too can trust the key.  Again, this
all comes back to building a healthy web of trust.

Surely there are a couple of key developers whom would be willing to
sign each other's keys and have previously met before.  Surely this
would be the basis for phone validation.  Then, of course, there is 'ol
snail-mail route too.  Of course, nothing beats meeting in person having
valid ID and fingerprints in hand.  ;)


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PGP signing releases

2003-02-04 Thread Greg Copeland
Comments intermixed below.

On Tue, 2003-02-04 at 12:04, Steve Crawford wrote:
 Having just started working with GPG I shouldn't be considered an expert but 
 it seems to me that each core developer should create a key and should 
 cross-sign each others' keys to form a web of trust to verify the 

This is a good idea regardless as which key approach is used.  Being
able to reliably trust a key is only as strong as the associated web of
trust.

 authenticity of those signatures. In any case, I think that if 
 security-related projects like GnuPG and OpenSSH use the individual method 
 then it wouldn't be a bad idea to follow their lead.
 

There are pros and cons associated with each approach.  Neither is
really better IMO.  If three people are required to sign a package prior
to release, what happens when one of them is unavailable for signing
(vacation, hospital, etc).  This is one of the reasons why having a
single project key which the core developers sign may appear to be
easier.

 One hopes that situations like last week's ousting of one of the core 
 FreeBSD developers 
 (http://slashdot.org/article.pl?sid=03/02/03/239238mode=threadtid=122tid=156) 
 are rare but if such a situation were to arise, a shared project key would be 
 Very Bad (tm).
 

If a revocation key has been properly generated (as it should of been),
this is not a problem at all.  The revocation key is quickly shot over
to the known key servers and included with the newly generated project
key.  As people add and confirm the new project key, the old key is
automatically revoked.  Again, if this is properly handled, it is not
much of a problem at all.  PKI, by means of revocation keys,
specifically addresses this need.

 If I understand GPG correctly, one can create a detached signature of a 
 document. As such, any or all of the core developers could create and post 
 such a signature and a user could verify against as many signatures as 
 desired to feel secure that the file is good.
 
 Cheers,
 Steve
 

IIRC, PGP and GPG both support detached signatures.
 
  Who will actually hold the key? Where will it be physically kept?

Good question but can usually be addressed.

 
  How many people will know the passphrase?

As few as possible.  Ideally only two, maybe three core developers.  One
could also only allow a single person to hold the passphrase and divide
it into parts between two or more.  This is commonly done in financial
circles.  The exact details will be mostly driven by the key approach
that is picked.

 
  Who will be responsible for signing the files? Is there a backup person?
 

This is important to make sure that any backup people are properly
included in the web of trust from the very beginning.

  Will it be a signing-only key? What size? Should it expire?
 

Keys should always expire.  If you want to allow for one year, two year,
or even maybe three years, that's fine.  Nonetheless, expiration should
always be built in, especially on a project like this where people may
be more transient.

  How is verification of the files before signing accomplished?
 

The person creating the initial package release should also initially
sign it.  From there, the web of trust for the people signing it can
work as designed.  Once the initial package has been generated, it
should not leave his eyes until it has been signed.


Regards,


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PGP signing releases

2003-02-04 Thread Greg Copeland
On Tue, 2003-02-04 at 12:02, Rod Taylor wrote:
 On Tue, 2003-02-04 at 12:55, Kurt Roeckx wrote:
  On Tue, Feb 04, 2003 at 01:35:47PM +0900, Curt Sampson wrote:
   On Mon, 3 Feb 2003, Kurt Roeckx wrote:
   
I'm not saying md5 is as secure as pgp, not at all, but you can't
trust those pgp keys to be the real one either.
   
   Sure you can. Just verify that they've been signed by someone you trust.
  
  I know how it works, it's just very unlikely I'll ever meet
  someone so it gives me a good chain.
  
  Anyway, I think pgp is good thing to do, just don't assume that
  it's always better then just md5.
 
 Not necessarily better -- but it's always as good as md5.

Even improperly used, digital signatures should never be worse than
simple checksums.  Having said that, anyone that is trusting checksums
as a form of authenticity validation is begging for trouble.  Checksums
are not, in of themselves, a security mechanism.  I can't stress this
enough.  There really isn't any comparison here.  Please stop comparing
apples and oranges.  No matter how hard you try, you can not make orange
juice from apples.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PGP signing releases

2003-02-04 Thread Greg Copeland
On Tue, 2003-02-04 at 16:13, Kurt Roeckx wrote:
 On Tue, Feb 04, 2003 at 02:04:01PM -0600, Greg Copeland wrote:
  
  Even improperly used, digital signatures should never be worse than
  simple checksums.  Having said that, anyone that is trusting checksums
  as a form of authenticity validation is begging for trouble.
 
 Should I point out that a fingerprint is nothing more than a
 hash?
 

You seem to not understand the part where I said, in of themselves. 
Security is certainly an area of expertise where the devil is in the
details.  One minor detail can greatly effect the entire picture. 
You're simply ignoring all the details and looking for obtuse
parallels.  Continue to do so all you like.  It still doesn't
effectively and reliably address security in the slightest.  

  Checksums are not, in of themselves, a security mechanism.
 
 So a figerprint and all the hash/digest function have no purpose
 at all?
 

This is just getting silly and bordering on insulting.  If you have
meaningful comments, please offer them up.  Until such time, I have no
further comments for you.  Obviously, a fingerprint is derivative piece
of information which, in of it self, does not validate anything. 
Thusly, the primary supporting concept is the web of trust, associated
process and built in mechanisms to help ensure it all makes sense and
maintained in proper context.  Something that a simple MD5 checksum does
not provide for.  Not in the least.

A checksum or hash only allows for comparisons between two copies to
establish they are the same or different.  It, alone, can never reliably
be a source of authentication and validation.  A checksum or hash,
alone, says nothing about who created it, where it came from, how old it
is, or whom is available to readily and authoritatively assist in
validation of the checksummed (or hashed) entity or the person who
created it.

I do agree that a checksum (or hash) is better than nothing, however, a
serious security solution it is not.  Period.  Feel free to be lulled
into complacent comfort.  In the mean time, I'll choose a system which
actually has a chance at working.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PGP signing releases

2003-02-05 Thread Greg Copeland
On Wed, 2003-02-05 at 00:22, Curt Sampson wrote:
 On Wed, 4 Feb 2003, Greg Copeland wrote:
 
  If three people are required to sign a package prior to release,
  what happens when one of them is unavailable for signing (vacation,
  hospital, etc). This is one of the reasons why having a single project
  key which the core developers sign may appear to be easier.
 
 I don't see that it makes that much difference. So the release is signed
 only by, say, only three people instead of four. It's still signed.
 

Note that I said appear to be easier, not that it actually is easier
an any meaningful way.  Some of the more paranoid will look for
consistency from those that sign the package in question.  The fact
different people sign may be the cause of additional footwork for some. 
Which probably isn't such a bad thing.  Nonetheless, it could be a sign
of alarm for a few.

   One hopes that situations like last week's ousting of one of the
   core FreeBSD developers are rare but if such a situation were to
   arise, a shared project key would be Very Bad (tm).
 
  If a revocation key has been properly generated (as it should of been),
  this is not a problem at all.
 
 Actually, it is still a problem. Revocations are not reliable in PGP,
 and there's really no way to make them perfectly reliable in any system,
 because you've got no way to force the user to check that his cached
 data (i.e., the key he holds in his keyring) is still valid. This is why
 we generally expire signing keys and certificates and stuff like that on
 a regular basis.
 

When a package is released which has a new key signing it, revocation
should normally be found fairly quickly.  This is especially true if
it's included in the package AND from normal PKI routes.  Revocation
should accompany any packages which later follow until the key in
question has expired.


 This one element alone makes me think that individual signing is a
 better thing. (With individual signing you'd have to compromise several
 keys before you have to start relying on revocation certificates.)
 
Who will actually hold the key? Where will it be physically kept?
 
  Good question but can usually be addressed.
 
 It can be addressed, but how well? This is another big issue that I
 don't see any plan for that I'm comfortable with..
 

The reason I was vague is because it depends on the key route. 
Obviously, if each person signs, each person must protect their own
key.  If there is a central project key, it's simply a matter of
determining which box is used for signing, etc...while important, it's
certainly not difficult to address.


How many people will know the passphrase?
 
  As few as possible.  Ideally only two, maybe three core developers.
 
 Um...I'm not sure that this is a relevant question at all. The
 passphrase is not part of the key; it's just used to encrypt the key for
 storage. If you know the passphrase, you can make unlimited copies of
 the key, and these copies can be protected with any passphrases you like,
 or no passphrase, for that matter.
 

If you're concerned about this to that extent, clearly those people
should not part of the web of trust nor should they be receiving the
passphrase nor a copy of the private key.  Remember, trust is a key (pun
intended) part of a reliable PKI.

  One could also only allow a single person to hold the passphrase and
  divide it into parts between two or more. This is commonly done in
  financial circles.
 
 Hm. Splitting the key into parts is a very interesting idea, but I'd
 be interested to know how you might implement it without requiring
 everybody to be physically present at signing.
 
 cjs

I was actually talking about splitting the passphrase, however,
splitting the key is certainly possible as well.  Having said that, if a
private key is shared, it should still be encrypted.  As such, a
passphrase should still be considered; as will splitting it.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Greg Copeland
On Wed, 2003-02-05 at 11:18, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  On Wednesday 05 February 2003 11:49, Tom Lane wrote:
  I wonder if it is possible that, every so often,
  you are losing just the last few bytes of an NFS transfer?
 
  Yah, that's kind of what it looked like when I tried this before
  Christmas too although the actual errors differd.
 
 The observed behavior could vary wildly depending on what data happened to
 get read.
 
 Wild thought here: can you reduce the MTU on the LAN linking the NFS
 server to the NetBSD box?  If so, does it help?
 

Tom,

I'm curious as to why you think adjusting the MTU may have an effect on
this.  Lowering the MTU may actually increase fragmentation, lower
efficiency, and even exacerbate the situation.

Is this purely a diagnostic suggestion?


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PGP signing releases

2003-02-06 Thread Greg Copeland
On Tue, 2003-02-04 at 18:27, Curt Sampson wrote:
 On Tue, 2003-02-04 at 16:13, Kurt Roeckx wrote:
  On Tue, Feb 04, 2003 at 02:04:01PM -0600, Greg Copeland wrote:
  
   Even improperly used, digital signatures should never be worse than
   simple checksums.  Having said that, anyone that is trusting checksums
   as a form of authenticity validation is begging for trouble.
 
  Should I point out that a fingerprint is nothing more than a
  hash?
 
 Since someone already mentioned MD5 checksums of tar files versus PGP
 key fingerprints, perhaps things will become a bit clearer here if I
 point out that the important point is not that these are both hashes of
 some data, but that the time and means of acquisition of that hash are
 entirely different between the two.


And that it creates a verifiable chain of entities with direct
associations to people and hopefully, email addresses.  Meaning, it
opens the door for rapid authentication and validation of each entity
and associated person involved.  Again, something a simple MD5 hash does
not do or even allow for.  Perhaps even more importantly, it opens the
door for rapid detection of corruption in the system thanks to
revocation certificates/keys.  In turn, allows for rapid repair in the
event that the worst is realized.  Again, something a simple MD5 does
not assist with in the least.


Thanks Curt.


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PGP signing releases

2003-02-11 Thread Greg Copeland
Well said.   I'm glad someone else is willing to take a stab at
addressing these issues, since I've been down with the flu.  Thanks
Greg.

As both Gregs have pointed out, hashes and checksums alone should only
be used as an integrity check.  It is not a viable security mechanism. 
A hash does not provide for authentication and even more importantly,
verification of authentication.  These concepts are key to creating a
secure environment.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting

On Mon, 2003-02-10 at 21:57, [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 
  So you put the MD5 sum into the release announcement email.  That is
  downloaded by many people and also archived in many distributed places
  that we don't control, so it would be very hard to tamper with.  
  ISTM that this gives you the same result as a PGP signature but with 
  much less administrative overhead.
 
 Not the same results. For one thing, the mailing announcement may be 
 archived on google, but asking people to search google for an MD5 sum 
 as they download the tarball is hardly feasible. Second, it still does 
 not prevent someone from breaking into the server and replacing the 
 tarball with their own version, and their own MD5 checksum. Or maybe 
 just one of the mirrors. Users are not going to know to compare that 
 MD5 with versions on the web somewhere. Third, is does not allow a 
 positive history to be built up due to signing many releases over time. 
 With PGP, someone can be assured that the 9.1 tarball they just 
 downloaded was signed by the same key that signed the 7.3 tarball 
 they've been using for 2 years. Fourth, only with PGP can you trace 
 your key to the one that signed the tarball, an additional level of 
 security. MD5 provides an integrity check only. Any security it 
 affords (such as storing the MD5 sum elsewhere) is trivial and 
 should not be considered when using PGP is standard, easy to implement,
 and has none of MD5s weaknesses.
 
 - --
 Greg Sabino Mullane  [EMAIL PROTECTED]
 PGP Key: 0x14964AC8 200302102250
 -BEGIN PGP SIGNATURE-
 Comment: http://www.turnstep.com/pgp.html
 
 iD8DBQE+SA4AvJuQZxSWSsgRAhenAKDu0vlUBC5Eodyt2OxTG6el++BJZACguR2i
 GGLAzhtA7Tt9w4RUYXY4g2U=
 =3ryu
 -END PGP SIGNATURE-
 
 
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 08:26, Christopher Kings-Lynne wrote:
 Hrm.  I just saw that the PHP ADODB guy just published a bunch of database
 benchmarks.  It's fairly evident to me that benchmarking PostgreSQL on
 Win32 isn't really fair:
 
 http://php.weblogs.com/oracle_mysql_performance
 
 *sigh*

How much of the performance difference is from the RDBMS, from the
middleware, and from the quality of implementation in the middleware.

While I'm not surprised that the the cygwin version of PostgreSQL is
slow, those results don't tell me anything about the quality of the
middleware interface between PHP and PostgreSQL.  Does anyone know if we
can rule out some of the performance loss by pinning it to bad
middleware implementation for PostgreSQL?


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL Benchmarks

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 08:31, Mario Weilguni wrote:
 Hrm.  I just saw that the PHP ADODB guy just published a bunch of database
 benchmarks.  It's fairly evident to me that benchmarking PostgreSQL on
 Win32 isn't really fair:
 
 http://php.weblogs.com/oracle_mysql_performance
 
 And why is the highly advocated transaction capable MySQL 4 not tested?
 That's the problem, for every performance test they choose ISAM tables, and
 when transactions are mentioned it's said MySQL has transactions. But why
 no benchmarks?
 


Insert Statement

Not using bind variables (MySQL and Oracle): 
$DB-BeginTrans();



Using bind variables: 
$DB-BeginTrans();


PL/SQL Insert Benchmark
Appears to not initiate a transaction.  I'm assuming this is because
it's implicitly within a transaction?  Oddly enough, I am seeing
explicit commits here.

It appears that the benchmarks are attempting to use transactions,
however, I have no idea if MySQL's HEAP supports them.  For all I know,
transactions are being silently ignored.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PGP signing releases

2003-02-11 Thread Greg Copeland
On Wed, 2003-02-05 at 18:53, Curt Sampson wrote:
 On Thu, 5 Feb 2003, Greg Copeland wrote:
 
  Who will actually hold the key? Where will it be physically kept?
   
Good question but can usually be addressed.
  
   It can be addressed, but how well? This is another big issue that I
   don't see any plan for that I'm comfortable with..
 
  The reason I was vague is because it depends on the key route.
  Obviously, if each person signs, each person must protect their own
  key.  If there is a central project key, it's simply a matter of
  determining which box is used for signing, etc...while important, it's
  certainly not difficult to address.
 
 It seems to me extremely difficult to address. Unless you are physically
 monitoring someone, how do you prevent someone from copying the key off
 of that machine. At which point anybody with the passphrase can use it
 for anything.
 

This issue doesn't change regardless of the mechanism you pick.  Anyone
that is signing a key must take reasonable measures to ensure the
protection of their key.


  How many people will know the passphrase?
   
As few as possible.  Ideally only two, maybe three core developers.
  
   Um...I'm not sure that this is a relevant question at all. The
   passphrase is not part of the key; it's just used to encrypt the key for
   storage. If you know the passphrase, you can make unlimited copies of
   the key, and these copies can be protected with any passphrases you like,
   or no passphrase, for that matter.
 
  If you're concerned about this to that extent, clearly those people
  should not part of the web of trust nor should they be receiving the
  passphrase nor a copy of the private key.  Remember, trust is a key (pun
  intended) part of a reliable PKI.
 
 In that case, I would trust only one person with the key. Making copies of
 the key for others gives no additional protection (since it takes only one
 person out of the group to sign the release) while it increases the chance
 of key compromise (since there are now more copies of the key kicking around,
 and more people who know the passphrase).
 

Which brings us back to backups.  Should the one person that has the key
be unavailable or dead, who will sign the release?  Furthermore, making
*limited* copies of the key does provide for additional limited
protection in case it's lost for some reason.  This helps mitigate the
use of the revocation key until it's absolutely required.  Also provides
for backup (of key and people).

Basically, you are saying:
You trust a core developer
You trust they can protect their keys
You trust they can properly distribute their trust
You don't trust a core developer with a key

Hmmm...something smells in your web of trust...So, which is it?  Do you
trust the core developers to protect the interests of the project and
the associated key or not?  If not, why trust any digital signature from
them in the first place?

Can't stress this enough.  PKI is an absolute failure without trust. 
Period.


 Keys cannot be transfered from one person to another since, being digital
 data, there's no way to ascertain that the original holder does not still
 (on purpose or inadvertantly) have copies of the key. So in the case where
 we want to transfer trust from one person to another, we must also generate
 a new key and revoke the old one.
 

No one is talking about transferring keys.  In fact, I've previously
addressed this topic, from a different angle, a number of times.  We are
talking about shared trust and not transfered trust.  The transferring
of trust is done by signing keys, not transferring keys.

 This is now exactly equivalant to having each developer sign postgres
 with a signing key (signed by his main key) for which the other
 developers (or appropriate authority) have a revocation certificate.
 
 And back to the passphrase issue, once again, can't you see that it's
 completely irrelevant? At some point, someone who knows the passphrase is
 going to have to be in a position to use that to decrypt the key. At that
 point he has the key, period. Changing the passphrase does no good, because
 you can't change the passphrase on the copy of the key he may have made.
 

So you trust the core developer to sign the package but you don't trust
him to have the key that's required to sign it?  You can't have it both
ways.

 A passphrase is like a lock on your barn door. After you've given
 someone the key and he's gone in and taken the cow, changing the lock
 gives you no protection at all.


I can assure you I fully understand the implications and meaning of
everything I've said.


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 10:20, Tom Lane wrote:
 Merlin Moncure [EMAIL PROTECTED] writes:
  May I make a suggestion that maybe it is time to start thinking about
  tuning the default config file, IMHO its just a little bit too
  conservative,
 
 It's a lot too conservative.  I've been thinking for awhile that we
 should adjust the defaults.
 
 The original motivation for setting shared_buffers = 64 was so that
 Postgres would start out-of-the-box on machines where SHMMAX is 1 meg
 (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data
 structures).  At one time SHMMAX=1M was a pretty common stock kernel
 setting.  But our other data structures blew past the 1/2 meg mark
 some time ago; at default settings the shmem request is now close to
 1.5 meg.  So people with SHMMAX=1M have already got to twiddle their
 postgresql.conf settings, or preferably learn how to increase SHMMAX.
 That means there is *no* defensible reason anymore for defaulting to
 64 buffers. 
 
 We could retarget to try to stay under SHMMAX=4M, which I think is
 the next boundary that's significant in terms of real-world platforms
 (isn't that the default SHMMAX on some BSDen?).  That would allow us
 350 or so shared_buffers, which is better, but still not really a
 serious choice for production work.
 
 What I would really like to do is set the default shared_buffers to
 1000.  That would be 8 meg worth of shared buffer space.  Coupled with
 more-realistic settings for FSM size, we'd probably be talking a shared
 memory request approaching 16 meg.  This is not enough RAM to bother
 any modern machine from a performance standpoint, but there are probably
 quite a few platforms out there that would need an increase in their
 stock SHMMAX kernel setting before they'd take it.
 
 So what this comes down to is making it harder for people to get
 Postgres running for the first time, versus making it more likely that
 they'll see decent performance when they do get it running.
 
 It's worth noting that increasing SHMMAX is not nearly as painful as
 it was back when these decisions were taken.  Most people have moved
 to platforms where it doesn't even take a kernel rebuild, and we've
 acquired documentation that tells how to do it on all(?) our supported
 platforms.  So I think it might be okay to expect people to do it.
 
 The alternative approach is to leave the settings where they are, and
 to try to put more emphasis in the documentation on the fact that the
 factory-default settings produce a toy configuration that you *must*
 adjust upward for decent performance.  But we've not had a lot of
 success spreading that word, I think.  With SHMMMAX too small, you
 do at least get a pretty specific error message telling you so.
 
 Comments?

I'd personally rather have people stumble trying to get PostgreSQL
running, up front, rather than allowing the lowest common denominator
more easily run PostgreSQL only to be disappointed with it and move on.

After it's all said and done, I would rather someone simply say, it's
beyond my skill set, and attempt to get help or walk away.  That seems
better than them being able to run it and say, it's a dog, spreading
word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
those that do walk away and claim it performs horribly are probably
doing more harm to the PostgreSQL community than expecting someone to be
able to install software ever can.

Nutshell:
Easy to install but is horribly slow.

or

Took a couple of minutes to configure and it rocks!



Seems fairly cut-n-dry to me.  ;)


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 11:23, mlw wrote:
 Greg Copeland wrote:
 
   
 
 I'd personally rather have people stumble trying to get PostgreSQL
 running, up front, rather than allowing the lowest common denominator
 more easily run PostgreSQL only to be disappointed with it and move on.
 
 After it's all said and done, I would rather someone simply say, it's
 beyond my skill set, and attempt to get help or walk away.  That seems
 better than them being able to run it and say, it's a dog, spreading
 word-of-mouth as such after they left PostgreSQL behind.  Worse yet,
 those that do walk away and claim it performs horribly are probably
 doing more harm to the PostgreSQL community than expecting someone to be
 able to install software ever can.
 
 RANT
 
 And that my friends is why PostgreSQL is still relatively obscure.
 
 This attitude sucks. If you want a product to be used, you must put the 
 effort into making it usable.
 


Ah..okay


 It is a no-brainer to make the default configuration file suitable for 
 the majority of users. It is lunacy to create a default configuration 
 which provides poor performance for over 90% of the users, but which 
 allows the lowest common denominator to work.
 

I think you read something into my email which I did not imply.  I'm
certainly not advocating a default configuration file assuming 512M of
share memory or some such insane value.

Basically, you're arguing that they should keep doing exactly what they
are doing.  It's currently known to be causing problems and propagating
the misconception that PostgreSQL is unable to perform under any
circumstance.  I'm arguing that who cares if 5% of the potential user
base has to learn to properly install software.  Either they'll read and
learn, ask for assistance, or walk away.  All of which are better than
Jonny-come-lately offering up a meaningless benchmark which others are
happy to eat with rather large spoons.


 A product must not perform poorly out of the box, period. A good product 
 manager would choose one of two possible configurations, (a) a high 
 speed fairly optimized system from the get-go, or (b) it does not run 
 unless you create the configuration file. Option (c) out of the box it 
 works like crap, is not an option.
 

That's the problem.  Option (c) is what we currently have.  I'm amazed
that you even have a problem with option (a), as that's what I'm
suggesting.  The problem is, potentially for some minority of users, it
may not run out of the box.  As such, I'm more than happy with this
situation than 90% of the user base being stuck with a crappy default
configuration.

Oddly enough, your option (b) is even worse than what you are ranting at
me about.  Go figure.

 This is why open source gets such a bad reputation. Outright contempt 
 for the user who may not know the product as well as those developing 
 it. This attitude really sucks and it turns people off. We want people 
 to use PostgreSQL, to do that we must make PostgreSQL usable. Usability 
 IS important.
 /RANT


There is no contempt here.  Clearly you've read your own bias into this
thread.  If you go back and re-read my posting, I think it's VERY clear
that it's entirely about usability.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Changing the default configuration (was Re:

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 12:55, Tom Lane wrote:
 scott.marlowe [EMAIL PROTECTED] writes:
  Is setting the max connections to something like 200 reasonable, or likely 
  to cause too many problems?
 
 That would likely run into number-of-semaphores limitations (SEMMNI,
 SEMMNS).  We do not seem to have as good documentation about changing
 that as we do about changing the SHMMAX setting, so I'm not sure I want
 to buy into the it's okay to expect people to fix this before they can
 start Postgres the first time argument here.
 
 Also, max-connections doesn't silently skew your testing: if you need
 to raise it, you *will* know it.
 

Besides, I'm not sure that it makes sense to let other product needs
dictate the default configurations for this one.  It would be one thing
if the vast majority of people only used PostgreSQL with Apache.  I know
I'm using it in environments in which no way relate to the web.  I'm
thinking I'm not alone.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Windows SHMMAX (was: Default configuration)

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 12:49, Merlin Moncure wrote:
 Does anyone know whether cygwin has a setting comparable to SHMMAX,
 and if so what is its default value?  How about the upcoming native
 Windows port --- any issues there?
 
 From a pure win32 point of view, a good approach would be to use the
 VirtualAlloc() memory allocation functions and set up a paged memory
 allocation system.  From a very top down point of view, this is the
 method of choice if portability is not an issue.  An abstraction to use
 this technique within pg context is probably complex and requires
 writing lots of win32 api code, which is obviously not desirable.
 
 Another way of looking at it is memory mapped files.  This probably most
 closely resembles unix shared memory and is the de facto standard way
 for interprocess memory block sharing.  Sadly, performance will suffer
 because you have to rely on the virtual memory system (think: writing to
 files) to do a lot of stupid stuff you don't necessarily want or need.
 The OS has to guarantee that the memory can be swapped out to file at
 any time and therefore mirrors the pagefile to the allocated memory
 blocks.
 
 With the C++/C memory malloc/free api, you are supposed to be able to
 get some of the benefits of virtual alloc (in particular, setting a
 process memory allocation limit), but personal experience did not bear
 this out.  However, this api sits directly over the virtual allocation
 system and is the most portable.  The application has to guard against
 fragmentation and things like that in this case.  In win32, server
 thrashing is public enemy #1 for database servers, mostly due to the
 virtual allocation system (which is quite fast when used right, btw).


IIRC, there is a mechanism which enables it to be directly
supported/mapped via pagefile.  This is the preferred means of memory
mapped files unless you have a specific need which dictates otherwise. 
Meaning, it allows for many supposed optimizations to be used by the OS
as it is suppose to bypass some of the filesystem overhead.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PGP signing release

2003-02-11 Thread Greg Copeland
On Tue, 2003-02-11 at 18:27, Curt Sampson wrote:
 On Wed, 11 Feb 2003, Greg Copeland wrote:
 
  On Wed, 2003-02-05 at 18:53, Curt Sampson wrote:
 
  [Re: everybody sharing a single key]
 
  This issue doesn't change regardless of the mechanism you pick.  Anyone
  that is signing a key must take reasonable measures to ensure the
  protection of their key.
 
 Right. Which is why you really want to use separate keys: you can determine
 who compromised a key if it is compromised, and you can revoke one without
 having to revoke all of them.
 
 Which pretty much inevitably leads you to just having the developers use
 their own personal keys to sign the release.
 
  Basically, you are saying:
  You trust a core developer
  You trust they can protect their keys
  You trust they can properly distribute their trust
  You don't trust a core developer with a key
 
 Not at all. I trust core developers with keys, but I see no reason to
 weaken the entire system by sharing keys when it's not necessary. Having
 each developer sign the release with his own personal key solves every
 problem you've brought up.
 
 cjs

You need to keep in mind, I've not been advocating, rather, clarifying. 
The point being, having a shared key between trusted core developers is
hardly an additional risk.  After all, either they can be trusted or
they can't.

At this point, I think we both understand where the other stands. 
Either we agree or agree to disagree.  The next step is for the
developers to adopt which path they prefer to enforce and to ensure they
have the tools and knowledge at hand to support it.

Anyone know if Tom and Bruce know each other well enough to sign each
other's keys outright, via phone, via phone and snail-mail?  That would
put us off to an excellent start.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Changing the default configuration

2003-02-12 Thread Greg Copeland
On Wed, 2003-02-12 at 10:36, Robert Treat wrote:
 On Tue, 2003-02-11 at 21:00, Tatsuo Ishii wrote:
   
   while 200 may seem high, 32 definitely seems low.  So, what IS a good 
   compromise?  for this and ALL the other settings that should probably be a 
   bit higher.  I'm guessing sort_mem or 4 or 8 meg hits the knee for most 
   folks, and the max fsm settings tom has suggested make sense.
  
  32 is not too low if the kernel file descriptors is not
  increased. Beware that running out of the kernel file descriptors is a
  serious problem for the entire system, not only for PostgreSQL.
  
 
 Had this happen at a previous employer, and it definitely is bad. I
 believe we had to do a reboot to clear it up. And we saw the problem a
 couple of times since the sys admin wasn't able to deduce what had
 happened the first time we got it. IIRC the problem hit somewhere around
 150 connections, so we ran with 128 max. I think this is a safe number
 on most servers these days (running linux as least) though out of the
 box I might be more inclined to limit it to 64. If you do hit a file
 descriptor problem, *you are hosed*.
 

That does seem like a more reasonable upper limit.  I would rather see
people have to knowingly increase the limit rather than bump into system
upper limits and start scratching their heads trying to figure out what
the heck is going on.


-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PGP signing release

2003-02-12 Thread Greg Copeland
On Tue, 2003-02-11 at 20:17, Bruce Momjian wrote:
 I hate to poo-poo this, but this web of trust sounds more like a web
 of confusion.  I liked the idea of mentioning the MD5 in the email
 announcement.  It doesn't require much extra work, and doesn't require a
 'web of %$* to be set up to check things.  Yea, it isn't as secure as
 going through the motions, but if someone breaks into that FTP server
 and changes the tarball and MD5 file, we have much bigger problems than
 someone modifying the tarballs;  our CVS is on that machine too.
 
 ---
 
 Greg Copeland wrote:
  On Tue, 2003-02-11 at 18:27, Curt Sampson wrote:
   On Wed, 11 Feb 2003, Greg Copeland wrote:
   
On Wed, 2003-02-05 at 18:53, Curt Sampson wrote:
   
[Re: everybody sharing a single key]
   
This issue doesn't change regardless of the mechanism you pick.  Anyone
that is signing a key must take reasonable measures to ensure the
protection of their key.
   
   Right. Which is why you really want to use separate keys: you can determine
   who compromised a key if it is compromised, and you can revoke one without
   having to revoke all of them.
   
   Which pretty much inevitably leads you to just having the developers use
   their own personal keys to sign the release.
   
Basically, you are saying:
You trust a core developer
You trust they can protect their keys
You trust they can properly distribute their trust
You don't trust a core developer with a key
   
   Not at all. I trust core developers with keys, but I see no reason to
   weaken the entire system by sharing keys when it's not necessary. Having
   each developer sign the release with his own personal key solves every
   problem you've brought up.
   
   cjs
  
  You need to keep in mind, I've not been advocating, rather, clarifying. 
  The point being, having a shared key between trusted core developers is
  hardly an additional risk.  After all, either they can be trusted or
  they can't.
  
  At this point, I think we both understand where the other stands. 
  Either we agree or agree to disagree.  The next step is for the
  developers to adopt which path they prefer to enforce and to ensure they
  have the tools and knowledge at hand to support it.
  
  Anyone know if Tom and Bruce know each other well enough to sign each
  other's keys outright, via phone, via phone and snail-mail?  That would
  put us off to an excellent start.
  
 

Bruce,

Since you just got back in town I'm not sure if you've been able to
follow the thread or not.  Just the same, I wanted to remind you that
using MD5 is not a security mechanism of any worth.  As such, this
thread was an effort to add a layer of authenticity.  Again, this is not
something that MD5 is going to provide for, now or in the future.

If it sounds confusing, it's only because you've never done it. 
Honestly, once you take the 20-minutes to do it the first time, you'll
understand what's going on.  Beyond that, you won't have to sign
additional keys until you can validate them or as they expire.  It only
takes minutes once you understand what's going on after that.

The time to actually sign packages is more or less the same as creating
your hashes.

Lastly, don't forget that your site is mirrored all over the place.  As
such, you're not the only place open to attack.  Just because you have
additional software running on this box is no reason to throw your hands
in the air and say, I don't care.  Simple fact is, it only takes one
site to become compromised to significantly effect PostgreSQL's
reputation.  And that site doesn't have to be yours.  If it's an
official mirror, it reflects (oh...a pun!) accordingly on the project.


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Incremental backup

2003-02-14 Thread Greg Copeland
On Fri, 2003-02-14 at 06:52, Bruce Momjian wrote:
 OK, once we have PITR, will anyone want incremental backups?
 
 ---
 
 Martin Marques wrote:
  On Jue 13 Feb 2003 16:38, Bruce Momjian wrote:
   Patrick Macdonald wrote:
Bruce Momjian wrote:
 Someone at Red Hat is working on point-in-time recovery, also known as
 incremental backups.
   
PITR and incremental backup are different beasts.  PITR deals with a
backup + logs.  Incremental backup deals with a full backup + X
smaller/incremental backups.
   
So... it doesn't look like anyone is working on incremental backup at the
moment.
  
   But why would someone want incremental backups compared to PITR?  The
   backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
   pretty weird.  :-)
  
  Good backup systems, such as Informix (it's the one I used) doesn't do a query 
  backup, but a pages backup. What I mean is that it looks for pages in the 
  system that has changed from the las full backup and backs them up.
  
  That's how an incremental backup works. PITR is another thing, which is even 
  more important. :-)

I do imagine for some people it will register high on their list.

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Greg Copeland
On Mon, 2003-02-17 at 22:04, Tom Lane wrote:
 Curt Sampson [EMAIL PROTECTED] writes:
  On Mon, 17 Feb 2003, Tom Lane wrote:
  Postgres has a bad habit of becoming very confused if the page header of
  a page on disk has become corrupted.
 
  What typically causes this corruption?
 
 Well, I'd like to know that too.  I have seen some cases that were
 identified as hardware problems (disk wrote data to wrong sector, RAM
 dropped some bits, etc).  I'm not convinced that that's the whole story,
 but I have nothing to chew on that could lead to identifying a software
 bug.
 
  If it's any kind of a serious problem, maybe it would be worth keeping
  a CRC of the header at the end of the page somewhere.
 
 See past discussions about keeping CRCs of page contents.  Ultimately
 I think it's a significant expenditure of CPU for very marginal returns
 --- the layers underneath us are supposed to keep their own CRCs or
 other cross-checks, and a very substantial chunk of the problem seems
 to be bad RAM, against which occasional software CRC checks aren't 
 especially useful.

This is exactly why magic numbers or simple algorithmic bit patterns
are commonly used.  If the magic number or bit pattern doesn't match
it's page number accordingly, you know something is wrong.  Storage cost
tends to be slightly and CPU overhead low.

I agree with you that a CRC is seems overkill for little return.

Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Roadmap for a Win32 port

2002-06-18 Thread Greg Copeland

On Tue, 2002-06-18 at 09:07, Jan Wieck wrote:
 Dann Corbit wrote:
  
  The startup stuff for PostgreSQL is just a few files.  It does not seem
  insurmountable to change it.  But it is none of my business.  If it is a
  major hassle (for reasons which I am not aware) then I see no driving
  reason to change it.
 
 It has to be changed for Windows, it is a major hassle for reasons I
 wasn't aware of, and I am half way through ;-)
 

Well, if you're going to go through the trouble of rewriting postmaster
to be win32 specific, you might as well make it hook into the services
control panel.

If I recall, shared memory is owned by a process in Win32 (please
correct as needed...as I've slept since I last looked).  That means that
the postmaster process not only owns the shared memory but needs to make
sure that it's persists as the rest of postgres is expecting.

Please provide more details as to the nature of your Win32 changes.  I'm
certainly curious.  If you've already covered them, just say so...I have
no problem going back to the archives! :)

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] RAMDISK

2002-07-23 Thread Greg Copeland

Interesting results.  You didn't really offer much in how your system
was configured to use the ramdisk.  Did you use it to simply store a
database on it?  Was the entire database able to fit into available
memory even without the RAMDISK?  Did you try only storing indicies on
the RAMDISK?  There are lots of other questions that unanswered on the
topic.

Worth mentioning that it is very possible and in fact, fairly easy to
do, for the use of a RAMDISK to significantly hinder the performance of
a system running a database.

Greg


On Tue, 2002-07-23 at 09:36, Samuel J. Sutjiono wrote:
 I've finally got around to try RAMDISK with PostgreSQL.  The attached doc
 contains the test results that I'd like to share with PostgreSQL's users
 and developers groups.  
 
 Regards,
 Samuel Sutjiono
 _
  Expand your wireless world with Arkdom PLUS
  http://www.arkdom.com/
 
 
 

 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] CVS server problem!

2002-08-01 Thread Greg Copeland

Seems the CVS server is not working correctly.  I just deleted my CVS
tree and did a fresh checkout of the pgsql module.  Everything seemingly
went well.  After the check out completed, I did:

[gcope@mouse pgsql]$ ./configure --with-tcl --with-java --with-python
--with-perl 
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking which template to use... linux
checking whether to build with 64-bit integer date/time support... no
checking whether to build with recode support... no
checking whether NLS is wanted... no
checking for default port number... 5432
checking for default soft limit on number of connections... 32
checking for gcc... gcc
checking for C compiler default output... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
./configure: ./src/template/linux: No such file or directory

So, I did, ./configure which yields the same result.  So, thinking
maybe I just had a poorly timed checkout, I did an update.  Doing so,
looks like this:
[gcope@mouse pgsql]$ cvs -z3 update -dP
? config.log
cvs server: Updating .
cvs server: Updating ChangeLogs
cvs server: Updating MIGRATION
cvs server: Updating config
.
.
.
src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1
cvs server: Updating
src/backend/utils/mb/conversion_procs/utf8_and_johab
cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_sjis
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_sjis: No such 
file or directory
cvs server: skipping directory
src/backend/utils/mb/conversion_procs/utf8_and_sjis
cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_tcvn
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_tcvn: No such 
file or directory
cvs server: skipping directory
src/backend/utils/mb/conversion_procs/utf8_and_tcvn
cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_uhc
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_uhc: No such 
file or directory
cvs server: skipping directory
src/backend/utils/mb/conversion_procs/utf8_and_uhc
cvs server: Updating src/backend/utils/misc
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/misc: No such file or
directory
cvs server: skipping directory src/backend/utils/misc
cvs server: Updating src/backend/utils/mmgr
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mmgr: No such file or
directory
cvs server: skipping directory src/backend/utils/mmgr
cvs server: Updating src/backend/utils/sort
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/sort: No such file or
directory
cvs server: skipping directory src/backend/utils/sort
cvs server: Updating src/backend/utils/time
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/time: No such file or
directory
cvs server: skipping directory src/backend/utils/time
cvs server: Updating src/bin
cvs server: cannot open directory /projects/cvsroot/pgsql/src/bin: No
such file or directory
cvs server: skipping directory src/bin


So, I'm fairly sure something is awry.

Greg


On Mon, 2002-07-29 at 20:51, Tatsuo Ishii wrote:
 Can anyone fix this?
 
 $ cvs up
 can't create temporary directory /tmp/cvs-serv40296
 No space left on device
 --
 Tatsuo Ishii
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] cvs checkout pgsql

2002-08-01 Thread Greg Copeland

Looks like I replied to the wrong thread...here's a repeat...

Seems the CVS server is not working correctly.  I just deleted my CVS
tree and did a fresh checkout of the pgsql module.  Everything seemingly
went well.  After the check out completed, I did:

[gcope@mouse pgsql]$ ./configure --with-tcl --with-java --with-python
--with-perl 
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking which template to use... linux
checking whether to build with 64-bit integer date/time support... no
checking whether to build with recode support... no
checking whether NLS is wanted... no
checking for default port number... 5432
checking for default soft limit on number of connections... 32
checking for gcc... gcc
checking for C compiler default output... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
./configure: ./src/template/linux: No such file or directory

So, I did, ./configure which yields the same result.  So, thinking
maybe I just had a poorly timed checkout, I did an update.  Doing so,
looks like this:
[gcope@mouse pgsql]$ cvs -z3 update -dP
? config.log
cvs server: Updating .
cvs server: Updating ChangeLogs
cvs server: Updating MIGRATION
cvs server: Updating config
.
.
.
src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1
cvs server: Updating
src/backend/utils/mb/conversion_procs/utf8_and_johab
cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_sjis
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_sjis: No such 
file or directory
cvs server: skipping directory
src/backend/utils/mb/conversion_procs/utf8_and_sjis
cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_tcvn
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_tcvn: No such 
file or directory
cvs server: skipping directory
src/backend/utils/mb/conversion_procs/utf8_and_tcvn
cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_uhc
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_uhc: No such 
file or directory
cvs server: skipping directory
src/backend/utils/mb/conversion_procs/utf8_and_uhc
cvs server: Updating src/backend/utils/misc
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/misc: No such file or
directory
cvs server: skipping directory src/backend/utils/misc
cvs server: Updating src/backend/utils/mmgr
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/mmgr: No such file or
directory
cvs server: skipping directory src/backend/utils/mmgr
cvs server: Updating src/backend/utils/sort
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/sort: No such file or
directory
cvs server: skipping directory src/backend/utils/sort
cvs server: Updating src/backend/utils/time
cvs server: cannot open directory
/projects/cvsroot/pgsql/src/backend/utils/time: No such file or
directory
cvs server: skipping directory src/backend/utils/time
cvs server: Updating src/bin
cvs server: cannot open directory /projects/cvsroot/pgsql/src/bin: No
such file or directory
cvs server: skipping directory src/bin


So, I'm fairly sure something is awry.

Greg

Unrelated quote:
 I could move docs into $CVSROOT/this/is/a/stupid/directory/structure/docs
 and except for the fact that you already have a copy checked out pointing
 to the old path, a fresh checkout would still place that in pgsql/docs,
 where you've grown used to it being ...
 

You do realize that by moving modules outside of the base project,
you're forcing more work for the masses.  That is, if you plan to have
/doc, /pgsql-server, /contrib, etc, people will now have to create a new
.../pgsql directory locally which means, locally, people will have
.../pgsql/pgsql-server, .../pgsql/contrib, etc...why force onto the
developer what CVS should already be doing it.  I don't know about you
guys, but when I check out pgsql, I certainly expect everything to be
there.  If that's not what I want, then I should be more explicate in
what I pick for checkout.

Just some food for thought...this is a common peeve of mine when people
decide they need to restructure their repository...seems like this is
always done and almost always a poor choice that isn't realized until
it's all done and over with.

Greg





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] CVS server problem!

2002-08-01 Thread Greg Copeland

Yes, it's compiling now...thanks.

Greg

On Thu, 2002-08-01 at 18:31, Marc G. Fournier wrote:
 
 Should be fixed now ... I found a rsync.core file, so it looks like the
 changes may have been more extensive then rsync could handle ... just ran
 it manually (or, rather, am running as I type this), so by the time you
 receive, a checkout should grab the right structures ...
 
 Let me know if it works now for you ...
 
 
 On 1 Aug 2002, Greg Copeland wrote:
 
  Seems the CVS server is not working correctly.  I just deleted my CVS
  tree and did a fresh checkout of the pgsql module.  Everything seemingly
  went well.  After the check out completed, I did:
 
  [gcope@mouse pgsql]$ ./configure --with-tcl --with-java --with-python
  --with-perl
  checking build system type... i686-pc-linux-gnu
  checking host system type... i686-pc-linux-gnu
  checking which template to use... linux
  checking whether to build with 64-bit integer date/time support... no
  checking whether to build with recode support... no
  checking whether NLS is wanted... no
  checking for default port number... 5432
  checking for default soft limit on number of connections... 32
  checking for gcc... gcc
  checking for C compiler default output... a.out
  checking whether the C compiler works... yes
  checking whether we are cross compiling... no
  checking for suffix of executables...
  checking for suffix of object files... o
  checking whether we are using the GNU C compiler... yes
  checking whether gcc accepts -g... yes
  ./configure: ./src/template/linux: No such file or directory
 
  So, I did, ./configure which yields the same result.  So, thinking
  maybe I just had a poorly timed checkout, I did an update.  Doing so,
  looks like this:
  [gcope@mouse pgsql]$ cvs -z3 update -dP
  ? config.log
  cvs server: Updating .
  cvs server: Updating ChangeLogs
  cvs server: Updating MIGRATION
  cvs server: Updating config
  .
  .
  .
  src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1
  cvs server: Updating
  src/backend/utils/mb/conversion_procs/utf8_and_johab
  cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_sjis
  cvs server: cannot open directory
  /projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_sjis: No 
such file or directory
  cvs server: skipping directory
  src/backend/utils/mb/conversion_procs/utf8_and_sjis
  cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_tcvn
  cvs server: cannot open directory
  /projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_tcvn: No 
such file or directory
  cvs server: skipping directory
  src/backend/utils/mb/conversion_procs/utf8_and_tcvn
  cvs server: Updating src/backend/utils/mb/conversion_procs/utf8_and_uhc
  cvs server: cannot open directory
  /projects/cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_uhc: No 
such file or directory
  cvs server: skipping directory
  src/backend/utils/mb/conversion_procs/utf8_and_uhc
  cvs server: Updating src/backend/utils/misc
  cvs server: cannot open directory
  /projects/cvsroot/pgsql/src/backend/utils/misc: No such file or
  directory
  cvs server: skipping directory src/backend/utils/misc
  cvs server: Updating src/backend/utils/mmgr
  cvs server: cannot open directory
  /projects/cvsroot/pgsql/src/backend/utils/mmgr: No such file or
  directory
  cvs server: skipping directory src/backend/utils/mmgr
  cvs server: Updating src/backend/utils/sort
  cvs server: cannot open directory
  /projects/cvsroot/pgsql/src/backend/utils/sort: No such file or
  directory
  cvs server: skipping directory src/backend/utils/sort
  cvs server: Updating src/backend/utils/time
  cvs server: cannot open directory
  /projects/cvsroot/pgsql/src/backend/utils/time: No such file or
  directory
  cvs server: skipping directory src/backend/utils/time
  cvs server: Updating src/bin
  cvs server: cannot open directory /projects/cvsroot/pgsql/src/bin: No
  such file or directory
  cvs server: skipping directory src/bin
 
 
  So, I'm fairly sure something is awry.
 
  Greg
 
 
  On Mon, 2002-07-29 at 20:51, Tatsuo Ishii wrote:
   Can anyone fix this?
  
   $ cvs up
   can't create temporary directory /tmp/cvs-serv40296
   No space left on device
   --
   Tatsuo Ishii
  
   ---(end of broadcast)---
   TIP 6: Have you searched our list archives?
  
   http://archives.postgresql.org
 
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Why is MySQL more chosen over PostgreSQL?

2002-08-02 Thread Greg Copeland

On Thu, 2002-08-01 at 22:39, Curt Sampson wrote:
 On 1 Aug 2002, Greg Copeland wrote:
 
  For some reason,
  many of the developers are under the impression that even if code is
  never touched, it has a very high level of effort to keep it in the code
  base.  That is, of course, completely untrue.
 
 Where does this of course come from? I've been programming for quite a
 while now, and in my experience every line of code costs you something
 to maintain.

Please re-read my statement.  Your assertion and my statement are by no
means exclusionary.  Of course was correctly used and does correctly
apply, however, it doesn't appear it was correctly comprehended by you
as it applied in context.  I agree with your statement of, ...every
line of code costs you something to maintain... which in no way, shape,
or form contradicts my statement of, ...it has a very high level of
effort...of course not  Fact is, if code which is never touched and
requires a very level of effort to maintain, chances are you screwed up
somewhere.

Hopefully we can agree that ...costs you something... does not have to
mean, ...very high level of effort...

 As long as there's any interaction with other parts of
 the system, you have to test it regularly, even if you don't need to
 directly change it.

No one said otherwise.  Perhaps you were replying to someone else?!  :)

 
 That said, if you've been doing regular work on postgres code base and you
 say that it's cheap to maintain, I'll accept that.

Please re-read my statement.  In my mind, this was implicately
understood from the statement I made.

Shesh...sure hope I remembered to dot all my i's...

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Why is MySQL more chosen over PostgreSQL?

2002-08-01 Thread Greg Copeland

On Tue, 2002-07-30 at 14:54, Hannu Krosing wrote:
 On Tue, 2002-07-30 at 16:00, Curt Sampson wrote:
  On 30 Jul 2002, Hannu Krosing wrote:
  
   On Tue, 2002-07-30 at 14:51, Adrian 'Dagurashibanipal' von Bidder wrote:
  
Bruce Momjian:
 It causes too much complexity in other parts of the system.
   
That's one reason.
  
   Seems like somewhat valid reason. But still not enough to do a lot of
   work _and_ annoy a lot of existing users :)
  
  It's almost unquestionably more work to maintain than to drop. Dropping
  support for it is a one-time operation. Maintaining it is an ongoing
  expense.
 
 I would not rush to drop advanced features, as they may be hard to put
 back later. If they stay in, even in broken form, then there wont be
 nearly as much patches which make fixing them harder.

I seem to find this argument a lot on the list here.  For some reason,
many of the developers are under the impression that even if code is
never touched, it has a very high level of effort to keep it in the code
base.  That is, of course, completely untrue.  Now then, I'm not saying
that something as central as the topic at hand has a zero maintenance
cost associated with it, especially if it's constantly being run into by
the developers, but I do see it used WAY to often here for it to be
applicable in every case.

From what I can tell, in many cases, when one developer on the list
doesn't want to maintain or sees little value in a feature, it suddenly
seems to have a high price associated with it.  We need to be sure we're
making the distinction between, I don't care to maintain this, and,
maintaining this code is prohibitively high given it's feature
return...because   In other words, I find this argument used often
here will little to nothing used in context which would quantify it. 
Worse yet, it generally goes unchallenged and unquestioned.

 
 I'm afraid that we have already dropped too much. 
 
 For example we dropped time travel, but recent versions of Oracle now
 have some form of it, usable mostly for recovering accidentally deleted
 (and committed rows), although it is much harder to implement it using
 logs than using MVCC.

I must admit, I never understood this myself but I'm sure I'm ignorant
of the details.

  That's a straw man argument.
 
 Actually it was meant to be 'one straw man against another straw man 
 argument' ;)

Was clear to me!  I thought you made the point rather well.

 
  What we (or I, anyway) are arguing is that
  the relational model does everything that table inheritance does, and at
  least as easily.
 
 The problem is that 'the relational model' does nothing by itself. It is
 always the developers/DBAs who have to do things. 
 
 And at least for some brain shapes it is much more convenient to inherit
 tables than to (re)factor stuff into several tables to simulate
 inheritance using the relational model. 

Agreed.  It's important to remember, there are some cases where the
conceptual implications can allow for more freedom in implementation. 
This is the point that was being made with the pure C comment.  Sure,
I can do pretty much anything in asm, but that approach doesn't suddenly
invalidate every other way/language/concept/idiom to trying to
accomplish as given task.

Simply put, much of the power you get from any tool is often the
flexibility of a given tool to address a problem domain in many
different ways rather than just one.  Just because it doesn't fit your
paradigm doesn't mean it doesn't fit nicely into someone else's.

 
 I still think that inheritance should be enchanced and made compatible
 with standards not removed.

I completely agree with that!

 
  Extending the model adds complexity without adding the
  ability to do things you couldn't easily do before. (This, IMHO, makes
  table inheritance quite inelegant.)
 
 Then explain why SQL99 has included inheritance ?
 

Yes please.  I'm very interested in hearing a rebuttal to this one.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Why is MySQL more chosen over PostgreSQL?

2002-08-02 Thread Greg Copeland

On Thu, 2002-08-01 at 23:30, Tom Lane wrote:
 Greg Copeland [EMAIL PROTECTED] writes:
  I seem to find this argument a lot on the list here.  For some reason,
  many of the developers are under the impression that even if code is
  never touched, it has a very high level of effort to keep it in the code
  base.  That is, of course, completely untrue.
 
 FWIW, I did not notice any of the core developers making that case.
 

I've seen it used a lot.  In many cases, it's asserted with nothing to
support it other than the fact that they are a core developer, however,
these assertions are often given against unspecified and undeveloped
code, so, it makes such an assertion invalid.  

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] WAL file location

2002-08-02 Thread Greg Copeland

On Wed, 2002-07-31 at 22:20, Bruce Momjian wrote:
 I am wondering why we even want to specify the WAL location anywhere
 except as a flag to initdb.  If you specify a location at initdb time,
 it creates the /xlog directory, then symlinks it into /data.
 

Does this have any negative implications for Win32 ports?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] WAL file location

2002-08-02 Thread Greg Copeland

On Fri, 2002-08-02 at 13:46, Thomas Lockhart wrote:
   I am wondering why we even want to specify the WAL location anywhere
   except as a flag to initdb.  If you specify a location at initdb time,
   it creates the /xlog directory, then symlinks it into /data.
  Does this have any negative implications for Win32 ports?
 
 Sure. the symlinks thing was just a suggestion. Everything else is
 portable for sure... Or is there some other area you are concerned
 about?

Well, as another poster pointed out, Cygwin does support soft links but
I was also under the impression that lots of Win32 related development
was underway.  I wasn't sure if those plans called for the use of Cygwin
or not.

I was just trying to highlight a possible cause for concern...as I
honestly don't know how it relates to the current Win32 efforts.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-10 Thread Greg Copeland

On Sat, 2002-08-10 at 00:25, Mark Kirkwood wrote:
 Ralph Graulich wrote:
 
 Hi,
 
 just my two cents worth: I like having the files sized in a way I can
 handle them easily with any UNIX tool on nearly any system. No matter
 wether I want to cp, tar, dump, dd, cat or gzip the file: Just keep it at
 a maximum size below any limits, handy for handling.
 
 Good point... however I was thinking that being able to dump the entire 
 database without resporting to gzips and splits was handy...
 
 
 For example, Oracle suggests it somewhere in their documentation, to keep
 datafiles at a reasonable size, e.g. 1 GB. Seems right to me, never had
 any problems with it.
 
 Yep, fixed or controlled sizes for data files is great... I was thinking 
 about databases rather than data files (altho I may not have made that 
 clear in my mail)
 

I'm actually amazed that postgres isn't already using large file
support.  Especially for tools like dump.  I do recognize the need to
keep files manageable in size but my file sizes for my needs may differ
from your sizing needs.

Seems like it would be a good thing to enable and simply make it a
function for the DBA to handle.  After all, even if I'm trying to keep
my dumps at around 1GB, I probably would be okay with a dump of 1.1GB
too.  To me, that just seems more flexible.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] python patch

2002-08-11 Thread Greg Copeland

Not a problem.  I would rather them be correct.

Worth noting that the first patch is what attempts to fix the long -
int overflow issue.  The second patch attempts to resolve attisdropped
column use issues with the python scripts.  The third patch addresses
issues generated by the implicate to explicate use of cascade.

I assume your reservations are only with the second patch and not the
first and third patches?

Greg


On Sun, 2002-08-11 at 04:43, Christopher Kings-Lynne wrote:
 I wouldn't apply this _just_ yet Bruce as I'm not certain all the changes
 are necessary...  I intend to look into it but I haven't had the time yet
 (sorry Greg!)
 
 Chris
 
 
 On Sun, 11 Aug 2002, Bruce Momjian wrote:
 
 
  Your patch has been added to the PostgreSQL unapplied patches list at:
 
  http://candle.pha.pa.us/cgi-bin/pgpatches
 
  I will try to apply it within the next 48 hours.
 
  ---
 
 
  Greg Copeland wrote:
 
  Checking application/pgp-signature: FAILURE
  -- Start of PGP signed section.
   Well, that certainly appeared to be very straight forward.  pg.py and
   syscat.py scripts were both modified.  pg.py uses it to cache a list of
   pks (which is seemingly does for every db connection) and various
   attributes.  syscat uses it to walk the list of system tables and
   queries the various attributes from these tables.
  
   In both cases, it seemingly makes sense to apply what you've requested.
  
   Please find attached the quested patch below.
  
   Greg
  
  
   On Wed, 2002-08-07 at 22:16, Christopher Kings-Lynne wrote:
 I don't have a problem looking into it but I can't promise I can get it
 right.  My python skills are fairly good...my postgres internal skills
 are still sub-par IMO.

 From a cursory review, if attisdropped is true then the attribute/column
 should be ignored/skipped?! Seems pretty dang straight forward.
   
Basically, yep.  Just grep the source code for pg_attribute most likely...
   
I'm interested in knowing what it uses pg_attribute for as well...?
   
Chris
   
   
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
  
 
  [ text/x-patch is unsupported, treating like TEXT/PLAIN ]
 
   Index: pg.py
   ===
   RCS file: /projects/cvsroot/pgsql-server/src/interfaces/python/pg.py,v
   retrieving revision 1.9
   diff -u -r1.9 pg.py
   --- pg.py 2002/03/19 13:20:52 1.9
   +++ pg.py 2002/08/08 03:29:48
   @@ -69,7 +69,8 @@
 WHERE pg_class.oid = 
pg_attribute.attrelid AND
 pg_class.oid = 
pg_index.indrelid AND
 pg_index.indkey[0] = 
pg_attribute.attnum AND
   - pg_index.indisprimary = 
't').getresult():
   + pg_index.indisprimary = 't' AND
   + pg_attribute.attisdropped = 
'f').getresult():
 self.__pkeys__[rel] = att
  
 # wrap query for debugging
   @@ -111,7 +112,8 @@
 WHERE pg_class.relname = '%s' AND
 pg_attribute.attnum  0 AND
 pg_attribute.attrelid = pg_class.oid 
AND
   - pg_attribute.atttypid = pg_type.oid
   + pg_attribute.atttypid = pg_type.oid AND
   + pg_attribute.attisdropped = 'f'
  
 l = {}
 for attname, typname in self.db.query(query % cl).getresult():
   Index: tutorial/syscat.py
   ===
   RCS file: 
/projects/cvsroot/pgsql-server/src/interfaces/python/tutorial/syscat.py,v
   retrieving revision 1.7
   diff -u -r1.7 syscat.py
   --- tutorial/syscat.py2002/05/03 14:21:38 1.7
   +++ tutorial/syscat.py2002/08/08 03:29:48
   @@ -37,7 +37,7 @@
 FROM pg_class bc, pg_class ic, pg_index i, pg_attribute a
 WHERE i.indrelid = bc.oid AND i.indexrelid = bc.oid
 AND i.indkey[0] = a.attnum AND a.attrelid = bc.oid
   - AND i.indproc = '0'::oid
   + AND i.indproc = '0'::oid AND a.attisdropped = 'f'
 ORDER BY class_name, index_name, attname)
 return result
  
   @@ -48,6 +48,7 @@
 WHERE c.relkind = 'r' and c.relname !~ '^pg_'
 AND c.relname !~ '^Inv' and a.attnum  0
 AND a.attrelid

Re: [HACKERS] [SECURITY] DoS attack on backend possible (was: Re:

2002-08-12 Thread Greg Copeland

Well, if it's a buffer overrun, there is certainly potential for risks
well beyond that of simply crashing the be.  It's certainly possible
that a simple bug in one cgi script or web site could allow someone to
execute code on the database host because of this bug.  Assuming they
are running the be as postgres or some other seemingly harmless 
user, it's still possible that complete destruction of any and all
databases which are hosted and accessible by this user can be utterly
destroyed or miscellaneously corrupted.

Buffer over runs should be treated with the up most urgency and
respect.  IMO, any known buffer overrun is worthy of an emergency fix
and corresponding advisory.

Greg Copeland


On Sun, 2002-08-11 at 12:09, Tom Lane wrote:
 Justin Clift [EMAIL PROTECTED] writes:
  Am I understanding this right:
   - A PostgreSQL 7.2.1 server can be crashed if it gets passed certain
  date values which would be accepted by standard front end parsing? 
 
 AFAIK it's a buffer overrun issue, so anything that looks like a
 reasonable date would *not* cause the problem.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] OOP real life example (was Re: Why is MySQL more

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 00:29, Hannu Krosing wrote:
 On Mon, 2002-08-12 at 11:52, Curt Sampson wrote:
  On Sun, 11 Aug 2002, Don Baccus wrote:
[snip]
  But anyway, I have no particularly huge objection to syntatic sugar
  alone. I do have objections to it when it's not saving much typing. (It
  is in this case, but that could be fixed with better automatic support
  of view updates.)
  
  But my real objection is when it makes things more confusing, rather
  than less, which I think is definitely happening here.
 
 What makes things more confusing is poor understanding of a feature, not
 the feature itself. 

Agreed.  Just because a feature may not be well understood by the masses
doesn't mean the feature is worthless.

 
  I've never
  seen a rigourous explanation of our model of table inheritance,
  nor any model that was more obviously correct than another. And
  the parallel drawn with inheritance in OO languages is a false
  parallel that adds to the confusion.
 
 Are you saying that inheritance in SQL is something fundamentally
 different than inheritance in OO languages ?
 

Hmmm...there might be.  Curt raises in interesting point below.  Do keep
in mind that I believe he's specifically referring to table inheritance
and not the broad scope of language wide inheritance.


  (For example, the distinction
  between types and instances of types is critical in OO theory. What are
  the TI equivalants of this?)
 
 If by TI you mean type instance then the equivalent of of an instance is
 a relation (i.e. one row in an (inherited) table).
 

Look a little deeper here.  In other OO implementations, I can define a
class (say class a) which has no instances (abstract base class). 
Furthermore, I can take this case and use it for building blocks
(assuming multiple inheritance is allowed in this world) by combining
with other classes (z inherits from a, b, c; whereby classes a, b, c
still do not have an actual instance).  I can create an instance of my
newly inherited class (z).

Seems to me that there is some distinction between types (classes) and
and type instances (instance of a specific class) as it pertains to it's
usability.

How exactly would you create an abstract base class for table type?

I'm still trying to put my brain around exactly what the implications
are here, but I *think* this is what curt was trying to stress.  Curt,
feel free to correct me as needed.


  All this is borne out by the regular questions one sees about
  inheritance in the mailing lists. I'll admit a good part of it is
  due to the broken implementation of inheritance, but all of the
  problems I've ever seen are easily solved with very simple relational
  solutions.
 
 All _simple_ inheritance problems are easily solved by simple relational
 solutions. The general problem of much more typing and debugging, less
 clues for optimiser etc. are not solved by _simple_ relational
 solutions.

I agree with Hannu here.  Curt's comment seems like lip service.  Worth
noting too, even if it were not for the issues pointed out by Hannu
here, Curt's statement certainly does nothing to invalidate the concept
of table inheritance.  After all, most camps are happy when there are
multiple means to an end.  Just because it can be done via method-x,
doesn't invalid method-y.  The inverse is probably true too.  ;)

 
  Maybe the inheritance thing is causing people to turn off the relational
  parts of their brain or something.
  
 Of maybe people are diversifying, using inheritance for is-a
 relationships and relational model for has-a relationships.

That's an interesting point.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 09:39, Andrew Sullivan wrote:
 On Sat, Aug 10, 2002 at 09:21:07AM -0500, Greg Copeland wrote:
 
  I'm actually amazed that postgres isn't already using large file
  support.  Especially for tools like dump. 
 
 Except it would only cause confusion if you ran such a program on a
 system that didn't itself have largefile support.  Better to make the
 admin turn all these things on on purpose, until everyone is running
 64 bit systems everywhere.

If by turn...on, you mean recompile, that's a horrible idea IMO. 
Besides, you're expecting that an admin is going to know that he even
needs to recompile to obtain this feature let alone that he'd interested
in compiling his own installation.  Whereas, more then likely he'll know
off hand (or can easily find out) if his FS/system supports large files
(32 bit sizes).

Seems like, systems which can natively support this feature should have
it enabled by default.  It's a different issue if an admin attempts to
create files larger than what his system and/or FS can support.

I guess what I'm trying to say here is, it's moving the problem from
being a postgres specific issue (not compiled in -- having to recompile
and install and not knowing if it's (dis)enabled) to a general body of
knowledge (does my system support such-n-such capabilities).

If a recompile time is still much preferred by the core developers,
perhaps a log entry can be created which at least denotes the current
status of such a feature when a compile time option is required.  Simply
having an entry of, LOG:  LARGE FILE SUPPORT (DIS)ENABLED (64-bit file
sizes), etc...things along those lines.  Of course, having a
--enable-large-files would be nice too.

This would seemingly make sense in other contexts too.  Imagine a
back-end compiled with large file support and someone else using fe
tools which does not support it.  How are they going to know if their
fe/be supports this feature unless we let them know?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 10:30, Andrew Sullivan wrote:
 On Mon, Aug 12, 2002 at 10:15:46AM -0500, Greg Copeland wrote:
 
  If by turn...on, you mean recompile, that's a horrible idea IMO. 
 
 Ah.  Well, that is what I meant.  Why is it horrible?  PostgreSQL
 doesn't take very long to compile.  


Many reasons.  A DBA is not always the same thing as a developer (which
means it's doubtful he's even going to know about needed options to pass
-- if any).  Using a self compiled installation may not install the same
sense of reliability (I know that sounds odd) as using a distribution's
package.  DBA may not be a SA, which means he should probably not be
compiling and installing software on a system.  Furthermore, he may not
even have access to do so.

Means upgrading in the future may be problematic.  Someone compiled with
large file support.  He leaves.  New release comes out.  Someone else
upgrades and now finds things are broken.  Why?  If it supported it out
of the box, issue is avoided.

Lastly, and perhaps the most obvious, SA and DBA bodies of knowledge are
fairly distinct.  You should not expect a DBA to function as a SA. 
Furthermore, SA and developer bodies of knowledge are also fairly
distinct.  You shouldn't expect a SA to know what compiler options he
needs to use to compile software on his system.  Especially for
something as obscure as large file support.


 
  I guess what I'm trying to say here is, it's moving the problem from
  being a postgres specific issue (not compiled in -- having to recompile
  and install and not knowing if it's (dis)enabled) to a general body of
  knowledge (does my system support such-n-such capabilities).
 
 The problem is not just a system-level one, but a filesystem-level
 one.  Enabling 64 bits by default might be dangerous, because a DBA
 might think oh, it supports largefiles by default and therefore not
 notice that the filesystem itself is not mounted with largefile
 support.  But I suspect that the developers would welcome autoconfig
 patches if someone offered them.


The distinction you make there is minor.  A SA, should know and
understand the capabilities of the systems he maintains (this is true
even if the SA and DBA are one).  This includes filesystem
capabilities.  A DBA, should only care about the system requirements and
trust that the SA can deliver those capabilities.  If a SA says, my
filesystems can support very large files, installs postgres, the DBA
should expect that match support in the database is already available. 
Woe is his surprise when he finds out that his postgres installation
can't handle it?!

As for the concern for danger.  Hmm...my understanding is that the
result is pretty much the same thing as exceeding max file size.  That
is, if you attempt to read/write beyond what the filesystem can provide,
you're still going to get an error.  Is this really more dangerous than
simply reading/writing to a file which exceeds max system capabilities? 
Either way, this issue exists and having large file support, seemingly,
does not effect it one way or another.

I guess I'm tying to say, the risk of seeing filesystem corruption or
even database corruption should not be effected by the use of large file
support.  Please correct me if I'm wrong.


Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 11:04, Andrew Sullivan wrote:
 On Mon, Aug 12, 2002 at 11:44:24AM -0400, Lamar Owen wrote:
  keep discussing the issues involved, and I'll see what comes of it.  I don't 
  have an direct experience with the largefile support, and am learning as I go 
  with this.
 
 I do have experience with both of these cases.  We're hosted in a
 managed-hosting environment, and one day one of the sysadmins there
 must've remounted a filesystem without largefile support.  Poof! I
 started getting all sorts of strange pg_dump problems.  It wasn't
 hard to track down, except that I was initially surprised by the
 errors, since I'd just _enabled_ large file support.

And, what if he just remounted it read only.  Mistakes will happen. 
That doesn't come across as being a strong argument to me.  Besides,
it's doubtful that a filesystem is going to be remounted while it's in
use.  Which means, these issues are going to be secondary to actual
product use of the database.  That is, either the system is working
correctly or it's not.  If it's not, guess it's not ready for production
use.

Furthermore, since fs mounting, if being done properly, is almost always
a matter of automation, this particular class of error should be few and
very far between.

Wouldn't you rather answer people with, remount your file system,
rather than, recompile with such-n-such option enabled, reinstall.  Oh
ya, since you're re-installing a modified version of your database,
probably a good paranoid option would be to back up and dump, just to be
safe.  Personally, I'd rather say, remount.


 There are, in any case, _lots_ of problems with these large files. 
 You not only need to make sure that pg_dump and friends can support
 files bigger than 2G.  You need to make sure that you can move the
 files around (your file transfer commands), that you can compress the
 files (how is gzip compiled?  bzip2?), and even that you r backup
 software takes the large file.  In a few years, when all
 installations are ready for this, it seems like it'd be a good idea
 to turn this on by default.  Right now, I think the risks are at
 least as great as those incurred by telling people they need to
 recompile.  
 

All of those are SA issues.  Shouldn't we leave that domain of issues
for a SA to deal with rather than try to force a single view down
someone's throat?  Which, btw, results is creating more work for those
that desire this feature.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] OOP real life example (was Re: Why is MySQL more

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 10:39, Oliver Elphick wrote:
 On Mon, 2002-08-12 at 15:00, Greg Copeland wrote:
  How exactly would you create an abstract base class for table type?
 
 CREATE TABLE abstract_base (
cols ...,
CONSTRAINT No data allowed in table abstract_base! CHECK (1 = 0)
 )
 
 This assumes that the constraint is not inherited or can be removed in
 child tables.
 

Why would I assume that constraints would not be inherited?  Seems as a
general rule of thumb, you'd want the constraints to be inherited.  Am I
missing something?

Also, if I remove the constraint on the child table, doesn't that really
mean I'm removing the constraint on the parent table?  That would seem
to violate the whole reason of having constraints.  If a constraint is
placed in an ABC and we find that we later need to remove it for EVERY
derived class, doesn't that imply it shouldn't of been in there to begin
with?  After all, in this case, we're saying that each and every derived
class needs to overload or drop a specific constraint.  That strikes me
as being rather obtuse.

That, in it self, I find rather interesting.  Is there any papers or
books which offers explanation of how constraints should handled for
table inheritance?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 11:40, Andrew Sullivan wrote:
 On Mon, Aug 12, 2002 at 11:07:51AM -0500, Greg Copeland wrote:
 
  Many reasons.  A DBA is not always the same thing as a developer (which
  means it's doubtful he's even going to know about needed options to pass
  -- if any). 
 
 This (and the upgrade argument) are simply documentation issues. 
 If you check the FAQ_Solaris, there's already a line in there which
 tells you how to do it.

And?  What's you're point.  That somehow make it disappear?  Even if it
had been documented, it doesn't mean the documentation made it to the
right hands or was obviously located.  Just look at postgres'
documentation in general.  How often are people told to read the
code.  Give me a break.  You're argument is a very weak straw.

 
  Lastly, and perhaps the most obvious, SA and DBA bodies of knowledge are
  fairly distinct.  You should not expect a DBA to function as a SA. 
  Furthermore, SA and developer bodies of knowledge are also fairly
  distinct.  You shouldn't expect a SA to know what compiler options he
  needs to use to compile software on his system.  Especially for
  something as obscure as large file support.
 
 It seems to me that a DBA who is running a system which produces 2
 Gig dump files, and who can't compile Postgres, is in for a rocky
 ride.  Such a person needs at least a support contract, and in such a
 case the supporting organisation would be able to provide the needed
 binary.

LOL.  Managing data and compiling applications have nothing to do with
each other.  Try, try again.

You also don't seem to understand that this isn't as simple as
recompile.  It's not!!!  We clear on this?!  It's as simple as
needing to KNOW that you have to recompile and then KNOWING you have to
use a serious of obtuse options when compiling.

In other words, you seemingly know everything you don't know which is
more than the rest of us.

 Anyway, as I said, this really seems like the sort of thing that
 mostly gets done when someone sends in a patch.  So if it scratches
 your itch . . .
 
  The distinction you make there is minor.  A SA, should know and
  understand the capabilities of the systems he maintains (this is true
  even if the SA and DBA are one).  This includes filesystem
  capabilities.  A DBA, should only care about the system requirements and
  trust that the SA can deliver those capabilities.  If a SA says, my
  filesystems can support very large files, installs postgres, the DBA
  should expect that match support in the database is already available. 
  Woe is his surprise when he finds out that his postgres installation
  can't handle it?!
 
 And it seems to me the distinction you're making is an invidious one. 
 I am sick to death of so-called experts who want to blather on about
 this or that tuning parameter of [insert big piece of software here]
 without knowing the slightest thing about the basic operating
 environment.  A DBA has responsibility to know a fair amount about

In other words, you can't have a subject matter expert unless he is an
expert on every subject?  Ya, right!

 the platform in production.  A DBA who doesn't is one day going to
 find out what deep water is.

Agreed...as it relates to the database.  DBA's should have to know
details about the filesystem...that's the job of a SA.  You seem to be
under the impression that SA = DBA or somehow a DBA is an SA with extra
knowledge.  While this is sometimes true, I can assure you this is not
always the case.

This is exactly why large companies often have DBAs in one department
and SA in another.  Their knowledge domains tend to uniquely differ.

 
  result is pretty much the same thing as exceeding max file size.  That
  is, if you attempt to read/write beyond what the filesystem can provide,
  you're still going to get an error.  Is this really more dangerous than
  simply reading/writing to a file which exceeds max system capabilities? 
 
 Only if you were relying on it for backups, and suddenly your backups
 don't work.
 

Correction.  Suddenly your backends never worked.  Seems like it would
of been caught prior to going into testing.  Surely you're not
suggesting that people place a system into production without having
testing full life cycle?  Back up testing is part of your life cycle
right?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS\

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 11:48, Andrew Sullivan wrote:
 On Mon, Aug 12, 2002 at 11:17:31AM -0500, Greg Copeland wrote:
[snip]

   There are, in any case, _lots_ of problems with these large files. 
 
  All of those are SA issues.  
 
 So is compiling the software correctly, if the distinction has any
 meaning at all.  When some mis-installed bit of software breaks, the
 DBAs won't go running to the SAs.  They'll ask here.

Either case, they're going to ask.  You can give them a simple solution
or you can make them run around and pull their hair out.

You're also assuming that SA = developer.  I can assure you it does
not.  I've met many an SA who's development experience was make and
korn scripts.  Expecting that he should know to use GNU_SOURCE and
BITS=64, it a pretty far reach.  Furthermore, you're even expecting that
he knows that such a recompile fix even exists.  Where do you think
he's going to turn?  The lists.  That's right.  Since he's going to
contact the list or review a faq item anyways, doesn't it make sense to
give them the easy way out (the the initiator and the mailing list)?

IMO, powerful tools seem to always be capable enough to shoot your self
in the foot.  Why make pay special attention with this sole feature
which doesn't really address it to begin with?

Would you at least agree that --enable-large-files, rather than
CFLAGS=xxx, is a good idea as might well be banners and log entries
stating that large file support has or has not been compiled in?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] python patch

2002-08-12 Thread Greg Copeland

On Sun, 2002-08-11 at 21:15, Christopher Kings-Lynne wrote:
  Not a problem.  I would rather them be correct.
 
  Worth noting that the first patch is what attempts to fix the long -
  int overflow issue.  The second patch attempts to resolve attisdropped
  column use issues with the python scripts.  The third patch addresses
  issues generated by the implicate to explicate use of cascade.
 
  I assume your reservations are only with the second patch and not the
  first and third patches?
 
 Correct.  I'm pretty sure you don't need to exclude attisdropped from the
 primary key list because all it's doing is finding the column that a primary
 key is over and that should never be over a dropped column.  I can't
 remember what you said the second query did?


Hmmm.  Sounds okay but I'm just not sure that holds true (as I
previously stated, I'm ignorant on the topic).  Obviously I'll defer to
you on this.

Here's the queries and what they do:


From pg.py:
Used to locate primary keys -- or so the comment says.  It does create a
dictionary of keys and attribute values for each returned row so I
assume it really is attempting to do something of the like.

SELECT pg_class.relname, pg_attribute.attname 
FROM pg_class, pg_attribute, pg_index 
WHERE pg_class.oid = pg_attribute.attrelid AND 
pg_class.oid = pg_index.indrelid AND 
pg_index.indkey[0] = pg_attribute.attnum AND 
pg_index.indisprimary = 't' AND 
pg_attribute.attisdropped = 'f' ;

So, everyone is in agreement that any attribute which is indexed as a
primary key will never be able to have attisdtopped = 't'?

According to the code:
SELECT pg_attribute.attname, pg_type.typname
FROM pg_class, pg_attribute, pg_type
WHERE pg_class.relname = '%s' AND
pg_attribute.attnum  0 AND
pg_attribute.attrelid = pg_class.oid AND
pg_attribute.atttypid = pg_type.oid AND
pg_attribute.attisdropped = 'f' ;

is used to obtain all attributes (column names) and their types for a
given table ('%s').  It then attempts to build a column/type cache.  I'm
assuming that this really does need to be there.  Please correct
accordingly.


From syscat.py:
SELECT bc.relname AS class_name,
ic.relname AS index_name, a.attname
FROM pg_class bc, pg_class ic, pg_index i, pg_attribute a
WHERE i.indrelid = bc.oid AND i.indexrelid = bc.oid
AND i.indkey[0] = a.attnum AND a.attrelid = bc.oid
AND i.indproc = '0'::oid AND a.attisdropped = 'f'
ORDER BY class_name, index_name, attname ;

According to the nearby documentation, it's supposed to be fetching a
list of all simple indicies.  If that's the case, is it safe to assume
that any indexed column will never have attisdropped = 't'?  If so, we
can remove that check from the file as well.  Worth pointing out, this
is from syscat.py, which is sample source and not used as actual
interface.  So, worse case, it would appear to be redundant in nature
with no harm done.

This should conclude the patched items offered in the second patch.

What ya think?

Thanks,
Greg





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] OOP real life example (was Re: Why is MySQL more

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 20:34, Curt Sampson wrote:
 Ok, big bundled up reply here to various people.
 
  From: Greg Copeland [EMAIL PROTECTED]
 
   What makes things more confusing is poor understanding of a feature, not
   the feature itself.
 
  Agreed.  Just because a feature may not be well understood by the masses
  doesn't mean the feature is worthless.
 
 Yeah, but if it's not understood by fairly smart people familiar
 with both relational theory and OO programming? If the feature is
 confusing because it appears to be something it's not, that's a
 feature problem, not a problem with the people trying to understand
 it. Maybe all that's necessary to fix it is a terminology change,
 but even so

You're constantly confusing Postgres' implementation with a desired
implementation.  Below, I think, is the effort to figure out exactly
what a desired implementation really is.

If a feature is partially implemented, of course it's going to be
confusing to use.

Let's please stop beating this horse Curt.  At this point, I think the
horse is floating upside down in a pond somewhere...yep...and the
buzzards are coming.

Please.  Beating people with a stick isn't suddenly going to make
everyone share your view point.

   All _simple_ inheritance problems are easily solved by simple relational
   solutions. The general problem of much more typing and debugging, less
   clues for optimiser etc. are not solved by _simple_ relational
   solutions.
 
  I agree with Hannu here.  Curt's comment seems like lip service.
 
 Well, as I said: examples please. Quite frankly, between the lack
 of a clear model of table inheritance (Hannu seems to have one,
 but this needs to be written up in unambiguous form and put into
 the postgres manual) and the bugs in the postgres implementation
 of table inheritance, I've found the relational model much easier
 to use for solving problems.

If you're so keen on examples, please provide one that justifies such a
boastful statement.  Hannu has done a pretty fair job of beating ya back
every time.  Personally, in this case, I don't really need examples are
it's pretty obvious a braggart statement full of bias.  So second
thought, perhaps we can let this one alone.

I do agree that it looks like Hannu is doing a fairly good job of
providing some constructive direction here.  Hannu, please keep up the
good work.  ;)

 
  From: Oliver Elphick [EMAIL PROTECTED]
 
  On Mon, 2002-08-12 at 15:00, Greg Copeland wrote:
   How exactly would you create an abstract base class for table type?
 
  CREATE TABLE abstract_base (
 cols ...,
 CONSTRAINT No data allowed in table abstract_base! CHECK (1 = 0)
  )
 
  This assumes that the constraint is not inherited or can be removed in
  child tables.
 
 Are we then assuming that tuples in the child tables do not appear
 in the base table? That's more or less what I'd assumed when I
 originally heard about table inheritance (after all, instantiating
 a child object does not automatically instantiate a separate copy
 of the parent object), but the SQL standard, postgres, and I believe other
 systems make the exact opposite assumption.

That's actually my exact assumption...that is, that tuples in the parent
did not exist in the child.  Is that not true?  Can you point me to any
references?

 
 If the child table tuples do appear in the parent, you've now got
 a situation analogous to the current postgres situation where a
 constraint on the parent table is an outright lie. (I'm thinking
 of the UNIQUE constraint which guarantees that all values in a
[snip]

I knew that there are *implementation* issues with postgres that causes
problems with constraints, etc...I didn't realize that was the reason.

 
  From: Greg Copeland [EMAIL PROTECTED]
 
  That, in it self, I find rather interesting.  Is there any papers or
  books which offers explanation of how constraints should handled for
  table inheritance?
 
 Here again, I'd love to hear about some references, too. I see a
 lot of people saying they like table inheritance; I don't see anyone
 (except maybe Hannu) who seems to have a clear idea of how it should
 work.

Well, you seem to be making references to ...SQL standard, postgres,
and I believe other systems  I was counting on you or someone else
to point us to existing references.  I'm fairly sure we can manage to
wade through it to walk a sane and fruitful path...it would just be a
less bumpier road if we all spoke the same OO'ish dialect and shared a
common knowledge base that we can all agree on for starters.  So, you
got anything to share here???  ;)


Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 18:41, Martijn van Oosterhout wrote:
 On Mon, Aug 12, 2002 at 11:30:36AM -0400, Andrew Sullivan wrote:
  The problem is not just a system-level one, but a filesystem-level
  one.  Enabling 64 bits by default might be dangerous, because a DBA
  might think oh, it supports largefiles by default and therefore not
  notice that the filesystem itself is not mounted with largefile
  support.  But I suspect that the developers would welcome autoconfig
  patches if someone offered them.
 
 Are there any filesystems in common use (not including windows ones) that
 don't support 32-bit filesizes?
 
 Linux (ext2) I know supports by default at least to 2TB (2^32 x 512bytes),
 probably much more. What about the BSDs? XFS? etc
 

Ext2  3 should be okay.  XFS (very sure) and JFS (reasonably sure)
should also be okay...IIRC.  NFS and SMB are probably problematic, but I
can't see anyone really wanting to do this.  Maybe some of the
clustering file systems (GFS, etc) might have problems???  I'm not sure
where reiserfs falls.  I *think* it's not a problem but something
tingles in the back of my brain that there may be problems lurking...

Just for the heck of it, I did some searching.  Found these for
starters:
http://www.suse.de/~aj/linux_lfs.html.
http://www.gelato.unsw.edu.au/~peterc/lfs.html

http://ftp.sas.com/standards/large.file/


So, in a nut shell, most modern (2.4.x+) x86 Linux systems should be
able to handle large files.

Enjoy,
Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [COMMITTERS] pgsql-server/src

2002-08-12 Thread Greg Copeland

On Mon, 2002-08-12 at 23:09, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  OK, seeing as no one voted, and only Tom and I objected originally, we
  will keep the code as Thomas has applied it, namely that PGXLOG/-X is
  recognized by initdb, postmaster, postgres, and pg_ctl.
 
 We will?  It looks to me like Thomas lost the vote 2-to-1.
 
 Unless there are more votes, I'm going to *insist* that this code be
 changed.  It's dangerous and offers no offsetting benefit.  XLOG
 location should be settable at initdb, noplace later.
 


I think Tom is on to something here.  I meant to ask but never got
around to it.  Why would anyone need to move the XLOG after you've
inited the db?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [COMMITTERS] pgsql-server/src

2002-08-12 Thread Greg Copeland

On Tue, 2002-08-13 at 00:16, Marc G. Fournier wrote:
 
 Myself, if I'm going to move something around, it will be after the server
 has been running for a while and I've added in more drive space in order
 to correct a problem i didn't anticipate (namely, disk I/O) ... at that
 point, I really don't wan tto have to dump/re-initdb/load just to move the
 xlog directory to that new drive ...
 

Okay, fair enough.  But do we really need to have environment variables
for this?  Sounds like we need a stand alone utility which does the
associated magic in the database which moves the xlog and associated
internal pointers.  Doing so would assuming that all bes for the
database have been shutdown or simply would not go into effect until
restarted.  Is this feasible?

For something that would seemingly be infrequently used, creating
environment variables would seemingly be rather error prone.

Requiring soft links also doesn't strike me as a good portable idea
either...not to mention I've been bitten by them before too.

Sign,
Greg Copeland




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] OOP real life example (was Re: Why is MySQL more

2002-08-12 Thread Greg Copeland

On Tue, 2002-08-13 at 00:33, Curt Sampson wrote:
 On Mon, 12 Aug 2002, Don Baccus wrote:
 
  Give it up.  You're acting like a turkey.  If you aren't, skin yourself
  a new non-turkey skin.
 
 Since he appears not to be able to avoid abusive ad hominem attacks,
 I'm now sending mail with [EMAIL PROTECTED] in the From: header
 to /dev/null. If there's a technical point in one of his messages that
 relates to the discussion that I need to answer, someone should please
 mention it on the list or forward it to me.
 

Curt, I think his reply stems from his frustration of chosen content in
many emails that originate from you.  We all pretty well understand
postgres has a broken feature.  We all understand you see zero value in
it.  We're all actively attempting to determine ways to make it less
broken, if not altogether better.  The fact that it's broken and you
hardly go an email reminding everyone of this fact is certainly not
making friends.  In fact, one should hardly be surprised you're not
seeing more retorts as such.

For the sake of the project, I'd hope you could give the broken topic
a rest, move on, and allow a little time for the list to settle again. 
If such abuse continues, then IMHO, it would make sense to /dev/null
him.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] OOP real life example (was Re: Why is MySQL more

2002-08-12 Thread Greg Copeland

On Tue, 2002-08-13 at 00:16, Curt Sampson wrote:
 I will revise my opinion the instant someone shows me something that I
 can't do relationally, or is easy to implement with inheritance, and
 difficult with relational methods. Now you know what you need to do, and
 if you have no example, we can drop the whole thing. But I am honestly
 interested to see just what it is that makes table inheritance so great.

I think here-in is the first problem.  You seem to insist that the world
can only allow for one or the other and that the two approaches are
mutually exclusive.  I tends to think that there is room for both.  One
would also seem to allow that they can actually be complimentary
(referring to Hannu's recent is-a  has-a inheritance comments).

Can we let go of x is better than y and just concentrate on how y can be
made better without regard for x?

After it's all said and done, who knows, everyone might agree that table
inheritance is just a plain, bad idea.

 
  I knew that there are *implementation* issues with postgres that causes
  problems with constraints, etc...I didn't realize that was the reason.
 
 Well, assuming we are mapping inheritance back into relational stuff
 behind the scenes (which it appears to me we are doing now), we can just
 map back to the relation method I demonstrated earlier of doing what
 someone wanted to do with table inheritance (child tables contain only
 foreign key and child-specific data; parent table contains primary key
 and all parent data) and that will fix the implementation problem.

This is what I imagined the preferred solution would be, however, I'm
also assuming it would be the more complex to implement *properly*.  

 
 Or people have proposed other things, such as cross-table constraints,
 to try to do this.

Ya, I was kicking this idea around in my head tonight.  Didn't get far
on it.  So I should look for postings in the archive about this specific
implementation?

 
  Well, you seem to be making references to ...SQL standard, postgres,
  and I believe other systems  I was counting on you or someone else
  to point us to existing references.
 
 Well, counting on me is not good, since the whole reason I started this
 was because I found the issue confusing in part due to the lack of any
 obvious standards here that I could find. :-) But here's what I do have:
 
 Date, Darwen, _Foundation for Future Database Systems, The
 Third Manefesto (Second Edition)_. Appendex E.

Is this a book or a paper.  I have a paper that I've been reading
(ack...very, very dry) by these guys of the same name.

 
 Silberschatz, Korth, Sudarshan, _Database Systems Concepts
 (Fourth Edition)_. I think it's around chapter 9. (My copy is
 at home right now.)
 
 SQL Standard. I don't have it handy. Anyone? Anyone? Bueller?

So the SQL standard does address table inheritance?  Not that this means
I feel that they've done the right thing...but what did the
specification have to say on the subject?  Any online references?

 
 Postgres. Known broken implementation, but we can at least poke
 stuff into it and see what it does.
 
 In addition, OO programming gets mentioned ocassionally. I don't
 think that table inheritance is anything related (and I've spent

Yes.  I think I'm starting to buy into that too, however, I'm not sure
that it has to mean that no value is within.  In other words, I'm still
on the fence on a) table inheritance really makes much OO sense and b)
even if it does or does not, is there value in any form of it's
implementation (whatever the end result looks like) .

 a lot of time in the last couple of years developing methods to
 make my OO programs and relational databases play nice with each
 other), but it might help to have some idea of what people to do
 connect the two, in case some people think that they are or should
 be connected. You can start by checking out this page for a few
 ways of creating objects from database information:
 
 http://www.martinfowler.com/isa/inheritanceMappers.html
 

Thanks.  Funny, I was reading that just the other day.  ;)

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [COMMITTERS] pgsql-server/src

2002-08-13 Thread Greg Copeland

On Tue, 2002-08-13 at 08:15, Tom Lane wrote:
 Marc G. Fournier [EMAIL PROTECTED] writes:
  I think Tom is on to something here.  I meant to ask but never got
  around to it.  Why would anyone need to move the XLOG after you've
  inited the db?
 
  I just determined that disk I/O is terrible, so want to move the XLOG over
  to a different file system that is currently totally idle ...
 
 Sure, needing to manually move the xlog directory is a plausible thing,
 but *you can already do it*.  The current procedure is
 
 1. shut down postmaster
 2. cp -p -r xlog directory to new location
 3. rm -rf old xlog directory
 4. ln -s new xlog directory to $PGDATA/xlog
 5. start postmaster
 
 With the patch it's almost the same, but you can instead of (4) substitute

Why not simply create a script which does this?  Creation of movexlog
or some such beast which anally checked everything it did.  As options,
you could simply pass it the src and dest and let it take care of the
rest.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-13 Thread Greg Copeland

On Tue, 2002-08-13 at 12:45, [EMAIL PROTECTED] wrote:
 On Tue, Aug 13, 2002 at 01:04:02PM -0400, Tom Lane wrote:
  On a system where building with large-file support is reasonably
  standard, I agree that PG should be built that way too.  Where it's
  not so standard, I agree with Andrew Sullivan's concerns ...
 
 What do you mean by standard? That only some filesystems are supported?
 In Linux the vfat filesystem doesn't support largefiles, so the behaviour
 is the same as if the application didn't specify O_LARGEFILE to open(2):
 As Helge Bahmann pointed out, kernel will refuse to write files larger than
 2GB. In current Linux, a signal (SIGXFSZ) is sent to the application
 that then dumps core.
 
 
 So, the use of O_LARGEFILE is nullified by the lack of support by the
 filesystem, but no problem is introduced by the application supporting
 largefiles, it already existed before.
 

Thank you.  That's a point that I previously pointed out...you just did
a much better job of it.  Specifically, want to stress that enabling
large file support is not dangerous.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] [HACKERS] Linux Largefile Support In Postgresql RPMS

2002-08-13 Thread Greg Copeland

On Tue, 2002-08-13 at 12:04, Tom Lane wrote:
 
 On a system where building with large-file support is reasonably
 standard, I agree that PG should be built that way too.  Where it's
 not so standard, I agree with Andrew Sullivan's concerns ...


Agreed.  This is what I originally asked for.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] OOP real life example (was Re: Why is MySQL more

2002-08-14 Thread Greg Copeland

On Tue, 2002-08-13 at 23:42, Bruce Momjian wrote:
 Curt Sampson wrote:
  On Tue, 13 Aug 2002, Bruce Momjian wrote:
  
   Yea, you have to question what value the discussion has, really.  We
   have users of inheritance that like it.  If we can get a TODO item out
   of the disucssion, great, but there doesn't seem to be any direction of
   where the discussion is heading.
  
  Summary:
  
  1. The current implementation is broken.
  
  2. We have no proper description of how a fixed implementation
  should work.
  
  3. It's hard to fix the current implementation without such a
  description.
  
  4. Thus, we are in other messages here trying to work out the
  model and come up with such a description.
  
  5. The people working this out at the moment appear to be me,
  Greg Copeland and Hannu Krosing.
 
 OK, great summary.  Isn't the bottom-line issue the limitation of not
 being able to create an index that spans tables?  Is there any way to
 implement that?  We have sequences that can span tables.  Can that help
 us?
 

Actually, I'm not sure that is the bottom line.  One of the reasons I
ask so many questions is because I'm trying to understand what the is
case is.  For me, that is important before I can understand, not only
what the to-be picture should be, but what needs to be done to get
there.

Because of that, I tend to agree with Curt.  We need to fill in 1, 2,
and 3.  As for item number 4, I was hoping that other references would
at least help us understand a defacto implementation.

Long story short, for me, it's easy to superficially agree that we need
indexes that span tables but I still have no idea if that really
constitutes the bottom-line.

Regards,
Greg Copeland





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-08-14 Thread Greg Copeland

On Wed, 2002-08-14 at 08:59, Tom Lane wrote:
 There are a veritable ton of other issues to be resolved --- like how do
 we (efficiently) find all the indexes relevant to a given child table
 --- but the physical storage doesn't seem too complicated.


Tom, seems we have yet another false start.  Thanks for offering your
comments on the topic at hand.  Since you seem to have a good grasp on
the the is case is, would you be willing to offer up some additional
details on what you feel the (veritable ton of) outstanding issues
are?

Seems everyone clearly wants a cure and is itching to get there, yet I
don't fully understand the disease.  I suspect that there are others in
the same boat.  I feel that this is important for us all of understand. 
I think we need to understand what our to-be picture is as well as
what points need to be addressed before we can say we've arrived.

Willing to help spell this out?

Regards,
Greg Copeland




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-08-14 Thread Greg Copeland

On Tue, 2002-08-13 at 23:43, Curt Sampson wrote:
 Just my opinion of course, but I think it would be best to have a
 detailed description of how everything in inheritance is supposed to
 work, write a set of tests from that, and then fix the implementation to
 conform to the tests.
 
 And I think a detailed description comes most easily when you have
 a logical model to work from.

I completely agree.  This is why I want/wanted to pursue the theory and
existing implementations angle.

Seems like everyone trying to jump on index spanning is premature.

Doesn't Oracle have table inheritance?  Hmmm...I might have to go do
some reading to find out one way or anther...  ;)

Sign,
Greg Copeland



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-08-14 Thread Greg Copeland

On Wed, 2002-08-14 at 10:17, Ross J. Reedstrom wrote:
 On Wed, Aug 14, 2002 at 09:39:06AM -0500, Greg Copeland wrote:
  I completely agree.  This is why I want/wanted to pursue the theory and
  existing implementations angle.
 
 In theory, it sounds like a good idea. In practice ... ;-)
 

LOL.  :)

  Seems like everyone trying to jump on index spanning is premature.
 
 Seems like some people haven't looked at the history of the OO
 implementation in PostgreSQL.

[waving hand...]

 
 Bruce has archived some of the emails - check your local pgsql source tree,
 under $PGSQLHOME/doc/TODO.detail/inheritance
 
 There was also some theoretical OO discussion, back when the change for
 default SELECT behavior on an inhertiance tree was made. (You used to
 have to say: SELECT foo from parent* to get foo from the parent and all
 children) Take a look at the archives and see if there's anything in that
 discussion that interests you: providing summary posts of old discussions
 is often a good way to restart and move an unresolved topic along.

Thanks!  I briefly read something about that in the archives.  Excellent
pointers.  I'll check that out.  If I have time, I'll try to summarize
and post.


Greg Copeland




signature.asc
Description: This is a digitally signed message part


[HACKERS] SF moving to DB2...

2002-08-14 Thread Greg Copeland

I know this is a off topic.  I found this in my mailbox not long ago.
I'm sharing because I thought it might be of some interest.  While it's
obviously a PR move by IBM, it certainly was nice to have something of
scale like SF to tout in Postgres' favor as a success story.


Here's a snippet from what I got:
Today we have announced that we are moving SourceForge.net to DB2,
a powerful relational database by IBM.  We are doing this because
the site continues to grow at a rapid rate, with 700 new users and
70 new projects a day, and we need a database that can handle this
growth.  We feel that DB2 can do this for us, and IBM is giving us
the resources to make this transition successful.  You can read the
press release here:

http://www.vasoftware.com/news/press.php/2002/1070.html


Sign,
Greg Copeland




signature.asc
Description: This is a digitally signed message part


[HACKERS] Standard replication interface?

2002-08-14 Thread Greg Copeland

Reading about the pgmonitor thread and mention of gborg made me wonder
about replication and ready ability to uniformly monitor it.  Just as
pg_stat* tables exist to allow for statistic gathering and monitoring in
a uniform fashion, it occurred to me that a predefined set of views
and/or tables for all replication implementations may be worthwhile. 
That way, no matter what replication method/tool is being used, as long
as it conforms to the defined replication interfaces, generic monitoring
tools can be used to keep an eye on things.

Think this has any merit?

Greg Copeland








signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] python patch

2002-08-14 Thread Greg Copeland

Well, I tend to agree with that.  Overall, I can't say that I see bad
things coming out of accepting the patch as is.  It's not exactly
causing an extra join or other wise a significant waste of resources. 
At worst, it appears to be ambiguous.  Since Christopher has not offered
any additional follow up, can we assume that he agrees?  In not, please
let me know and I'll resubmit patch #2.

In the mean time, patches #1 and #3 should be good to go.  Bruce, feel
free to apply those whenever time allows.

Thanks,
Greg Copeland


On Mon, 2002-08-12 at 18:33, Rod Taylor wrote:
 All of that said, the cost of the check is so small it may save someones
 ass some day when they have a corrupted catalog and the below
 assumptions are no longer true.
 
 On Mon, 2002-08-12 at 18:40, Greg Copeland wrote:
  On Sun, 2002-08-11 at 21:15, Christopher Kings-Lynne wrote:
Not a problem.  I would rather them be correct.
   
Worth noting that the first patch is what attempts to fix the long -
int overflow issue.  The second patch attempts to resolve attisdropped
column use issues with the python scripts.  The third patch addresses
issues generated by the implicate to explicate use of cascade.
   
I assume your reservations are only with the second patch and not the
first and third patches?
   
   Correct.  I'm pretty sure you don't need to exclude attisdropped from the
   primary key list because all it's doing is finding the column that a primary
   key is over and that should never be over a dropped column.  I can't
   remember what you said the second query did?
  
  
  Hmmm.  Sounds okay but I'm just not sure that holds true (as I
  previously stated, I'm ignorant on the topic).  Obviously I'll defer to
  you on this.
  
  Here's the queries and what they do:
  
  
  From pg.py:
  Used to locate primary keys -- or so the comment says.  It does create a
  dictionary of keys and attribute values for each returned row so I
  assume it really is attempting to do something of the like.
  
  SELECT pg_class.relname, pg_attribute.attname 
  FROM pg_class, pg_attribute, pg_index 
  WHERE pg_class.oid = pg_attribute.attrelid AND 
  pg_class.oid = pg_index.indrelid AND 
  pg_index.indkey[0] = pg_attribute.attnum AND 
  pg_index.indisprimary = 't' AND 
  pg_attribute.attisdropped = 'f' ;
  
  So, everyone is in agreement that any attribute which is indexed as a
  primary key will never be able to have attisdtopped = 't'?
  
  According to the code:
  SELECT pg_attribute.attname, pg_type.typname
  FROM pg_class, pg_attribute, pg_type
  WHERE pg_class.relname = '%s' AND
  pg_attribute.attnum  0 AND
  pg_attribute.attrelid = pg_class.oid AND
  pg_attribute.atttypid = pg_type.oid AND
  pg_attribute.attisdropped = 'f' ;
  
  is used to obtain all attributes (column names) and their types for a
  given table ('%s').  It then attempts to build a column/type cache.  I'm
  assuming that this really does need to be there.  Please correct
  accordingly.
  
  
  From syscat.py:
  SELECT bc.relname AS class_name,
  ic.relname AS index_name, a.attname
  FROM pg_class bc, pg_class ic, pg_index i, pg_attribute a
  WHERE i.indrelid = bc.oid AND i.indexrelid = bc.oid
  AND i.indkey[0] = a.attnum AND a.attrelid = bc.oid
  AND i.indproc = '0'::oid AND a.attisdropped = 'f'
  ORDER BY class_name, index_name, attname ;
  
  According to the nearby documentation, it's supposed to be fetching a
  list of all simple indicies.  If that's the case, is it safe to assume
  that any indexed column will never have attisdropped = 't'?  If so, we
  can remove that check from the file as well.  Worth pointing out, this
  is from syscat.py, which is sample source and not used as actual
  interface.  So, worse case, it would appear to be redundant in nature
  with no harm done.
  
  This should conclude the patched items offered in the second patch.
  
  What ya think?
  
  Thanks,
  Greg
  
  
 
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] python patch

2002-08-14 Thread Greg Copeland

Thanks.

-Greg


On Wed, 2002-08-14 at 22:34, Bruce Momjian wrote:
 
 OK, I have applied all three of Greg's python patches.
 
 ---
 
 Christopher Kings-Lynne wrote:
  Yep - alright, just commit it I guess.
  
  Chris
  
   -Original Message-
   From: Greg Copeland [mailto:[EMAIL PROTECTED]]
   Sent: Thursday, 15 August 2002 11:09 AM
   To: Rod Taylor
   Cc: Christopher Kings-Lynne; Bruce Momjian; PostgresSQL Hackers Mailing
   List
   Subject: Re: [HACKERS] python patch
  
  
   Well, I tend to agree with that.  Overall, I can't say that I see bad
   things coming out of accepting the patch as is.  It's not exactly
   causing an extra join or other wise a significant waste of resources.
   At worst, it appears to be ambiguous.  Since Christopher has not offered
   any additional follow up, can we assume that he agrees?  In not, please
   let me know and I'll resubmit patch #2.
  
   In the mean time, patches #1 and #3 should be good to go.  Bruce, feel
   free to apply those whenever time allows.
  
   Thanks,
 Greg Copeland
  
  
   On Mon, 2002-08-12 at 18:33, Rod Taylor wrote:
All of that said, the cost of the check is so small it may save someones
ass some day when they have a corrupted catalog and the below
assumptions are no longer true.
   
On Mon, 2002-08-12 at 18:40, Greg Copeland wrote:
 On Sun, 2002-08-11 at 21:15, Christopher Kings-Lynne wrote:
   Not a problem.  I would rather them be correct.
  
   Worth noting that the first patch is what attempts to fix
   the long -
   int overflow issue.  The second patch attempts to resolve
   attisdropped
   column use issues with the python scripts.  The third
   patch addresses
   issues generated by the implicate to explicate use of cascade.
  
   I assume your reservations are only with the second patch
   and not the
   first and third patches?
 
  Correct.  I'm pretty sure you don't need to exclude
   attisdropped from the
  primary key list because all it's doing is finding the
   column that a primary
  key is over and that should never be over a dropped column.  I can't
  remember what you said the second query did?


 Hmmm.  Sounds okay but I'm just not sure that holds true (as I
 previously stated, I'm ignorant on the topic).  Obviously
   I'll defer to
 you on this.

 Here's the queries and what they do:


 From pg.py:
 Used to locate primary keys -- or so the comment says.  It
   does create a
 dictionary of keys and attribute values for each returned row so I
 assume it really is attempting to do something of the like.

 SELECT pg_class.relname, pg_attribute.attname
 FROM pg_class, pg_attribute, pg_index
 WHERE pg_class.oid = pg_attribute.attrelid AND
   pg_class.oid = pg_index.indrelid AND
   pg_index.indkey[0] = pg_attribute.attnum AND
   pg_index.indisprimary = 't' AND
   pg_attribute.attisdropped = 'f' ;

 So, everyone is in agreement that any attribute which is indexed as a
 primary key will never be able to have attisdtopped = 't'?

 According to the code:
 SELECT pg_attribute.attname, pg_type.typname
 FROM pg_class, pg_attribute, pg_type
 WHERE pg_class.relname = '%s' AND
   pg_attribute.attnum  0 AND
   pg_attribute.attrelid = pg_class.oid AND
   pg_attribute.atttypid = pg_type.oid AND
   pg_attribute.attisdropped = 'f' ;

 is used to obtain all attributes (column names) and their types for a
 given table ('%s').  It then attempts to build a column/type
   cache.  I'm
 assuming that this really does need to be there.  Please correct
 accordingly.


 From syscat.py:
 SELECT bc.relname AS class_name,
   ic.relname AS index_name, a.attname
 FROM pg_class bc, pg_class ic, pg_index i, pg_attribute a
 WHERE i.indrelid = bc.oid AND i.indexrelid = bc.oid
   AND i.indkey[0] = a.attnum AND a.attrelid = bc.oid
   AND i.indproc = '0'::oid AND a.attisdropped = 'f'
   ORDER BY class_name, index_name, attname ;

 According to the nearby documentation, it's supposed to be fetching a
 list of all simple indicies.  If that's the case, is it
   safe to assume
 that any indexed column will never have attisdropped = 't'?  If so, we
 can remove that check from the file as well.  Worth pointing out, this
 is from syscat.py, which is sample source and not used as actual
 interface.  So, worse case, it would appear to be redundant in nature
 with no harm done.

 This should conclude the patched items offered in the second patch.

 What ya think?

 Thanks,
   Greg


   
   
   
---(end of broadcast)---
TIP 1: subscribe and unsubscribe

Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

Well, that's a different issue.  ;)

I initially wanted to get feedback to see if anyone else thought the
concept might hold some merit.

I take it from your answer you think it might...but are scratching your
head wondering exactly what it entails...

Greg


On Wed, 2002-08-14 at 22:47, Tom Lane wrote:
 Greg Copeland [EMAIL PROTECTED] writes:
  ... it occurred to me that a predefined set of views
  and/or tables for all replication implementations may be worthwhile.
 
 Do we understand replication well enough to define such a set of views?
 I sure don't ...
 
   regards, tom lane




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

On Thu, 2002-08-15 at 09:47, Andrew Sullivan wrote:
 On Wed, Aug 14, 2002 at 10:15:32PM -0500, Greg Copeland wrote:
  That way, no matter what replication method/tool is being used, as long
  as it conforms to the defined replication interfaces, generic monitoring
  tools can be used to keep an eye on things.
 
 That sounds like the cart is before the horse.  You need to know what
 sort of replication scheme you might ever have before you could
 know the statistics that you might want to know.

Hmmm.  Never heard of an inquiry for interest in a concept as putting
the cart before the horse.  Considering this is pretty much how things
get developed in the real world, I'm not sure what you feel is so
special about replication.

First step is always identify the need.  I'm attempting to do so.  Not
sure what you'd consider the first step to be but I can assure you,
regardless of this concept seeing the light of day, it is the first
step.  The horse is correctly positioned in front of the cart.

I also stress that I'm talking about a statistical replication
interface.  It occurred to me that you might of been confused on this
matter.  That is, a set of tables and views will allow for the
replication process to be uniformly *monitored*.  I am not talking about
a set of interfaces which all manner of replication much perform its job
through (interface with databases for replication).

 
 There are different sorts of replication schemes under consideration. 

Yep.  Thus it would seemingly be ideal to have a specification which
different implementations would seek to implement.  Off of the top of my
head and for starters, a table and/or view which could can queried that
returns the tables that are being replicated sounds good to me.  Same
thing for the list of databases, the servers involved and their
associated role (master, slave, peer).

Without such a concept, there will be no standardized way to monitor
your replication.  As such, chances are one of two things will happen. 
One, a single replication method will be championed and fair tools will
develop to support where all others are bastards.  Two, quality tools to
monitor replication will never materialize because each method for
monitoring is specific to the different types of implementations. 
Resources will constantly be spread amongst a variety of well meaning
projects.


--Greg





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

On Thu, 2002-08-15 at 09:53, Neil Conway wrote:
 That's exactly what I was going to say -- I'd prefer that any
 interested parties concentrate on producing a *really good*
 replication implementation, which might eventually be integrated into
 PostgreSQL itself.
 
 Producing a generic API for something that really doesn't need
 genericity sounds like a waste of time, IMHO.
 
 Cheers,
 
 Neil


Some how I get the impression that I've been completely misunderstood. 
Somehow, people seem to of only read the subject and skipped the body
explaining the concept.

In what way would providing a generic interface to *monitor* be a waste
of time?  In what way would that prevent someone from producing a
*readlly good* replication implementation?  I utterly fail to see the
connection.

Regards,
Greg Copeland




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

 As I said -- I don't really see the need for a bunch of replication
 implementations, and therefore I don't see the need for a generic API
 to make the whole mess (slightly) more manageable.

I see.  So the intension of the core developers is to have one and only
one replication solution?

Greg






signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-15 Thread Greg Copeland

On Thu, 2002-08-15 at 13:18, Neil Conway wrote:
 That said, I _personally_ don't see the need for more than one or two
 replication implementations. You might need more than one if you
 wanted to do both lazy and eager replication, for example. But you
 certainly don't need 5 or 6 or however many implementations exist at
 the moment.

Fair enough.  Thank you for offering a complete explanation.

You're argument certainly made sense.  I wasn't aware of any single
serious effort underway which sought to finally put replication to bed,
let alone integrated into the core code base.

Sign,

Greg Copeland



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Bug/Change in behavior for 7.3 vs 7.2.1

2002-08-15 Thread Greg Copeland

Dang it...meant to mention that the other day when I was working on
those python patches.  I had to place tick marks (single quote) around
the number and it was converted correctly.

gcope=# insert into a values ( 99 ) ;
ERROR:  column a is of type 'bigint' but expression is of type 'double
precision'
You will need to rewrite or cast the expression
gcope=# insert into a values ( '99' ) ;
INSERT 25115 1

Greg


On Thu, 2002-08-15 at 20:03, Barry Lind wrote:
 I was just testing my product running on a 7.3 snapshot from a few days 
 ago.  And I ran into the following change in behavior that I consider a 
 bug.  You can no long insert large values into a bigint column without a 
 cast.  Small values (in the int range work fine though).
 
 On 7.3 I get:
 
 files=# create table test (cola bigint);
 CREATE
 files=# insert into test values (99);
 ERROR:  column cola is of type 'bigint' but expression is of type 
 'double precision'
 You will need to rewrite or cast the expression
 
 On 7.2.1 this works correctly:
 
 files=# create table test (cola bigint);
 CREATE
 files=# insert into test values (99);
 INSERT 108683 1
 
 thanks,
 --Barry
 
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Standard replication interface?

2002-08-16 Thread Greg Copeland

On Thu, 2002-08-15 at 15:36, Tom Lane wrote:
 Well, I am, but I'm only speaking for myself here:
 

Fair enough.

 I think there is room for several replication solutions for Postgres
 (three or four, maybe).

If the ideal solution count is merely one with a maybe on two then I
tend to concur that any specification along these lines would *mostly*
be a waste.  On the other hand, if we can count three or more possible
replication solutions, IMHO, there seemingly would be merit is providing
some sort of defacto monitoring interface.

Seems the current difficulty is forecasting the future in this regard. 
Perhaps other core developers would care to chime in and share their
vision?

 CVS tree.  So assuming that the Postgres-R project gets to the point
 of usefulness, I'd vote in favor of integrating it.  On the other hand,

I guess I should ask.  Do the developers foresee immediate usability
from this project or are we looking at something that's a year+ away?  I
don't think I have a problem helping guide what could be an interim
solution if the interim window were large enough.  In theory, monitoring
tools developed between now and the closing of the window could largely
continue to function without change.  That, of course, assumes that even
the end-run solutions would implement the interface as well.

The return on such a concept is that it allows generic monitoring tools
to mature while providing value now and in the future.  The end result
should be a stronger, more powerful tool base which matures while other
technologies are still being developed.

Another question along this line is, once something rolls into a core
position, does that obsolete all other existing implementations or
merely become the defacto in a bag of solutions?  Tom seems to hint at
the later.  If the answer is the former then that seemingly argues not
to worry about this...unless the window for usefulness and/or inclusion
is rather large.

 As for the point at hand: I'm fairly dubious that a common monitoring
 API will be very useful, considering how different the possible

Well, all replication scenarios have a lot in common.  They should, 
after all, they are all doing the same thing.  Since the different
strategies for accomplishing replication are well understood, it seems
well within reason to assume that someone can put their brain around
this.

I can also imagine that the specification includes requirements as well
as optional facilities.  Certainly capability queries would further iron
out any gaps between differing solutions/implementations.

 replication approaches are.  If Greg can prove me wrong, fine.  But
 I don't want to see us artificially constraining replication solutions
 by insisting that they meet some prespecified API.

Hmmm.  I'm not sure how it would act as a constraining force.  To me,
this implies that any such specification would fail to evolve and could
not be revised based on feedback.  IMO, most specifications are regarded
as living documents.  While I can see that some specifications are set
in stone, I certainly am not so bold as to assert my crystal ball even
came with batteries.  ;)  That is, I assume some level of revision to an
initial specification would be required following real-world use.


Regards,

Greg Copeland 





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-08-19 Thread Greg Copeland

On Mon, 2002-08-19 at 09:42, Curt Sampson wrote:
 On Mon, 19 Aug 2002, Zeugswetter Andreas SB SD wrote:
 
   So what you're saying is that constraints shouldn't be inherited?
 
  No. I even said that inheriting should be the default.
 
 Ah. So you think it should be possible not to inherit constraints.

I've been silent for a bit because I wanted to kick the concept around
in my head.  After some thought, I say that I support children
inheriting constraints.  In a more abstract sense, we are really setting
conditions for all entities of a given type (class) which must be met to
classify as a defined type.  Again, in an abstract sense, if I say all
candies (type/class, candy) must have sugar (constraint), and I go on
to create a subclass of candy which I desire not to have sugar, then
there is a fundamental problem.  Either I incorrectly identified my
problem domain and didn't properly create my entities which address my
domain needs or what I'm trying to express really isn't a candy at all. 
In other words, it sounds like candy should of been a subclass of a more
abstract base entity.  Likewise, the newly desired class which doesn't
have sugar should also inherit from the newly created base class and not
be derived from candy at all.


 
  A local constraint should be made obvious from looking at the schema,
 
 Ok, this now I could live with. Though I'm not sure that its
 theoretically very defensible, or worth the effort. Other languages
 that offer constraints, such as Eiffel (and soon Java), do not allow
 constraints that are not inherited, as far as I know. Do you have some
 counterexamples.

I tend to agree.  Constraints should be inherited.  See above.

 
  Well, that is where I do not think this is flexible enough, and keep in mind
  that all triggers and rules would then also need such restrictions.
 
 Yes, all triggers, rules, and everything else would have to be inherited.

Agreed.

 
  Regardless whether your objection is *strong* or not :-)
  If you don't like the feature (to add a local constraint), don't use it.
  (Remember you are talking about removing an implemented feature)
 
 1. It's not exactly an implemented feature, it's an accident of an
 incomplete implementation of inheritance done in a certain way.
 
 2. Should we change the way we decide to implement inheritance,
 perhaps to make fixing the current problems much easier, it might
 be a lot of work to add this.
 

I'm still trying to figure out if subclasses should be allowed to have
localized constraints.  I tend to think yes even though it's certainly
possible to create seemingly illogical/incompatible/conflicting
constraints with parent classes.  Then again, my gut feeling is, that's
more and an architectural/design issue rather than a fundamental issue
with the concept.


--Greg Copeland





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] @(#) Mordred Labs advisory 0x0001: Buffer overflow in

2002-08-20 Thread Greg Copeland

On Tue, 2002-08-20 at 00:35, Dann Corbit wrote:
 
 Most computer virus problems are caused by buffer overrun.  Someone
 decided it wasn't very important.

This is true.  IMO, it is extremely arrogant to ignore a buffer overrun
and announce it can't be exploited.  There is many cases where this
assertion has been proved to be false.  The real risk of overrun is not
what you think can be done with it (or lack there of), rather, it's how
someone else might decide to use it in some obscure manner which you can
not currently fathom or foresee.

 Will you trust your multi-million dollar database to someone who says
 the above?  I think the priorities are upside down.  Any *known*
 buffer-overrun _must_ be repaired, and as quickly as possible.  And

I agree with that too.  When setting priorities, using a scale of 1-10,
10 being highest priority, anything that effects stability or
reliability should always be a 10.  As such, they should always be
repaired even above new wiz-bang features.

IMO, if you don't embrace that rule of thumb, a developer shouldn't be
working on a project where stability and reliability are key components
of the end product.

 potential overruns should be identified.  A grep for memcpy, strcpy,
 gets, etc. should hunt down most of them.  A known buffer overrun should
 fill the designer of a product with abject terror.  And I really mean

Agreed.  It is horribly irresponsible to thumb a nose at such things in
this day and age.

 that, literally.  If you *know* of a buffer overrun, and simply decide
 not to fix it, that sounds very negligent to me.  For a public project
 like PostgreSQL, there is probably very little liability for the
 programmers, but I am thinking of the damage that can be inflicted upon
 potential clients using the database.
 

Not a question of it sounding negligent.  It is negligent.

If quality and stability is not the core developers #1 goal then
expecting people to trust the resulting product is laughable.  Please
tell me why anyone should use a database to maintain important data when
quality and stability is not important to the developers of such a
product.  It's an oxymoron.

Time and time again I've read what basically amounts to, ...if someone
can crash it it's because someone is stupid enough to allow someone to
be able to do it in the first place...  Maybe you're right.  After all,
if core developers continue to turn a blind eye to such issues, they are
in fact, the facilitators of allowing people to do it to begin with. 
That is, they are the ones that allowing people to do it in the first
place.  In short, every time I see that response, I immediately think,
...now that's the pot calling the kettle black.

At some point in time, you have to stand and say, the buck stops here.

Now then, after that long rant and rave, since these are known issues, I
don't have a problem with the next release going out as planned.  Once
it does go out, I would certainly hope that the developers would
readjust their priorities and target a bug fix release to immediately
follow.

You know, I've seen many people trash Oracle's unbreakable mantra. 
I'm sure no one is confused at the fact that it is nothing but a
marketing ploy, however, perhaps there is a culture behind it.  Perhaps
this is their way of saying stability and reliability is very important
to them.  Perhaps their mantra is the rule of thumb outlined above.

Sign,

Greg Copeland






signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] @(#) Mordred Labs advisory 0x0001: Buffer overflow in

2002-08-20 Thread Greg Copeland

On Tue, 2002-08-20 at 08:05, Jan Wieck wrote:
 If you say your users need direct DB access on the SQL interface level,
 I say trash your application because the data it produces isn't worth
 the magnetism on the media. It's not that we ugently need to fix such a
 bug that can only cause a DOS in case someone finds a way to hack
 through the application into the SQL interface. It's that the
 application has to be fixed not to allow that, because even if the
 server wouldn't be crashable, such a hack would end in a disaster. 
 

The problem is, the vast majority of these issues are actually caused by
internal resources (people) rather than external attacks.

The real fear is internal resources DoSing internal resources.  The
reaction on the list is usually to look outward for sources of possible
problems.  That in it self is a problem.  It's well documented the vast
majority (what, 70+%) of these issues actually originate internally.

It is because of these issues that it is often, no longer an issue of
application this or that, as an application may of been completely
bypassed.

And yes, you can argue all day long that if this happens, you should be
fearing destruction of your data.  While that's true, data can be
restored.  It also requires a different personality type.  Many people
would be willing to DoS a system, however, that doesn't have to mean
they are willing to destroy data.


Regards,

Greg Copeland





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] @(#) Mordred Labs advisory 0x0001: Buffer overflow in

2002-08-21 Thread Greg Copeland

On Tue, 2002-08-20 at 18:40, Mark Pritchard wrote:
 On Tue, 20 Aug 2002 23:48, Greg Copeland wrote:
  On Tue, 2002-08-20 at 00:35, Dann Corbit wrote:
   Most computer virus problems are caused by buffer overrun.  Someone
   decided it wasn't very important.
 
  This is true.  IMO, it is extremely arrogant to ignore a buffer overrun
  and announce it can't be exploited.  There is many cases where this
  assertion has been proved to be false.  The real risk of overrun is not
 
 I would certainly hope you are not misquoting me here. I have never stated we 
 should ignore the bug. I suggested that with proper network level protection, 
 the bug should not be exploitable.

Don't think I quoted you at all.  I don't see quotation marks and am
including Dan's remark.  Perhaps everyone isn't talking about you after
all.  Continue with the green pills.  ;)

 do disagree with is the panic mentality that seems so evident in this type 
 of post.

Hmmm.  Interesting.  I didn't see panic or anything alarmist in my post.

 
 
  IMO, if you don't embrace that rule of thumb, a developer shouldn't be
  working on a project where stability and reliability are key components
  of the end product.
 
 I wasn't aware that PostgreSQL as an open source collaborative project had any 
 specific requirements of upon it at all. While stability and reliability are 
 obvious goals, if you feel the project does not meet your needs you are more 
 than welcome to try one of the alternatives. MySQL for example :)

In other words, if someone if verbal about development goals or
objectives they should go elsewhere.

As for specific requirements, I've said nothing which are not inherently
obvious.  Without a stable and reliable product, no one is going to use
it for serious work.  As such, it's certainly not a leap of faith to
assume that developers without such ideals in mind should be shunned or
properly guided.  Why?  Obviously (amazed I have to spell this out),
said developer efforts would be counter to the project's objectives
(stated or otherwise implicitly obvious).

You also seem to imply that open source collaborative project[s]
should not aim for high goals.  I think it's safe to say, we disagree.

 I'm not going to restate my previous response to this view, but I will ask, 
 who is affected by the horribly irresponsible approach? If it is you, then 
 why hasn't your due dilligence process audited the code? Perhaps you would 
 feel more comfortable with one of the closed source/closed development model 
 organisations? But what bugs lie within the depths of DBs such as SQL Server? 
 How will you audit those? Or will you just trust the sales guy?

Hmm.  Keep a blind and pretend it doesn't exist.  Not a good response. 
Since this thread came out of someone's dilligence (coders or
otherwise), I fail to see the significance of your comments.  After all,
mucho-peer review is supposed to be one of the perks of using OSS
software.  Considering I'm not asserted I'm performance a code audit,
this seems completely unrelated to the topic at hand.

 Where is an expectation at all? If you want to use PostgreSQL, then use it. If 
 it doesn't meet your needs, don't use it, or start fixing the issues 
 yourself. If you can't do it, buy Oracle, or DB2, or whatever else.

In other words, if someone if verbal about development goals they should
go elsewhere.  Furthermore, you seem to assert that only the core
developers should be using this database.  Wow, I'm amazed.  Just
because someone does or does not contributes code doesn't suddenly
invalidate observations.  That, of course, doesn't mean the observations
have to be valid.

Simply stated, if it meets my needs, shut-up and use it.  If it doesn't,
go elsewhere.  Bad attitude.

 I'm not sure how you make the jump from knowing that an issue exists and 
 allowing people to exploit it, to the inference that core developers are 
 turning a blind eye to it. Forgive me if I misquote you Tom, but I don't 
 think he has ever said we should not fix this bug, simply that the effort 
 is significant, and there are other factors to consider.

Because it was stated that these are known issues.  Because it was
stated these have been known issues for a very long time.  Because it
was stated that, more or less, no one is excited about doing the lots of
effort which is seemingly required to put some of these issues to bed.

The expression, turning a blind eye, means that something is being
ignored.  It means it won't be focused on for now.  That does not have
to mean forever.  The statement is valid and supported by pretty much
every posting on this list on the topic at hand.  I stand by my
statement.

Stop with the misquoting comments already.  The quotes I used were
properly attributed.   Believe it or not, everyone is not talking about
you or trying to place words in your mouth.  Believe it or not, the
quotes are correct.  I have no idea what you're talking about.

 Perhaps we need a pgsql-hackers

Re: [HACKERS] @(#) Mordred Labs advisory 0x0001: Buffer overflow in

2002-08-21 Thread Greg Copeland

On Tue, 2002-08-20 at 19:59, [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Greg Copeland) wrote
  At some point in time, you have to stand and say, the buck stops here.
  
 
 I agree here, but at the same time you cannot put 100% of the 
 responsibility on the developers.  If you are the dba/sysadmin/whatever/etc 
 then it is your responsibility.  It is up to you to know about potential 
 problems and have workarounds or whatever it is you need to do.  I think 
 that is one of the things that seperates a good admin from a not-so-
 good one.

I absolutely agree.  I was not trying to say that bugs don't happen and
that sometimes those bugs may cause crashes.  What I was trying to say
is what amounts to, dr, when I move my arm like this, it hurts, and
the response is, don't do that.  Humor aside, surely there has to be a
happy medium in between.  Perhaps, with a skewing toward fixing rather
than prescribing.  ;)

 
 Afterall, when your boss calls you into his office monday morning and asks 
 for a really good explanation for why the db was cracked, I dont think he 
 is going to accept the excuse this guy, you dont know him but his name is 

I understand and agree with ya.

 
 That being said, I do agree the developers should give things like this 
 more priority.  But, its open source...  so you either live with it or 
 write your own patch.
 

Well, the priority portion was what I was shooting for.  Perhaps it came
off being over zealous.  I'm not really sure.  I re-read it and I didn't
think so.  But, I'm not you and you're not me...so, it's hard to say how
exactly it was received.

As for the open source comment, that fine and all...but...there are
companies which are paying for postgres' development too.  Some of the
developers are being paid to do this.  The write your own patch has
much more meaning on simple projects.  For a project as complex as
postgres, simply asking for a patch is almost meaningless.  Along those
lines, I have been reading (code and the list) and learning for sometime
now, as time allows.  One day, I will contribute significant patches. 
However, until that day comes, I would hope that observational
commentary is simply not invalidated just because they're not one with
the code yet.


Regards,

Greg Copeland





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-09-05 Thread Greg Copeland

On Thu, 2002-09-05 at 08:15, Hannu Krosing wrote:
 On Thu, 2002-09-05 at 11:34, Curt Sampson wrote:
  On 5 Sep 2002, Hannu Krosing wrote:

  If that
  trigger is acting as an integrety constraint on the base table,
  you might destroy the table's integrity.
 
 What I try to say is that you should have the same freedom with triggers
 that you have with select/insert/update/delete - you must be able to
 choose if the trigger is on the parent table ONLY or on parent and all
 children. 

Sounds like a mechanism to make the distinction between virtual (child
can override parent) and non-virtual (child is constrained by the
parent) constraints are needed.

After all, there are two basic needs for constraints.  One is for
relational integrity and the other is business rule integrity.  That is,
one seeks to ensure that the database makes sense in respect to the data
model (a shoe is a product) while the other is to enforce business rules
(products are never free).  Seems like the DBA should be able to dictate
which domain his constraint falls into in some manner.

 
 And you should be able to override a trigger for child table even if it
 is defined on parent as applying to all children - I guess that
 overriding by trigger _name_ would be what most people expect.
 

That's the reason I used virtual and non-virtual above.  If we think
using C++ idioms, the child is stuck with it if it's deemed
non-virtual.  Generally speaking, if someone designed something with
that expectation in mind, there's probably a good reason for it.  In
this case, we could assume that such non-virtual constraints would be to
help ensure proper RI.  Something that otherwise, IMO, would be tossed
out with the bath water.

 What I'm after here is dynamic (and automatic) row level dispach of the
 right function based on row type - so that for rows in CITIZEN or
 CIVIL_SERVANT the function kills_not_others(CITIZEN) will be used but
 for rows in MILITAY the kills_not_others(MILITARY) is used.

I think we're touching on some form of RTTI information here.  That is,
triggers and even functions may need to be able to dynamically determine
the row type that is actively being worked on.

If we're on the same page, I think that seemingly makes a lot of sense.

What about the concept of columns being public or private?  That is,
certain columns may not be inherited by a child?  Any thought to such a
concept?  Perhaps different types of table inheritance can be considered
in our model...has-a, is-a, etc...


Regards,

Greg Copeland




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-09-06 Thread Greg Copeland

On Thu, 2002-09-05 at 15:51, Hannu Krosing wrote:
 On Fri, 2002-09-06 at 03:19, Greg Copeland wrote:
  
  What about the concept of columns being public or private?  That is,
  certain columns may not be inherited by a child?  Any thought to such a
  concept?  Perhaps different types of table inheritance can be considered
  in our model...has-a, is-a, etc...
 
 I can't fit this in my mental model of table inheritance for two reasons
 
 1) all parent table columns must be present in child

Okay, I must admit, I'm not really sure why.  If we look at it in a
physical versus logical manner, even if it's physically there, why must
it be logically exposed?  Can you help me understand why it would even
need to physically be there.  After all, if a child can't update it,
they don't need to see it.

 
 2) granting some right to parent should automatically allow selecting
 from children

Unless the parent deemed it inappropriate access (private)?

If a column were deemed private, that would have a couple of
stipulations on it.  That is, it would have to ensure that NOT NULL
where not one of the constraints, or, if it did, ensure that a default
value were also provided.

 
 both are required for select/insert/update/delete to work on table and
 its children (i.e. without ONLY)
 
 
 But maybe i just need to think more about it ;)
 

Well, I guess I'm lagging behind you on this manner.  Perhaps holding
my hand and explaining it a bit will allow you to work through it some
more and help bring me in line with what you're thinking.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-09-06 Thread Greg Copeland

On Fri, 2002-09-06 at 07:53, Hannu Krosing wrote:
 On Fri, 2002-09-06 at 09:53, Curt Sampson wrote:
  This looks like a classic case of incorrect modelling to me. Does the
  good itself change when it becomes a campaign_good? No. The price
  changes, but that's obviously not an integral part of the good itself.
 
 Perhaps we mean different things by good. I meant a GOOD to be a THING 
 bought with the purpose of reselling. Price (actually prices: 
 selling_price and buying_price) is what makes it a GOOD and thus it is
 an integral part of it.

No matter now you look at the example, invalidating it does not address
the issue raised as it still exists.  Either way, Hannu and I seem to
agree that some class of constraints need to be able to be overridden.

 In case of being able to override constraints for child tables it can
 also be a significant performance boost - if you have 10 000 000 goods
 in a table you don't want to change a constraint on GOODS to allow
 campaign goods to be sold cheaper than bought as it would have to check
 all goods for validity according to new constraint - putting the
 constraint on just CAMPAIGN_GOODS will enable the DB engine to check
 just tuples in CAMPAIGN_GOODS.

I had not considered this before.  Does that still hold true if we go
with a parent contains all columns implementation?  Of are you simply
saying that it doesn't matter as when the constraint were applied it
would only scan the rows the below to the child?  Perhaps this doesn't
matter for this portion of the conversation.  But hey, I was curious. 
:)

 
   SQL standard constraints should be non-overridable. I still think that
   Constraint triggers should be overridable/dynamic.
  
  I still don't like it. Eiffel had good reasons for making the
  constraints non-overridable. Other OO languages don't have constraints,
  or they would probably do the same.

Well Curt, as you outlined above (clipped out) about it being a
different world...I think also applies here.  IMO, we are treading
lightly on new and perhaps thin ground so we need to be careful that we
apply common parallels and idioms only we are certain that they need
apply.  What I'm trying to say is, just because it's not allowed in
Eiffel does have to mean the same applies here.

  
  That said, I could live with dynamic dispatch, if the default were
  to make it non-dynamic, and you had to add a special flag to make it
  dynamic. That way it would be obvious to the casual user or a DBA
  familiar with other databases but not postgres that something unusual is
  going on.
 
 That seems about the right compromise between constraining and developer
 freedom.
 

I agree.  That does appear to be pointing us in a conservatively sane
and safe direction.


Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-09-06 Thread Greg Copeland

On Fri, 2002-09-06 at 08:57, [EMAIL PROTECTED] wrote:
  On Fri, 2002-09-06 at 07:37, Curt Sampson wrote:
   On 5 Sep 2002, Hannu Krosing wrote:
  
  To elaborate on Gregs example if you have table GOODS and under it a
  table CAMPAIGN_GOODS then you may place a general overridable constraint
  valid_prices on GOODS which checks that you dont sell cheaper than you
  bought, but you still want sell CAMPAIGN_GOODS under aquiring price, so
  you override the constraint for CAMPAIGN_GOODS.
 
 What that tells me is that the constraint, valid_prices, shouldn't have been 
 on GOODS in the first place.  If it is not a legitimate constraint for the 
 children, then it is not a legitimate constraint for the parent.
 

I don't agree with you on that point.  This concept is common to many
OO-implementations.  Unless you can come up with a powerful argument as
to why our to-be picture should never do this, I'm less than
convinced.

 In human inheritance, if you marry someone with funny coloured skin, you 
 don't get to choose that your children won't have funny coloured skin.  
 That's a pretty forcible constraint.  :-).
 

Fine, but that only works for YOUR specific example.  In that example,
the color constraint should be non-virtual, meaning, the child should
not be able to change it.  On the other hand, if I replace human with
metal product, hopefully I won't be stuck with gun metal gray for
every derived product.  Hopefully, somewhere along the lines, I'll be
able to override the parent's color constraint.

  Or maybe it is better to just make the check function should be
  dynamically dispatched, so the constraint will always hold, it just can
  mean different things for different types.
 
 Or maybe if someone is doing an Object Oriented design, and making extensive 
 use of inheritance, they'll need to apply constraints in a manner that allow 
 them to be properly inherited.

The problem with that assumption is that there is normally nothing wrong
with having seemingly mutually exclusive sets of *business rules* for a
parent and child.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Inheritance

2002-09-06 Thread Greg Copeland

On Fri, 2002-09-06 at 11:05, [EMAIL PROTECTED] wrote:
 Oops! [EMAIL PROTECTED] (Greg Copeland) was seen spray-painting on a wall:
  That's a pretty forcible constraint.  :-).
 =20
 
 Is there something broken with your mailer?  It's reformatting quotes
 rather horribly...

Hmm...not that I know off.  Never had complaints before anyways.  Looks
like an issue with MIME contents...perhaps your mailer doesn't properly
parse some MIME and/or mine is hosing it some how.  Not really sure.

 Reality, with the GOODS/CAMPAIGN_GOODS example, is that GOODS isn't
 the appropriate parent class for CAMPAIGN_GOODS.  Both should be
 inheriting the common characteristics from some common ancestor.  If
 that is done, then there's nothing to override.
 

You can complain about and redefine the model to suit your needs all day
long and get no where.  It doesn't change the need for it.  Fact is, it
would be nice to allow.  Fact is, OO-implementations tend to allow
this.  I'm quite happy to let you go to every OO computer language camp
and inform them that they've done it all wrong.  ;)

Citing that a specific example is all wrong hardly invalidates the
concept.  Since we are pretty much at the conceptual stage, I welcome a
conceptual argument on why this is bad and should never be done. 
Please, be high level and generic.  After all, I too can give you a
hundred specific reasons why a cat is not dog (i.e. bad model)...but it
does nothing to facilitate the topic at hand.

  The problem with that assumption is that there is normally nothing
  wrong with having seemingly mutually exclusive sets of *business
  rules* for a parent and child.
 
 If the rules are totally different, it begs the question of why they
 _should_ be considered to be related in a parent/child relationship.

Because this is how the real world works.  Often there are exceptions to
the rules.  When these rules differ, I've not seen a valid high level
conceptual reason that should prevent it.

Example:

animal
quadruped (has 4-trunk limbs)
dog
injuredDog (has 0 or more trunk limbs)

Hopefully we can agree that a dog is still a dog even if it only has
three legs?  Hopefully you'll realize this was given to illustrate an
example and to prove a point.  Sometimes a model needs to allow for
exceptions to the rule.  You can argue that a three-legged dog is no
longer a quadruped but I prefer to believe that it is a quadruped which
just happens to be an exception to the rule.

 
 It may well be that they _aren't_ related as parent/child.  They may
 merely be cousins, sharing some common ancestors.

Yes, it's true.  Sometimes the wrong model is applied but that hardly
invalidates the concept or alleviates the need.

Regards,

Greg Copeland





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Schemas not available for pl/pgsql %TYPE....

2002-09-12 Thread Greg Copeland

Does anyone know if such effort is also required to pl/python to become
schema aware?

Regards,

Greg Copeland


On Wed, 2002-09-11 at 19:24, Bruce Momjian wrote:
 
 Patch applied.  Thanks.
 
 ---
 
 
 Joe Conway wrote:
  Tom Lane wrote:
   Sean Chittenden [EMAIL PROTECTED] writes:
   
  ::sigh:: Is it me or does it look like all
  of pl/pgsql is schema un-aware (ie, all of the declarations).  -sc
   
   
   Yeah.  The group of routines parse_word, parse_dblword, etc that are
   called by the lexer certainly all need work.  There are some
   definitional issues to think about, too --- plpgsql presently relies on
   the number of names to give it some idea of what to look for, and those
   rules are probably all toast now.  Please come up with a sketch of what
   you think the behavior should be before you start hacking code.
  
  Attached is a diff -c format proposal to fix this. I've also attached a short 
  test script. Seems to work OK and passes all regression tests.
  
  Here's a breakdown of how I understand plpgsql's Special word rules -- I 
  think it illustrates the behavior reasonably well. New functions added by this 
  patch are plpgsql_parse_tripwordtype and plpgsql_parse_dblwordrowtype:
  
  
  Identifiers (represents)parsing function
  
  identifierplpgsql_parse_word
   tg_argv
   T_LABEL (label)
   T_VARIABLE  (variable)
   T_RECORD(record)
   T_ROW   (row)
  
  identifier.identifier  plpgsql_parse_dblword
   T_LABEL
   T_VARIABLE  (label.variable)
   T_RECORD(label.record)
   T_ROW   (label.row)
   T_RECORD
   T_VARIABLE  (record.variable)
   T_ROW
   T_VARIABLE  (row.variable)
  
  identifier.identifier.identifier  plpgsql_parse_tripword
   T_LABEL
   T_RECORD
   T_VARIABLE  (label.record.variable)
   T_ROW
   T_VARIABLE  (label.row.variable)
  
  identifier%TYPE   plpgsql_parse_wordtype
   T_VARIABLE
   T_DTYPE (variable%TYPE)
   T_DTYPE (typname%TYPE)
  
  identifier.identifier%TYPE plpgsql_parse_dblwordtype
   T_LABEL
   T_VARIABLE
   T_DTYPE (label.variable%TYPE)
   T_DTYPE (relname.attname%TYPE)
  
  new
  identifier.identifier.identifier%TYPE plpgsql_parse_tripwordtype
   T_DTYPE (nspname.relname.attname%TYPE)
  
  identifier%ROWTYPE plpgsql_parse_wordrowtype
   T_DTYPE (relname%ROWTYPE)
  
  new
  identifier.identifier%ROWTYPE   plpgsql_parse_dblwordrowtype
   T_DTYPE (nspname.relname%ROWTYPE)
  
  
  Parameters - parallels the above
  
  $#plpgsql_parse_word
  $#.identifier  plpgsql_parse_dblword
  $#.identifier.identifier  plpgsql_parse_tripword
  $#%TYPE   plpgsql_parse_wordtype
  $#.identifier%TYPE plpgsql_parse_dblwordtype
  $#.identifier.identifier%TYPE plpgsql_parse_tripwordtype
  $#%ROWTYPE plpgsql_parse_wordrowtype
  $#.identifier%ROWTYPE   plpgsql_parse_dblwordrowtype
  
  Comments?
  
  Thanks,
  
  Joe
 
  Index: src/pl/plpgsql/src/pl_comp.c
  ===
  RCS file: /opt/src/cvs/pgsql-server/src/pl/plpgsql/src/pl_comp.c,v
  retrieving revision 1.51
  diff -c -r1.51 pl_comp.c
  *** src/pl/plpgsql/src/pl_comp.c4 Sep 2002 20:31:47 -   1.51
  --- src/pl/plpgsql/src/pl_comp.c9 Sep 2002 04:22:24 -
  ***
  *** 1092,1097 
  --- 1092,1217

Re: [GENERAL] [HACKERS] PGXLOG variable worthwhile?

2002-09-19 Thread Greg Copeland

I think Marc made a pretty good case about the use of command line
arguments but I think I have to vote with Tom.  Many of the command line
arguments you seem to be using do sorta make sense to have for easy
reference or to help validate your runtime environment for each
instance.  The other side of that is, I completely agree with Tom in the
it's a very dangerous option.  It would be begging for people to shoot
themselves with it.  Besides, just as you can easily parse the command
line, you can also parse the config file to out that information.  Plus,
it really should be a very seldom used option.  When it is used, it's
doubtful that you'll need the same level of dynamic control that you get
by using command line options.

As a rule of thumb, if an option is rarely used or is very dangerous if
improperly used, I do think it should be in a configuration file to
discourage adhoc use.

Let's face it, specify XLOG location is hardly something people need to
be doing on the fly.

My vote is config file it and no command line option!

Greg


On Wed, 2002-09-18 at 23:50, Bruce Momjian wrote:
 Marc G. Fournier wrote:
  On Wed, 18 Sep 2002, Bruce Momjian wrote:
  
   Yea, but you aren't going to be needing to know the xlog directory that
   way, will you?
  
  Why not?  Who are you to tell me how my scripts work, or how they get
  their information?  I have a script that runs to tell me how much disk
  space each instance is using up, that parses the ps output for the -D
  argument ... having -X there would allow me to parse for that as well and,
  if it was in the ps output, add that appropriately into the calculations
  ...
  
  My point is, the functionality is there, and should be documented properly
  ... encourage ppl to use the GUC setting in postmaster.conf, but just
  because you can't grasp that some of us *like* to use command line args,
  don't remove such functionality ...
 
 You ask for a vote and see if you can get votes to add -X.  We had that
 vote once already.  We do make decisions on what people should use.  If
 not, we would be as hard to manage as Oracle.
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Having no luck with getting pgbench to run multiple

2002-09-20 Thread Greg Copeland

Well, you'll probably want to pass in a valid timeval structure if you
don't want it to block.

Basically, that snippet tells select on the list of sockets, looking for
sockets that have data to be read while waiting forever.  That means it
will block until something appears on one of the sockets your
monitoring.


Greg


On Thu, 2002-09-19 at 13:43, Justin Clift wrote:
 Hi everyone,
 
 Am trying my hand at a bit of C code again.  Specifically am trying to
 get Tatsuo's pgbench code to loop around more than once, but it keeps
 on hanging forever at this line:
 
 if ((nsocks = select(maxsock + 1, input_mask, (fd_set *) NULL,
   (fd_set *) NULL, (struct timeval *) NULL))  0)
 {
 
 etc
 
 Running this on a FreeBSD 4.6.2 system with PostgreSQL 7.2.2 and gcc
 2.95.3.  Looking around the Net seems to say that hangs like this are
 caused by the select blocking, but that's not helping me any with
 knowing what to do.
 
 Does anyone have an idea of what I can do, or maybe have a few minutes
 to look at my code and point out the problem?
 
 :-)
 
 Regards and best wishes,
 
 Justin Clift
 
 -- 
 My grandfather once told me that there are two kinds of people: those
 who work and those who take the credit. He told me to try to be in the
 first group; there was less competition there.
- Indira Gandhi
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] SCSI Error

2002-09-20 Thread Greg Copeland

Ensure you don't have termination issues.  Make sure your SCSI interface
is configured correctly for your SCSI environment, especially on matters
of termination.  Make sure you have enough power to your drive and if
possible, make sure your drives are hung off of distinct power segments
coming from your power supply.  Of course, make sure you have enough
power being supplied from your power supply to power all of your drives
and *equipment* (cards, CPU, fans, etc).  Last, be sure your cables are
in good working order and that they are as far away from the power
supply and cables as is possible.  SCSI is notorious for suffering from
RF interference especially on long cables.  If all else fails, you may
try swapping cables to see if this helps.

If at all possible, immediately fsck (don't make it perform any
repairs) and backup!  The fsck will let you know if you should
reasonably expect to have a meaningful backup.  Requesting fsck to
repair in this situation could actually result in more/worse
(unrecoverable) corruption.  If the fsck does report potential FS
damage, accept the fact that some of your data (and resulting backup)
may be corrupt, perhaps even unrecoverably so.

I'm using a 7880 here and have never had notable issues save only for
bugs in the Linux SCSI drivers.


Greg


On Fri, 2002-09-20 at 07:08, Ricardo Fogliati wrote:
 Hiya Lists 
 
 Somebody could help me? I am with an error when the Postgresql makes Insert, 
 Delete or Update 
 
 kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 7
 kernel:  I/O error: dev 08:08, sector 47938856
 kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 7
 kernel:  I/O error: dev 08:08, sector 47938800
 kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 7
 kernel:  I/O error: dev 08:08, sector 47938864
 kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 7
 kernel:  I/O error: dev 08:08, sector 47938872
 kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code = 7
 kernel:  I/O error: dev 08:08, sector 47938808
 
 Version:
 
 postgresql-7.2.1-5
 postgresql-devel-7.2.1-5
 postgresql-libs-7.2.1-5
 postgresql-server-7.2.1-5
 
 Server:
 
 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 11
 model name  : Intel(R) Pentium(R) III CPU family  1266MHz
 stepping: 1
 cpu MHz : 1258.309
 cache size  : 512 KB
 fdiv_bug: no
 hlt_bug : no
 f00f_bug: no
 coma_bug: no
 fpu : yes
 fpu_exception   : yes
 cpuid level : 2
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
 cmov pat pse36 mmx fxsr sse
 bogomips: 2510.02
 
 Kernel:
 
 Linux version 2.4.7-10custom (gcc version 2.96 2731 (Red Hat Linux 7.1 
 2.96-98)) #9 Mon
 Sep 16 17:50:13 BRT 2002
 
 Mem Info
 total:used:free:  shared: buffers:  cached:
 Mem:  789540864 782589952  6950912 81764352 34406400 583122944
 Swap: 802873344  2957312 799916032
 MemTotal:   771036 kB
 MemFree:  6788 kB
 MemShared:   79848 kB
 Buffers: 33600 kB
 Cached: 566568 kB
 SwapCached:   2888 kB
 Active:  14476 kB
 Inact_dirty:664852 kB
 Inact_clean:  3576 kB
 Inact_target:  136 kB
 HighTotal:   0 kB
 HighFree:0 kB
 LowTotal:   771036 kB
 LowFree:  6788 kB
 SwapTotal:  784056 kB
 SwapFree:   781168 kB
 NrSwapPages:195292 pages
 
 pci
 
  Bus  2, device   9, function  0:
 SCSI storage controller: Adaptec 7899P (rev 1).
   IRQ 10.
   Master Capable.  Latency=96.  Min Gnt=40.Max Lat=25.
   I/O at 0x2100 [0x21ff].
   Non-prefetchable 64 bit memory at 0xedfff000 [0xedff].
 
 Swap
 
 FilenameTypeSizeUsedPriority
 /dev/sda6   partition   784056  2888-1
 
 Tkanksfull
 -- 
 Ricardo Fogliati
 4 Linux/Vesper
 
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [GENERAL] Memory Errors...

2002-09-20 Thread Greg Copeland

I'll try to have a look-see by the end of the weekend.  Any code that
can reproduce it or is it ANY code that uses SPI?

Greg


On Fri, 2002-09-20 at 11:39, Peter Eisentraut wrote:
 Tom Lane writes:
 
  On looking a little more closely, it's clear that pltcl_SPI_exec()
  should be, and is not, calling SPI_freetuptable() once it's done with
  the tuple table returned by SPI_exec().  This needs to be done in all
  the non-elog code paths after SPI_exec has returned SPI_OK_SELECT.
 
 There's a note in the PL/Python documentation that it's leaking memory if
 SPI plans are used.  Maybe that's related and someone could take a look at
 it.
 
 -- 
 Peter Eisentraut   [EMAIL PROTECTED]
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [GENERAL] Memory Errors...

2002-09-23 Thread Greg Copeland

Well, it looks like it was already taken to the mat.

;)


Greg


On Thu, 2002-09-19 at 16:58, Joe Conway wrote:
 Nigel J. Andrews wrote:
  On Thu, 19 Sep 2002, Joe Conway wrote:
 I can give it a shot, but probably not until the weekend.
 
 I haven't really followed this thread closely, and don't know tcl very well, 
 so it would help if someone can send me a minimal tcl function which triggers 
 the problem.
  
  I can probably take a look at this tomorrow, already started by looking at the
  pltcl_SPI_exec routine. I think a quick glance at ...init_unknown() also shows
  a lack of tuptable freeing.
  
 
 OK -- let me know if you can't find the time and I'll jump back in to it.
 
 Joe
 
 
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html




signature.asc
Description: This is a digitally signed message part


[HACKERS] Client/Server compression?

2002-03-14 Thread Greg Copeland

Just curious, and honestly I haven't looked, but is there any form of
compression between clients and servers?  Has this been looked at?

Greg





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Client/Server compression?

2002-03-14 Thread Greg Copeland

Well, it occurred to me that if a large result set were to be identified
before transport between a client and server, a significant amount of
bandwidth may be saved by using a moderate level of compression. 
Especially with something like result sets, which I tend to believe may
lend it self well toward compression.

Unlike FTP which may be transferring (and often is) previously
compressed data, raw result sets being transfered between the server and
a remote client, IMOHO, would tend to compress rather well as I doubt
much of it would be true random data.

This may be of value for users with low bandwidth connectivity to their
servers or where bandwidth may already be at a premium.

The zlib exploit posting got me thinking about this.

Greg


On Thu, 2002-03-14 at 12:20, Bruce Momjian wrote:
 Greg Copeland wrote:
  Just curious, and honestly I haven't looked, but is there any form of
  compression between clients and servers?  Has this been looked at?
 
 This issues has never come up before.  It is sort of like compressing an
 FTP session.  No one really does that.  Is there value in trying it with
 PostgreSQL?
 
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Client/Server compression?

2002-03-14 Thread Greg Copeland

On Thu, 2002-03-14 at 13:35, Bruce Momjian wrote:
 Greg Copeland wrote:
 
 Checking application/pgp-signature: FAILURE
 -- Start of PGP signed section.
  Well, it occurred to me that if a large result set were to be identified
  before transport between a client and server, a significant amount of
  bandwidth may be saved by using a moderate level of compression. 
  Especially with something like result sets, which I tend to believe may
  lend it self well toward compression.
  
  Unlike FTP which may be transferring (and often is) previously
  compressed data, raw result sets being transfered between the server and
  a remote client, IMOHO, would tend to compress rather well as I doubt
  much of it would be true random data.
  
 
 I should have said compressing the HTTP protocol, not FTP.
 
  This may be of value for users with low bandwidth connectivity to their
  servers or where bandwidth may already be at a premium.
 
 But don't slow links do the compression themselves, like PPP over a
 modem?


Yes and no.  Modem compression doesn't understand the nature of the data
that is actually flowing through it.  As a result, a modem is going to
speed an equal amount of time trying to compress the PPP/IP/NETBEUI
protocols as it does trying to compress the data contained within those
protocol envelopes.  Furthermore, modems tend to have a very limited
amount of time to even attempt to compress, combined with the fact that
they have very limited buffer space, usually limits its ability to
provide effective compression.  Because of these issues, it not uncommon
for a modem to actually yield a larger compressed block than was the
input.

I'd also like to point out that there are also other low speed
connections available which are in use which do not make use of modems
as well as modems which do not support compression (long haul modems for
example).

As for your specific example of HTTP versus FTP, I would also like to
point out that it is becoming more and more common for gzip'd data to be
transported within the HTTP protocol whereby each end is explicitly
aware of the compression taking place on the link with knowledge of what
to do with it.

Also, believe it or not, one of the common uses of SSH is to provide
session compression.  It is not unheard of for people to disable the
encryption to simply use it for a compression tunnel which also provides
for modest session obscurantism.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Client/Server compression?

2002-03-14 Thread Greg Copeland

On Thu, 2002-03-14 at 14:14, Neil Conway wrote:
 On Thu, 2002-03-14 at 14:35, Bruce Momjian wrote:
  Greg Copeland wrote:
  
  Checking application/pgp-signature: FAILURE
  -- Start of PGP signed section.
   Well, it occurred to me that if a large result set were to be identified
   before transport between a client and server, a significant amount of
   bandwidth may be saved by using a moderate level of compression. 
   Especially with something like result sets, which I tend to believe may
   lend it self well toward compression.
   
   Unlike FTP which may be transferring (and often is) previously
   compressed data, raw result sets being transfered between the server and
   a remote client, IMOHO, would tend to compress rather well as I doubt
   much of it would be true random data.
  
  I should have said compressing the HTTP protocol, not FTP.
 
 Except that lots of people compress HTTP traffic (or rather should, if
 they were smart). Bandwidth is much more expensive than CPU time, and
 most browsers have built-in support for gzip-encoded data. Take a look
 at mod_gzip or mod_deflate (2 Apache modules) for more info on this.
 
 IMHO, compressing data would be valuable iff there are lots of people
 with a low-bandwidth link between Postgres and their database clients.
 In my experience, that is rarely the case. For example, people using
 Postgres as a backend for a dynamically generated website usually have
 their database on the same server (for a low-end site), or on a separate
 server connected via 100mbit ethernet to a bunch of webservers. In this
 situation, compressing the data between the database and the webservers
 will just add more latency and increase the load on the database.
 
 Perhaps I'm incorrect though -- are there lots of people using Postgres
 with a slow link between the database server and the clients?
 


What about remote support of these databases where a VPN may not be
available?  In my past experience, this was very common as many
companies do not was to expose their database, even via a VPN to the out
side world, while allowing only modem access.  Not to mention, road
warriors that may need to remotely support their databases may find
value here too.  Would they not?

...I think I'm pretty well coming to the conclusion that it may be of
some value...even if only for a limited number of users.


Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Client/Server compression?

2002-03-14 Thread Greg Copeland

On Thu, 2002-03-14 at 14:29, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  This may be of value for users with low bandwidth connectivity to their
  servers or where bandwidth may already be at a premium.
 
  But don't slow links do the compression themselves, like PPP over a
  modem?
 
 Even if the link doesn't compress, shoving the feature into PG itself
 isn't necessarily the answer.  I'd suggest running such a connection
 through an ssh tunnel, which would give you encryption as well as
 compression.
 
   regards, tom lane

Couldn't the same be said for SSL support?

I'd also like to point out that it's *possible* that this could also be
a speed boost under certain work loads where extra CPU is available as
less data would have to be transfered through the OS, networking layers,
and device drivers.  Until zero copy transfers becomes common on all
platforms for all devices, I would think that it's certainly *possible*
that this *could* offer a possible improvement...well, perhaps a break
even at any rate...

Such claims, again, given specific workloads for compressed file systems
are not unheard off as less device I/O has to take place.

Greg





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Client/Server compression?

2002-03-15 Thread Greg Copeland

On Thu, 2002-03-14 at 14:03, Arguile wrote:

[snip]

 I'm sceptical of the benefit such compressions would provide in this setting
 though. We're dealing with sets that would have to be compressed every time
 (no caching) which might be a bit expensive on a database server. Having it
 as a default off option for psql migtht be nice, but I wonder if it's worth
 the time, effort, and cpu cycles.
 

I dunno.  That's a good question.  For now, I'm making what tends to be
a safe assumption (opps...that word), that most database servers will be
I/O bound rather than CPU bound.  *IF* that assumption hold true, it
sounds like it may make even more sense to implement this.  I do know
that in the past, I've seen 90+% compression ratios on many databases
and 50% - 90% compression ratios on result sets using tunneled
compression schemes (which were compressing things other than datasets
which probably hurt overall compression ratios).  Depending on the work
load and the available resources on a database system, it's possible
that latency could actually be reduced depending on where you measure
this.  That is, do you measure latency as first packet back to remote or
last packet back to remote.  If you use last packet, compression may
actually win.

My current thoughts are to allow for enabled/disabled compression and
variable compression settings (1-9) within a database configuration. 
Worse case, it may be fun to implement and I'm thinking there may
actually be some surprises as an end result if it's done properly.

In looking at the communication code, it looks like only an 8k buffer is
used.  I'm currently looking to bump this up to 32k as most OS's tend to
have a sweet throughput spot with buffer sizes between 32k and 64k. 
Others, depending on the devices in use, like even bigger buffers. 
Because of the fact that this may be a minor optimization, especially on
a heavily loaded server, we may want to consider making this a
configurable parameter.

Greg






signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Client/Server compression?

2002-03-15 Thread Greg Copeland

On Thu, 2002-03-14 at 19:43, Bruce Momjian wrote:
 Kyle wrote:
  On the subject on client/server compression, does the server
  decompress toast data before sending it to the client?  Is so, why
  (other than requiring modifications to the protocol)?
  
  On the flip side, does/could the client toast insert/update data
  before sending it to the server?
 
 It has to decrypt it so the server functions can process it too.  Hard
 to avoid that.  Of course, in some cases, it doesn't need to be
 processed on the server, just passed, so it would have to be done
 conditionally.
 

Along those lines, it occurred to me if the compressor somehow knew the
cardinality of the data rows involved with the result set being
returned, a compressor data dictionary (...think of it as a heads up on
patterns to be looking for) could be created using the unique
cardinality values which, I'm thinking, could dramatically improve the
level of compression for data being transmitted.

Just some food for thought.  After all, these two seem to be somewhat
related as you wouldn't want the communication layer attempting to
recompress data which was natively compressed and needed to be
transparently transmitted.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] User Level Lock question

2002-03-15 Thread Greg Copeland

Are you trying to do a select for update?

Greg


On Fri, 2002-03-15 at 13:54, Lance Ellinghaus wrote:
 I know it does not sound like something that would need to be done, but here
 is why I am looking at doing this...
 
 I am trying to replace a low level ISAM database with PostgreSQL. The low
 level ISAM db allows locking a record during a read to allow Exclusive
 access to the record for that process. If someone tries to do a READ
 operation on that record, it is skipped. I have to duplicate this
 functionality. The application also allows locking multiple records and then
 unlocking individual records or unlocking all of them at once. This cannot
 be done easily with PostgreSQL unless I add a status field to the records
 and manage them. This can be done, but User Level Locks seem like a much
 better solution as they provide faster locking, no writes to the database,
 when the backend quits all locks are released automatically, and I could
 lock multiple records and then clear them as needed. They also exist outside
 of transactions!
 
 So my idea was to use User Level Locks on records and then include a test on
 the lock status in my SELECT statements to filter out any records that have
 a User Level Lock on it. I don't need to set it during the query, just test
 if there is a lock to remove them from the query. When I need to do a true
 lock during the SELECT, I can do it with the supplied routines.
 
 Does this make any more sense now or have I made it that much more
 confusing?
 
 Lance
 
 - Original Message -
 From: Tom Lane [EMAIL PROTECTED]
 To: Lance Ellinghaus [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Friday, March 15, 2002 9:11 AM
 Subject: Re: [HACKERS] User Level Lock question
 
 
  Lance Ellinghaus [EMAIL PROTECTED] writes:
   Is there an easy way to test the lock on a user level lock without
 actually
   issuing the lock?
 
  Why would you ever want to do such a thing?  If you test the lock but
  don't actually acquire it, someone else might acquire the lock half a
  microsecond after you look at it --- and then what does your test result
  mean?  It's certainly unsafe to take any action based on assuming that
  the lock is free.
 
  I suspect what you really want is a conditional acquire, which you can
  get (in recent versions) using the dontWait parameter to LockAcquire.
 
  regards, tom lane
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] User Level Lock question

2002-03-15 Thread Greg Copeland

On Fri, 2002-03-15 at 16:24, Neil Conway wrote:
 On Fri, 2002-03-15 at 14:54, Lance Ellinghaus wrote:
  I know it does not sound like something that would need to be done, but here
  is why I am looking at doing this...
  
  I am trying to replace a low level ISAM database with PostgreSQL. The low
  level ISAM db allows locking a record during a read to allow Exclusive
  access to the record for that process. If someone tries to do a READ
  operation on that record, it is skipped.
 
 If the locked record is skipped, how can the application be sure it is
 getting a consistent view of the data?
 
 Cheers,
 
 Neil
 


Ya, that's what I'm trying to figure out.

It sounds like either he's doing what equates to a select for update or
more of less needs a visibility attribute for the row in question. 
Either way, perhaps he should share more information on what the end
goal is so we can better address any changes in idiom that better
reflect a relational database.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Client/Server compression?

2002-03-15 Thread Greg Copeland

On Fri, 2002-03-15 at 19:44, Kyle wrote:
[snip]

 Wouldn't Tom's suggestion of riding on top of ssh would give similar
 results?  Anyway, it'd probably be a good proof of concept of whether
 or not it's worth the effort.  And that brings up the question: how
 would you measure the benefit?  I'd assume you'd get a good cut in
 network traffic, but you'll take a hit in cpu time.  What's an
 acceptable tradeoff?

Good question.  I've been trying to think of meaningful testing methods,
however, I can still think of reasons all day long where it's not an
issue of a tradeoff.  Simply put, if you have a low bandwidth
connection, as long as there are extra cycles available on the server,
who really cares...except for the guy at the end of the slow connection.

As for SSH, well, that should be rather obvious.  It often is simply not
available.  While SSH is nice, I can think of many situations this is a
win/win.  At least in business settings...where I'm assuming the goal is
to get Postgres into.  Also, along those lines, if SSH is the answer,
then surely the SSL support should be removed too...as SSH provides for
encryption too.  Simply put, removing SSL support makes about as much
sense as asserting that SSH is the final compression solution.

Also, it keeps being stated that a tangible tradeoff between CPU and
bandwidth must be realized.  This is, of course, a false assumption. 
Simply put, if you need bandwidth, you need bandwidth.  Its need is not
a function of CPU, rather, it's a lack of bandwidth.  Having said that,
I of course would still like to have something meaningful which reveals
the impact on CPU and bandwidth.

I'm talking about something that would be optional.  So, what's the cost
of having a little extra optional code in place?  The only issue, best I
can tell, is can it be implemented in a backward compatible manner.

 
 That's one reason I was thinking about the toast stuff.  If the
 backend could serve toast, you'd get an improvement in server to
 client network traffic without the server spending cpu time on
 compression since the data has previously compressed.
 
 Let me know if this is feasible (or slap me if this is how things
 already are): when the backend detoasts data, keep both copies in
 memory.  When it comes time to put data on the wire, instead of
 putting the whole enchilada down give the client the compressed toast
 instead.  And yeah, I guess this would require a protocol change to
 flag the compressed data.  But it seems like a way to leverage work
 already done.
 

I agree with that, however, I'm guessing that implementation would
require a significantly larger effort than what I'm suggesting...then
again, probably because I'm not aware of all the code yet.  Pretty much,
the basic implementation could be in place by the end of this weekend
with only a couple hours worth of work...and then, mostly because I
still don't know lots of the code.  The changes you are talking about is
going to require not only protocol changes but changes at several layers
within the engine.

Of course, something else to keep in mind is that using the TOAST
solution requires that TOAST already be in use.  What I'm suggesting
benefits (size wise) all types of data being sent back to a client.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] User Level Lock question

2002-03-15 Thread Greg Copeland

On Fri, 2002-03-15 at 21:45, Lance Ellinghaus wrote:
 The application actually does not want nor need a consistent view of the
 data. It is expecting that records that are locked will not be viewed at
 all. The locks are normally held for VERY short periods of time. The fact
 that the application is expecting locked records not to be viewed is causing

You keep asserting that these viewed records qualify as being called
locked.  It sounds like a record attribute to me.  Furthermore, it
sounds like that attribute reflects a record's visibility and not if
it's locked.  Locks are generally used to limit accessibility rather
than visibility.  This, I think, seems like the primary source of issue
you're having with your desired implementation.

 me problems because under PostgreSQL this is not easy to do. Even if I lock
 a record using SELECT ... FOR UPDATE, I can still do a SELECT and read it.
 I need to effectively do a SELECT ... FOR UPDATE and make the other
 reading clients skip that record completely.
 
 I can do this with a flag column, but this requires the disk access to do
 the UPDATE and if the client/backend quits/crashes with outstanding records
 marked, they are locked.

That's what transactions are for.  If you have a failure, the
transaction should be rolled back.  The visibility marker would be
restored to it's original visible state.

 
 The User Level Locks look like a great way to do this as I can set a lock
 very quickly without disk access and if the client/backend quits/crashes,
 the locks are automatically removed.

But do you really need to lock it or hide it or both?  If both, you may
want to consider doing an update inside of a transaction or even a
select for update if it fits your needs.  Transactions are your friend. 
:)  I'm assuming you're needing to lock it because you are needing to
update the row at some point in time.  If you are not wanting to update
it, then you are really needing to hide it, not lock it.

 
 I can set the User Level Lock on a record using the supplied routines in the
 contrib directory when I do a SELECT, and can reset the lock by doing an
 UPDATE or SELECT as well.
 But without the ability to test for an existing lock (without ever setting
 it) I cannot skip the locked records.
 
 I would set up all the SELECTs in thunking layer (I cannot rewrite the
 application, only replace the ISAM library with a thunking library that
 converts the ISAM calls to PostgreSQL calls) to look like the following:
 
 SELECT col1, col2, col3
 FROM table
 WHERE
 col1 = 'whatever'
   AND
 col2 = 'whatever'
   AND
 user_lock_test(oid) = 0;
 
 user_lock_test() would return 0 if there is no current lock, and 1 if there
 is.


SELECT col1, col2, col3
FROM table
WHERE
col1 = 'whatever'
AND
col2 = 'whatever'
AND
visible = '1' ;


 
 Does this clear it up a little more or make it more complicated. The big
 problem is the way that the ISAM code acts compared to a REAL RDBMS. If this
 application was coded with a RDBMS in mind, things would be much easier.
 

I understand that...and that can be hard...but sometimes semantics and
idioms have to be adjusted to allow for an ISAM to RDBMS migration.


Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-16 Thread Greg Copeland

I previously replied to you vaguely describing a way you could implement
this by using a combination of client side caching and database tables
and triggers to allow you to determine if your cache is still valid. 
Someone came right behind me, Tom maybe??, and indicated that the
proper/ideal way to do this would be to using postgres' asynchronous
database notification mechanisms (listen/notify I believe were the
semantics) to alert your application that your cache has become
invalid.  Basically, a couple of triggers and the use of the list/notify
model, and you should be all set.

Done properly, a client side cache which is asynchronously notified by
the database when it's contents become invalid should be faster than
relying on MySQL's database caching scheme.  Basically, a strong client
side cache is going to prevent your database from even having to return
a cached result set while a database side cache is going to always
return a result set.  Of course, one of the extra cool things you can do
is to cache a gzip'd copy of the data contents which would further act
as an optimization preventing the client or web server (in case they are
different) from having to recompress every result set.

In the long run, again, if properly done, you should be able to beat
MySQL's implementation without too extra much effort.  Why?  Because a
client side cache can be much smarter in the way that it uses it's
cached contents much in the same way an application is able to better
cache it's data then what the file system is able to do.  This is why an
client side cache should be preferred over that of a database result set
cache.

Greg

References:
http://www.postgresql.org/idocs/index.php?sql-notify.html
http://www.postgresql.org/idocs/index.php?sql-listen.html


On Sat, 2002-03-16 at 08:01, mlw wrote:
 I traded a couple emails with a guy using one of my open source projects. To
 make a long story short, he is going to the new version of MySQL for his
 website because of the new caching feature. He is convinced that it will speed
 up his web site, and he is probably right.
 
 On a web site, a few specific queries get executed, unchanged, repeatedly.
 Think about an ecommerce site, most of the time it is just a handful of basic
 queries. These basic queries are usually against pretty large product tables. A
 caching mechanism would make these queries pretty light weight.
 
 The arguments against caching:
 
 It is an application issue
 This is completely wrong. Caching can not be done against a database without
 knowledge of the database, i.e. when the data changes.
 
 If it is mostly static data, why not just make it a static page?
 Because a static page is a maintenance nightmare. One uses a database in a web
 site to allow content to be changed and upgraded dynamically and with a minimum
 of work.
 
 It isn't very useful
 I disagree completely. A cache of most frequently used queries, or specific
 ones, could make for REALLY good performance in some very specific, but very
 common, applications. Any system that has a hierarchical drill down interface
 to a data set, ecommerce, libraries, document management systems, etc. will
 greatly benefit from a query cache.
 
 I was thinking that it could be implemented as a keyword or comment in a query.
 Such as:
 
 select * from table where column = 'foo' cacheable
 or
 select * from table where column = 'bar' /* cacheable */
 
 Either way, it would speed up a lot of common application types. It would even
 be very cool if you could just cache the results of sub queries, such as:
 
 select * from (select * from table where col1 = 'foo' cacheable) as subset
 where subset.col2 = 'bar' ;
 
 Which would mean that the subquery gets cached, but the greater select need not
 be. The cache could be like a global temp table. Perhaps the user could even
 name the cache entry:
 
 select * from table where column = 'foo' cache on foo
 
 Where one could also do:
 
 select * from cache_foo
 
 Using a keyword is probably a better idea, it can be picked up by the parser
 and instruct PostgreSQL to use the cache, otherwise there will be no additional
 overhead.
 
 Having caching within PostgreSQL will be good for data integrity. Application
 caches can't tell when an update/delete/insert happens, they often have to use
 a time-out mechanism.
 
 OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
 am.
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-16 Thread Greg Copeland

On Sat, 2002-03-16 at 08:01, mlw wrote:
[snip]

 If it is mostly static data, why not just make it a static page?
 Because a static page is a maintenance nightmare. One uses a database in a web
 site to allow content to be changed and upgraded dynamically and with a minimum
 of work.
 


Oh ya, I forgot that reply to that part.  I think you are forgetting
that you can use a database to generate a static page.  That is, only
regenerate the static page when the data within the database changes. 
Again, this is another example of efficient application caching.  If you
have an application which listens for your cache invalidation event, you
can then recreate your static page.  Again, database result set caching
is not required.  And again, then should be significantly faster than
MySQL's result set caching.  Also worth noting that you could then gzip
your static page (keeping both static pages -- compressed and
uncompressed) resulting in yet another optimization for most web servers
and browsers.

Greg




signature.asc
Description: This is a digitally signed message part


<    1   2   3   >