Re: [HACKERS] Re: Hand written parsers

2001-04-15 Thread Bruce Guenter

On Fri, Apr 13, 2001 at 03:12:57AM -0400, Tom Lane wrote:
> I have some interest in proposals to switch to a better parser-generator
> tool than yacc ...

There are tools to produce efficient top-down parsers as well.  Those
doing such parsers "by hand" may be interested in looking at PCCTS
(http://www.polhode.com/pccts.html) or ANTLR (http://www.antlr.org/).
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Sure enough, the lock file is gone

2001-01-26 Thread Bruce Guenter

On Fri, Jan 26, 2001 at 05:06:24PM -0500, Jan Wieck wrote:
> So  the  crazy-temp-vacuum-cleaner on linux doesn't touch the
> sockets?

The tmpwatch program that comes with many Linux distributions will only
unlink regular files and empty directories by default.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-11 Thread Bruce Guenter

On Mon, Dec 11, 2000 at 10:09:01AM -0800, Mikheev, Vadim wrote:
> > One thing we should look at before going with a 64-bit method is the
> > extra storage space for the larger checksum.  We can clearly afford
> > an extra 32 bits for a checksum on an 8K disk page, but if Vadim is
> > envisioning checksumming each individual XLOG record then the extra
> > space is more annoying.
> We need in checksum for each record. But there is no problem with
> 64bit CRC: log record header is 8byte aligned, so CRC addition
> will add 8bytes to header anyway. Is there any CRC64 code?

All you need is a good 64-bit polynomial.  Unfortunately, I've been
unable to find one that's been analyzed to any amount.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-11 Thread Bruce Guenter

On Sun, Dec 10, 2000 at 02:36:38PM -0500, Tom Lane wrote:
> > MD4 would be a better choice than MD5, despite that a theoretical attack
> > on MD4 has been described (albeit never executed).  We don't even care 
> > about real attacks, never mind theoretical ones.  What matters is that 
> > MD4 is entirely good enough, and faster to compute than MD5.
> 
> > I find these results very encouraging.  BSD-licensed MD4 code is readily
> > available, e.g. from any of the BSDs themselves.
> 
> MD4 would be worth looking at, especially if it has less
> startup/shutdown overhead than MD5.  I think a 64-bit true CRC might
> also be worth looking at, just for completeness.  But I don't know
> where to find code for one.

The startup/shutdown for MD4 is identical to that of MD5, however the
inner loop is much smaller (a total of 48 operations instead of 64, with
fewer constants).  The inner MD4 loop is about 1.5 times the speed of
MD5.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-10 Thread Bruce Guenter

On Sun, Dec 10, 2000 at 02:53:43PM -0500, Tom Lane wrote:
> > On my Celeron, the timing for those six opcodes is almost whopping 13
> > cycles per byte.  Obviously there's some major performance hit to do the
> > memory instructions, because there's no more than 4 cycles worth of
> > dependant instructions in that snippet.
> Yes.  It looks like we're looking at pipeline stalls for the memory
> reads.

In particular, for the single-byte memory read.  By loading in 32-bit
words at a time, the cost drops to about 7 cycles per byte.  I
imagine on a 64-bit CPU, loading 64-bit words at a time would drop the
cost even further.
  word1 = *(unsigned long*)z;
  while (c > 4)
{
  z += 4;
  ick = IUPDC32 (word1, ick); word1 >>= 8;
  c -= 4;
  ick = IUPDC32 (word1, ick); word1 >>= 8;
  word1 = *(unsigned long*)z;
  ick = IUPDC32 (word1, ick); word1 >>= 8;
  ick = IUPDC32 (word1, ick);
}
I tried loading two words at a time, starting to load the second word
well before it's used, but that didn't actually reduce the time taken.

> As Nathan remarks nearby, this is just minutiae, but I'm interested
> anyway...

Yup.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-10 Thread Bruce Guenter

On Sun, Dec 10, 2000 at 12:24:59PM -0800, Alfred Perlstein wrote:
> I would try unrolling the loop some (if possible) and retesting.

The inner loop was already unrolled, but was only processing single
bytes at a time.  By loading in 32-bit words at once, it reduced the
cost to only 7 cycles per byte (from 13).
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-09 Thread Bruce Guenter

On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote:
> Bruce Guenter <[EMAIL PROTECTED]> writes:
> > The difference is likely because PA-RISC (like most other RISC
> > architectures) lack a "roll" opcode that is very prevalent in the MD5
> > algorithm.
> 
> A good theory, but unfortunately not a correct theory.  PA-RISC can do a
> circular shift in one cycle using the "shift right double" instruction,
> with the same register specified as both high and low halves of the
> 64-bit operand.  And gcc does know about that.

Interesting.  I was under the impression that virtually no RISC CPU had
a rotate instruction.  Do any others?

> After some groveling through assembly code, it seems that the CRC-32
> implementation is about as tight as it could get: two loads, two XORs,
> and two EXTRU's per byte (one used to implement the right shift, the
> other to implement masking with 0xFF).

Same with the x86 core:
movb %dl,%al
xorb (%ecx),%al
andl $255,%eax
shrl $8,%edx
incl %ecx
xorl (%esi,%eax,4),%edx

> And the wall clock timing does
> indeed come out to just about six cycles per byte.

On my Celeron, the timing for those six opcodes is almost whopping 13
cycles per byte.  Obviously there's some major performance hit to do the
memory instructions, because there's no more than 4 cycles worth of
dependant instructions in that snippet.

BTW, for reference, P3 timings are almost identical to those of the
Celeron, so it's not causing problems outside the built-in caches common
to the two chips.

> The MD5 code also
> looks pretty tight.  Each basic OP requires either two or three logical
> operations (and/or/xor/not) depending on which round you're looking at,
> plus four additions and a circular shift.  PA-RISC needs two cycles to
> load an arbitrary 32-bit constant, but other than that I see no wasted
> cycles here:
> 
>   ldil L'-1444681467,%r20
>   xor %r3,%r14,%r19
>   ldo R'-1444681467(%r20),%r20
>   and %r1,%r19,%r19
>   addl %r15,%r20,%r20
>   xor %r14,%r19,%r19
>   addl %r19,%r26,%r19
>   addl %r20,%r19,%r15
>   shd %r15,%r15,27,%r15
>   addl %r15,%r3,%r15

Here's the x86 assembly code for what appears to be the same basic OP:
movl %edx,%eax
xorl %esi,%eax
andl %edi,%eax
xorl %esi,%eax
movl -84(%ebp),%ecx
leal -1444681467(%ecx,%eax),%eax
addl %eax,%ebx
roll $5,%ebx
addl %edx,%ebx
This is a couple fewer instructions, mainly saving on doing any loads to
use the constant value.  This takes almost exactly 9 cycles per byte.

> There are 64 of these basic OPs needed in each round, and each round
> processes 64 input bytes, so basically you can figure one OP per byte.
> Ignoring loop overhead and so forth, it's nine or ten cycles per byte
> for MD5 versus six for CRC.

On Celeron/P3, CRC scores almost 13 cycles per byte, MD4 is about 6
cycles per byte, and MD5 is about 9 cycles per byte.  On Pentium MMX,
CRC is 7.25, MD4 is 7.5 and MD5 is 10.25.  So, the newer CPUs actually
do worse on CRC than the older ones do.  Weirder and weirder.

> I'm at a loss to see how a Pentium would arrive at a better result for
> MD5 than for CRC.  For one thing, it's going to be at a disadvantage
> because it hasn't got enough registers.

I agree.  It would appear that the table lookup is causing a major
bubble in the pipelines on the newer Celeron/P2/P3 CPUs.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] CRC, hash & Co.

2000-12-09 Thread Bruce Guenter

On Sat, Dec 09, 2000 at 01:58:29PM +1100, Horst Herb wrote:
> There have been some misconceptions in previous mails.
> 
> 1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash domain,
> defined as "a fast error-check hash based on mod 2 polynomial operations"
> which has typically no crypto strength (and does not need it either for most
> purposes).

Not true, unless your definition of strength is different than mine.
The point under question is if different data can produce the same hash
as correct data.  A CRC will always be different if the difference is a
burst error of N-1 bits or less (N being the size of the CRC), and has a
2^N chance of being different for all other errors.  Cryptographic
hashes can only claim the 2^N factor, with no guarantees.

> 2.) Theoretically, an optimal MD5 implementation can't be faster than an
> optimal CRC-32 implementation. Check it yourself: download openssl
> (www.openssl.org) or Peter Gutmans cryptlib where all sorts of hashes and
> CRC-32 are implemented in a very reasonable way. Write a tiny routine
> generating random strings, popping them through the hash function. You will
> see, CRC-32 is typically several times faster.

You check it yourself.  I'll send you a copy of my benchmarking code
under seperate cover.  All the core hashes except for CRC were taken
from openssl.  As per my last message, CRC on Celeron/P2/P3 sucks, and
in the worst case would only be 1.5 times faster than MD5.  MD4 would be
near par with CRC.

> 3.) There are many domains where you need to protect yout database not only
> against random accidental glitches, but also against malicious attacks. In
> these cases, CRC-32 (and other CRCs without any cryptographic strength) are
> no help.

If you have malicious attackers who can deliberately modify live data in
a database, you have problems beyond what any kind of hash can protect
against.

> 4.) Without CRC/hash facility, we will have no means of checking our data
> integrity at all. At least in my domain (medical) most developers are
> craving for database backends where we don't have to re-implement the
> integrity checking stuff again and again. If postgres could provide this, it
> would be a strong argument in favour of postgres.

I agree that it would be useful to CRC data blocks to protect against
bad data errors.  If you're data is really that sensitive, though, you
may be looking for ECC, not CRC or hash facilities.

> 5.) As opposed to a previous posting (Bruce ?), MD5 has been shown to be
> "crackable" (deliberate collison feasible withavailable technology)

No, it hasn't, unless you can provide us a reference to a paper showing
that.  I've seen references that there are internal collisions in the
MD5 reduction function, but still no way to produce collisions on the
final digest.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-09 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 10:17:00PM -0500, Tom Lane wrote:
> A couple further observations while playing with this benchmark ---
> 
> 1. This MD5 implementation is not too robust.  On my machine it dumps
> core if given a non-word-aligned data buffer.  We could probably work
> around that, but it bespeaks a little *too* much hand optimization...

The operations in the MD5 core are based on word-sized chunks.
Obviously, the implentation only does word-sized loads and stores for
that data, and you got a bus error.

> 2. It's a bad idea to ignore the startup/termination costs of the
> algorithms.

Yes.  I had included the startup costs in my benchmark, but not the
termination costs, which are large for MD5 as you point out.

> Of course startup/termination is trivial for CRC, but
> it's not so trivial for MD5.  I changed the code so that the md5
> update() routine also calls md5_finish_ctx(), so that each inner
> loop represents a complete MD5 calculation for a message of the
> size of the main routine's fread buffer.  I then experimented with
> different buffer sizes.  At a buffer size of 1K:

On my Celeron, at 1K blocks MD5 is still significantly faster than CRC,
but is slightly slower at 100 byte blocks.  For comparison, I added
RIPEMD-160, but it's far slower than any of them (twice as long as CRC).
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-08 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 09:28:38PM -0500, Tom Lane wrote:
> Bruce Guenter <[EMAIL PROTECTED]> writes:
> >> I agree, don't send it to the whole list.  But I'd like a copy.
> > Here you go.
> As near as I could tell, the test as you have it (one CRC computation per
> fread) is purely I/O bound.

Nope.  They got 99-100% CPU time with the original version.

> I changed the main loop to this:
> [...hash each block repeatedly...]

Good idea.  Might have been even better to just read the block once and
hash it even more times.

>   On an
> otherwise idle HP 9000 C180 machine, I get the following numbers on a
> 1MB input file:
> 
> time benchcrc  real   35.3 > user   35.0 > sys 0.0
> 
> time benchmd5  real   37.6 > user   37.3 > sys 0.0
> 
> This is a lot closer than I'd have expected, but it sure ain't
> "MD5 40% faster" as you reported.  I wonder why the difference
> in results between your platform and mine?

The difference is likely because PA-RISC (like most other RISC
architectures) lack a "roll" opcode that is very prevalent in the MD5
algorithm.  Intel CPUs have it.  With a new version modified to repeat
the inner loop 100,000 times, I got the following:

time benchcrchttp://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-08 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 04:30:58PM -0500, Tom Lane wrote:
> Bruce Guenter <[EMAIL PROTECTED]> writes:
> >> Are you really saying MD5 was faster than CRC-32?
> > Yes.  I expect it's because the operations used in MD5 are easily
> > parallelized, and operate on blocks of 64-bytes at a time, while the CRC
> > is mostly non-parallelizable, uses a table lookup, and operates on
> > single bytes.
> What MD5 implementation did you use?

I used the GPL'ed implementation written by Ulrich Drepper in 1995.  The
code from OpenSSL looks identical in terms of the operations performed.

> The one I have handy (the original
> RSA reference version) sure looks like it's more computation per byte
> than a CRC.

The algorithm itself does use more computation per byte.  However, the
algorithm works on blocks of 64 bytes at a time.  As well, the
operations should be easily pipelined.  On the other hand, the CRC code
is largely serial, and highly dependant on a table lookup operation.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: CRC was: Re: [HACKERS] beta testing version

2000-12-08 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 03:38:09PM -0500, Tom Lane wrote:
> Bruce Guenter <[EMAIL PROTECTED]> writes:
> > MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
> > impossible to produce a collision using any other method than brute
> > force attempts.
> True but irrelevant.  What we need to worry about is the probability
> that a random error will be detected,

Which I indicated immediately after the sentence you quoted.  The
probability that a random error will be detected is the same as the
probability of a collision in the hash given two different inputs.  The
brute force note means that the probability of a collision is as good as
random.

> MD5 is designed for a purpose that really doesn't have much to do with
> error detection, when you think about it.  It says "you will have a hard
> time computing a different string that produces the same hash as some
> prespecified string".  This is not the same as promising
> better-than-random odds against a damaged copy of some string having the
> same hash as the original.

It does provide as-good-as-random odds against a damaged copy of some
string having the same hash as the original -- nobody has been able to
exhibit any collisions in MD5 (see http://cr.yp.to/papers/hash127.ps,
page 18 for notes on this).

> CRC, on the other hand, is specifically
> designed for error detection, and for localized errors (such as a
> corrupted byte or two) it does a provably better-than-random job.
> For nonlocalized errors you don't get a guarantee, but you do get
> same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC).

For the log, the CRC's primary function (as far as I understand it)
would be to guard against inconsistent transaction being treated as
consistent data.  Such inconsistent transactions would be partially
written, resulting in errors much larger than a small burst.

For guarding the actual record data, I agree with you 100% -- what we're
likely to see is a few localized bytes with flipped bits due to hardware
failure of one kind or another.  However, if the data is really
critical, an ECC may be more appropriate, but that would make the data
significantly larger (9/8 for the algorithms I've seen).

> I really doubt that MD5 can beat a CRC with the same number of output
> bits for the purpose of error detection;

Agreed.  However, MD5 provides four times as many bits as the standard
32-bit CRC.

(I think I initially suggested you could take an arbitrary 32 bits out
of MD5 to provide a check code "as good as CRC-32".  I now take that
back.  Due to the burst error nature of CRCs, nothing else could be as
good as it, unless the alternate algorithm also made some guarantees,
which MD5 definitely doesn't.)

> (Wild-pointer
> stomps on disk buffers are an example of the sort of thing that may
> look like a burst error.)

Actually, wild-pointer incidents involving disk buffers at the kernel
level, from my experience, are characterized by content from one file
appearing in another, which is distinctly different than a burst error,
and more like what would be seen if a log record were partially written.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-08 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 04:21:21PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] (Nathan Myers) writes:
> > Thinking about it, I suspect that any CRC implementation that can't outrun 
> > MD5 by a wide margin is seriously sub-optimal.
> I was finding that hard to believe, too, at least for CRC-32 (CRC-64
> would take more code, so I'm not so sure about it).

Would you like to see the simple benchmarking setup I used?  The amount
of code involved (once all the hashes are factored in) is fairly large,
so I'm somewhat hesitant to just send it to the mailing list.

> Is that 64-bit code you pointed us to before actually a CRC, or
> something else?  It doesn't call itself a CRC, and I was having a hard
> time extracting anything definite (like the polynomial) from all the
> bit-pushing underbrush :-(

It isn't a CRC.  It's a fingerprint.  As you've mentioned, it doesn't
have the guarantees against burst errors that a CRC would have, but it
does have as good as random collision avoidance over any random data
corruption.  At least, that's what the author claims.  My math isn't
nearly good enough to verify such claims.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Re: CRC

2000-12-08 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 11:10:19AM -0800, Nathan Myers wrote:
> This is very interesting.  MD4 is faster than MD5.  (MD5, described as 
> "MD4 with suspenders on", does some extra stuff to protect against more-
> obscure attacks, of no interest to us.)  Which 64-bit CRC code did you 
> use, Mark Mitchell's?

Yes.

> Are you really saying MD5 was faster than CRC-32?

Yes.  I expect it's because the operations used in MD5 are easily
parallelized, and operate on blocks of 64-bytes at a time, while the CRC
is mostly non-parallelizable, uses a table lookup, and operates on
single bytes.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: CRC was: Re: [HACKERS] beta testing version

2000-12-08 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 10:36:39AM -0800, Ian Lance Taylor wrote:
>Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
>the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
>Celeron and on a PIII was MD5.  The 64-bit fingerprint was only a hair
>slower, the CRC was (quite surprisingly) about 40% slower, and the
>implementation of SHA that I had available was a real dog.  Taking an
>arbitrary 32 bits of a MD5 would likely be less collision prone than
>using a 32-bit CRC, and it appears faster as well.
> 
> I just want to confirm that you used something like the fast 32-bit
> CRC algorithm, appended.  The one posted earlier was accurate but
> slow.

Yes.  I just rebuilt the framework using this exact code, and it
performed identically to the previous CRC code (which didn't have an
unrolled inner loop).  These were compiled with -O6 with egcs 1.1.2.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: CRC was: Re: [HACKERS] beta testing version

2000-12-08 Thread Bruce Guenter

On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote:
> Bruce Guenter <[EMAIL PROTECTED]> writes:
> > ... Taking an
> > arbitrary 32 bits of a MD5 would likely be less collision prone than
> > using a 32-bit CRC, and it appears faster as well.
> 
> ... but that would be an algorithm that you know NOTHING about the
> properties of.  What is your basis for asserting it's better than CRC?

MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
impossible to produce a collision using any other method than brute
force attempts.  In other words, any stream of input to the hash that is
longer than the hash length (8 bytes for MD5) is equally probable to
produce a given hash code.

> CRC is pretty well studied and its error-detection behavior is known
> (and good).  MD5 has been studied less thoroughly AFAIK, and in any
> case what's known about its behavior is that the *entire* MD5 output
> provides a good signature for a datastream.  If you pick some ad-hoc
> method like taking a randomly chosen subset of MD5's output bits,
> you really don't know anything at all about what the error-detection
> properties of the method are.

Actually, in my reading reagarding the properties of MD5, I read an
article that stated that if a smaller number of bits was desired, one
could either (and here's where my memory fails me) just select the
middle N bits from the resulting hash, or fold the hash using XOR until
the desired number of bits was reached.  I'll see if I can find a
reference...

RFC2289 (http://www.ietf.org/rfc/rfc2289.txt) includes an algorithm for
folding MD5 digests down to 64 bits by XORing the top half with the
bottom half.  See appendix A.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: CRC was: Re: [HACKERS] beta testing version

2000-12-08 Thread Bruce Guenter

On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote:
> 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for
>2^32 times the confidence.  That's pretty cheap confidence.

Incidentally, I benchmarked the previously mentioned 64-bit fingerprint,
the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my
Celeron and on a PIII was MD5.  The 64-bit fingerprint was only a hair
slower, the CRC was (quite surprisingly) about 40% slower, and the
implementation of SHA that I had available was a real dog.  Taking an
arbitrary 32 bits of a MD5 would likely be less collision prone than
using a 32-bit CRC, and it appears faster as well.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] CRCs (was: beta testing version)

2000-12-07 Thread Bruce Guenter

On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote:
> > > That's why an end marker must follow all valid records.  
> > That requires an extra out-of-sequence write. 
> Yes, and also increase probability to corrupt already committed
> to log data.

Are you referring to the case where the drive loses power in mid-write?
That is solved by either arranging for the markers to always be placed
at the start of a block, or by plugging in a UPS.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] CRCs (was: beta testing version)

2000-12-07 Thread Bruce Guenter

On Thu, Dec 07, 2000 at 12:25:41PM -0800, Nathan Myers wrote:
> That requires an extra out-of-sequence write. 

Ayup!

> Generally, there are no guarantees, only reasonable expectations.

I would differ, but that's irrelevant.

> A 64-bit CRC would give sufficient confidence...

This is part of what I was getting at, in a roundabout way.  If you use
a CRC, hash, or any other kind of non-trivial check code, you have a
certain level of confidence in the data, but not a guarantee.  If you
decide, based on your expert opinions, that a 32 or 64 bit CRC or hash
gives you an adequate level of confidence in the event of a crash, then
I'll be satisfied, but don't call it a guarantee.

Them's small nits we're picking at, though.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] CRCs (was: beta testing version)

2000-12-06 Thread Bruce Guenter

On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote:
> On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
> > On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote:
> > > How exactly *do* we determine where the end of the valid log data is,
> > > anyway?
> > 
> > I don't know how pgsql does it, but the only safe way I know of is to
> > include an "end" marker after each record.  When writing to the log,
> > append the records after the last end marker, ending with another end
> > marker, and fdatasync the log.  Then overwrite the previous end marker
> > to indicate it's not the end of the log any more and fdatasync again.
> >
> > To ensure that it is written atomically, the end marker must not cross a
> > hardware sector boundary (typically 512 bytes).  This can be trivially
> > guaranteed by making the marker a single byte.
> 
> An "end" marker is not sufficient, unless all writes are done in
> one-sector units with an fsync between, and the drive buffering 
> is turned off.

That's why an end marker must follow all valid records.  When you write
records, you don't touch the marker, and add an end marker to the end of
the records you've written.  After writing and syncing the records, you
rewrite the end marker to indicate that the data following it is valid,
and sync again.  There is no state in that sequence in which partially-
written data could be confused as real data, assuming either your drives
aren't doing write-back caching or you have a UPS, and fsync doesn't
return until the drives return success.

> For larger writes the OS will re-order the writes.  
> Most drives will re-order them too, even if the OS doesn't.

I'm well aware of that.

> > Any other way I've seen discussed (here and elsewhere) either
> > - Assume that a CRC is a guarantee.  
> 
> We are already assuming a CRC is a guarantee.  
>
> The drive computes a CRC for each sector, and if the CRC is OK the 
> drive is happy.  CRC errors within the drive are quite frequent, and 
> the drive re-reads when a bad CRC comes up.

The kind of data failures that a CRC is guaranteed to catch (N-bit
errors) are almost precisely those that a mis-read on a hardware sector
would cause.

> >   ... A CRC would be a good addition to
> >   help ensure the data wasn't broken by flakey drive firmware, but
> >   doesn't guarantee consistency.
> No, a CRC would be a good addition to compensate for sector write
> reordering, which is done both by the OS and by the drive, even for 
> "atomic" writes.

But it doesn't guarantee consistency, even in that case.  There is a
possibility (however small) that the random data that was located in the
sectors before the write will match the CRC.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: AW: [HACKERS] beta testing version

2000-12-06 Thread Bruce Guenter

On Wed, Dec 06, 2000 at 11:13:33PM +, Daniele Orlandi wrote:
> Bruce Guenter wrote:
> > - Assume that a CRC is a guarantee.  A CRC would be a good addition to
> >   help ensure the data wasn't broken by flakey drive firmware, but
> >   doesn't guarantee consistency.
> Even a CRC per transaction (it could be a nice END record) ?

CRCs are designed to catch N-bit errors (ie N bits in a row with their
values flipped).  N is (IIRC) the number of bits in the CRC minus one.
So, a 32-bit CRC can catch all 31-bit errors.  That's the only guarantee
a CRC gives.  Everything else has a 1 in 2^32-1 chance of producing the
same CRC as the original data.  That's pretty good odds, but not a
guarantee.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: AW: [HACKERS] beta testing version

2000-12-06 Thread Bruce Guenter

On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote:
> Zeugswetter Andreas SB <[EMAIL PROTECTED]> writes:
> > Yes, but there would need to be a way to verify the last page or
> > record from txlog when running on crap hardware.
> How exactly *do* we determine where the end of the valid log data is,
> anyway?

I don't know how pgsql does it, but the only safe way I know of is to
include an "end" marker after each record.  When writing to the log,
append the records after the last end marker, ending with another end
marker, and fdatasync the log.  Then overwrite the previous end marker
to indicate it's not the end of the log any more and fdatasync again.

To ensure that it is written atomically, the end marker must not cross a
hardware sector boundary (typically 512 bytes).  This can be trivially
guaranteed by making the marker a single byte.

Any other way I've seen discussed (here and elsewhere) either
- Requires atomic multi-sector writes, which are possible only if all
  the sectors are sequential on disk, the kernel issues one large write
  for all of them, and you don't powerfail in the middle of the write.
- Assume that a CRC is a guarantee.  A CRC would be a good addition to
  help ensure the data wasn't broken by flakey drive firmware, but
  doesn't guarantee consistency.

-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-05 Thread Bruce Guenter

On Tue, Dec 05, 2000 at 02:52:48PM -0500, Tom Lane wrote:
> There aren't going to be all that many data pages needing the COW
> treatment, because the postmaster uses very little data space of its
> own.  I think this would become an issue if we tried to have the
> postmaster pre-cache catalog information for backends, however (see
> my post elsewhere in this thread).

Would that pre-cached data not be placed in a SHM segment?  Such
segments don't do COW, so this would be a non-issue.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-05 Thread Bruce Guenter

On Mon, Dec 04, 2000 at 08:43:24PM -0800, Tom Samplonius wrote:
>   Some OSes (Linux is the main one) implement threads as pseudo processes.
> Linux threads are processes with a shared address space and file
> descriptor table.
> 
>   Context switch cost for threads can be lower if you are switching to a
> thread in the same process.  That of course assumes that all context
> switches will occur within the same process, or the Linux
> everything-is-a-process model isn't used.

Actually, context switch cost between threads is low on Linux as well,
since the CPU's VM mappings don't get invalidated.  This means that its
page tables won't get reloaded, which is one of the large costs involved
in context switches.  Context switches between processes takes (with no
significant VM) about 900 cycles (1.8us) on a 450MHz Celery.  I would
expect thread switch time to be slightly lower than that, and context
switches between processes with large VMs would be much larger just due
to the cost of reloading the page tables.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-05 Thread Bruce Guenter

On Tue, Dec 05, 2000 at 10:07:37AM +0100, Zeugswetter Andreas SB wrote:
> > And using the following program for timing thread creation 
> > and cleanup:
> > 
> > #include 
> > 
> > threadfn() { pthread_exit(0); }
> 
> I think you would mainly need to test how the system behaves, if 
> the threads and processes actually do some work in parallel, like:
> 
> threadfn() {int i; for (i=0; i<1000;) {i++}; pthread_exit(0); }

The purpose of the benchmark was to time how long it took to create and
destroy a process or thread, nothing more.  It was not creating
processes in parallel for precisely that reason.  The point in dispute
was that threads took much less time to create than processes.

> In a good thread implementation 1 parallel processes tend to get way less 
> cpu than 1 parallel threads, making threads optimal for the very many clients 
>case
> (like > 3000).

Why do you believe this?  In the "classical" thread implementation, each
process would get the same amount of CPU, no matter how many threads was
running in it.  That would mean that many parallel processes would get
more CPU in total than many threads in one process.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-04 Thread Bruce Guenter

On Mon, Dec 04, 2000 at 02:30:31PM -0800, Dan Lyke wrote:
> Adam Haberlach writes:
> > Typically (on a well-written OS, at least), the spawning of a thread
> > is much cheaper then the creation of a new process (via fork()).
> This would be well worth testing on some representative sample
> systems.

Using the following program for timing process creation and cleanup:

main() {
  int i;
  int pid;
  for (i=0; i<10; ++i) {
pid=fork();
if(pid==-1) exit(1);
if(!pid) _exit(0);
waitpid(pid,0,0);
  }
  exit(0);
} 

And using the following program for timing thread creation and cleanup:

#include 

threadfn() { pthread_exit(0); }

main() {
  int i;
  pthread_t thread;
  for (i=0; i<10; ++i) {
if (pthread_create(&thread, 0, threadfn, 0)) exit(1);
if (pthread_join(thread, 0)) exit(1);
  }
  exit(0);
} 

On a relatively unloaded 500MHz PIII running Linux 2.2, the fork test
program took a minimum of 16.71 seconds to run (167us per
fork/exit/wait), and the thread test program took a minimum of 12.10
seconds to run (121us per pthread_create/exit/join).  I use the minimums
because those would be the runs where the tasks were least interfered
with by other tasks.  This amounts to a roughly 25% speed improvement
for threads over processes, for the null-process case.

If I add the following lines before the for loop:
  char* m;
  m=malloc(1024*1024);
  memset(m,0,1024,1024);
The cost for doing the fork balloons to 240us, whereas the cost for
doing the thread is constant.  So, the cost of marking the pages as COW
is quite significant (using those numbers, 73us/MB).

So, forking a process with lots of data is expensive.  However, most of
the PostgreSQL data is in a SysV IPC shared memory segment, which
shouldn't affect the fork numbers.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-04 Thread Bruce Guenter

On Mon, Dec 04, 2000 at 03:17:00PM -0800, Adam Haberlach wrote:
>   Typically (on a well-written OS, at least), the spawning of a thread
> is much cheaper then the creation of a new process (via fork()).

Unless I'm mistaken, the back-end is only forked when starting a new
connection, in which case the latency of doing the initial TCP tri-state
and start-up queries is much larger than any process creation cost.  On
Linux 2.2.16 on a 500MHz PIII, I can do the fork/exit/wait sequence in
about 164us.  On the same server, I can make/break a PostgreSQL
connection in about 19,000us (with 0% CPU idle, about 30% CPU system).
Even if we can manage to get a thread for free, and assume that the fork
from postmaster takes more than 164us, it won't make a big difference
once the other latencies are worked out.

> Also, since everything in a group of threads (I'll call 'em a team)

Actually, you call them a process.  That is the textbook definition.

> shares the
> same address space, there can be some memory overhead savings.

Only slightly.  All of the executable and libraries should already be
shared, as will all non-modified data.  If the data is modified by the
threads, you'll need seperate copies for each thread anyways, so the net
difference is small.

I'm not denying there would be a difference.  Compared to seperate
processes, threads are more efficient.  Doing a context switch between
threads means there is no PTE invalidations, which makes them quicker
than between processes.  Creation would be a bit faster due to just
linking in the VM to a new thread rather than marking it all as COW.
The memory savings would come from reduced fragmentation of the modified
data (if you have 1 byte modified on each of 100 pages, the thread would
grow by a few K, compared to 400K for processes).  I'm simply arguing
that the differences don't appear to be significant compared to the
other costs involved.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-04 Thread Bruce Guenter

On Mon, Nov 27, 2000 at 11:42:24PM -0600, Junfeng Zhang wrote:
> I am new to postgreSQL. When I read the documents, I find out the Postmaster
> daemon actual spawns a new backend server process to serve a new client
> request. Why not use threads instead? Is that just for a historical reason,
> or some performance/implementation concern?

Once all the questions regarding "why not" have been answered, it would
be good to also ask "why use threads?"  Do they simplify the code?  Do
they offer significant performance or efficiency gains?  What do they
give, other than being buzzword compliant?
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] 8192 BLCKSZ ?

2000-11-28 Thread Bruce Guenter

On Tue, Nov 28, 2000 at 12:38:37AM -0500, Tom Lane wrote:
> Not sure about the wild-and-wooly world of Linux filesystems...
> anybody know what the allocation unit is on the popular Linux FSes?

It rather depends on the filesystem.  Current ext2 (the most common)
systems default to 1K on small partitions and 4K otherwise.  IIRC,
reiserfs uses 4K blocks in a tree structure that includes tail merging
which makes the question of block size tricky.  Linux 2.3.x passes all
file I/O through its page cache, which deals in 4K pages on most 32-bit
architectures.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] [Fwd: [CORE SDI ADVISORY] MySQL weak authentication]

2000-10-25 Thread Bruce Guenter

On Tue, Oct 24, 2000 at 10:25:14AM -0400, Lamar Owen wrote:
> I am forwarding this not to belittle MySQL, but to hopefully help in the
> development of our own encryption protocol for secure password
> authentication over the network.
> 
> The point being is that if we offer the protocol to do it, we had better
> ensure its security, or someone WILL find the hole.  Hopefully it will
> be people who want to help security and not exploit it.

IMO, anything short of a full SSL wrapped connection is fairly
pointless.  What does it matter if the password is encrypted if
sensitive query data flows in the clear?
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature