Re: [HACKERS] Re: Hand written parsers
On Fri, Apr 13, 2001 at 03:12:57AM -0400, Tom Lane wrote: > I have some interest in proposals to switch to a better parser-generator > tool than yacc ... There are tools to produce efficient top-down parsers as well. Those doing such parsers "by hand" may be interested in looking at PCCTS (http://www.polhode.com/pccts.html) or ANTLR (http://www.antlr.org/). -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Sure enough, the lock file is gone
On Fri, Jan 26, 2001 at 05:06:24PM -0500, Jan Wieck wrote: > So the crazy-temp-vacuum-cleaner on linux doesn't touch the > sockets? The tmpwatch program that comes with many Linux distributions will only unlink regular files and empty directories by default. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Mon, Dec 11, 2000 at 10:09:01AM -0800, Mikheev, Vadim wrote: > > One thing we should look at before going with a 64-bit method is the > > extra storage space for the larger checksum. We can clearly afford > > an extra 32 bits for a checksum on an 8K disk page, but if Vadim is > > envisioning checksumming each individual XLOG record then the extra > > space is more annoying. > We need in checksum for each record. But there is no problem with > 64bit CRC: log record header is 8byte aligned, so CRC addition > will add 8bytes to header anyway. Is there any CRC64 code? All you need is a good 64-bit polynomial. Unfortunately, I've been unable to find one that's been analyzed to any amount. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Sun, Dec 10, 2000 at 02:36:38PM -0500, Tom Lane wrote: > > MD4 would be a better choice than MD5, despite that a theoretical attack > > on MD4 has been described (albeit never executed). We don't even care > > about real attacks, never mind theoretical ones. What matters is that > > MD4 is entirely good enough, and faster to compute than MD5. > > > I find these results very encouraging. BSD-licensed MD4 code is readily > > available, e.g. from any of the BSDs themselves. > > MD4 would be worth looking at, especially if it has less > startup/shutdown overhead than MD5. I think a 64-bit true CRC might > also be worth looking at, just for completeness. But I don't know > where to find code for one. The startup/shutdown for MD4 is identical to that of MD5, however the inner loop is much smaller (a total of 48 operations instead of 64, with fewer constants). The inner MD4 loop is about 1.5 times the speed of MD5. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Sun, Dec 10, 2000 at 02:53:43PM -0500, Tom Lane wrote: > > On my Celeron, the timing for those six opcodes is almost whopping 13 > > cycles per byte. Obviously there's some major performance hit to do the > > memory instructions, because there's no more than 4 cycles worth of > > dependant instructions in that snippet. > Yes. It looks like we're looking at pipeline stalls for the memory > reads. In particular, for the single-byte memory read. By loading in 32-bit words at a time, the cost drops to about 7 cycles per byte. I imagine on a 64-bit CPU, loading 64-bit words at a time would drop the cost even further. word1 = *(unsigned long*)z; while (c > 4) { z += 4; ick = IUPDC32 (word1, ick); word1 >>= 8; c -= 4; ick = IUPDC32 (word1, ick); word1 >>= 8; word1 = *(unsigned long*)z; ick = IUPDC32 (word1, ick); word1 >>= 8; ick = IUPDC32 (word1, ick); } I tried loading two words at a time, starting to load the second word well before it's used, but that didn't actually reduce the time taken. > As Nathan remarks nearby, this is just minutiae, but I'm interested > anyway... Yup. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Sun, Dec 10, 2000 at 12:24:59PM -0800, Alfred Perlstein wrote: > I would try unrolling the loop some (if possible) and retesting. The inner loop was already unrolled, but was only processing single bytes at a time. By loading in 32-bit words at once, it reduced the cost to only 7 cycles per byte (from 13). -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > > The difference is likely because PA-RISC (like most other RISC > > architectures) lack a "roll" opcode that is very prevalent in the MD5 > > algorithm. > > A good theory, but unfortunately not a correct theory. PA-RISC can do a > circular shift in one cycle using the "shift right double" instruction, > with the same register specified as both high and low halves of the > 64-bit operand. And gcc does know about that. Interesting. I was under the impression that virtually no RISC CPU had a rotate instruction. Do any others? > After some groveling through assembly code, it seems that the CRC-32 > implementation is about as tight as it could get: two loads, two XORs, > and two EXTRU's per byte (one used to implement the right shift, the > other to implement masking with 0xFF). Same with the x86 core: movb %dl,%al xorb (%ecx),%al andl $255,%eax shrl $8,%edx incl %ecx xorl (%esi,%eax,4),%edx > And the wall clock timing does > indeed come out to just about six cycles per byte. On my Celeron, the timing for those six opcodes is almost whopping 13 cycles per byte. Obviously there's some major performance hit to do the memory instructions, because there's no more than 4 cycles worth of dependant instructions in that snippet. BTW, for reference, P3 timings are almost identical to those of the Celeron, so it's not causing problems outside the built-in caches common to the two chips. > The MD5 code also > looks pretty tight. Each basic OP requires either two or three logical > operations (and/or/xor/not) depending on which round you're looking at, > plus four additions and a circular shift. PA-RISC needs two cycles to > load an arbitrary 32-bit constant, but other than that I see no wasted > cycles here: > > ldil L'-1444681467,%r20 > xor %r3,%r14,%r19 > ldo R'-1444681467(%r20),%r20 > and %r1,%r19,%r19 > addl %r15,%r20,%r20 > xor %r14,%r19,%r19 > addl %r19,%r26,%r19 > addl %r20,%r19,%r15 > shd %r15,%r15,27,%r15 > addl %r15,%r3,%r15 Here's the x86 assembly code for what appears to be the same basic OP: movl %edx,%eax xorl %esi,%eax andl %edi,%eax xorl %esi,%eax movl -84(%ebp),%ecx leal -1444681467(%ecx,%eax),%eax addl %eax,%ebx roll $5,%ebx addl %edx,%ebx This is a couple fewer instructions, mainly saving on doing any loads to use the constant value. This takes almost exactly 9 cycles per byte. > There are 64 of these basic OPs needed in each round, and each round > processes 64 input bytes, so basically you can figure one OP per byte. > Ignoring loop overhead and so forth, it's nine or ten cycles per byte > for MD5 versus six for CRC. On Celeron/P3, CRC scores almost 13 cycles per byte, MD4 is about 6 cycles per byte, and MD5 is about 9 cycles per byte. On Pentium MMX, CRC is 7.25, MD4 is 7.5 and MD5 is 10.25. So, the newer CPUs actually do worse on CRC than the older ones do. Weirder and weirder. > I'm at a loss to see how a Pentium would arrive at a better result for > MD5 than for CRC. For one thing, it's going to be at a disadvantage > because it hasn't got enough registers. I agree. It would appear that the table lookup is causing a major bubble in the pipelines on the newer Celeron/P2/P3 CPUs. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] CRC, hash & Co.
On Sat, Dec 09, 2000 at 01:58:29PM +1100, Horst Herb wrote: > There have been some misconceptions in previous mails. > > 1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash domain, > defined as "a fast error-check hash based on mod 2 polynomial operations" > which has typically no crypto strength (and does not need it either for most > purposes). Not true, unless your definition of strength is different than mine. The point under question is if different data can produce the same hash as correct data. A CRC will always be different if the difference is a burst error of N-1 bits or less (N being the size of the CRC), and has a 2^N chance of being different for all other errors. Cryptographic hashes can only claim the 2^N factor, with no guarantees. > 2.) Theoretically, an optimal MD5 implementation can't be faster than an > optimal CRC-32 implementation. Check it yourself: download openssl > (www.openssl.org) or Peter Gutmans cryptlib where all sorts of hashes and > CRC-32 are implemented in a very reasonable way. Write a tiny routine > generating random strings, popping them through the hash function. You will > see, CRC-32 is typically several times faster. You check it yourself. I'll send you a copy of my benchmarking code under seperate cover. All the core hashes except for CRC were taken from openssl. As per my last message, CRC on Celeron/P2/P3 sucks, and in the worst case would only be 1.5 times faster than MD5. MD4 would be near par with CRC. > 3.) There are many domains where you need to protect yout database not only > against random accidental glitches, but also against malicious attacks. In > these cases, CRC-32 (and other CRCs without any cryptographic strength) are > no help. If you have malicious attackers who can deliberately modify live data in a database, you have problems beyond what any kind of hash can protect against. > 4.) Without CRC/hash facility, we will have no means of checking our data > integrity at all. At least in my domain (medical) most developers are > craving for database backends where we don't have to re-implement the > integrity checking stuff again and again. If postgres could provide this, it > would be a strong argument in favour of postgres. I agree that it would be useful to CRC data blocks to protect against bad data errors. If you're data is really that sensitive, though, you may be looking for ECC, not CRC or hash facilities. > 5.) As opposed to a previous posting (Bruce ?), MD5 has been shown to be > "crackable" (deliberate collison feasible withavailable technology) No, it hasn't, unless you can provide us a reference to a paper showing that. I've seen references that there are internal collisions in the MD5 reduction function, but still no way to produce collisions on the final digest. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 10:17:00PM -0500, Tom Lane wrote: > A couple further observations while playing with this benchmark --- > > 1. This MD5 implementation is not too robust. On my machine it dumps > core if given a non-word-aligned data buffer. We could probably work > around that, but it bespeaks a little *too* much hand optimization... The operations in the MD5 core are based on word-sized chunks. Obviously, the implentation only does word-sized loads and stores for that data, and you got a bus error. > 2. It's a bad idea to ignore the startup/termination costs of the > algorithms. Yes. I had included the startup costs in my benchmark, but not the termination costs, which are large for MD5 as you point out. > Of course startup/termination is trivial for CRC, but > it's not so trivial for MD5. I changed the code so that the md5 > update() routine also calls md5_finish_ctx(), so that each inner > loop represents a complete MD5 calculation for a message of the > size of the main routine's fread buffer. I then experimented with > different buffer sizes. At a buffer size of 1K: On my Celeron, at 1K blocks MD5 is still significantly faster than CRC, but is slightly slower at 100 byte blocks. For comparison, I added RIPEMD-160, but it's far slower than any of them (twice as long as CRC). -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 09:28:38PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > >> I agree, don't send it to the whole list. But I'd like a copy. > > Here you go. > As near as I could tell, the test as you have it (one CRC computation per > fread) is purely I/O bound. Nope. They got 99-100% CPU time with the original version. > I changed the main loop to this: > [...hash each block repeatedly...] Good idea. Might have been even better to just read the block once and hash it even more times. > On an > otherwise idle HP 9000 C180 machine, I get the following numbers on a > 1MB input file: > > time benchcrc real 35.3 > user 35.0 > sys 0.0 > > time benchmd5 real 37.6 > user 37.3 > sys 0.0 > > This is a lot closer than I'd have expected, but it sure ain't > "MD5 40% faster" as you reported. I wonder why the difference > in results between your platform and mine? The difference is likely because PA-RISC (like most other RISC architectures) lack a "roll" opcode that is very prevalent in the MD5 algorithm. Intel CPUs have it. With a new version modified to repeat the inner loop 100,000 times, I got the following: time benchcrchttp://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 04:30:58PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > >> Are you really saying MD5 was faster than CRC-32? > > Yes. I expect it's because the operations used in MD5 are easily > > parallelized, and operate on blocks of 64-bytes at a time, while the CRC > > is mostly non-parallelizable, uses a table lookup, and operates on > > single bytes. > What MD5 implementation did you use? I used the GPL'ed implementation written by Ulrich Drepper in 1995. The code from OpenSSL looks identical in terms of the operations performed. > The one I have handy (the original > RSA reference version) sure looks like it's more computation per byte > than a CRC. The algorithm itself does use more computation per byte. However, the algorithm works on blocks of 64 bytes at a time. As well, the operations should be easily pipelined. On the other hand, the CRC code is largely serial, and highly dependant on a table lookup operation. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: CRC was: Re: [HACKERS] beta testing version
On Fri, Dec 08, 2000 at 03:38:09PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > > MD5 is a cryptographic hash, which means (AFAIK) that ideally it is > > impossible to produce a collision using any other method than brute > > force attempts. > True but irrelevant. What we need to worry about is the probability > that a random error will be detected, Which I indicated immediately after the sentence you quoted. The probability that a random error will be detected is the same as the probability of a collision in the hash given two different inputs. The brute force note means that the probability of a collision is as good as random. > MD5 is designed for a purpose that really doesn't have much to do with > error detection, when you think about it. It says "you will have a hard > time computing a different string that produces the same hash as some > prespecified string". This is not the same as promising > better-than-random odds against a damaged copy of some string having the > same hash as the original. It does provide as-good-as-random odds against a damaged copy of some string having the same hash as the original -- nobody has been able to exhibit any collisions in MD5 (see http://cr.yp.to/papers/hash127.ps, page 18 for notes on this). > CRC, on the other hand, is specifically > designed for error detection, and for localized errors (such as a > corrupted byte or two) it does a provably better-than-random job. > For nonlocalized errors you don't get a guarantee, but you do get > same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC). For the log, the CRC's primary function (as far as I understand it) would be to guard against inconsistent transaction being treated as consistent data. Such inconsistent transactions would be partially written, resulting in errors much larger than a small burst. For guarding the actual record data, I agree with you 100% -- what we're likely to see is a few localized bytes with flipped bits due to hardware failure of one kind or another. However, if the data is really critical, an ECC may be more appropriate, but that would make the data significantly larger (9/8 for the algorithms I've seen). > I really doubt that MD5 can beat a CRC with the same number of output > bits for the purpose of error detection; Agreed. However, MD5 provides four times as many bits as the standard 32-bit CRC. (I think I initially suggested you could take an arbitrary 32 bits out of MD5 to provide a check code "as good as CRC-32". I now take that back. Due to the burst error nature of CRCs, nothing else could be as good as it, unless the alternate algorithm also made some guarantees, which MD5 definitely doesn't.) > (Wild-pointer > stomps on disk buffers are an example of the sort of thing that may > look like a burst error.) Actually, wild-pointer incidents involving disk buffers at the kernel level, from my experience, are characterized by content from one file appearing in another, which is distinctly different than a burst error, and more like what would be seen if a log record were partially written. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 04:21:21PM -0500, Tom Lane wrote: > [EMAIL PROTECTED] (Nathan Myers) writes: > > Thinking about it, I suspect that any CRC implementation that can't outrun > > MD5 by a wide margin is seriously sub-optimal. > I was finding that hard to believe, too, at least for CRC-32 (CRC-64 > would take more code, so I'm not so sure about it). Would you like to see the simple benchmarking setup I used? The amount of code involved (once all the hashes are factored in) is fairly large, so I'm somewhat hesitant to just send it to the mailing list. > Is that 64-bit code you pointed us to before actually a CRC, or > something else? It doesn't call itself a CRC, and I was having a hard > time extracting anything definite (like the polynomial) from all the > bit-pushing underbrush :-( It isn't a CRC. It's a fingerprint. As you've mentioned, it doesn't have the guarantees against burst errors that a CRC would have, but it does have as good as random collision avoidance over any random data corruption. At least, that's what the author claims. My math isn't nearly good enough to verify such claims. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Re: CRC
On Fri, Dec 08, 2000 at 11:10:19AM -0800, Nathan Myers wrote: > This is very interesting. MD4 is faster than MD5. (MD5, described as > "MD4 with suspenders on", does some extra stuff to protect against more- > obscure attacks, of no interest to us.) Which 64-bit CRC code did you > use, Mark Mitchell's? Yes. > Are you really saying MD5 was faster than CRC-32? Yes. I expect it's because the operations used in MD5 are easily parallelized, and operate on blocks of 64-bytes at a time, while the CRC is mostly non-parallelizable, uses a table lookup, and operates on single bytes. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: CRC was: Re: [HACKERS] beta testing version
On Fri, Dec 08, 2000 at 10:36:39AM -0800, Ian Lance Taylor wrote: >Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, >the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my >Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair >slower, the CRC was (quite surprisingly) about 40% slower, and the >implementation of SHA that I had available was a real dog. Taking an >arbitrary 32 bits of a MD5 would likely be less collision prone than >using a 32-bit CRC, and it appears faster as well. > > I just want to confirm that you used something like the fast 32-bit > CRC algorithm, appended. The one posted earlier was accurate but > slow. Yes. I just rebuilt the framework using this exact code, and it performed identically to the previous CRC code (which didn't have an unrolled inner loop). These were compiled with -O6 with egcs 1.1.2. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: CRC was: Re: [HACKERS] beta testing version
On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote: > Bruce Guenter <[EMAIL PROTECTED]> writes: > > ... Taking an > > arbitrary 32 bits of a MD5 would likely be less collision prone than > > using a 32-bit CRC, and it appears faster as well. > > ... but that would be an algorithm that you know NOTHING about the > properties of. What is your basis for asserting it's better than CRC? MD5 is a cryptographic hash, which means (AFAIK) that ideally it is impossible to produce a collision using any other method than brute force attempts. In other words, any stream of input to the hash that is longer than the hash length (8 bytes for MD5) is equally probable to produce a given hash code. > CRC is pretty well studied and its error-detection behavior is known > (and good). MD5 has been studied less thoroughly AFAIK, and in any > case what's known about its behavior is that the *entire* MD5 output > provides a good signature for a datastream. If you pick some ad-hoc > method like taking a randomly chosen subset of MD5's output bits, > you really don't know anything at all about what the error-detection > properties of the method are. Actually, in my reading reagarding the properties of MD5, I read an article that stated that if a smaller number of bits was desired, one could either (and here's where my memory fails me) just select the middle N bits from the resulting hash, or fold the hash using XOR until the desired number of bits was reached. I'll see if I can find a reference... RFC2289 (http://www.ietf.org/rfc/rfc2289.txt) includes an algorithm for folding MD5 digests down to 64 bits by XORing the top half with the bottom half. See appendix A. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: CRC was: Re: [HACKERS] beta testing version
On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote: > 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for >2^32 times the confidence. That's pretty cheap confidence. Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair slower, the CRC was (quite surprisingly) about 40% slower, and the implementation of SHA that I had available was a real dog. Taking an arbitrary 32 bits of a MD5 would likely be less collision prone than using a 32-bit CRC, and it appears faster as well. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] CRCs (was: beta testing version)
On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote: > > > That's why an end marker must follow all valid records. > > That requires an extra out-of-sequence write. > Yes, and also increase probability to corrupt already committed > to log data. Are you referring to the case where the drive loses power in mid-write? That is solved by either arranging for the markers to always be placed at the start of a block, or by plugging in a UPS. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] CRCs (was: beta testing version)
On Thu, Dec 07, 2000 at 12:25:41PM -0800, Nathan Myers wrote: > That requires an extra out-of-sequence write. Ayup! > Generally, there are no guarantees, only reasonable expectations. I would differ, but that's irrelevant. > A 64-bit CRC would give sufficient confidence... This is part of what I was getting at, in a roundabout way. If you use a CRC, hash, or any other kind of non-trivial check code, you have a certain level of confidence in the data, but not a guarantee. If you decide, based on your expert opinions, that a 32 or 64 bit CRC or hash gives you an adequate level of confidence in the event of a crash, then I'll be satisfied, but don't call it a guarantee. Them's small nits we're picking at, though. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] CRCs (was: beta testing version)
On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote: > On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote: > > On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote: > > > How exactly *do* we determine where the end of the valid log data is, > > > anyway? > > > > I don't know how pgsql does it, but the only safe way I know of is to > > include an "end" marker after each record. When writing to the log, > > append the records after the last end marker, ending with another end > > marker, and fdatasync the log. Then overwrite the previous end marker > > to indicate it's not the end of the log any more and fdatasync again. > > > > To ensure that it is written atomically, the end marker must not cross a > > hardware sector boundary (typically 512 bytes). This can be trivially > > guaranteed by making the marker a single byte. > > An "end" marker is not sufficient, unless all writes are done in > one-sector units with an fsync between, and the drive buffering > is turned off. That's why an end marker must follow all valid records. When you write records, you don't touch the marker, and add an end marker to the end of the records you've written. After writing and syncing the records, you rewrite the end marker to indicate that the data following it is valid, and sync again. There is no state in that sequence in which partially- written data could be confused as real data, assuming either your drives aren't doing write-back caching or you have a UPS, and fsync doesn't return until the drives return success. > For larger writes the OS will re-order the writes. > Most drives will re-order them too, even if the OS doesn't. I'm well aware of that. > > Any other way I've seen discussed (here and elsewhere) either > > - Assume that a CRC is a guarantee. > > We are already assuming a CRC is a guarantee. > > The drive computes a CRC for each sector, and if the CRC is OK the > drive is happy. CRC errors within the drive are quite frequent, and > the drive re-reads when a bad CRC comes up. The kind of data failures that a CRC is guaranteed to catch (N-bit errors) are almost precisely those that a mis-read on a hardware sector would cause. > > ... A CRC would be a good addition to > > help ensure the data wasn't broken by flakey drive firmware, but > > doesn't guarantee consistency. > No, a CRC would be a good addition to compensate for sector write > reordering, which is done both by the OS and by the drive, even for > "atomic" writes. But it doesn't guarantee consistency, even in that case. There is a possibility (however small) that the random data that was located in the sectors before the write will match the CRC. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: AW: [HACKERS] beta testing version
On Wed, Dec 06, 2000 at 11:13:33PM +, Daniele Orlandi wrote: > Bruce Guenter wrote: > > - Assume that a CRC is a guarantee. A CRC would be a good addition to > > help ensure the data wasn't broken by flakey drive firmware, but > > doesn't guarantee consistency. > Even a CRC per transaction (it could be a nice END record) ? CRCs are designed to catch N-bit errors (ie N bits in a row with their values flipped). N is (IIRC) the number of bits in the CRC minus one. So, a 32-bit CRC can catch all 31-bit errors. That's the only guarantee a CRC gives. Everything else has a 1 in 2^32-1 chance of producing the same CRC as the original data. That's pretty good odds, but not a guarantee. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: AW: [HACKERS] beta testing version
On Wed, Dec 06, 2000 at 11:15:26AM -0500, Tom Lane wrote: > Zeugswetter Andreas SB <[EMAIL PROTECTED]> writes: > > Yes, but there would need to be a way to verify the last page or > > record from txlog when running on crap hardware. > How exactly *do* we determine where the end of the valid log data is, > anyway? I don't know how pgsql does it, but the only safe way I know of is to include an "end" marker after each record. When writing to the log, append the records after the last end marker, ending with another end marker, and fdatasync the log. Then overwrite the previous end marker to indicate it's not the end of the log any more and fdatasync again. To ensure that it is written atomically, the end marker must not cross a hardware sector boundary (typically 512 bytes). This can be trivially guaranteed by making the marker a single byte. Any other way I've seen discussed (here and elsewhere) either - Requires atomic multi-sector writes, which are possible only if all the sectors are sequential on disk, the kernel issues one large write for all of them, and you don't powerfail in the middle of the write. - Assume that a CRC is a guarantee. A CRC would be a good addition to help ensure the data wasn't broken by flakey drive firmware, but doesn't guarantee consistency. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Using Threads?
On Tue, Dec 05, 2000 at 02:52:48PM -0500, Tom Lane wrote: > There aren't going to be all that many data pages needing the COW > treatment, because the postmaster uses very little data space of its > own. I think this would become an issue if we tried to have the > postmaster pre-cache catalog information for backends, however (see > my post elsewhere in this thread). Would that pre-cached data not be placed in a SHM segment? Such segments don't do COW, so this would be a non-issue. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Using Threads?
On Mon, Dec 04, 2000 at 08:43:24PM -0800, Tom Samplonius wrote: > Some OSes (Linux is the main one) implement threads as pseudo processes. > Linux threads are processes with a shared address space and file > descriptor table. > > Context switch cost for threads can be lower if you are switching to a > thread in the same process. That of course assumes that all context > switches will occur within the same process, or the Linux > everything-is-a-process model isn't used. Actually, context switch cost between threads is low on Linux as well, since the CPU's VM mappings don't get invalidated. This means that its page tables won't get reloaded, which is one of the large costs involved in context switches. Context switches between processes takes (with no significant VM) about 900 cycles (1.8us) on a 450MHz Celery. I would expect thread switch time to be slightly lower than that, and context switches between processes with large VMs would be much larger just due to the cost of reloading the page tables. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Using Threads?
On Tue, Dec 05, 2000 at 10:07:37AM +0100, Zeugswetter Andreas SB wrote: > > And using the following program for timing thread creation > > and cleanup: > > > > #include > > > > threadfn() { pthread_exit(0); } > > I think you would mainly need to test how the system behaves, if > the threads and processes actually do some work in parallel, like: > > threadfn() {int i; for (i=0; i<1000;) {i++}; pthread_exit(0); } The purpose of the benchmark was to time how long it took to create and destroy a process or thread, nothing more. It was not creating processes in parallel for precisely that reason. The point in dispute was that threads took much less time to create than processes. > In a good thread implementation 1 parallel processes tend to get way less > cpu than 1 parallel threads, making threads optimal for the very many clients >case > (like > 3000). Why do you believe this? In the "classical" thread implementation, each process would get the same amount of CPU, no matter how many threads was running in it. That would mean that many parallel processes would get more CPU in total than many threads in one process. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Using Threads?
On Mon, Dec 04, 2000 at 02:30:31PM -0800, Dan Lyke wrote: > Adam Haberlach writes: > > Typically (on a well-written OS, at least), the spawning of a thread > > is much cheaper then the creation of a new process (via fork()). > This would be well worth testing on some representative sample > systems. Using the following program for timing process creation and cleanup: main() { int i; int pid; for (i=0; i<10; ++i) { pid=fork(); if(pid==-1) exit(1); if(!pid) _exit(0); waitpid(pid,0,0); } exit(0); } And using the following program for timing thread creation and cleanup: #include threadfn() { pthread_exit(0); } main() { int i; pthread_t thread; for (i=0; i<10; ++i) { if (pthread_create(&thread, 0, threadfn, 0)) exit(1); if (pthread_join(thread, 0)) exit(1); } exit(0); } On a relatively unloaded 500MHz PIII running Linux 2.2, the fork test program took a minimum of 16.71 seconds to run (167us per fork/exit/wait), and the thread test program took a minimum of 12.10 seconds to run (121us per pthread_create/exit/join). I use the minimums because those would be the runs where the tasks were least interfered with by other tasks. This amounts to a roughly 25% speed improvement for threads over processes, for the null-process case. If I add the following lines before the for loop: char* m; m=malloc(1024*1024); memset(m,0,1024,1024); The cost for doing the fork balloons to 240us, whereas the cost for doing the thread is constant. So, the cost of marking the pages as COW is quite significant (using those numbers, 73us/MB). So, forking a process with lots of data is expensive. However, most of the PostgreSQL data is in a SysV IPC shared memory segment, which shouldn't affect the fork numbers. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Using Threads?
On Mon, Dec 04, 2000 at 03:17:00PM -0800, Adam Haberlach wrote: > Typically (on a well-written OS, at least), the spawning of a thread > is much cheaper then the creation of a new process (via fork()). Unless I'm mistaken, the back-end is only forked when starting a new connection, in which case the latency of doing the initial TCP tri-state and start-up queries is much larger than any process creation cost. On Linux 2.2.16 on a 500MHz PIII, I can do the fork/exit/wait sequence in about 164us. On the same server, I can make/break a PostgreSQL connection in about 19,000us (with 0% CPU idle, about 30% CPU system). Even if we can manage to get a thread for free, and assume that the fork from postmaster takes more than 164us, it won't make a big difference once the other latencies are worked out. > Also, since everything in a group of threads (I'll call 'em a team) Actually, you call them a process. That is the textbook definition. > shares the > same address space, there can be some memory overhead savings. Only slightly. All of the executable and libraries should already be shared, as will all non-modified data. If the data is modified by the threads, you'll need seperate copies for each thread anyways, so the net difference is small. I'm not denying there would be a difference. Compared to seperate processes, threads are more efficient. Doing a context switch between threads means there is no PTE invalidations, which makes them quicker than between processes. Creation would be a bit faster due to just linking in the VM to a new thread rather than marking it all as COW. The memory savings would come from reduced fragmentation of the modified data (if you have 1 byte modified on each of 100 pages, the thread would grow by a few K, compared to 400K for processes). I'm simply arguing that the differences don't appear to be significant compared to the other costs involved. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] Using Threads?
On Mon, Nov 27, 2000 at 11:42:24PM -0600, Junfeng Zhang wrote: > I am new to postgreSQL. When I read the documents, I find out the Postmaster > daemon actual spawns a new backend server process to serve a new client > request. Why not use threads instead? Is that just for a historical reason, > or some performance/implementation concern? Once all the questions regarding "why not" have been answered, it would be good to also ask "why use threads?" Do they simplify the code? Do they offer significant performance or efficiency gains? What do they give, other than being buzzword compliant? -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] 8192 BLCKSZ ?
On Tue, Nov 28, 2000 at 12:38:37AM -0500, Tom Lane wrote: > Not sure about the wild-and-wooly world of Linux filesystems... > anybody know what the allocation unit is on the popular Linux FSes? It rather depends on the filesystem. Current ext2 (the most common) systems default to 1K on small partitions and 4K otherwise. IIRC, reiserfs uses 4K blocks in a tree structure that includes tail merging which makes the question of block size tricky. Linux 2.3.x passes all file I/O through its page cache, which deals in 4K pages on most 32-bit architectures. -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature
Re: [HACKERS] [Fwd: [CORE SDI ADVISORY] MySQL weak authentication]
On Tue, Oct 24, 2000 at 10:25:14AM -0400, Lamar Owen wrote: > I am forwarding this not to belittle MySQL, but to hopefully help in the > development of our own encryption protocol for secure password > authentication over the network. > > The point being is that if we offer the protocol to do it, we had better > ensure its security, or someone WILL find the hole. Hopefully it will > be people who want to help security and not exploit it. IMO, anything short of a full SSL wrapped connection is fairly pointless. What does it matter if the password is encrypted if sensitive query data flows in the clear? -- Bruce Guenter <[EMAIL PROTECTED]> http://em.ca/~bruceg/ PGP signature