bob wrote: > On Wed, 13 Aug 2008, paul wrote: > >> Shy extremely noisy hardware and/or literal hard failure, most >> errors will most likely always be expressed as 1 bit out of some >> very large N number of bits. > > This claim ignores the fact that most computers today are still based > on synchronously clocked parallel bus hardware. A common failure mode > is clock skew, which causes many bits to be wrong at once. This can > even happen within the CPU.
- in my experience clock skew/drift problems will first manifest themselves by expressing single bit errors even on parallel interfaces, as all although all paths are logically parallel, the actual physical performance of each of the individual transistor & traces composing the data path will be ever so slightly different and although physical cad layout tools attempt to balance clock trees, the actual arrival time of the clock to the latch elements of the physical data-path implementation will also be slightly different (often differing by as much as few picoseconds; therefore as a circuit approaches its maximum frequency threshold (which depends on temperature, age, etc), some very small number of single bit errors will begin to be generated, due to setup/hold time violations being exceeded on the bit with the least physical clock skew tolerance, as the clock frequency and/or temperature (etc) increases, more and more bit paths will begin to fail, until the whole path fails. Thereby as all systems have some of the bits within parallel paths being more sensitive to one type of corruption or another, I tend to believe that single bit failures will tend to express themselves statistically prior to and in greater number than multi-bit failures even though hardware still seems operable. > As serial interfaces continue to be added to computers, the number of > single bit errors (vs multi-bit errors) would tend to increase except > for the fact that these serial interfaces are designed to detect and > discard erroroneous packets. > > I do agree that the logic between the self-validating interfaces can > be faulty. > > Bob This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss