Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-13 Thread Joel C. Ewing
On 09/12/2014 01:59 PM, Paul Gilmartin wrote:
 On Fri, 12 Sep 2014 09:16:54 -0700, Anne  Lynn Wheeler wrote:
 re:
 http://www.garlic.com/~lynn/2014k.html#7 [sqlite] presentation about 
 ordering and atomicity of filesystems

 part of the issue was that incomplete write ... with propogated zeros
 ... would also (then) rewrite the error correcting codes for the record
 (with propogated zeros) ... so there wouldn't even be an error
 indication that the write was performed incorrectly (installation
 wouldn't even know to perform restore because of write error).

 It's almost as if they concealed the error on purpose.  Well, not quite;
 it depends on where in the data path the ECC was generated -- it
 should have been done farther upstream.

 later fba disks ... especially in conjunction with raid ... had
 requirement that single block write would complete correctly once
 started. ...

 With what probability, and subject to what assumptions?  If the
 data lead to the write head fails mechanically at a critical time, a
 bad block will be written.  Negligibly improbable?  Yes.  Physically
 impossible?  No.  Detectable by ECC?  Probably.

 -- gil


If the hardware knows it has incomplete information to write an entire
block because of some abnormal hardware condition, then something should
be done to guarantee that any later attempt to read that block will
produce an error indication.  If that is not the case, this would appear
to be a violation of one of the major tenets of mainframe design:  that
any data errors resulting from hardware issues should be at least
detectable, if not correctable.  Writing a valid block with trailing
zeros in such a case sounds a bad design decision.

-- 
Joel C. Ewing,Bentonville, AR   jcew...@acm.org 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-13 Thread Anne Lynn Wheeler
jcew...@acm.org (Joel C. Ewing) writes:
 If the hardware knows it has incomplete information to write an entire
 block because of some abnormal hardware condition, then something should
 be done to guarantee that any later attempt to read that block will
 produce an error indication.  If that is not the case, this would appear
 to be a violation of one of the major tenets of mainframe design:  that
 any data errors resulting from hardware issues should be at least
 detectable, if not correctable.  Writing a valid block with trailing
 zeros in such a case sounds a bad design decision.

re:
http://www.garlic.com/~lynn/2014k.html#7 Fwd: [sqlite] presentation about 
ordering and atomicity of filesystems
http://www.garlic.com/~lynn/2014k.html#8 Fwd: [sqlite] presentation about 
ordering and atomicity of filesystems

*and* generating a valid error correcting code for the propogated zeros

at one point i was asked to audit some of the early raid5 vendors
... and there were some cases where i had to give presentations on what
no-single-point-of-failure means (having found single points of
failure).

nearly decade earlier, i was involved in working with NSF on
interconnecting NSF supercomputer centers (later evolves into the NSFNET
backbone, precursor to the modern internet) ... some old email
http://www.garlic.com/~lynn/lhwemail.html#nsfnet

in part because had internal (HSDT) project with T1 (1.5mbit/sec) and
faster links ... some past posts
http://www.garlic.com/~lynn/subnetwork.html#hsdt

one of the people working on the effort had been graduate student of
Reed at jpl/caltech and did a lot of the original work on reed-solomon
(error correcting code). Also got to work with cyclotomics up in
berkeley (on of the founders was berlekamp) ... cyclotomics did a lot of
the reed-solomon stuff that shows up in the cdrom standard ... during
this period, they were bought by kodak. a couple recent posts
http://www.garlic.com/~lynn/2014g.html#75 non-IBM: SONY new tape storage - 185 
Terabytes on a tape
http://www.garlic.com/~lynn/2014j.html#68 No Internet. No Microsoft Windows. No 
iPods. This Is What Tech Was Like In 1984

reed-solomon
http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction

as previously mentioned ... one of the justifications for the industry
moving from fba-512 to fba-4096 was reducing space taken up by error
correcting code:
http://en.wikipedia.org/wiki/Advanced_Format

past posts mentioning fba, ckd, multi-track search, etc
http://www.garlic.com/~lynn/submain.html#dasd

-- 
virtualization experience starting Jan1968, online at home since Mar1970

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread John McKown
This is not about the z, per se, but is interesting. I don't think
that any of the IBM systems have this type of filesystem. Hum,
perhaps the i?


-- Forwarded message --
From: Kees Nuyt k.n...@zonnet.nl
Date: Thu, Sep 11, 2014 at 4:49 PM
Subject: [sqlite] presentation about ordering and atomicity of filesystems
To: sqlite-us...@sqlite.org



Hi all,

Today I bumped into a presentation about ordering and atomicity
of filesystems that might interest you.

https://www.youtube.com/watch?v=YvchhB1-Aws

The Application/Storage Interface: After All These Years, We're
Still Doing It Wrong
Remzi Arpaci-Dusseau, University of Wisconsin—Madison

Talk at usenix 2014 Published on Sep 4, 2014 by USENIX
Association Videos

Somewhat related to the article drh recently wrote about using
sqlite as an application data store.

--
Regards,

Kees Nuyt
___
sqlite-users mailing list
sqlite-us...@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Shane Ginnane
On Fri, 12 Sep 2014 06:28:47 -0500, John McKown wrote:

This is not about the z, per se, but is interesting. I don't think
that any of the IBM systems have this type of filesystem. Hum,
perhaps the i?

John, you gotta stop posting this stuff just before midnight on a Friday night 
!!!.
I got partway into it, but it'd be almost breakfast by the time it finished - 
maybe later I'll get back to it    :0)

Shane ...

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread John McKown
On Fri, Sep 12, 2014 at 9:08 AM, Shane Ginnane ibm-m...@tpg.com.au wrote:
 On Fri, 12 Sep 2014 06:28:47 -0500, John McKown wrote:

This is not about the z, per se, but is interesting. I don't think
that any of the IBM systems have this type of filesystem. Hum,
perhaps the i?

 John, you gotta stop posting this stuff just before midnight on a Friday 
 night !!!.
 I got partway into it, but it'd be almost breakfast by the time it finished - 
 maybe later I'll get back to it    :0)

 Shane ...

It was only after midnight because you Australian don't set your
clocks correctly. I posted that in the middle of the day. At least
according to GMT time. Which is the only true time. Right? [grin/]

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Anne Lynn Wheeler
john.archie.mck...@gmail.com (John McKown) writes:
 This is not about the z, per se, but is interesting. I don't think
 that any of the IBM systems have this type of filesystem. Hum,
 perhaps the i?

original CMS filesystem from mid-60s ... was somewhat brought over from
CTSS ... would simulate fixed-block on CKD dasd (somewhat inverse of the
current situation where there hasn't been any CKD DASD manufactured for
decades and simulated on industry standard fixed-block). The default was
to not replace/update existing record ... but write to newly allocated
location ... then periodically update alloction map, file directory (aka
VTOC) ... also to new location and then rewrite the MFD record
(in-place, single write that would flip between the old set of records
and the new set of records).

however, ibm CKD dasd had a peculiar power failure mode ... that might
occur in the middle of a write operation ... there would be sufficient
power to complete a write in progress ... but not sufficient power to
continue transmitting the data from processor memory over the channel
... so the controller completed the write operation with all zeros (and
no indication of a read/write failure). As far as i know, none of the
other mainframe systems made any software provisions to handle this
particular failure mode of ibm ckd dasd.

As a result, in the mid-70s, the CMS extended file system had fix
... which change to a pair of MFD records and would alternatively write
to the pair of records. On initial startup ... it would check both
records to see if both records had been written correctly (no zeros
propogated at the end of the record) and choose the most recent valid
record.

UNIX filesystem has been notorious for writting records in arbitrary
order ... especially the filesystem control information (metadata) and
after a shutdown/failure w/o clean shutdown (all records cleanly
written to disk) ... a start up after non-clean shutdown would have to
reread all records looking for inconsistencies ... which might take
large tens of minutes.

Circa 1990, aixv3 for rs/6000 enhanced the unix filesystem with logging
changes to the file directory information (metadata) ... a side-effect
was aix could almost immediately record/startup ... by rerunning logged
information (it doesn't do anything for consistency of file data ...
but does fix the unix filesystem integrity problem). AIX JFS filesystem
http://en.wikipedia.org/wiki/JFS_%28file_system%29
http://www.linuxjournal.com/article/6268

the original implementation relied on special hardware in 801/risc where
the unix filesystem control information (metadata) was placed in memory
area that was specially identified to catch all changes. then all
changes to filesystem was captured and journaled ... w/o having to
change all the unix code to explicitly call the journaling/logging
facility. The original claim was that the hardware implementation was
also faster than putting in explicit logging/journaling calls.  However,
when the ibm paloalto group was porting JFS to generic hardware (w/o the
801/risc features), they had to put in explicit logging/journaling calls
for changes. When they back ported that implementation to rs/6000, it
turns out the explicit calls ran faster than the original
implementation.

as an aside, we relied on JFS for faster restart when we did ibm's
ha/cmp (high availability, cluster multiprocessor) ... some past posts
http://www.garlic.com/~lynn/subtopic.html#hacmp

past posts mentioning 801/risc
http://www.garlic.com/~lynn/subtopic.html#801

recent references to Jim Gray credited with formalizing transaction
semantics and ACID properties
http://www.garlic.com/~lynn/2014f.html#69 Is end of mainframe near ?
http://www.garlic.com/~lynn/2014g.html#2 Is end of mainframe near ?
http://www.garlic.com/~lynn/2014g.html#14 Is end of mainframe near ?
http://www.garlic.com/~lynn/2014g.html#15 Is it time for a revolution to 
replace TLS?
http://www.garlic.com/~lynn/2014g.html#38 Fifty Years of BASIC, the Programming 
Language That Made Computers Personal
http://www.garlic.com/~lynn/2014k.html#2 Flat (VSAM or other) files still in 
use?


-- 
virtualization experience starting Jan1968, online at home since Mar1970

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Farley, Peter x23353
Isn't z/OS Unix HFS/ZFS that type of file system, on top of a VSAM linear 
dataset?

I haven't the time now to listen to the whole 90 minutes of video, but the 
first 13 minutes were enlightening.

Peter

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of John McKown
Sent: Friday, September 12, 2014 7:29 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

This is not about the z, per se, but is interesting. I don't think
that any of the IBM systems have this type of filesystem. Hum,
perhaps the i?

--


This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Tony Harminc
On 12 September 2014 10:38, Anne  Lynn Wheeler l...@garlic.com wrote:
 however, ibm CKD dasd had a peculiar power failure mode ... that might
 occur in the middle of a write operation ... there would be sufficient
 power to complete a write in progress ... but not sufficient power to
 continue transmitting the data from processor memory over the channel
 ... so the controller completed the write operation with all zeros (and
 no indication of a read/write failure). As far as i know, none of the
 other mainframe systems made any software provisions to handle this
 particular failure mode of ibm ckd dasd.

That's a not unreasonable implementation of an architected behaviour
on the BT/OEMI channel to CU interface, independent of power failure.
If an I/O reset is received by the control unit while a write is in
progress, it completes the write with zeros. What would be a more
reasonable behaviour on a disk with little or no buffering? So in
theory it's possible to corrupt data on disk just by hitting System
Reset (or Load) during a disk write. If you look at the probability
it's pretty unlikely, but I worked at one place that had a strict rule
about hitting stop and waiting a few seconds before doing the reset or
load.

Tony H.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Anne Lynn Wheeler
t...@harminc.net (Tony Harminc) writes:
 That's a not unreasonable implementation of an architected behaviour
 on the BT/OEMI channel to CU interface, independent of power failure.
 If an I/O reset is received by the control unit while a write is in
 progress, it completes the write with zeros. What would be a more
 reasonable behaviour on a disk with little or no buffering? So in
 theory it's possible to corrupt data on disk just by hitting System
 Reset (or Load) during a disk write. If you look at the probability
 it's pretty unlikely, but I worked at one place that had a strict rule
 about hitting stop and waiting a few seconds before doing the reset or
 load.

re:
http://www.garlic.com/~lynn/2014k.html#7 [sqlite] presentation about ordering 
and atomicity of filesystems/a

part of the issue was that incomplete write ... with propogated zeros
... would also (then) rewrite the error correcting codes for the record
(with propogated zeros) ... so there wouldn't even be an error
indication that the write was performed incorrectly (installation
wouldn't even know to perform restore because of write error).

later fba disks ... especially in conjunction with raid ... had
requirement that single block write would complete correctly once
started. before raid and with fba-512 blocks and 4k-byte logical blocks
... the hardware guarantee only applied to the physical 512byte block
... which could result in an inconsistent 4k-byte logical record (8
physical 512byte blocks) with no error condition. As a result, there had
to be special software provisions by filesystems with 4k-byte logical
records mapped to fba-512.

this particular issue has been eliminated with the recent move from
fba-512 to fba-4096 ... so 4k-byte logical block filesystems now match
the physical block size. part of the move from fba-512 to fba-4096 is
that rather than eight error correcting codes per 4k-bytes ... there is
only single error correcting code ... increasing the effective data
space on disk 
http://en.wikipedia.org/wiki/Advanced_Format

past posts mentioning fba, ckd, multi-track search, etc
http://www.garlic.com/~lynn/submain.html#dasd

-- 
virtualization experience starting Jan1968, online at home since Mar1970

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Fwd: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Paul Gilmartin
On Fri, 12 Sep 2014 09:16:54 -0700, Anne  Lynn Wheeler wrote:

re:
http://www.garlic.com/~lynn/2014k.html#7 [sqlite] presentation about ordering 
and atomicity of filesystems

part of the issue was that incomplete write ... with propogated zeros
... would also (then) rewrite the error correcting codes for the record
(with propogated zeros) ... so there wouldn't even be an error
indication that the write was performed incorrectly (installation
wouldn't even know to perform restore because of write error).
 
It's almost as if they concealed the error on purpose.  Well, not quite;
it depends on where in the data path the ECC was generated -- it
should have been done farther upstream.

later fba disks ... especially in conjunction with raid ... had
requirement that single block write would complete correctly once
started. ...

With what probability, and subject to what assumptions?  If the
data lead to the write head fails mechanically at a critical time, a
bad block will be written.  Negligibly improbable?  Yes.  Physically
impossible?  No.  Detectable by ECC?  Probably.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN