SV: z/OS file system and some Friday thoughts
-Ursprungligt meddelande- Från: IBM Mainframe Discussion List [mailto:ibm-m...@bama.ua.edu] För Joel C. Ewing Skickat: den 21 maj 2010 15:22 Till: IBM-MAIN@bama.ua.edu Ämne: Re: z/OS file system and some Friday thoughts On 05/21/2010 08:03 AM, Thomas Berg wrote: As I have followed some threads lately and also many times earlier about the file system in general and PDS(E)s in particular I got an (maybe OT) idea. (Beware! :) ) The goal is to be able to have any string in general as a data set name/file name and in particular *nix type name/structure. And that in z/OS native. As we have 44 bytes available in the catalog(s), I think we can do something like this: - Use 16 bytes for a hash (MDM5/SHA-2) of the file name. - Use 16 bytes for a hash of the dir path. - Alternatively using 16 bytes hash for the whole string of path and file name. But I think that separate hashes would get some performance benefits when handling directorys. - We must probably use an (initial?) byte with e g nulls for avoiding collision with the old data set names. - As an additional option we could use maybe 4 bytes for file version/generation handling. E g we have a file Just a test etc. which have the hash x'F296A5AE68F284954EBF47EC5EEFD72E', in the dir /First level dir/Second level dir/ which have the hash x'847FD35FD88274EC0EDA528E7CD7A65A'. So the 44 (?) bytes would maybe look like: x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A00 '. Enq:s would then also have the hash as keys etc. Am I an idiot or just a typical programmer ? :) Regards, Thomas Berg _ Thomas Berg Specialist A M SWEDBANK ... The obvious problem with this of course is that hash functions by their very nature (mapping a larger domain to a smaller range) cannot be one-to-one. They work reliably for such things as symbol table lookup only because you also have the full string available, both as the search argument and in the table in order to resolve collisions when two strings hash to the same value. A hash value unaccompanied by the full string does not represent a unique string and is thus ambiguous. Most programmers/users would not find it acceptable to request a read/update/enqueue on file a and have it give the same results as a reference to some unknown and unrelated file x, just because they happened to hash to the same value. If the hash function mapped to an equal or larger range, then it would be possible to have uniqueness; but then the hash value would require more bits to represent it than the original string of symbols and nothing would be gained by the substitution. I don't see the collision risk as a problem. My experience is that there is *very* seldom a collision. And that is understandable if You look at the usage of the file name space, it is not used for a random string of bits rather there is very strong adherence to the alphabetic and numeric characters together with some special ones. There is an extremely abysmal part of the binary space that is used. (Which is the point if this solution.) Nevertheless we must of course take care of that when it happens and e g use a duplicate flag for it. The rest is a SMOP. Regards, Thomas Berg _ Thomas Berg Specialist A M SWEDBANK -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
z/OS file system and some Friday thoughts
As I have followed some threads lately and also many times earlier about the file system in general and PDS(E)s in particular I got an (maybe OT) idea. (Beware! :) ) The goal is to be able to have any string in general as a data set name/file name and in particular *nix type name/structure. And that in z/OS native. As we have 44 bytes available in the catalog(s), I think we can do something like this: - Use 16 bytes for a hash (MDM5/SHA-2) of the file name. - Use 16 bytes for a hash of the dir path. - Alternatively using 16 bytes hash for the whole string of path and file name. But I think that separate hashes would get some performance benefits when handling directorys. - We must probably use an (initial?) byte with e g nulls for avoiding collision with the old data set names. - As an additional option we could use maybe 4 bytes for file version/generation handling. E g we have a file Just a test etc. which have the hash x'F296A5AE68F284954EBF47EC5EEFD72E', in the dir /First level dir/Second level dir/ which have the hash x'847FD35FD88274EC0EDA528E7CD7A65A'. So the 44 (?) bytes would maybe look like: x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A00'. Enq:s would then also have the hash as keys etc. Am I an idiot or just a typical programmer ? :) Regards, Thomas Berg _ Thomas Berg Specialist A M SWEDBANK -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: z/OS file system and some Friday thoughts
Thomas Berg wrote: [ ... lots of interesting things about hashes ... ] Curiousity question: Where are you going to store those hashes? Am I an idiot or just a typical programmer ? :) You are NOT licensed to be an idiot. ;-D Actually what you suggested could be very useful. Perhaps a SHARE submission? Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: z/OS file system and some Friday thoughts
On 05/21/2010 08:03 AM, Thomas Berg wrote: As I have followed some threads lately and also many times earlier about the file system in general and PDS(E)s in particular I got an (maybe OT) idea. (Beware! :) ) The goal is to be able to have any string in general as a data set name/file name and in particular *nix type name/structure. And that in z/OS native. As we have 44 bytes available in the catalog(s), I think we can do something like this: - Use 16 bytes for a hash (MDM5/SHA-2) of the file name. - Use 16 bytes for a hash of the dir path. - Alternatively using 16 bytes hash for the whole string of path and file name. But I think that separate hashes would get some performance benefits when handling directorys. - We must probably use an (initial?) byte with e g nulls for avoiding collision with the old data set names. - As an additional option we could use maybe 4 bytes for file version/generation handling. E g we have a file Just a test etc. which have the hash x'F296A5AE68F284954EBF47EC5EEFD72E', in the dir /First level dir/Second level dir/ which have the hash x'847FD35FD88274EC0EDA528E7CD7A65A'. So the 44 (?) bytes would maybe look like: x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A00'. Enq:s would then also have the hash as keys etc. Am I an idiot or just a typical programmer ? :) Regards, Thomas Berg _ Thomas Berg Specialist A M SWEDBANK ... The obvious problem with this of course is that hash functions by their very nature (mapping a larger domain to a smaller range) cannot be one-to-one. They work reliably for such things as symbol table lookup only because you also have the full string available, both as the search argument and in the table in order to resolve collisions when two strings hash to the same value. A hash value unaccompanied by the full string does not represent a unique string and is thus ambiguous. Most programmers/users would not find it acceptable to request a read/update/enqueue on file a and have it give the same results as a reference to some unknown and unrelated file x, just because they happened to hash to the same value. If the hash function mapped to an equal or larger range, then it would be possible to have uniqueness; but then the hash value would require more bits to represent it than the original string of symbols and nothing would be gained by the substitution. -- Joel C. Ewing, Fort Smith, ARjremoveccapsew...@acm.org -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: z/OS file system and some Friday thoughts
-Original Message- From: IBM Mainframe Discussion List [mailto:ibm-m...@bama.ua.edu] On Behalf Of Thomas Berg Sent: Friday, May 21, 2010 8:03 AM To: IBM-MAIN@bama.ua.edu Subject: z/OS file system and some Friday thoughts SNIPPAGE Am I an idiot or just a typical programmer ? :) door flung open And there is a difference? shields at maximum Regards, Steve Thompson -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: z/OS file system and some Friday thoughts
Automatic QED. On 5/21/10, Thompson, Steve steve_thomp...@stercomm.com wrote: -Original Message- From: IBM Mainframe Discussion List [mailto:ibm-m...@bama.ua.edu] On Behalf Of Thomas Berg Sent: Friday, May 21, 2010 8:03 AM To: IBM-MAIN@bama.ua.edu Subject: z/OS file system and some Friday thoughts SNIPPAGE Am I an idiot or just a typical programmer ? :) door flung open And there is a difference? shields at maximum Regards, Steve Thompson -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html -- It is no measure of health to be well adjusted to a profoundly sick society. -Krishnamurti I am as you, in you, for you. One as you in all, as all, forever. My call is your call. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: z/OS file system and some Friday thoughts
---snip--- Am I an idiot or just a typical programmer ? :) door flung open And there is a difference? shields at maximum -unsnip- Also set phasers at stun. :-) Rick -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html