SV: z/OS file system and some Friday thoughts

2010-05-24 Thread Thomas Berg
 -Ursprungligt meddelande-
 Från: IBM Mainframe Discussion List [mailto:ibm-m...@bama.ua.edu] För Joel
 C. Ewing
 Skickat: den 21 maj 2010 15:22
 Till: IBM-MAIN@bama.ua.edu
 Ämne: Re: z/OS file system and some Friday thoughts
 
 On 05/21/2010 08:03 AM, Thomas Berg wrote:
  As I have followed some threads lately and also many
  times earlier about the file system in general and PDS(E)s
  in particular I got an (maybe OT) idea.  (Beware! :) )
 
  The goal is to be able to have any string in general as
  a data set name/file name and in particular *nix type
  name/structure.  And that in z/OS native.
 
  As we have 44 bytes available in the catalog(s), I think we can
  do something like this:
 
  - Use 16 bytes for a hash (MDM5/SHA-2) of the file name.
  - Use 16 bytes for a hash of the dir path.
  - Alternatively using 16 bytes hash for the whole string of
  path and file name.  But I think that separate hashes would
  get some performance benefits when handling directorys.
  - We must probably use an (initial?) byte with e g nulls for avoiding
  collision with the old data set names.
  - As an additional option we could use maybe 4 bytes for file
  version/generation handling.
 
  E g we have a file Just a test etc. which have the hash
  x'F296A5AE68F284954EBF47EC5EEFD72E', in the dir
  /First level dir/Second level dir/ which have the hash
  x'847FD35FD88274EC0EDA528E7CD7A65A'.
 
  So the 44 (?) bytes would maybe look like:
 
 x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A00
 '.
 
  Enq:s would then also have the hash as keys etc.
 
 
  Am I an idiot or just a typical programmer ?  :)
  Regards,
  Thomas Berg
  _
  Thomas Berg   Specialist   A M   SWEDBANK
 ...
 
 The obvious problem with this of course is that hash functions by their
 very nature (mapping a larger domain to a smaller range) cannot be
 one-to-one.  They work reliably for such things as symbol table lookup
 only because you also have the full string available, both as the search
 argument and in the table in order to resolve collisions when two
 strings hash to the same value.  A hash value unaccompanied by the full
 string does not represent a unique string and is thus ambiguous.
 
 Most programmers/users would not find it acceptable to request a
 read/update/enqueue on file a and have it give the same results as a
 reference to some unknown and unrelated file x, just because they
 happened to hash to the same value.
 
 If the hash function mapped to an equal or larger range, then it would
 be possible to have uniqueness; but then the hash value would require
 more bits to represent it than the original string of symbols and
 nothing would be gained by the substitution.

I don't see the collision risk as a problem.  My experience is that there is 
*very* seldom a collision.  
And that is understandable if You look at the usage of the file name space, 
it is not used for a random string of bits rather there is very strong 
adherence 
to the alphabetic and numeric characters together with some special ones. 
There is an extremely abysmal part of the binary space that is used. (Which 
is the point if this solution.) 
Nevertheless we must of course take care of that when it happens and e g use 
a duplicate flag for it.  The rest is a SMOP. 


 
Regards, 
Thomas Berg 
_ 
Thomas Berg   Specialist   A M   SWEDBANK 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


z/OS file system and some Friday thoughts

2010-05-21 Thread Thomas Berg
As I have followed some threads lately and also many
times earlier about the file system in general and PDS(E)s
in particular I got an (maybe OT) idea.  (Beware! :) )

The goal is to be able to have any string in general as
a data set name/file name and in particular *nix type
name/structure.  And that in z/OS native.

As we have 44 bytes available in the catalog(s), I think we can
do something like this:

- Use 16 bytes for a hash (MDM5/SHA-2) of the file name.
- Use 16 bytes for a hash of the dir path.
- Alternatively using 16 bytes hash for the whole string of
path and file name.  But I think that separate hashes would
get some performance benefits when handling directorys.
- We must probably use an (initial?) byte with e g nulls for avoiding
collision with the old data set names.
- As an additional option we could use maybe 4 bytes for file
version/generation handling.

E g we have a file Just a test etc. which have the hash
x'F296A5AE68F284954EBF47EC5EEFD72E', in the dir
/First level dir/Second level dir/ which have the hash
x'847FD35FD88274EC0EDA528E7CD7A65A'.

So the 44 (?) bytes would maybe look like:
x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A00'.

Enq:s would then also have the hash as keys etc.


Am I an idiot or just a typical programmer ?  :)



Regards,
Thomas Berg
_
Thomas Berg   Specialist   A M   SWEDBANK


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: z/OS file system and some Friday thoughts

2010-05-21 Thread Elardus Engelbrecht
Thomas Berg wrote:

[ ... lots of interesting things about hashes ... ] 

Curiousity question: Where are you going to store those hashes?

Am I an idiot or just a typical programmer ?  :)

You are NOT licensed to be an idiot. ;-D

Actually what you suggested could be very useful. Perhaps a SHARE 
submission?

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: z/OS file system and some Friday thoughts

2010-05-21 Thread Joel C. Ewing
On 05/21/2010 08:03 AM, Thomas Berg wrote:
 As I have followed some threads lately and also many
 times earlier about the file system in general and PDS(E)s
 in particular I got an (maybe OT) idea.  (Beware! :) )
 
 The goal is to be able to have any string in general as
 a data set name/file name and in particular *nix type
 name/structure.  And that in z/OS native.
 
 As we have 44 bytes available in the catalog(s), I think we can
 do something like this:
 
 - Use 16 bytes for a hash (MDM5/SHA-2) of the file name.
 - Use 16 bytes for a hash of the dir path.
 - Alternatively using 16 bytes hash for the whole string of
 path and file name.  But I think that separate hashes would
 get some performance benefits when handling directorys.
 - We must probably use an (initial?) byte with e g nulls for avoiding
 collision with the old data set names.
 - As an additional option we could use maybe 4 bytes for file
 version/generation handling.
 
 E g we have a file Just a test etc. which have the hash
 x'F296A5AE68F284954EBF47EC5EEFD72E', in the dir
 /First level dir/Second level dir/ which have the hash
 x'847FD35FD88274EC0EDA528E7CD7A65A'.
 
 So the 44 (?) bytes would maybe look like:
 x'00F296A5AE68F284954EBF47EC5EEFD72E847FD35FD88274EC0EDA528E7CD7A65A00'.
 
 Enq:s would then also have the hash as keys etc.
 
 
 Am I an idiot or just a typical programmer ?  :)
 Regards,
 Thomas Berg
 _
 Thomas Berg   Specialist   A M   SWEDBANK
...

The obvious problem with this of course is that hash functions by their
very nature (mapping a larger domain to a smaller range) cannot be
one-to-one.  They work reliably for such things as symbol table lookup
only because you also have the full string available, both as the search
argument and in the table in order to resolve collisions when two
strings hash to the same value.  A hash value unaccompanied by the full
string does not represent a unique string and is thus ambiguous.

Most programmers/users would not find it acceptable to request a
read/update/enqueue on file a and have it give the same results as a
reference to some unknown and unrelated file x, just because they
happened to hash to the same value.

If the hash function mapped to an equal or larger range, then it would
be possible to have uniqueness; but then the hash value would require
more bits to represent it than the original string of symbols and
nothing would be gained by the substitution.

-- 
Joel C. Ewing, Fort Smith, ARjremoveccapsew...@acm.org

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: z/OS file system and some Friday thoughts

2010-05-21 Thread Thompson, Steve
-Original Message-
From: IBM Mainframe Discussion List [mailto:ibm-m...@bama.ua.edu] On
Behalf Of Thomas Berg
Sent: Friday, May 21, 2010 8:03 AM
To: IBM-MAIN@bama.ua.edu
Subject: z/OS file system and some Friday thoughts

SNIPPAGE


Am I an idiot or just a typical programmer ?  :)

door flung open

And there is a difference?

shields at maximum

Regards,
Steve Thompson

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: z/OS file system and some Friday thoughts

2010-05-21 Thread Roberto Halais
Automatic QED.

On 5/21/10, Thompson, Steve steve_thomp...@stercomm.com wrote:

 -Original Message-
 From: IBM Mainframe Discussion List [mailto:ibm-m...@bama.ua.edu] On
 Behalf Of Thomas Berg
 Sent: Friday, May 21, 2010 8:03 AM
 To: IBM-MAIN@bama.ua.edu
 Subject: z/OS file system and some Friday thoughts

 SNIPPAGE


 Am I an idiot or just a typical programmer ?  :)

 door flung open

 And there is a difference?

 shields at maximum

 Regards,
 Steve Thompson

 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
 Search the archives at http://bama.ua.edu/archives/ibm-main.html




-- 
It is no measure of health to be well adjusted to a profoundly sick
society. -Krishnamurti

I am as you, in you, for you. One as you in all, as all, forever. My call
is your call.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: z/OS file system and some Friday thoughts

2010-05-21 Thread Rick Fochtman

---snip---


Am I an idiot or just a typical programmer ?  :)

door flung open

And there is a difference?

shields at maximum
 


-unsnip-
Also set phasers at stun. :-)

Rick

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html