Re: [git-users] SHA-1 checksum

2016-08-13 Thread Sharan Basappa

>
> Hello All,
>

I just wanted to say a big thank you. Some of the examples here really has 
helped me get hang of some fundamentals.


-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-10 Thread Dale R. Worley
Sharan Basappa  writes:
> The other question is, when it is time for Git to pick up the file 
> associated with 100644 blob 0215040f90f133f999bac86eede7565c6d09b93d then 
> it starts 
> computing checksum of all the objects?

The point is that it doesn't have to *search* for the contents of the
file, because those contents are stored in

./git/objects/02/15040f90f133f999bac86eede7565c6d09b93d

The hash of an object tells Git where the object is stored.

This is why a *cryptographic* hash must be used, so that no two
different objects have the same hash, which would require that they both
be stored in the same file.

There is the complication that a file's contents are stored compressed,
so you can't directly read the file, which is why you need to use a Git
command to get the proper file contents.

There is also the complication that "pack files" can be made that
contain many objects.  Each pack file has a corresponding index listing
all the hashes of the objects in the pack file.  Clearly, the indexes
are arranged in some way that allows Git to quickly find what objects
are in which pack file, but I do not know the details.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-09 Thread Konstantin Khomoutov
On Mon, 8 Aug 2016 20:21:47 -0700 (PDT)
Sharan Basappa  wrote:

> > Well, there are exactly three types of objects in Git repos: blobs, 
> > trees and commits.  Files are stored as blobs.  Blobs have no "file 
> > names" attached to them; in fact, they keep no associated metadata
> > at all.  Since humans routinely manipulate data kept in files using 
> > hierarchical files systems, Git mirrors this approach by using tree 
> > objects.  A tree object serves the same purpose a directory does on
> > a file system: it maps human-defined names of the files to their
> > contents. So a tree object contains a set of entries -- each
> > representing a single file or a subdirectory.  Each entry has three
> > "fields" a (simplified) file mode, the hash value of the entry's
> > contents (its address, that is) and the human-friendly name --
> > taken from the source filesystem.  Subdirectory entries refer to
> > other tree objects and file entries refer to blobs. 
[...]
> So, all the 3 objects types are referenced by SHA hash
> values and searched using these values.
> This includes blobs, trees & commit objects. 

Yes, this is correct.

Git never uses names of files and directories as found in the work tree
to look up bits of data it stores.  Such lookups *do* happen -- say,
when you run something like

  git log -- path/to/some/file

but they happen like
1) ... Fetch the next commit object;
2) Fetch the root tree object it references,
   parse it to find an entry named "path", get its SHA-1 name.
3) Fetch a tree object figured out on step (2),
   parse it to find an entry named "to", ...
...and so on, so in the end the actual data is always looked up in the
object store using its SHA-1 name.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Sharan Basappa

>
> Well, there are exactly three types of objects in Git repos: blobs, 
> trees and commits.  Files are stored as blobs.  Blobs have no "file 
> names" attached to them; in fact, they keep no associated metadata at 
> all.  Since humans routinely manipulate data kept in files using 
> hierarchical files systems, Git mirrors this approach by using tree 
> objects.  A tree object serves the same purpose a directory does on a 
> file system: it maps human-defined names of the files to their contents. 
> So a tree object contains a set of entries -- each representing a 
> single file or a subdirectory.  Each entry has three "fields" a 
> (simplified) file mode, the hash value of the entry's contents (its 
> address, that is) and the human-friendly name -- taken from the source 
> filesystem.  Subdirectory entries refer to other tree objects and file 
> entries refer to blobs. 
>

Dear Konstantin,

Thanks a lot. So, all the 3 objects types are referenced by SHA hash values 
and searched using these values.
This includes blobs, trees & commit objects. 

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Konstantin Khomoutov
On Mon, 8 Aug 2016 09:00:06 -0700 (PDT)
Sharan Basappa  wrote:

[...]
> > The contents of file "-NOTES" is in 
> > .git/objects/02/15040f90f133f999bac86eede7565c6d09b93d.  In this
> > case, that object is in one of the "pack" files.  git-cat-file has
> > to read through the indexes of the pack files to find that. 
> >
> > The critical ideas are that files are stored by their *contents*
> > not their *names*.  Any particular blob of content has an eternally
> > unique name (its hash), which will be the same in any repository
> > containing a blob with the same bytes.  "tree" objects are used to
> > catalog the names of files and their contents. 
[...]
> To clarify,
> 
> 100644 blob 0215040f90f133f999bac86eede7565c6d09b93d-NOTES
> 
> Instead of storing reference to actual file, Git stores reference to
> the content rather (in the form of checksum 
> 0215040f90f133f999bac86eede7565c6d09b93d)?
> Is -NOTES a reference stored by Git. I am thinking where does Git get
> the file name if it does not store it in someplace originally?

Well, there are exactly three types of objects in Git repos: blobs,
trees and commits.  Files are stored as blobs.  Blobs have no "file
names" attached to them; in fact, they keep no associated metadata at
all.  Since humans routinely manipulate data kept in files using
hierarchical files systems, Git mirrors this approach by using tree
objects.  A tree object serves the same purpose a directory does on a
file system: it maps human-defined names of the files to their contents.
So a tree object contains a set of entries -- each representing a
single file or a subdirectory.  Each entry has three "fields" a
(simplified) file mode, the hash value of the entry's contents (its
address, that is) and the human-friendly name -- taken from the source
filesystem.  Subdirectory entries refer to other tree objects and file
entries refer to blobs.

Each commit object refers to exactly one tree object representing the
root of the project.

Conceptually, a commit is created by starting from the project's root
directory and going all the way down -- into subdirectories,
considering all the tracked files on each level and creating
appropriate tree and blob entries for everything found.
Of course, the real implementation is much more complex to perform with
the utmost speed possible.

I think you should read the famous (and old) "Git from the bottom up"
document [1].  It takes an unusual approach at explaining Git by
actually dealing with its data model -- rather than commands to
manipulate the repository.

1. https://jwiegley.github.io/git-from-the-bottom-up/

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Sharan Basappa


> 2) At its very bottom, Git implements the so-called 
> "content-addressable filesystem".  Its chief principle is that every 
> unique piece of data is stored exactly once, and these pieces are 
> identified by their contents.  Since use the contents "as is" is 
> unwieldy, its being addressed using -- again -- the cryptographic hashes 
> calculated over those contents.  This what makes Git effectively 
> implement its paradigm where each commit refers to a complete state of 
> all the project's files: even though like 99.9% of the content of each 
> commit a typical big project is the same as its parent commit, each 
> unique chunk of information -- a file or a tree referring to a set of 
> files -- is stored in the repository exactly once. 


Content addressable filesystem. Nicely put. So, sort of content addressable 
memory (CAM) where contents are unique.

Thanks a lot, 

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Sharan Basappa


> Consider one of my Git repositories.  The file .git/HEAD contains 
>
> ref: refs/heads/hobgoblin 
>
> That points to the file .git/refs/heads/hobgoblin, which contains the 
> hash of the commit which is the tip of the "hobgoblin" branch: 
>
> 92f8f718eb9b19f921f20283e55c56e8dc66ed10 
>
> That point to the file 
> .git/objects/92/f8f718eb9b19f921f20283e55c56e8dc66ed10.  That file's 
> contents aren't in ASCII, so you have to use "git cat-file -p 
> 92f8f718eb9b19f921f20283e55c56e8dc66ed10" to read its contents: 
>
> tree d5d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22 
> parent 39c83b086e141bb00d32737a4e2aae675d795f44 
> author Dale R. Worley > 1470669963 
> -0400 
> committer Dale R. Worley > 
> 1470669963 -0400 
>
> ... 
>
> So the hash of the tree object is 
> d5d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22 and the hash of the one parent 
> commit is 39c83b086e141bb00d32737a4e2aae675d795f44.  The tree object is 
> in .git/objects/d5/d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22, but again, 
> you have to use git-cat-file to read it: 
>
> 100644 blob 0215040f90f133f999bac86eede7565c6d09b93d-NOTES 
> 100644 blob 
> ef62bfd5a8e81c8ca13372b2436bccf1c0698185-NOTES.MYOB 
> 100644 blob 
> 65dda34dadf753dbfc791b5811f3cd437a666cac-NOTES.XA.recovery 
> 100644 blob 
> 88182ec16035fd4d77c0c1312ce1510f2f8da4b2-NOTES.XB.recovery 
> 100644 blob 
> 73415b6e2ebcd6a384874c0ab40ec70a5112db18-NOTES.freeze 
> 100644 blob 
> 3a4fb8ec6e7c0219c4d7ab002eaaa84abae2c72d-NOTES.gleaning 
> 04 tree c21923c2647ecec7d627a49e51b4e8b5d19344b4.a68g 
> 100644 blob 
> f9a4c46f50234a11f9ad283973ed2f11a4758f2f.aspell.en.prepl 
> 100644 blob 
> 182c2739a5cc69a322a41723d4423ed1d8a6266e.aspell.en.pws 
> ... 
>
> The contents of file "-NOTES" is in 
> .git/objects/02/15040f90f133f999bac86eede7565c6d09b93d.  In this case, 
> that object is in one of the "pack" files.  git-cat-file has to read 
> through the indexes of the pack files to find that. 
>
> The critical ideas are that files are stored by their *contents* not 
> their *names*.  Any particular blob of content has an eternally unique 
> name (its hash), which will be the same in any repository containing a 
> blob with the same bytes.  "tree" objects are used to catalog the names 
> of files and their contents. 
>

Dear Philip, Dale,

Thanks. I think this example helps me a lot.

To clarify,

100644 blob 0215040f90f133f999bac86eede7565c6d09b93d-NOTES

Instead of storing reference to actual file, Git stores reference to the 
content rather (in the form of checksum 
0215040f90f133f999bac86eede7565c6d09b93d)?
Is -NOTES a reference stored by Git. I am thinking where does Git get the 
file name if it does not store it in someplace originally?

The other question is, when it is time for Git to pick up the file 
associated with 100644 blob 0215040f90f133f999bac86eede7565c6d09b93d then 
it starts 
computing checksum of all the objects?

Similarly, referring to tree object d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22, 
one has to again calculate checksum of all tree objects in order to get 
the following contents:
 100644 blob 0215040f90f133f999bac86eede7565c6d09b93d-NOTES 
100644 blob ef62bfd5a8e81c8ca13372b2436bccf1c0698185-NOTES.MYOB 
100644 blob 
65dda34dadf753dbfc791b5811f3cd437a666cac-NOTES.XA.recovery 
100644 blob 
88182ec16035fd4d77c0c1312ce1510f2f8da4b2-NOTES.XB.recovery 
100644 blob 
73415b6e2ebcd6a384874c0ab40ec70a5112db18-NOTES.freeze 
100644 blob 
3a4fb8ec6e7c0219c4d7ab002eaaa84abae2c72d-NOTES.gleaning 
04 tree c21923c2647ecec7d627a49e51b4e8b5d19344b4.a68g 
100644 blob 
f9a4c46f50234a11f9ad283973ed2f11a4758f2f.aspell.en.prepl 
100644 blob 
182c2739a5cc69a322a41723d4423ed1d8a6266e.aspell.en.pws 

Thanks a lot

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Konstantin Khomoutov
On Sun, 7 Aug 2016 09:26:30 -0700 (PDT)
Sharan Basappa  wrote:

> I would like to know why GIT calculates checksum of a file.
> Typically, checksum is used for the purpose of integrity.

Well, Git does this for two reasons:

1) It's what makes "D" in the "DVCS" ("Distributed Version Control
System") possible.  When two Git instances exchange histories from
their repositories over the wire, they need to have a way to figure out
what parts of them they share.  Now suppose that the user of the first
repository created a file containing the string "Hello world" and named
that file "foo.txt".  The user of the second repository created a file
with identical contents but named it "bar.txt" and placed it in a
directory named "stuff".  If we look at file names only, these files
are clearly different.  But they have identical contents, and that is
what DVCSes exchange with each other.

Enter cryptographic hashes.  They have two major properties:
* Identical sets of data "compress" to identical hash values.
* No two different sets of data compress to identical hash values
  (well, in fact it's theoretically possible for real-world hash
  functions to fail keeping this invariant, and it's called
  "a collision", but such an event is quite improbable for real-world
  applications).

So cryptographic hashes allow to neatly serve as short "handles" to
chunks of data of arbitrary size: for my toy example of the data string
"Hello world", it not quite obvious, but a cryptographic hash is
perfectly able to uniquely identify the contents of a multi-megabyte
file as well.

2) At its very bottom, Git implements the so-called
"content-addressable filesystem".  Its chief principle is that every
unique piece of data is stored exactly once, and these pieces are
identified by their contents.  Since use the contents "as is" is
unwieldy, its being addressed using -- again -- the cryptographic hashes
calculated over those contents.  This what makes Git effectively
implement its paradigm where each commit refers to a complete state of
all the project's files: even though like 99.9% of the content of each
commit a typical big project is the same as its parent commit, each
unique chunk of information -- a file or a tree referring to a set of
files -- is stored in the repository exactly once.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Dale R. Worley
Sharan Basappa  writes:
> So, if Git stores files using just their checksums then
>
> a) how does it look up (or retrieve) a specific file in the database?
> For example, if it wants to find a file in the data base then it takes 
> checksum and starts computing checking of every file in its database & 
> compare?
> This looks pretty costly & rather unnecessary to me.
>
> b) how does it get keep track file names that are required when it gives us 
> a working copy? 

Consider one of my Git repositories.  The file .git/HEAD contains

ref: refs/heads/hobgoblin

That points to the file .git/refs/heads/hobgoblin, which contains the
hash of the commit which is the tip of the "hobgoblin" branch:

92f8f718eb9b19f921f20283e55c56e8dc66ed10

That point to the file
.git/objects/92/f8f718eb9b19f921f20283e55c56e8dc66ed10.  That file's
contents aren't in ASCII, so you have to use "git cat-file -p
92f8f718eb9b19f921f20283e55c56e8dc66ed10" to read its contents:

tree d5d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22
parent 39c83b086e141bb00d32737a4e2aae675d795f44
author Dale R. Worley  1470669963 -0400
committer Dale R. Worley  1470669963 -0400

...

So the hash of the tree object is
d5d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22 and the hash of the one parent
commit is 39c83b086e141bb00d32737a4e2aae675d795f44.  The tree object is
in .git/objects/d5/d1ad293f8fdd4a4a4e0e9a73c5c3c851126c22, but again,
you have to use git-cat-file to read it:

100644 blob 0215040f90f133f999bac86eede7565c6d09b93d-NOTES
100644 blob ef62bfd5a8e81c8ca13372b2436bccf1c0698185-NOTES.MYOB
100644 blob 65dda34dadf753dbfc791b5811f3cd437a666cac
-NOTES.XA.recovery
100644 blob 88182ec16035fd4d77c0c1312ce1510f2f8da4b2
-NOTES.XB.recovery
100644 blob 73415b6e2ebcd6a384874c0ab40ec70a5112db18-NOTES.freeze
100644 blob 3a4fb8ec6e7c0219c4d7ab002eaaa84abae2c72d-NOTES.gleaning
04 tree c21923c2647ecec7d627a49e51b4e8b5d19344b4.a68g
100644 blob f9a4c46f50234a11f9ad283973ed2f11a4758f2f.aspell.en.prepl
100644 blob 182c2739a5cc69a322a41723d4423ed1d8a6266e.aspell.en.pws
...

The contents of file "-NOTES" is in
.git/objects/02/15040f90f133f999bac86eede7565c6d09b93d.  In this case,
that object is in one of the "pack" files.  git-cat-file has to read
through the indexes of the pack files to find that.

The critical ideas are that files are stored by their *contents* not
their *names*.  Any particular blob of content has an eternally unique
name (its hash), which will be the same in any repository containing a
blob with the same bytes.  "tree" objects are used to catalog the names
of files and their contents.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Philip Oakley


- Original Message - 

From: Sharan Basappa
>Philip Oakley wrote:
> You have it in one.


> Yes that is the reason that git computes the sha1 of the file's 
> contents - it provides integrity, veracity and non-repudiation (the last 
> one is still true though cryo-analysis is getting better, so sha1 is no 
> longer recommended, and Git is looking at how to progress to newer 
> crypto-hashes).
> Once Git has the sha1's of the files in a directory, it does the same 
> again for the 'file' that lists the file names, mode bits and their 
> content's sha1s, and ever onwards up the trees to the commit, which 
> lists the sha1s of its parents.


> So it you have the sha1 of the tip of a branch, such as master, and you 
> have a repo that holds that sha1, then you have the full crypto 
> integrity that your copy (with all its history) is identical to that of 
> the originators - your own Dali, Rembrant, Gogin, hanging in your 
> hall... and it isn't even a replica, it's the real thing!




Dear Philip, Michael,


Thanks. It's true that checksums like SHA give a very signature of any 
file. But where things start getting confusing (to me) is when I read "In 
fact, Git stores everything in its database not by file name but by the 
hash value of its contents.".


Correct, in the .git/objects folder you will see those new objects stored as 
ab/cdef01234 etc.



This is from book Pro-Git.



So, if Git stores files using just their checksums then



a) how does it look up (or retrieve) a specific file in the database?
For example, if it wants to find a file in the data base then it takes 
checksum and starts computing checking of every file in its database & 
compare?


You will see in my reply that there is a 'next level' file which has the 
lists of names to associate with the sha1 hash it needs. These are the ones 
called 'tree' objects.



This looks pretty costly & rather unnecessary to me.


You will be looking at this from the wrong side. It's about speed of 
reconstruction when you are getting a specific revision back from the store. 
Don't forget that Git normally works on the revision of the complete 
project, not just some little file.


b) how does it get keep track file names that are required when it gives 
us a working copy?


Starting at the commit sha1, it looks for that sha1 file, which is lists the 
top level tree sha1. Expand that as the top level directory names, with 
sha1s for each next level directory of file. It's almost identical to how a 
file system works! (I think Linus, who wrote git, wrote a little OS, nothing 
big, once ;-)



Once you have all that nicely fixed in your head, you can then look (if you 
are interested in the next layer of digging) at pack files which are Git's 
way of compressing all those sha1 files which have lots of repetition 
because nothing much changes from one rev to the next (or at least it 
should, because the changes within a commit should be small! - it's part of 
what makes Git work)



Thanks again ...

--
No problems 


--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-08 Thread Sharan Basappa

>
>
>  
> You have it in one. 
>  
> Yes that is the reason that git computes the sha1 of the file's contents - 
> it provides integrity, veracity and non-repudiation (the last one is still 
> true though cryo-analysis is getting better, so sha1 is no longer 
> recommended, and Git is looking at how to progress to newer crypto-hashes)
> .
> Once Git has the sha1's of the files in a directory, it does the same 
> again for the 'file' that lists the file names, mode bits and their 
> content's sha1s, and ever onwards up the trees to the commit, which lists 
> the sha1s of its parents.
>  
> So it you have the sha1 of the tip of a branch, such as master, and you 
> have a repo that holds that sha1, then you have the full crypto integrity 
> that your copy (with all its history) is identical to that of the 
> originators - your own Dali, Rembrant, Gogin, hanging in your hall... and 
> it isn't even a replica, it's the real thing!
>  
>

Dear Philip, Michael,

Thanks. It's true that checksums like SHA give a very signature of any 
file. But where things start getting confusing (to me) is when I read *"**In 
fact, Git stores everything in its database not by file name but by the 
hash value of its contents.". *

This is from book Pro-Git. 

So, if Git stores files using just their checksums then

a) how does it look up (or retrieve) a specific file in the database?
For example, if it wants to find a file in the data base then it takes 
checksum and starts computing checking of every file in its database & 
compare?
This looks pretty costly & rather unnecessary to me.

b) how does it get keep track file names that are required when it gives us 
a working copy? 

Thanks again ...

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-07 Thread Michael

On 2016-08-07, at 9:26 AM, Sharan Basappa  wrote:

> Hi,
> 
> I would like to know why GIT calculates checksum of a file.
> Typically, checksum is used for the purpose of integrity.
> 
> An example would really help.

An example? Ok. Back when something else was using a simple CRC, someone tried 
to replace a file with another, bypassing the normal history system. The CRC 
was good enough to detect it; so, something was needed that was good enough to 
detect/stop this.

But more importantly: The hash is the filename of the file. It is critical that 
the hash be good enough that you won't get duplicate filenames. CRC doesn't do 
that. Sha-1 does.

The checksum has to be good enough to make a unique filename in normal use.
It does not have to be good enough to guarantee non-alteration, but that's a 
really good secondary; it does have to be good enough to detect accidental 
damage (such as memory/disk/network/driver/etc corruption).

Now, a secondary benefit of the whole "layer upon layer" approach: The hash of 
the last commit is only valid if every file and commit to date is accurate. If 
you know the hash of your last commit (20 bytes, I think), and you can validate 
all the hashes in the past, then you know that nothing has altered any file 
outside of the git mechanism.


---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] SHA-1 checksum

2016-08-07 Thread Philip Oakley
Sharan,

You have it in one. 

Yes that is the reason that git computes the sha1 of the file's contents - it 
provides integrity, veracity and non-repudiation (the last one is still true 
though cryo-analysis is getting better, so sha1 is no longer recommended, and 
Git is looking at how to progress to newer crypto-hashes).
Once Git has the sha1's of the files in a directory, it does the same again for 
the 'file' that lists the file names, mode bits and their content's sha1s, and 
ever onwards up the trees to the commit, which lists the sha1s of its parents.

So it you have the sha1 of the tip of a branch, such as master, and you have a 
repo that holds that sha1, then you have the full crypto integrity that your 
copy (with all its history) is identical to that of the originators - your own 
Dali, Rembrant, Gogin, hanging in your hall... and it isn't even a replica, 
it's the real thing!

Philip

It's turtles all the way down.

  - Original Message - 
  From: Sharan Basappa 
  To: Git for human beings 
  Sent: Sunday, August 07, 2016 5:26 PM
  Subject: [git-users] SHA-1 checksum


  Hi,


  I would like to know why GIT calculates checksum of a file.
  Typically, checksum is used for the purpose of integrity.


  An example would really help.


  Regards,

  -- 
  You received this message because you are subscribed to the Google Groups 
"Git for human beings" group.
  To unsubscribe from this group and stop receiving emails from it, send an 
email to git-users+unsubscr...@googlegroups.com.
  For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.