Re: [git-users] How GIT stores data
Sharan Basappawrites: > Is there a way to retrieve the previous version of the file (that is, F.1). It looks like "git fsck --unreachable" would output the hash of such a file. Then you can use "git cat-file" to get the contents of each object. You'll have to inspect the contents manually; as far as I know there's no record kept of where the file contents used to sit in the directory tree. Dale -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
On 2016-05-29, at 7:58 AM, Sharan Basappawrote: > Folks, > > Having started using GIT, one more question ... > > I do some work on a file. Assume the file is version is F.1. I think it is > fairly done and I stage it (git add) but don't commit. > Now later, I realize that I need to make some more changes to the file. I > make changes (F.2), add and them commit. > > I assume that the commit object will only point to the latest file (F.2). Is > this correct? > Is there a way to retrieve the previous version of the file (that is, F.1). > > Essentially, I am trying to get some intermediate version of a file even > through I never committed it. > if I cant get F.1, whatever happens to the it as far as git repository is > concerned? The commit object will reference F.2. A blob for F.1 is in git's storage, but there is now nothing referencing it. Eventually it will be garbage collected and deleted. Until then, it is in there, somewhere, but you have no idea where nor any way to recover it. --- Entertaining minecraft videos http://YouTube.com/keybounce -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
Folks, Having started using GIT, one more question ... I do some work on a file. Assume the file is version is F.1. I think it is fairly done and I stage it (git add) but don't commit. Now later, I realize that I need to make some more changes to the file. I make changes (F.2), add and them commit. I assume that the commit object will only point to the latest file (F.2). Is this correct? Is there a way to retrieve the previous version of the file (that is, F.1). Essentially, I am trying to get some intermediate version of a file even through I never committed it. if I cant get F.1, whatever happens to the it as far as git repository is concerned? -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
- Original Message - From: Sharan Basappa To: Git for human beings Cc: philipoak...@iee.org Sent: Sunday, May 22, 2016 5:31 PM Subject: Re: [git-users] How GIT stores data Dear Philip, Others, Thanks a lot. I have some follow-up questions. I am using a simple scenario to get additional clarity. 1) I have 4 files in my branch (a,b,c,d) 2) I modify a 3) I add a 4) I modify b,c 5) I add b,c 6) I commit 7) I modify d 8) I commit d 9) I push to remote Does step 6) above result in single commit object (single SHA reference) or two? From a developer engineer's perspective, what does commit signify? Does it mean something key development is complete? I ask this question to understand when a developer would do some development but would make multiple commits. Similarly, if during push step, if it is found that remote is ahead of local (and most likely requires merging) then does it mean anything wrt to the already computed SHA? I assume that already computed SHA has no meaning. Thanks, At step 6, the commit, you will get a single "commit object sha1" created. Hidden beneath that commit object is a number of subservient object sha1's of different types, e.g. the those of the tree's that complete the overall snapshot, but as others have said these are small details. Philip -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
Sharan Basappawrites: > I am pretty much new to Git though I am using it for a couple of projects > (without much understanding as such). > > In Git documents, it is mentioned that Git stores data as a stream of > snapshots. Compared to other VCS tools, the only difference I am able > to tell is that Git stores the entire file for each versions while > other VCS tools might store only differences. > > Can someone help me understand this? Actually, you don't *need* to understand how it's done. You do need to understand that Git commands are organized around the idea that commits are a total copy of your project. OTOH, you need to be careful. Some commands, particularly ones involving merging and "rebase" actually *do* think of commits not as snapshots but as the difference between the commit and its parent. That is how you can "reorder" two commits -- changing the commit order from A - B - C is actually constructing new commits D and E so that in the new commit order A - D - E, the difference between A and D is the same as the difference between B and C, and the difference between D and E is the difference between A and B. But even in this situation, what is *stored* is a sequence of commits (done with sophisticated compression) -- the merge or rebase command calcuates the differences based on the contents of the old commits, and then constructs a new set of commits that have the proper differences between them, and then stores the new commits. Dale -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
On 2016-05-22, at 9:31 AM, Sharan Basappawrote: > Dear Philip, Others, > > Thanks a lot. I have some follow-up questions. > > I am using a simple scenario to get additional clarity. > > 1) I have 4 files in my branch (a,b,c,d) > 2) I modify a > 3) I add a > 4) I modify b,c > 5) I add b,c > 6) I commit > 7) I modify d > 8) I commit d > 9) I push to remote > > Does step 6) above result in single commit object (single SHA reference) or > two? One commit object. That commit object contains (eventually) a full directory tree reference to every file in the project as of that commit. Three files have been changed, so three SHA references are different. Because of that, each node that describes a directory will differ, so everything above that will have a new blob/node describing the files and subdirectories below that. > From a developer engineer's perspective, what does commit signify? Does it > mean something key development is complete? > I ask this question to understand when a developer would do some development > but would make multiple commits. Depends on your workflow. Some people say "commit early, commit often". Some people say "commit when it compiles" I personally do "branch, then commit early, commit often, merge back when it passes tests". (Any given commit on the subbranch will probably compile, but almost certainly has half-implemented features that just won't work.) Some people take this a step further, and say "Do your work on a branch, then commit a single squashed commit of the whole branch". As far as I can tell, not only is there no "one true workflow", trying to figure out what workflow is right for the situation you are in is exceedingly non-trivial, and I have not seen it well-addressed in any of the guides for learning GIT that I have seen yet. > Similarly, if during push step, if it is found that remote is ahead of local > (and most likely requires merging) then does it mean anything wrt to the > already computed SHA? > I assume that already computed SHA has no meaning. I am not the person to ask here. My pushes only go to a repository I work on. I would *love* to see a good writeup on how to manage, and contribute to, a repository that will accept updates from many people doing pull requests, letting you update as the master does, without losing your own "features in progress". Yes, this is supposed to be git's area, but trying to understand from the docs how to manage it, especially if it later turns out that the "upstream master" is going off in an incompatible direction, and you want to take over managing a fork that is doing something else, and accepting pull requests from others. It doesn't help that as far as I can tell, "Pull Requests" are actually things that Github and Bitbucket do that are not in the core base git. --- Entertaining minecraft videos http://YouTube.com/keybounce -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
Dear Philip, Others, Thanks a lot. I have some follow-up questions. I am using a simple scenario to get additional clarity. 1) I have 4 files in my branch (a,b,c,d) 2) I modify a 3) I add a 4) I modify b,c 5) I add b,c 6) I commit 7) I modify d 8) I commit d 9) I push to remote Does step 6) above result in single commit object (single SHA reference) or two? >From a developer engineer's perspective, what does commit signify? Does it mean something key development is complete? I ask this question to understand when a developer would do some development but would make multiple commits. Similarly, if during push step, if it is found that remote is ahead of local (and most likely requires merging) then does it mean anything wrt to the already computed SHA? I assume that already computed SHA has no meaning. Thanks, -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
- Original Message - From: Sharan Basappa To: Git for human beings Cc: philipoak...@iee.org Sent: Saturday, May 21, 2016 5:02 PM Subject: Re: [git-users] How GIT stores data Do note that there is no file date information stored in the tree/blob data. The only place dates are recored is at the point of the commit. Likewise the only file permission stored is the *nix executable bit Dear Philip, So, the snapshot is taken when commit is done? Also, what does the SHA that is returned when a commit is complete indicate? is it the overall signature for the snapshot or commit identifier? Almost, but not quite. Each file snapshot is taken at the time it was added to the index. So most were not taken at the time of the commit, but were taken earlier (perhaps at an earlier commit if the file was not modified ;-) Each sha1 is a hash over the content of that object. Because the tree, commit and tag objects each contain a list of subordinate sha1's that they cover, then you get a complete crypto security over the appropriate history. The commit object will list the previous commit's sha1, the sha1 of the top level tree, and the date it was committed, which fully wraps up the history and status up to that point. Minor aside, the commit object acually has two dates, one for the originating author, and one for the authorising committer (the manager). For local local project they can be the same person, for big projects (Git, Linux, etc.) they will be different. -- Philip -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
> > > Do note that there is no file date information stored in the tree/blob > data. The only place dates are recored is at the point of the commit. > Likewise the only file permission stored is the *nix executable bit > Dear Philip, So, the snapshot is taken when commit is done? Also, what does the SHA that is returned when a commit is complete indicate? is it the overall signature for the snapshot or commit identifier? Thanks -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
- Original Message - From: Sharan Basappa To: Git for human beings Sent: Saturday, May 21, 2016 4:53 AM Subject: Re: [git-users] How GIT stores data Sure. Think of Git as a three layered tool. The top layer is a polished interface, called "Porcelain", that is designed to easily manage snapshots and compares and merges of filesystem trees. The bottom layer, on the other hand, is a filesystem. Files in this filesystem are read-only. The names of files are fixed based on their content. So identical files have the same name, and are stored once in the file system. Building up from fixed files that do not change, are directory objects, that map human understandable filenames to internal names. And, since this is itself a filesystem object, if everything in a directory is identical, then the directory entry is identical, and only stored once. Based on this, it's pretty easy to see that if two commits are completely identical, then the only thing that differs is the commit object itself, which will have a time stamp and user comment. (The middle layer by the way, are low-level tools designed to work with the files in this filesystem.) Dear Michael & Philip, Thanks. I think I am getting a hang of it. So, when an existing file is modified then I assume that Git computes its signature and then checks if such a file already exists. Is this correct? I ask this because my change can be such that it is same as one that was previously committed (sort of reverting back a file). The other thing I understand is that Git always stores every unique instance of a file as it is and not its differences with a reference file. One more question I have is on the file system. As such when I clone a repository, I get full repository and files locally. So, when I clone a repository, I have full repository and one set of project files (depending on the branch I have checked out) locally) Thanks, -- Git cheats regarding the initial detection of file modification - it just uses the file sytem's modified time (mtime), file size, and a couple of other easy to determine values that are reliable indicators. Even if it get's it wrong, because Git is snapshot based, it would have taken a snapshot anyway! Plus because the snapshot is identical, it wouldn't have actually added anything to the repository (it already had the copy;-). These file content snapshots are called 'blobs'. Note that at each stage it is the content, not the metadata that is stored, so renaming a file doesn't change it's content and nothing new is stored at that level (it's the same blob). However at the 'directory tree' level (what 'ls', or 'dir' list) then that content (of the tree's description) has changed, and it's stored there (these are called 'trees'). So you can have as many copies of a LICENCE or COPYING file as you like and all that extra content takes no extra file space (it's one single blob), with only a small amount of space for the tree, and if that doen't change from commit to commit, then no need for another copy... Do note that there is no file date information stored in the tree/blob data. The only place dates are recored is at the point of the commit. Likewise the only file permission stored is the *nix executable bit. In answer to the clone question. Yes you get a full copy. You can checkout the file tree at any point in the project's history (using the various revision specification methods - more fun), though more frequently it is the tree at the tip of one of the branches. There is also a whole load of stuff to discover about 'remote tracking branches' (which are local branches which track a remote system), and realising that they are actually local, and just part of your local history tree, and it's only a naming convention -- Philip -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
Thank you, Michael. > Correct. > > Now, just a few months ago, I had these same questions. I hope I have > learned this well enough that I can teach it accurately. > -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
On 2016-05-20, at 8:53 PM, Sharan Basappawrote: > Sure. Think of Git as a three layered tool. > > The top layer is a polished interface, called "Porcelain", that is designed > to easily manage snapshots and compares and merges of filesystem trees. > > The bottom layer, on the other hand, is a filesystem. Files in this > filesystem are read-only. The names of files are fixed based on their > content. So identical files have the same name, and are stored once in the > file system. > > Building up from fixed files that do not change, are directory objects, that > map human understandable filenames to internal names. And, since this is > itself a filesystem object, if everything in a directory is identical, then > the directory entry is identical, and only stored once. > > Based on this, it's pretty easy to see that if two commits are completely > identical, then the only thing that differs is the commit object itself, > which will have a time stamp and user comment. > > (The middle layer by the way, are low-level tools designed to work with the > files in this filesystem.) > > > Dear Michael & Philip, > > Thanks. I think I am getting a hang of it. > > So, when an existing file is modified then I assume that Git computes its > signature and then checks if such a file already exists. > Is this correct? I ask this because my change can be such that it is same as > one that was previously committed (sort of reverting back a file). Yep. Once git knows the signature (a "hash"), it also knows a filename that identifies a file with that hash. If it sees the filename in use, it knows it has a duplicate. Reverting changes is common enough that there are commands to do it. They are among the most complicated ones given all the warnings in the manual :-) > The other thing I understand is that Git always stores every unique instance > of a file as it is and not its differences with a reference file. Correct > One more question I have is on the file system. As such when I clone a > repository, I get full repository and files locally. > So, when I clone a repository, I have full repository and one set of project > files (depending on the branch I have checked out) locally) Correct. Now, just a few months ago, I had these same questions. I hope I have learned this well enough that I can teach it accurately. --- Entertaining minecraft videos http://YouTube.com/keybounce -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
> > Sure. Think of Git as a three layered tool. >> >> The top layer is a polished interface, called "Porcelain", that is >> designed to easily manage snapshots and compares and merges of filesystem >> trees. >> >> The bottom layer, on the other hand, is a filesystem. Files in this >> filesystem are read-only. The names of files are fixed based on their >> content. So identical files have the same name, and are stored once in the >> file system. >> >> Building up from fixed files that do not change, are directory objects, >> that map human understandable filenames to internal names. And, since this >> is itself a filesystem object, if everything in a directory is identical, >> then the directory entry is identical, and only stored once. >> >> Based on this, it's pretty easy to see that if two commits are completely >> identical, then the only thing that differs is the commit object itself, >> which will have a time stamp and user comment. >> >> (The middle layer by the way, are low-level tools designed to work with >> the files in this filesystem.) >> > > Dear Michael & Philip, Thanks. I think I am getting a hang of it. So, when an existing file is modified then I assume that Git computes its signature and then checks if such a file already exists. Is this correct? I ask this because my change can be such that it is same as one that was previously committed (sort of reverting back a file). The other thing I understand is that Git always stores every unique instance of a file as it is and not its differences with a reference file. One more question I have is on the file system. As such when I clone a repository, I get full repository and files locally. So, when I clone a repository, I have full repository and one set of project files (depending on the branch I have checked out) locally) Thanks, -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
From: "Michael" <keybou...@gmail.com> To: <git-users@googlegroups.com> Sent: Friday, May 20, 2016 7:28 PM Subject: Re: [git-users] How GIT stores data On 2016-05-20, at 11:10 AM, Sharan Basappa <sharan.basa...@gmail.com> wrote: Folks, I am pretty much new to Git though I am using it for a couple of projects (without much understanding as such). In Git documents, it is mentioned that Git stores data as a stream of snapshots. Compared to other VCS tools, the only difference I am able to tell is that Git stores the entire file for each versions while other VCS tools might store only differences. Can someone help me understand this? Sure. Think of Git as a three layered tool. The top layer is a polished interface, called "Porcelain", that is designed to easily manage snapshots and compares and merges of filesystem trees. The bottom layer, on the other hand, is a filesystem. Files in this filesystem are read-only. The names of files are fixed based on their content. So identical files have the same name, and are stored once in the file system. Building up from fixed files that do not change, are directory objects, that map human understandable filenames to internal names. And, since this is itself a filesystem object, if everything in a directory is identical, then the directory entry is identical, and only stored once. Based on this, it's pretty easy to see that if two commits are completely identical, then the only thing that differs is the commit object itself, which will have a time stamp and user comment. (The middle layer by the way, are low-level tools designed to work with the files in this filesystem.) -- Sharan, In addition to Michael's description, Git does have a method for compression of it's repository, which it uses where possible, called Pack files. So rather than recording changes (as noted), Git will record complete snapshots, and then compress the full history of all revisions in one go (see some of Linus's laws). The compressed repository (with all its history) can be smaller than the checked out work tree, so it is efficient to hold the whole snapshot. There is also a whole load of sha1 hash keys that pervade and validate the history, which is good as you always know that if your hash key has the same value as their hash key then they are seeing exactly the same history and content as you, no matter how far away and unknown they are to you. (and if the key's differ, all bets are off!) Philip -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [git-users] How GIT stores data
On 2016-05-20, at 11:10 AM, Sharan Basappawrote: > Folks, > > I am pretty much new to Git though I am using it for a couple of projects > (without much understanding as such). > >In Git documents, it is mentioned that Git stores data as a stream of >snapshots. Compared to other VCS tools, the only difference I am able to tell >is that Git stores the entire file for each versions while other VCS tools >might store only differences. >Can someone help me understand this? Sure. Think of Git as a three layered tool. The top layer is a polished interface, called "Porcelain", that is designed to easily manage snapshots and compares and merges of filesystem trees. The bottom layer, on the other hand, is a filesystem. Files in this filesystem are read-only. The names of files are fixed based on their content. So identical files have the same name, and are stored once in the file system. Building up from fixed files that do not change, are directory objects, that map human understandable filenames to internal names. And, since this is itself a filesystem object, if everything in a directory is identical, then the directory entry is identical, and only stored once. Based on this, it's pretty easy to see that if two commits are completely identical, then the only thing that differs is the commit object itself, which will have a time stamp and user comment. (The middle layer by the way, are low-level tools designed to work with the files in this filesystem.) --- Entertaining minecraft videos http://YouTube.com/keybounce -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[git-users] How GIT stores data
Folks, I am pretty much new to Git though I am using it for a couple of projects (without much understanding as such). In Git documents, it is mentioned that Git stores data as a stream of snapshots. Compared to other VCS tools, the only difference I am able to tell is that Git stores the entire file for each versions while other VCS tools might store only differences. Can someone help me understand this? -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.