Re: [git-users] How GIT stores data

2016-06-01 Thread Dale R. Worley
Sharan Basappa  writes:
> Is there a way to retrieve the previous version of the file (that is, F.1).

It looks like "git fsck --unreachable" would output the hash of such a
file.  Then you can use "git cat-file" to get the contents of each
object.  You'll have to inspect the contents manually; as far as I know
there's no record kept of where the file contents used to sit in the
directory tree.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-29 Thread Michael

On 2016-05-29, at 7:58 AM, Sharan Basappa  wrote:

> Folks,
> 
> Having started using GIT, one more question ...
> 
> I do some work on a file. Assume the file is version is F.1. I think it is 
> fairly done and I stage it (git add) but don't commit.
> Now later, I realize that I need to make some more changes to the file. I 
> make changes (F.2), add and them commit.
> 
> I assume that the commit object will only point to the latest file (F.2). Is 
> this correct?
> Is there a way to retrieve the previous version of the file (that is, F.1).
> 
> Essentially, I am trying to get some intermediate version of a file even 
> through I never committed it.
> if I cant get F.1, whatever happens to the it as far as git repository is 
> concerned?

The commit object will reference F.2.

A blob for F.1 is in git's storage, but there is now nothing referencing it.
Eventually it will be garbage collected and deleted.
Until then, it is in there, somewhere, but you have no idea where nor any way 
to recover it.

---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-29 Thread Sharan Basappa
Folks,

Having started using GIT, one more question ...

I do some work on a file. Assume the file is version is F.1. I think it is 
fairly done and I stage it (git add) but don't commit.
Now later, I realize that I need to make some more changes to the file. I 
make changes (F.2), add and them commit.

I assume that the commit object will only point to the latest file (F.2). 
Is this correct?
Is there a way to retrieve the previous version of the file (that is, F.1).

Essentially, I am trying to get some intermediate version of a file even 
through I never committed it.
if I cant get F.1, whatever happens to the it as far as git repository is 
concerned?

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-25 Thread Philip Oakley
- Original Message - 
  From: Sharan Basappa 
  To: Git for human beings 
  Cc: philipoak...@iee.org 
  Sent: Sunday, May 22, 2016 5:31 PM
  Subject: Re: [git-users] How GIT stores data


  Dear Philip, Others,


  Thanks a lot. I have some follow-up questions.


  I am using a simple scenario to get additional clarity.


  1) I have 4 files in my branch (a,b,c,d)
  2) I modify a
  3) I add a
  4) I modify b,c
  5) I add b,c
  6) I commit
  7) I modify d
  8) I commit d
  9) I push to remote


  Does step 6) above result in single commit object (single SHA reference) or 
two?


  From a developer engineer's perspective, what does commit signify? Does it 
mean something key development is complete?
  I ask this question to understand when a developer would do some development 
but would make multiple commits.


  Similarly, if during push step, if it is found that remote is ahead of local 
(and most likely requires merging) then does it mean anything wrt to the 
already computed SHA?
  I assume that already computed SHA has no meaning.


  Thanks,

At step 6, the commit, you will get a single "commit object sha1" created.

Hidden beneath that commit object is a number of subservient object sha1's of 
different types, e.g. the those of the tree's that complete the overall 
snapshot, but as others have said these are small details.

Philip

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-23 Thread Dale R. Worley
Sharan Basappa  writes:
> I am pretty much new to Git though I am using it for a couple of projects 
> (without much understanding as such).
>
> In Git documents, it is mentioned that Git stores data as a stream of
> snapshots. Compared to other VCS tools, the only difference I am able
> to tell is that Git stores the entire file for each versions while
> other VCS tools might store only differences.
>
> Can someone help me understand this?

Actually, you don't *need* to understand how it's done.  You do need to
understand that Git commands are organized around the idea that commits
are a total copy of your project.

OTOH, you need to be careful.  Some commands, particularly ones
involving merging and "rebase" actually *do* think of commits not as
snapshots but as the difference between the commit and its parent.  That
is how you can "reorder" two commits -- changing the commit order from A
- B - C is actually constructing new commits D and E so that in the new
commit order A - D - E, the difference between A and D is the same as
the difference between B and C, and the difference between D and E is
the difference between A and B.

But even in this situation, what is *stored* is a sequence of commits
(done with sophisticated compression) -- the merge or rebase command
calcuates the differences based on the contents of the old commits, and
then constructs a new set of commits that have the proper differences
between them, and then stores the new commits.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-22 Thread Michael

On 2016-05-22, at 9:31 AM, Sharan Basappa  wrote:

> Dear Philip, Others,
> 
> Thanks a lot. I have some follow-up questions.
> 
> I am using a simple scenario to get additional clarity.
> 
> 1) I have 4 files in my branch (a,b,c,d)
> 2) I modify a
> 3) I add a
> 4) I modify b,c
> 5) I add b,c
> 6) I commit
> 7) I modify d
> 8) I commit d
> 9) I push to remote
> 
> Does step 6) above result in single commit object (single SHA reference) or 
> two?

One commit object. That commit object contains (eventually) a full directory 
tree reference to every file in the project as of that commit.

Three files have been changed, so three SHA references are different. Because 
of that, each node that describes a directory will differ, so everything above 
that will have a new blob/node describing the files and subdirectories below 
that.

> From a developer engineer's perspective, what does commit signify? Does it 
> mean something key development is complete?
> I ask this question to understand when a developer would do some development 
> but would make multiple commits.

Depends on your workflow.

Some people say "commit early, commit often".
Some people say "commit when it compiles"

I personally do "branch, then commit early, commit often, merge back when it 
passes tests".

(Any given commit on the subbranch will probably compile, but almost certainly 
has half-implemented features that just won't work.)

Some people take this a step further, and say "Do your work on a branch, then 
commit a single squashed commit of the whole branch".

As far as I can tell, not only is there no "one true workflow", trying to 
figure out what workflow is right for the situation you are in is exceedingly 
non-trivial, and I have not seen it well-addressed in any of the guides for 
learning GIT that I have seen yet.

> Similarly, if during push step, if it is found that remote is ahead of local 
> (and most likely requires merging) then does it mean anything wrt to the 
> already computed SHA?
> I assume that already computed SHA has no meaning.

I am not the person to ask here. My pushes only go to a repository I work on.

I would *love* to see a good writeup on how to manage, and contribute to, a 
repository that will accept updates from many people doing pull requests, 
letting you update as the master does, without losing your own "features in 
progress". Yes, this is supposed to be git's area, but trying to understand 
from the docs how to manage it, especially if it later turns out that the 
"upstream master" is going off in an incompatible direction, and you want to 
take over managing a fork that is doing something else, and accepting pull 
requests from others.

It doesn't help that as far as I can tell, "Pull Requests" are actually things 
that Github and Bitbucket do that are not in the core base git.

---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-22 Thread Sharan Basappa
Dear Philip, Others,

Thanks a lot. I have some follow-up questions.

I am using a simple scenario to get additional clarity.

1) I have 4 files in my branch (a,b,c,d)
2) I modify a
3) I add a
4) I modify b,c
5) I add b,c
6) I commit
7) I modify d
8) I commit d
9) I push to remote

Does step 6) above result in single commit object (single SHA reference) or 
two?

>From a developer engineer's perspective, what does commit signify? Does it 
mean something key development is complete?
I ask this question to understand when a developer would do some 
development but would make multiple commits.

Similarly, if during push step, if it is found that remote is ahead of 
local (and most likely requires merging) then does it mean anything wrt to 
the already computed SHA?
I assume that already computed SHA has no meaning.

Thanks,

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-21 Thread Philip Oakley
  - Original Message - 
  From: Sharan Basappa 
  To: Git for human beings 
  Cc: philipoak...@iee.org 
  Sent: Saturday, May 21, 2016 5:02 PM
  Subject: Re: [git-users] How GIT stores data




Do note that there is no file date information stored in the tree/blob 
data. The only place dates are recored is at the point of the commit. Likewise 
the only file permission stored is the *nix executable bit


   Dear Philip,


  So, the snapshot is taken when commit is done?
  Also, what does the SHA that is returned when a commit is complete indicate?
  is it the overall signature for the snapshot or commit identifier?

Almost, but not quite.

Each file snapshot is taken at the time it was added to the index. So most were 
not taken at the time of the commit, but were taken earlier (perhaps at an 
earlier commit if the file was not modified ;-)

Each sha1 is a hash over the content of that object. 

Because the tree, commit and tag objects each contain a list of subordinate 
sha1's that they cover, then you get a complete crypto security over the 
appropriate history. The commit object will list the previous commit's sha1, 
the sha1 of the top level tree, and the date it was committed, which fully 
wraps up the history and status up to that point.


Minor aside, the commit object acually has two dates, one for the originating 
author, and one for the authorising committer (the manager). For local local 
project they can be the same person, for big projects (Git, Linux, etc.) they 
will be different.
--

Philip

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-21 Thread Sharan Basappa

>
>
> Do note that there is no file date information stored in the tree/blob 
> data. The only place dates are recored is at the point of the commit. 
> Likewise the only file permission stored is the *nix executable bit
>

 Dear Philip,

So, the snapshot is taken when commit is done?
Also, what does the SHA that is returned when a commit is complete indicate?
is it the overall signature for the snapshot or commit identifier?

Thanks

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-21 Thread Philip Oakley
- Original Message - 
  From: Sharan Basappa 
  To: Git for human beings 
  Sent: Saturday, May 21, 2016 4:53 AM
  Subject: Re: [git-users] How GIT stores data


  Sure. Think of Git as a three layered tool. 

  The top layer is a polished interface, called "Porcelain", that is 
designed to easily manage snapshots and compares and merges of filesystem 
trees. 

  The bottom layer, on the other hand, is a filesystem. Files in this 
filesystem are read-only. The names of files are fixed based on their content. 
So identical files have the same name, and are stored once in the file system. 

  Building up from fixed files that do not change, are directory objects, 
that map human understandable filenames to internal names. And, since this is 
itself a filesystem object, if everything in a directory is identical, then the 
directory entry is identical, and only stored once. 

  Based on this, it's pretty easy to see that if two commits are completely 
identical, then the only thing that differs is the commit object itself, which 
will have a time stamp and user comment. 

  (The middle layer by the way, are low-level tools designed to work with 
the files in this filesystem.) 




  Dear Michael & Philip,



  Thanks. I think I am getting a hang of it.


  So, when an existing file is modified then I assume that Git computes its 
signature and then checks if such a file already exists.
  Is this correct? I ask this because my change can be such that it is same as 
one that was previously committed (sort of reverting back a file).


  The other thing I understand is that Git always stores every unique instance 
of a file as it is and not its differences with a reference file.


  One more question I have is on the file system. As such when I clone a 
repository, I get full repository and files locally.
  So, when I clone a repository, I have full repository and one set of project 
files (depending on the branch I have checked out) locally)


  Thanks,

  -- 
Git cheats regarding the initial detection of file modification - it just uses 
the file sytem's modified time (mtime), file size, and a couple of other easy 
to determine values that are reliable indicators. 

Even if it get's it wrong, because Git is snapshot based, it would have taken a 
snapshot anyway! Plus because the snapshot is identical, it wouldn't have 
actually added anything to the repository (it already had the copy;-). These 
file content snapshots are called 'blobs'.

Note that at each stage it is the content, not the metadata that is stored, so 
renaming a file doesn't change it's content and nothing new is stored at that 
level (it's the same blob). However at the 'directory tree' level (what 'ls', 
or 'dir' list) then that content (of the tree's description) has changed, and 
it's stored there (these are called 'trees'). So you can have as many copies of 
a LICENCE or COPYING file as you like and all that extra content takes no extra 
file space (it's one single blob), with only a small amount of space for the 
tree, and if that doen't change from commit to commit, then no need for another 
copy...

Do note that there is no file date information stored in the tree/blob data. 
The only place dates are recored is at the point of the commit. Likewise the 
only file permission stored is the *nix executable bit.


In answer to the clone question. Yes you get a full copy. You can checkout the 
file tree at any point in the project's history (using the various revision 
specification methods - more fun), though more frequently it is the tree at the 
tip of one of the branches.

There is also a whole load of stuff to discover about 'remote tracking 
branches' (which are local branches which track a remote system), and realising 
that they are actually local, and just part of your local history tree, and 
it's only a naming convention

--
Philip

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-21 Thread Sharan Basappa

Thank you, Michael.
 

> Correct. 
>
> Now, just a few months ago, I had these same questions. I hope I have 
> learned this well enough that I can teach it accurately. 
>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Michael

On 2016-05-20, at 8:53 PM, Sharan Basappa  wrote:

> Sure. Think of Git as a three layered tool. 
> 
> The top layer is a polished interface, called "Porcelain", that is designed 
> to easily manage snapshots and compares and merges of filesystem trees. 
> 
> The bottom layer, on the other hand, is a filesystem. Files in this 
> filesystem are read-only. The names of files are fixed based on their 
> content. So identical files have the same name, and are stored once in the 
> file system. 
> 
> Building up from fixed files that do not change, are directory objects, that 
> map human understandable filenames to internal names. And, since this is 
> itself a filesystem object, if everything in a directory is identical, then 
> the directory entry is identical, and only stored once. 
> 
> Based on this, it's pretty easy to see that if two commits are completely 
> identical, then the only thing that differs is the commit object itself, 
> which will have a time stamp and user comment. 
> 
> (The middle layer by the way, are low-level tools designed to work with the 
> files in this filesystem.) 
> 
>  
> Dear Michael & Philip,
> 
> Thanks. I think I am getting a hang of it.
> 
> So, when an existing file is modified then I assume that Git computes its 
> signature and then checks if such a file already exists.
> Is this correct? I ask this because my change can be such that it is same as 
> one that was previously committed (sort of reverting back a file).

Yep. Once git knows the signature (a "hash"), it also knows a filename that 
identifies a file with that hash. If it sees the filename in use, it knows it 
has a duplicate.

Reverting changes is common enough that there are commands to do it. They are 
among the most complicated ones given all the warnings in the manual :-)

> The other thing I understand is that Git always stores every unique instance 
> of a file as it is and not its differences with a reference file.

Correct

> One more question I have is on the file system. As such when I clone a 
> repository, I get full repository and files locally.
> So, when I clone a repository, I have full repository and one set of project 
> files (depending on the branch I have checked out) locally)

Correct.

Now, just a few months ago, I had these same questions. I hope I have learned 
this well enough that I can teach it accurately.

---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Sharan Basappa

>
> Sure. Think of Git as a three layered tool. 
>>
>> The top layer is a polished interface, called "Porcelain", that is 
>> designed to easily manage snapshots and compares and merges of filesystem 
>> trees. 
>>
>> The bottom layer, on the other hand, is a filesystem. Files in this 
>> filesystem are read-only. The names of files are fixed based on their 
>> content. So identical files have the same name, and are stored once in the 
>> file system. 
>>
>> Building up from fixed files that do not change, are directory objects, 
>> that map human understandable filenames to internal names. And, since this 
>> is itself a filesystem object, if everything in a directory is identical, 
>> then the directory entry is identical, and only stored once. 
>>
>> Based on this, it's pretty easy to see that if two commits are completely 
>> identical, then the only thing that differs is the commit object itself, 
>> which will have a time stamp and user comment. 
>>
>> (The middle layer by the way, are low-level tools designed to work with 
>> the files in this filesystem.) 
>>
>
>  
Dear Michael & Philip,

Thanks. I think I am getting a hang of it.

So, when an existing file is modified then I assume that Git computes its 
signature and then checks if such a file already exists.
Is this correct? I ask this because my change can be such that it is same 
as one that was previously committed (sort of reverting back a file).

The other thing I understand is that Git always stores every unique 
instance of a file as it is and not its differences with a reference file.

One more question I have is on the file system. As such when I clone a 
repository, I get full repository and files locally.
So, when I clone a repository, I have full repository and one set of 
project files (depending on the branch I have checked out) locally)

Thanks,

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Philip Oakley

From: "Michael" <keybou...@gmail.com>
To: <git-users@googlegroups.com>
Sent: Friday, May 20, 2016 7:28 PM
Subject: Re: [git-users] How GIT stores data



On 2016-05-20, at 11:10 AM, Sharan Basappa <sharan.basa...@gmail.com> wrote:


Folks,

I am pretty much new to Git though I am using it for a couple of projects 
(without much understanding as such).


In Git documents, it is mentioned that Git stores data as a stream of 
snapshots. Compared to other VCS tools, the only difference I am able to 
tell is that Git stores the entire file for each versions while other VCS 
tools might store only differences.



Can someone help me understand this?


Sure. Think of Git as a three layered tool.

The top layer is a polished interface, called "Porcelain", that is designed 
to easily manage snapshots and compares and merges of filesystem trees.


The bottom layer, on the other hand, is a filesystem. Files in this 
filesystem are read-only. The names of files are fixed based on their 
content. So identical files have the same name, and are stored once in the 
file system.


Building up from fixed files that do not change, are directory objects, that 
map human understandable filenames to internal names. And, since this is 
itself a filesystem object, if everything in a directory is identical, then 
the directory entry is identical, and only stored once.


Based on this, it's pretty easy to see that if two commits are completely 
identical, then the only thing that differs is the commit object itself, 
which will have a time stamp and user comment.


(The middle layer by the way, are low-level tools designed to work with the 
files in this filesystem.)


--
Sharan,
In addition to Michael's description, Git does have a method for compression 
of it's repository, which it uses where possible, called Pack files.


So rather than recording changes (as noted), Git will record complete 
snapshots, and then compress the full history of all revisions in one go 
(see some of Linus's laws).


The compressed repository (with all its history) can be smaller than the 
checked out work tree, so it is efficient to hold the whole snapshot. There 
is also a whole load of sha1 hash keys that pervade and validate the 
history, which is good as you always know that if your hash key has the same 
value as their hash key then they are seeing exactly the same history and 
content as you, no matter how far away and unknown they are to you. (and if 
the key's differ, all bets are off!)


Philip 


--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Michael

On 2016-05-20, at 11:10 AM, Sharan Basappa  wrote:

> Folks,
> 
> I am pretty much new to Git though I am using it for a couple of projects 
> (without much understanding as such).
> 
>In Git documents, it is mentioned that Git stores data as a stream of 
>snapshots. Compared to other VCS tools, the only difference I am able to tell 
>is that Git stores the entire file for each versions while other VCS tools 
>might store only differences.

>Can someone help me understand this?

Sure. Think of Git as a three layered tool.

The top layer is a polished interface, called "Porcelain", that is designed to 
easily manage snapshots and compares and merges of filesystem trees.

The bottom layer, on the other hand, is a filesystem. Files in this filesystem 
are read-only. The names of files are fixed based on their content. So 
identical files have the same name, and are stored once in the file system.

Building up from fixed files that do not change, are directory objects, that 
map human understandable filenames to internal names. And, since this is itself 
a filesystem object, if everything in a directory is identical, then the 
directory entry is identical, and only stored once.

Based on this, it's pretty easy to see that if two commits are completely 
identical, then the only thing that differs is the commit object itself, which 
will have a time stamp and user comment.

(The middle layer by the way, are low-level tools designed to work with the 
files in this filesystem.)


---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[git-users] How GIT stores data

2016-05-20 Thread Sharan Basappa
Folks,

I am pretty much new to Git though I am using it for a couple of projects 
(without much understanding as such).

In Git documents, it is mentioned that Git stores data as a stream of 
snapshots. Compared to other VCS tools, the only difference I am able to 
tell is
that Git stores the entire file for each versions while other VCS tools 
might store only differences.

Can someone help me understand this?

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.