Re: [git-users] How GIT stores data

2016-05-20 Thread Michael

On 2016-05-20, at 8:53 PM, Sharan Basappa  wrote:

> Sure. Think of Git as a three layered tool. 
> 
> The top layer is a polished interface, called "Porcelain", that is designed 
> to easily manage snapshots and compares and merges of filesystem trees. 
> 
> The bottom layer, on the other hand, is a filesystem. Files in this 
> filesystem are read-only. The names of files are fixed based on their 
> content. So identical files have the same name, and are stored once in the 
> file system. 
> 
> Building up from fixed files that do not change, are directory objects, that 
> map human understandable filenames to internal names. And, since this is 
> itself a filesystem object, if everything in a directory is identical, then 
> the directory entry is identical, and only stored once. 
> 
> Based on this, it's pretty easy to see that if two commits are completely 
> identical, then the only thing that differs is the commit object itself, 
> which will have a time stamp and user comment. 
> 
> (The middle layer by the way, are low-level tools designed to work with the 
> files in this filesystem.) 
> 
>  
> Dear Michael & Philip,
> 
> Thanks. I think I am getting a hang of it.
> 
> So, when an existing file is modified then I assume that Git computes its 
> signature and then checks if such a file already exists.
> Is this correct? I ask this because my change can be such that it is same as 
> one that was previously committed (sort of reverting back a file).

Yep. Once git knows the signature (a "hash"), it also knows a filename that 
identifies a file with that hash. If it sees the filename in use, it knows it 
has a duplicate.

Reverting changes is common enough that there are commands to do it. They are 
among the most complicated ones given all the warnings in the manual :-)

> The other thing I understand is that Git always stores every unique instance 
> of a file as it is and not its differences with a reference file.

Correct

> One more question I have is on the file system. As such when I clone a 
> repository, I get full repository and files locally.
> So, when I clone a repository, I have full repository and one set of project 
> files (depending on the branch I have checked out) locally)

Correct.

Now, just a few months ago, I had these same questions. I hope I have learned 
this well enough that I can teach it accurately.

---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Sharan Basappa

>
> Sure. Think of Git as a three layered tool. 
>>
>> The top layer is a polished interface, called "Porcelain", that is 
>> designed to easily manage snapshots and compares and merges of filesystem 
>> trees. 
>>
>> The bottom layer, on the other hand, is a filesystem. Files in this 
>> filesystem are read-only. The names of files are fixed based on their 
>> content. So identical files have the same name, and are stored once in the 
>> file system. 
>>
>> Building up from fixed files that do not change, are directory objects, 
>> that map human understandable filenames to internal names. And, since this 
>> is itself a filesystem object, if everything in a directory is identical, 
>> then the directory entry is identical, and only stored once. 
>>
>> Based on this, it's pretty easy to see that if two commits are completely 
>> identical, then the only thing that differs is the commit object itself, 
>> which will have a time stamp and user comment. 
>>
>> (The middle layer by the way, are low-level tools designed to work with 
>> the files in this filesystem.) 
>>
>
>  
Dear Michael & Philip,

Thanks. I think I am getting a hang of it.

So, when an existing file is modified then I assume that Git computes its 
signature and then checks if such a file already exists.
Is this correct? I ask this because my change can be such that it is same 
as one that was previously committed (sort of reverting back a file).

The other thing I understand is that Git always stores every unique 
instance of a file as it is and not its differences with a reference file.

One more question I have is on the file system. As such when I clone a 
repository, I get full repository and files locally.
So, when I clone a repository, I have full repository and one set of 
project files (depending on the branch I have checked out) locally)

Thanks,

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Problem:fatal: early EOF fatal: index-pack failed

2016-05-20 Thread Dale R. Worley
Jeremy Yang  writes:
> When I executed the "git clone git://url -b  branch" cmd by multi-thread(40 
> threads) at the same time,several cloned failures would often occur.
>
> However,the max-connections is setted to zero which is for no limit.
>
>
> *Git-daemon CMD:*
>
> /usr/bin/git daemon --verbose --syslog --reuseaddr 
> --base-path=/home/user/repositories --max-connections=0
>
>
> *Client Log:*
>
> remote: fatal: unable to create thread: Resource temporarily unavailable

The max-connections setting may be unlimited, but that doesn't mean that
the server daemon can create an enormous number of threads.  The last
quoted line is the client reporting that the server reported that it
could not create a new thread.  There are many reasons why that might
happen.  One possibility is that the server system ran out of RAM and
swap space.  Another possibility is that the user running the server
process has a low quota of processes that it is allowed to create.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Git life cycle

2016-05-20 Thread Dale R. Worley
Sharan Basappa  writes:
> Git mentions that state of the file as untracked, unmodified, modified and 
> staged.
>
> As I understand untracked files are not yet in the respository.
> unmodified and modified is understood but what action results in a file 
> being in staged state?
> is it git add or git commit?

I had trouble with these terms myself.  The problem is that the
situation is a bit complex, and few Git references stop to carefully and
exactly define the terms.  I eventually wrote myself a guide to Git, and
within it is this definition of those terms:

Index

  A commit is not created as a copy of the working copy directory tree,
  but rather of a shadow directory tree called the *index* (or
  sometimes, the *cache*).  This makes it easier to control exactly
  which changes in the working copy are included in a commit.  Note that
  despite its name, the index does not contain *references* to files in
  the working copy directory tree, but rather *fixed copies* of those
  files, and the index's contents can differ from the working copy.

  Generally, the index contains either the contents of the base commit,
  or those contents plus some or all of the modifications that are in
  the working copy.  The names of the files that differ between the
  working copy, the index, and the base commit are shown by the "git
  status" command.  The contents of the files are compared by the "git
  diff" command.  New or changed files in the working copy can be copied
  to the index with "git add".  The output of "git status" shows example
  commands to add or remove changes from the current state of the index.
  Files can be removed from the index with "git rm".

File classification

  Files in the working copy are classified into three groups.  Which
  group a file is in determines the effect many Git commands have on it.
  
  *tracked*
A file is tracked if it is in the base commit of the repository or
if it is in the index.  Git commands that are intended to apply to
"all" files generally apply to all tracked files.
  *ignored*
If a file is ignored, it is not operated on by Git commands that
do not explicitly specify it.  There are several mechanisms for
specifying which files should be ignored, of which the ".gitignore"
file is the most common.  Any directory can contain a ".gitignore"
file, which (in the simplest case) contains a (newline-separated)
list of filename globs.  Any file whose name matches a glob in the
".gitignore" file of its directory, or the ".gitignore" file of any
ancestor directory (up to the top directory of the repository) is
ignored.  Similarly, a ".gitignore" file can contain a glob preceded
by "/".  That glob only applies to file files in the directory
containing the ".gitignore" file.  (See the "gitignore" manual page
for the full details.)  The ".gitignore" files should be
version-controlled files in the repository.
  *untracked*
All files that are neither tracked nor ignored are considered
untracked.  Git assumes that you may want to track them at some
future time.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Philip Oakley

From: "Michael" 
To: 
Sent: Friday, May 20, 2016 7:28 PM
Subject: Re: [git-users] How GIT stores data



On 2016-05-20, at 11:10 AM, Sharan Basappa  wrote:


Folks,

I am pretty much new to Git though I am using it for a couple of projects 
(without much understanding as such).


In Git documents, it is mentioned that Git stores data as a stream of 
snapshots. Compared to other VCS tools, the only difference I am able to 
tell is that Git stores the entire file for each versions while other VCS 
tools might store only differences.



Can someone help me understand this?


Sure. Think of Git as a three layered tool.

The top layer is a polished interface, called "Porcelain", that is designed 
to easily manage snapshots and compares and merges of filesystem trees.


The bottom layer, on the other hand, is a filesystem. Files in this 
filesystem are read-only. The names of files are fixed based on their 
content. So identical files have the same name, and are stored once in the 
file system.


Building up from fixed files that do not change, are directory objects, that 
map human understandable filenames to internal names. And, since this is 
itself a filesystem object, if everything in a directory is identical, then 
the directory entry is identical, and only stored once.


Based on this, it's pretty easy to see that if two commits are completely 
identical, then the only thing that differs is the commit object itself, 
which will have a time stamp and user comment.


(The middle layer by the way, are low-level tools designed to work with the 
files in this filesystem.)


--
Sharan,
In addition to Michael's description, Git does have a method for compression 
of it's repository, which it uses where possible, called Pack files.


So rather than recording changes (as noted), Git will record complete 
snapshots, and then compress the full history of all revisions in one go 
(see some of Linus's laws).


The compressed repository (with all its history) can be smaller than the 
checked out work tree, so it is efficient to hold the whole snapshot. There 
is also a whole load of sha1 hash keys that pervade and validate the 
history, which is good as you always know that if your hash key has the same 
value as their hash key then they are seeing exactly the same history and 
content as you, no matter how far away and unknown they are to you. (and if 
the key's differ, all bets are off!)


Philip 


--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Git life cycle

2016-05-20 Thread Philip Oakley
- Original Message - 
  From: Sharan Basappa 
  To: Git for human beings 
  Sent: Friday, May 20, 2016 7:16 PM
  Subject: [git-users] Git life cycle


  Folks,


  Git mentions that state of the file as untracked, unmodified, modified and 
staged.


  As I understand untracked files are not yet in the respository.
  unmodified and modified is understood but what action results in a file being 
in staged state?
  is it git add or git commit?


  Thanks, 
Git does take a little while to grasp, which is because of the shift from a 
centralised store to a distributed storage and control method.

Git holds a local state file called the Index, also called the staging area, 
that helps mediate your movement of updated files from your local file system 
(the work tree) into the repository.

An untracked file, at a particular path in your work tree is one that you 
haven't given to Git or it's Index to track. (you can also explicitly tell Git 
which files to ignore)

If you decide that you want to store the contents of a file in the repository, 
first you add it to the Index, that is, it is staged [i.e. the file is ready 
for formal despatch to the repository]. 

If a file had already been stored in the repository previously, then it would 
be considered unmodified. If you had changed the file from it's previously 
stored content, then it would have be 'modified', but note that you haven't yet 
added it to the index. If you add it, then it becomes staged, rather than 
modified. If you continue changing it you get that it's modified again, and 
there is that staged copy in the index.

Once you have all the right content in the Index you can commit that package of 
changed files. At that point your staged files will revert to being labelled 
unmodified, though if you had a modified file in the work tree it would still 
be modified. This can be confusing at first but is quite powerful when needed. 

It is worth look for the various git books and explanations of the git 
internals. Which one works for you will depend on your interests (e.g. 
programmer vs computer scientist ;-)

Philip

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Git life cycle

2016-05-20 Thread Konstantin Khomoutov
On Fri, 20 May 2016 11:16:27 -0700 (PDT)
Sharan Basappa  wrote:

> Git mentions that state of the file as untracked, unmodified,
> modified and staged.
> 
> As I understand untracked files are not yet in the respository.

This is correct.

> unmodified and modified is understood but what action results in a
> file being in staged state?
> is it git add or git commit?

`git add` -- which adds the file to the staging area.

The staging area is the place in the repository which contains the
state from which the new commit will be recorded when you run
`git commit`.

Please note that this your question is really so basic I highly advise
you do actually read an introductory book on Git before asking them and
needlessly irritating folks with trivial questions.  You know, this
list is intended to help non-hardcore Git users help solving problems
they have with the tool, not actually tutoring them about the basics.

[1] should get you started, and [2] is a free go-to book on Git these
days.

1. http://git-scm.com/documentation
2. http://git-scm.com/book

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Konstantin Khomoutov
On Fri, 20 May 2016 11:10:19 -0700 (PDT)
Sharan Basappa  wrote:

[...]
> In Git documents, it is mentioned that Git stores data as a stream of 
> snapshots. Compared to other VCS tools, the only difference I am able
> to tell is
> that Git stores the entire file for each versions while other VCS
> tools might store only differences.
[...]

Yes, each commit is a snapshot of the whole project.

But you have to understand that's *a concept* because behind the scenes
Git re-uses objects which are the same between multiple such snapshots,
compresses everything compressible and further crams data from older
history into the so-called pack files.  Hence even if the concept Git
uses to manage history sounds like being ineffective, to my knowledge,
Git is currently the most disk-space-effective VC tool in existence
(tested by various folks on projects of insane size such as Mozilla
Firefox codebase).

Please google the "Git from the bottom up" document and read it if you
want to know more nitty-gritty details about the Git implementation.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] Git life cycle

2016-05-20 Thread Michael

On 2016-05-20, at 11:16 AM, Sharan Basappa  wrote:

> Folks,
> 
> Git mentions that state of the file as untracked, unmodified, modified and 
> staged.
> 
> As I understand untracked files are not yet in the respository.
> unmodified and modified is understood but what action results in a file being 
> in staged state?
> is it git add or git commit?

As I understand it:

"Staged" essentially means that the file has been added to the Git filesystem, 
but is not yet associated with any commit. It is also called, "cached".

Because it is in the file system, the tools that do things such as showing you 
differences, merging, etc. can work with it. And, if it appears in a commit, 
all Git needs to do is record the username to internal name mapping.

Internally, (and this format has changed, but I believe this is correct for the 
current version of Git), the "index file" stores up to three different internal 
names for each username for the current check out--one for the "ready to 
commit" version, and as many as two versions for resolving merge conflicts.

A file becomes staged from the Git add command

---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] How GIT stores data

2016-05-20 Thread Michael

On 2016-05-20, at 11:10 AM, Sharan Basappa  wrote:

> Folks,
> 
> I am pretty much new to Git though I am using it for a couple of projects 
> (without much understanding as such).
> 
>In Git documents, it is mentioned that Git stores data as a stream of 
>snapshots. Compared to other VCS tools, the only difference I am able to tell 
>is that Git stores the entire file for each versions while other VCS tools 
>might store only differences.

>Can someone help me understand this?

Sure. Think of Git as a three layered tool.

The top layer is a polished interface, called "Porcelain", that is designed to 
easily manage snapshots and compares and merges of filesystem trees.

The bottom layer, on the other hand, is a filesystem. Files in this filesystem 
are read-only. The names of files are fixed based on their content. So 
identical files have the same name, and are stored once in the file system.

Building up from fixed files that do not change, are directory objects, that 
map human understandable filenames to internal names. And, since this is itself 
a filesystem object, if everything in a directory is identical, then the 
directory entry is identical, and only stored once.

Based on this, it's pretty easy to see that if two commits are completely 
identical, then the only thing that differs is the commit object itself, which 
will have a time stamp and user comment.

(The middle layer by the way, are low-level tools designed to work with the 
files in this filesystem.)


---
Entertaining minecraft videos
http://YouTube.com/keybounce

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[git-users] Git life cycle

2016-05-20 Thread Sharan Basappa
Folks,

Git mentions that state of the file as untracked, unmodified, modified and 
staged.

As I understand untracked files are not yet in the respository.
unmodified and modified is understood but what action results in a file 
being in staged state?
is it git add or git commit?

Thanks, 

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[git-users] How GIT stores data

2016-05-20 Thread Sharan Basappa
Folks,

I am pretty much new to Git though I am using it for a couple of projects 
(without much understanding as such).

In Git documents, it is mentioned that Git stores data as a stream of 
snapshots. Compared to other VCS tools, the only difference I am able to 
tell is
that Git stores the entire file for each versions while other VCS tools 
might store only differences.

Can someone help me understand this?

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[git-users] Problem:fatal: early EOF fatal: index-pack failed

2016-05-20 Thread Jeremy Yang
*git-daemon cannot process connections whose amount exceeded 40 (or a 
little more):*

When I executed the “git clone git://url -b  branch” cmd by multi-thread(40 
threads) at the same time,several cloned failures would often occur.

However,the max-connections is setted to zero which is for no limit.


*Git-daemon CMD:*

/usr/bin/git daemon --verbose --syslog --reuseaddr 
--base-path=/home/user/repositories --max-connections=0


*Client Log:*

remote: fatal: unable to create thread: Resource temporarily unavailable

remote: aborting due to possible repository corruption on the remote side.

fatal: early EOF

fatal: index-pack failed

 

*Server Log:*

error: git upload-pack: git-pack-objects died with error.

fatal: git upload-pack: aborting due to possible repository corruption on 
the remote side.

 

*Git version : *

git version 1.9.0

 

*System Config:*

CPU:32

 

OS:rethat linux

 

Memory:128G

 

core file size(blocks, -c) 0

data seg size   (kbytes, -d) unlimited

scheduling priority (-e) 0

file size   (blocks, -f) unlimited

pending signals (-i) 1032156

max locked memory   (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files  (-n) 27

pipe size(512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority  (-r) 0

stack size  (kbytes, -s) 10240

cpu time   (seconds, -t) unlimited

max user processes  (-u) unlimited

virtual memory  (kbytes, -v) unlimited

file locks  (-x) unlimited

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.