Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-24 Thread Felipe Contreras
Karsten Blees wrote:
> (2) Index
> 
> An index, as in a library, maps almost perfectly to what the git index is
> _and_ what we do with it.

Not really. An index in the context of a library, and in any other context, is
a tool that indicates where something is, in order to find it quickly.

That is not how the Git index is used, nor what it is.

> (3b) Staging area (other meanings)
> 
> I don't see how a stage (as in a theater) is in any way related to the git
> index.
> 
> Data staging (as in loading a datawarehouse or web-server) fits to some
> extent, as its also about copying information, not moving physical things.

A stage in theater, and in any other context, is a special place, a standing
place, I don't see what is so different from the git staging area.

> > Even 'native' speakers don't have a single consistent term for the
> > concept. Terms are stolen from many varied industries and activities
> > that have to prepare and package items (Ships, Trains, Theaters)
> > (see http://en.wikipedia.org/wiki/Shipping_list, for a shortish list, which 
> > doesn't mention an Index)
> 
> All true, but we don't need to steal terms from unrelated fields if
> information science provides us with the terms we need.

But it doesn't.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-24 Thread Andreas Krey
On Thu, 24 Oct 2013 02:57:15 +, Karsten Blees wrote:
...
> The latter. I don't know about 'broader', but I'll try to summarize _my_ 
> world view:
> 
> (1) Audience matters
> 
> For actual users, we need an accurate model that supports a variety of use 
> cases without falling apart. IMO, a working model is more important than 
> simplicity. Finally, its more important to agree on the actual model than on 
> a vague term that can mean many things (theater stage vs. loading dock...).

Terms almost invariable mean multiple things in different contexts,
and assume new meaning in new fields.

> For potential users / decision makers, we need to describe git's features in 
> unmistakable terms that don't need extra explanation. In this sense, the 
> index / cache / staging area is not a feature in itself but facilitates a 
> variaty of broader features:
> - fine grained commit control (via index (add -i), but also commit -p, commit 
> --amend, cherry-pick, rebase etc.)

The audience will have a hard time understanding what these features
actually do (and how they interact) if we hide the underlying model from
them - they then need to build that model themselves.

And no decision-maker will make the effort to understand either the
operations you mention or the concept of the staging area, unless they
are also users.

...
> An index, as in a library, maps almost perfectly to what the git index is 
> _and_ what we do with it.

No, it doesn't. The git index actually contains the content of the added
files, not just an identity reference. (Unless, maybe, you consider file
sha1s as a reference and not actual content.) The point is that the
index doesn't just contain a mapping from file names to some objects,
but de facto a tree - that will form the next commit.

...
> (3a) Staging area (logistics)
> 
> A staging area, as in (military) logistics / transportation, is about moving 
> physical goods around. You move an item from your stock to the staging area, 
> then onto the truck and finally deliver it to the customer.
> 
> The defining characteristic of a physical good is its physical existence. 
> Each item is uniquely identifiable by a serial number.

Please show me the serial numbers on bullets.

> Problem #1: If an item in the staging area is broken, you fix it directly in 
> the staging area, because that's where it _is_.

That may be true in a physical world, and may not be - you can as well
replace them instead of repairing them in place.

The real problem: You can find some reason why any possible existing
name for this concept isn't correct.

...
> I don't see how a stage (as in a theater) is in any way related to the git 
> index.

It's because the name 'stage (noun)' goes with the verb 'stage'. You
stage a play, or you stage content to be committed. From that, you
may almost call the index just 'stage'.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-23 Thread Karsten Blees
Am 19.10.2013 16:08, schrieb Philip Oakley:
> From: "Karsten Blees" 
>> Am 15.10.2013 00:29, schrieb Felipe Contreras:
>>> tl;dr: everyone except Junio C Hamano and Drew Northup agrees; we
>>> should move
>>> away from the name "the index".
>>>
>>> It has been discussed many times in the past that 'index' is not an
>>> appropriate description for what the high-level user does with it,
>>> and
>>> it has been agreed that 'staging area' is the best term.
>>>
>>
>> I haven't followed the previous discussion, but if a final conclusion
>> towards 'staging area' has already been reached, it should probably be
>> revised.
> 
> Do you mean that how that conclusion was reached should be summarised,
> or that you don't think it's an appropriate summary of the broader
> weltanschauung?
> 

The latter. I don't know about 'broader', but I'll try to summarize _my_ world 
view:

(1) Audience matters

For actual users, we need an accurate model that supports a variety of use 
cases without falling apart. IMO, a working model is more important than 
simplicity. Finally, its more important to agree on the actual model than on a 
vague term that can mean many things (theater stage vs. loading dock...).

For potential users / decision makers, we need to describe git's features in 
unmistakable terms that don't need extra explanation. In this sense, the index 
/ cache / staging area is not a feature in itself but facilitates a variaty of 
broader features:
- fine grained commit control (via index (add -i), but also commit -p, commit 
--amend, cherry-pick, rebase etc.)
- performance
- merging


(2) Index

An index, as in a library, maps almost perfectly to what the git index is _and_ 
what we do with it. No, I don't mean .so/.dll/.lib files, I'm talking about the 
real thing with shelves of books and a big box with index cards (aka the index).

The defining characteristic of a book (or publication in general) is its 
content, not its physical representation (paper). There are typically many 
indistinguishable copies of the same book. An author can continue working on 
the manuscript without affecting the copy at the library at all.

When a new or updated publication is submitted to the library, it is first 
added to the index and placed on a cart at the reception desk. Some time later, 
the librarian commits the content of the cart to the shelves. A user of the 
library will typically consult the index to lookup information or to check if 
his personal copy of a publication is up to date. The index can be thrown away 
and rebuilt from the content of the shelves. A big library may have a central 
repository and several local branches (aka field offices) that can be 
synchronized by comparing their indexes card by card.

Granted, a library is typically not versioned, and its unlikely that any one 
user will have checked out a full copy of the library's content. But otherwise, 
its pretty similar to git...


(3a) Staging area (logistics)

A staging area, as in (military) logistics / transportation, is about moving 
physical goods around. You move an item from your stock to the staging area, 
then onto the truck and finally deliver it to the customer.

The defining characteristic of a physical good is its physical existence. Each 
item is uniquely identifiable by a serial number. There may be many of the same 
kind, but there are no exact copies.

Problem #1: If an item in the staging area is broken, you fix it directly in 
the staging area, because that's where it _is_. Thus you also don't need to 
stage the item again. That's how conventional SCMs work: they track the 
identity (serial number, file name) of things.

Problem #2: The transportation model only supports additions. You cannot add an 
item to your staging area that, upon delivery, will magically remove itself 
from the possession of the customer. Let alone that you'd have to steal it 
first to be able to physically place it into your staging area.

This can be fixed by slightly modifying our mental model: instead of real 
things, lets think about "staging changes" (or deltas, or patches). Again, 
that's what conventional SCMs do and what git exactly does _not_ do.

Problem #3: In logistics, the state / inventory of the customer is irrelevant. 
If a customer orders an item he already has, its his problem. There's no need 
for core commands like status, diff or reset, and there's no way to explain 
what they do with a staging area model. What if a customer buys at another shop 
without telling us, effectively changing his inventory (git reset --soft)? This 
shouldn't affect our staging area at all, right? But with git it does...ooops.

(3b) Staging area (other meanings)

I don't see how a stage (as in a theater) is in any way related to the git 
index.

Data staging (as in loading a datawarehouse or web-server) fits to some extent, 
as its also about copying information, not moving physical things.

[...]
>>
>> 1.) Recording individual files to commit in advance (instead

Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-19 Thread Philip Oakley

From: "Karsten Blees" 

Am 15.10.2013 00:29, schrieb Felipe Contreras:

tl;dr: everyone except Junio C Hamano and Drew Northup agrees; we
should move
away from the name "the index".

It has been discussed many times in the past that 'index' is not an
appropriate description for what the high-level user does with it,
and
it has been agreed that 'staging area' is the best term.



I haven't followed the previous discussion, but if a final conclusion
towards 'staging area' has already been reached, it should probably be
revised.


Do you mean that how that conclusion was reached should be summarised,
or that you don't think it's an appropriate summary of the broader
weltanschauung?




The term 'staging area' is more intuitive for newcomers which are
more
familiar with English than with Git, and it seems to be a
straightforward mental notion for people with different mother
tongues.

In fact it is so intuitive that it's used already in a lot online
documentation, and the people that do teach Git professionally use
this
term, because it's easier for many kinds of audiences to grasp.



Such online documentation often portraits the 'staging area' as some
supposedly 'unique' git feature, which I find _very_ confusing. In
fact, every major SCM has a staging area. E.g. you first need to
"svn/hg/bzr/p4 add/remove/rename/move" a file, which is somehow
recorded in the working copy. These recorded changes will then be
picked up by a subsequent "svn/hg/bzr/p4 commit/submit".

Additionally, all those systems support selectively committing
individual files (by specifying the files on the commit command line
or selecting them in a GUI).

So git's 'unique staging area' boils down to this:

1.) Recording individual files to commit in advance (instead of
specifying them at commit time). Which isn't that hard to grasp.


For many, that separation of preparation(s), from the final action, is
brand new and difficult to appreciate - it's special to computer systems
(where copying is 100% reliable, essentially instantaneous, and in Git's
case, 100% verifiable via crypto checksums).



2.) Recording individual diff hunks or even lines to commit. Which is
an advanced feature that requires even more advanced commands to be
useful (i.e. stash save --keep-index; make; test; commit; stash pop).
It is also entirely irrelevant to binary files (i.e. for non-technical
users that use git to track documents and stuff).


index: an 'index' is a guide of pointers to something else; a book
index has a list of entries so the reader can locate information
easily without having to go through the whole book. Git porcelain is
not using the staging area to find out entries quicker; it's not an
index.



The 'staging area' is a sorted list of most recently checked out
files, and its primary purpose is to quickly detect changes in the
working copy (i.e. its an index).



There is a big (human) problem here. We (humans) are able to believe
contradictory things ("He ain't heavy, he's my brother" to quote a
song). The Index (file) isn't a staging area, but we are happy to flip
flop between the two ideas depending on context - others can feel
confused.

In one sense the "Index" is an implementation detail of the concept of a
packing area where a shipment (commit) is prepared, which is most
commonly called the staging are in populist discussions (which I believe 
is the summary I mentioned above)



Of the 28 builtin main porcelain commands, 18 read the index
(read_index), and 11 of those also check the state of the working copy
(ie_match_stat). Incidentally, the latter include _all_ commands that
update the so-called 'staging area'.

Subversion until recently kept the checked out files scattered in
**/.svn/text-base directories, and many operations were terribly slow
due to this. Subversion 1.7 introduced a new working copy format,
based on a database in the root .svn directory (i.e. an index),
leading to tremendous performance improvements.

The git index is a major feature that facilitates the incredible
performance we're so much addicted to...why be shy about it and call
it something else?


stage: a 'stage' is a special area designated for convenience in
order
for some activity to take place; an orator would prepare a stage in
order for her speak to be successful, otherwise many people might not
be able to hear, or see her. Git porcelain is using the staging area
precisely as a special area to be separated from the working
directory
for convenience.



I'm not a native speaker, but in my limited understanding, 'staging'
in computer jargon is the process of preparing data for a production
system (i.e. until the 'stage' or 'state' of the data is ready for
production). It has nothing to do with the 'stage' in a theater. I've
never heard the term 'staging' beeing used for source code or software
(that would be 'integration'). I've also never heard 'staging' for
moving data back from a production system to some work- or development
area.


Even 'native' sp

Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-18 Thread Felipe Contreras
Karsten Blees wrote:
> Am 15.10.2013 00:29, schrieb Felipe Contreras:
> > tl;dr: everyone except Junio C Hamano and Drew Northup agrees; we should 
> > move
> > away from the name "the index".
> > 
> > It has been discussed many times in the past that 'index' is not an
> > appropriate description for what the high-level user does with it, and
> > it has been agreed that 'staging area' is the best term.
> 
> I haven't followed the previous discussion, but if a final conclusion towards
> 'staging area' has already been reached, it should probably be revised.
> 
> > The term 'staging area' is more intuitive for newcomers which are more
> > familiar with English than with Git, and it seems to be a
> > straightforward mental notion for people with different mother tongues.
> > 
> > In fact it is so intuitive that it's used already in a lot online
> > documentation, and the people that do teach Git professionally use this
> > term, because it's easier for many kinds of audiences to grasp.
> 
> Such online documentation often portraits the 'staging area' as some
> supposedly 'unique' git feature, which I find _very_ confusing. In fact,
> every major SCM has a staging area. E.g. you first need to "svn/hg/bzr/p4
> add/remove/rename/move" a file, which is somehow recorded in the working
> copy. These recorded changes will then be picked up by a subsequent
> "svn/hg/bzr/p4 commit/submit".

That is not a staging area.

  % hg init test
  % cd test
  % echo Hello > README
  % hg add README
  % echo Bye > README
  % hg commit -m Init
  % hg log -p -r -1
  changeset:   0:ba28df72474c
  tag: tip
  user:Felipe Contreras 
  date:Fri Oct 18 19:43:42 2013 -0500
  summary: Init

  diff -r  -r ba28df72474c README
  --- /dev/null Thu Jan 01 00:00:00 1970 +
  +++ b/README  Fri Oct 18 19:43:42 2013 -0500
  @@ -0,0 +1,1 @@
  +Bye


What exactly got staged?

To me the best way to think about the staging area is like a commit draft. No
other VCS has anything like that. And what is the point about this argument?

> > index: an 'index' is a guide of pointers to something else; a book
> > index has a list of entries so the reader can locate information
> > easily without having to go through the whole book. Git porcelain is
> > not using the staging area to find out entries quicker; it's not an
> > index.
> 
> The 'staging area' is a sorted list of most recently checked out files, and
> its primary purpose is to quickly detect changes in the working copy (i.e.
> its an index).

That is not it's primary purpose.

> Of the 28 builtin main porcelain commands, 18 read the index (read_index),
> and 11 of those also check the state of the working copy (ie_match_stat).
> Incidentally, the latter include _all_ commands that update the so-called
> 'staging area'.
> 
> Subversion until recently kept the checked out files scattered in
> **/.svn/text-base directories, and many operations were terribly slow due to
> this. Subversion 1.7 introduced a new working copy format, based on a
> database in the root .svn directory (i.e. an index), leading to tremendous
> performance improvements.
> 
> The git index is a major feature that facilitates the incredible performance
> we're so much addicted to...why be shy about it and call it something else?

Tell me which subversion command adds and removes information from their
working copy metadata, which is not a used as a staging area, commit draft, or
even an index.

Moreover, we are not discussing about Git's index file, that low level concept
will stay the same, we are talking about the high level concept.

> > stage: a 'stage' is a special area designated for convenience in order
> > for some activity to take place; an orator would prepare a stage in
> > order for her speak to be successful, otherwise many people might not
> > be able to hear, or see her. Git porcelain is using the staging area
> > precisely as a special area to be separated from the working directory
> > for convenience.
> 
> I'm not a native speaker, but in my limited understanding, 'staging' in
> computer jargon is the process of preparing data for a production system
> (i.e. until the 'stage' or 'state' of the data is ready for production). It
> has nothing to do with the 'stage' in a theater.

It is the same. A stage in the theater is also used for preparing a production.

> I've never heard the term 'staging' beeing used for source code or software
> (that would be 'integration'). I've also never heard 'staging' for moving
> data back from a production system to some work- or development area.

Then why are people using it in external documentation? Why is ProGit already 
using it?

But more importantly: do you have a better name?

> In any sense, 'staging' is a unidirectional process (even in a theater).

It is not. Props and utilites are added and removed from the stage.

> If I think of the index as a staging area, it covers just a single use case:
> preparing new commits.

That is it

Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-18 Thread Karsten Blees
Am 15.10.2013 00:29, schrieb Felipe Contreras:
> tl;dr: everyone except Junio C Hamano and Drew Northup agrees; we should move
> away from the name "the index".
> 
> It has been discussed many times in the past that 'index' is not an
> appropriate description for what the high-level user does with it, and
> it has been agreed that 'staging area' is the best term.
> 

I haven't followed the previous discussion, but if a final conclusion towards 
'staging area' has already been reached, it should probably be revised.

> The term 'staging area' is more intuitive for newcomers which are more
> familiar with English than with Git, and it seems to be a
> straightforward mental notion for people with different mother tongues.
> 
> In fact it is so intuitive that it's used already in a lot online
> documentation, and the people that do teach Git professionally use this
> term, because it's easier for many kinds of audiences to grasp.
> 

Such online documentation often portraits the 'staging area' as some supposedly 
'unique' git feature, which I find _very_ confusing. In fact, every major SCM 
has a staging area. E.g. you first need to "svn/hg/bzr/p4 
add/remove/rename/move" a file, which is somehow recorded in the working copy. 
These recorded changes will then be picked up by a subsequent "svn/hg/bzr/p4 
commit/submit".

Additionally, all those systems support selectively committing individual files 
(by specifying the files on the commit command line or selecting them in a GUI).

So git's 'unique staging area' boils down to this:

1.) Recording individual files to commit in advance (instead of specifying them 
at commit time). Which isn't that hard to grasp.

2.) Recording individual diff hunks or even lines to commit. Which is an 
advanced feature that requires even more advanced commands to be useful (i.e. 
stash save --keep-index; make; test; commit; stash pop). It is also entirely 
irrelevant to binary files (i.e. for non-technical users that use git to track 
documents and stuff).

> index: an 'index' is a guide of pointers to something else; a book
> index has a list of entries so the reader can locate information
> easily without having to go through the whole book. Git porcelain is
> not using the staging area to find out entries quicker; it's not an
> index.
> 

The 'staging area' is a sorted list of most recently checked out files, and its 
primary purpose is to quickly detect changes in the working copy (i.e. its an 
index).

Of the 28 builtin main porcelain commands, 18 read the index (read_index), and 
11 of those also check the state of the working copy (ie_match_stat). 
Incidentally, the latter include _all_ commands that update the so-called 
'staging area'.

Subversion until recently kept the checked out files scattered in 
**/.svn/text-base directories, and many operations were terribly slow due to 
this. Subversion 1.7 introduced a new working copy format, based on a database 
in the root .svn directory (i.e. an index), leading to tremendous performance 
improvements.

The git index is a major feature that facilitates the incredible performance 
we're so much addicted to...why be shy about it and call it something else?

> stage: a 'stage' is a special area designated for convenience in order
> for some activity to take place; an orator would prepare a stage in
> order for her speak to be successful, otherwise many people might not
> be able to hear, or see her. Git porcelain is using the staging area
> precisely as a special area to be separated from the working directory
> for convenience.
> 

I'm not a native speaker, but in my limited understanding, 'staging' in 
computer jargon is the process of preparing data for a production system (i.e. 
until the 'stage' or 'state' of the data is ready for production). It has 
nothing to do with the 'stage' in a theater. I've never heard the term 
'staging' beeing used for source code or software (that would be 
'integration'). I've also never heard 'staging' for moving data back from a 
production system to some work- or development area.

In any sense, 'staging' is a unidirectional process (even in a theater). If I 
think of the index as a staging area, it covers just a single use case: 
preparing new commits. With this world view, it is completely counter-intuitive 
that e.g. changing branches overwrites my staging area.

IMO, it is ok to use 'like a staging area' when we talk about using the index 
to prepare new commits. However, its not ok to use 'staging area' as a general 
synonym for the index.

Just my 2 cents
Karsten
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-18 Thread Max Horn

On 17.10.2013, at 21:50, Junio C Hamano  wrote:

> Felipe Contreras  writes:

[...]

> 
> However, since you asked, I would share a couple of comments
> regarding the index, cache and staging area.
> 
> (1) "Staging area".
> 
> The phrase "staging area" is not an everyday phrase or common CS
> lingo, and that unfortunately makes it a suboptimal choice of words
> especially to those of us, to whom a large portion of their exposure
> to the English language is through the command words we use when we
> talk to our computers.

Interestingly, as a non-native speaker, I draw the exact reverse conclusion 
from this: While I had no idea what a "staging area" or "to stage" was (I did 
know the "stage" in a theater, though), I found this to not be a major problem: 
Using a dictionary and reading up on what it means in git made it clearly 
quickly enough.

To the contrary, the fact that the term was not yet overloaded with conflicting 
other meanings made it easier for me to attach the semantics git associates 
with it.

In contrast, "index" was exceedingly bad and confusing to me... I already had 
various notions of what an "index" is (e.g. the index of a book -- the same 
word actually exists in German; or more generally an index in computer science, 
as a kind of loopup table, etc.), and to this day, have a hard time 
consolidating this with the way git uses it. For me, it is yet another, seeming 
completely unrelated, meaning of the word "index" I had to memorize. Hey, just 
take a look at Wiki page  for the many 
dozens of meanings associated to this word. Ugh. And worst of it, I am actually 
not quite sure on which of the meanings listed there "the index" as used by git 
is based... I.e. I don't even see a helpful analogy that would make it easier 
to understand the choice of name. 

In summary: For me as a non-native speaker, "index" feels like about the worst 
possible choice (well, you could have called it the "file" or "thing", that 
might have been worse ;-). While staging area turned out to be surprisingly 
good, *precisely* because I was unfamiliar with it. 

So, while "staging area" might not be perfect, it seems good enough to me. If 
this matter had indeed been discussed here for years, and no better suggestions 
has come up, then perhaps it is time to end the search for the (possibly 
non-existent) perfect solution, and instead do the pragmatic?


> The index can also be thought of "like the buffer cache, where new
> contents of files are kept before they are committed to disk
> platter."  At least, non-native speaker with CS background would
> understand that, much better than "the index" (no, I am not saying
> that we should call it "the cache"; I am just saying "the index" is
> not a good word, but we may need to find a better word than "the
> staging area").

Huh? As a non-native speaker with CS background, this actually leaves me more 
confused than I was before. I think about "the staging area", and I don't see 
how this is anything like an "index" (in any of the meaning I see on 
). I can vaguely recognize a faint 
similarity to a "cache", and yet more relation to a "buffer", but in the end, 
none of these strike me as particularly illustrative.

For that matter, I never really understood of why I had to do "git diff 
--cached", I simply learned it by rote. 

On the other hand, I feel that after understanding what the staging area is, 
then writing "git diff --staged" is very logical and simple to memorize.




Cheers,
Max


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-18 Thread Felipe Contreras
On Fri, Oct 18, 2013 at 4:46 AM, Matthieu Moy
 wrote:
> I'm lacking time to read and answer in detail, sorry.
>
> Junio C Hamano  writes:
>
>> "It must be done" is different from "any change is good, as long as
>> it introduces more instances of word 'stage'".
>
> I agree. Something must be done, at least to remove the cache Vs index
> confusion. I'm not sure exactly what's best, and we should agree where
> to go before going there.

I thought we already agreed "staging area" is the best term. Some
people don't, but that's expected.

>> The phrase "staging area" is not an everyday phrase or common CS
>> lingo, and that unfortunately makes it a suboptimal choice of words
>> especially to those of us, to whom a large portion of their exposure
>> to the English language is through the command words we use when we
>> talk to our computers.
>
> I do not think being understandable immediately by non-native is so
> important actually. To me as a french, "commit" makes no sense as an
> english word to describe what "git commit" does, but it's OK as I never
> really translate it. Even fr.po translates "a commit" by "un commit".

Indeed. Let's hope this red herring is not brought again.

> That said, having something that immediately makes sense to a non-native
> is obviously a good point.

Most non-native speakers, as most native speakers, already agreed the
term "staging area" is best.

> Another proposal which I liked BTW was to use the word "precommit".
> Short, and easily understood as the place where the next commit is
> prepared.

And that proposal has been argued against already[1][2].

To summarize:

1) It's not even an English word
2) Unlike "staging area", it's not widely used in external documentation already
3) There's no sensible verb: "to precommit"?

Moreover, in my mind a true precommit would have author, committer,
date; all the things you expect in a commit, except that it's not
permanent. A natural command that would derive from this concept is
'git commit --prepare', which would create an actual precommit.

But we are not looking to introduce yet another concept, we are
looking for a name of a concept we already have, and the majority of
users have already given it a name; the staging area.

[1] http://article.gmane.org/gmane.comp.version-control.git/197215
[2] http://article.gmane.org/gmane.comp.version-control.git/168201
-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-18 Thread John Szakmeister
On Fri, Oct 18, 2013 at 5:46 AM, Matthieu Moy
 wrote:
> I'm lacking time to read and answer in detail, sorry.
>
> Junio C Hamano  writes:
>
>> "It must be done" is different from "any change is good, as long as
>> it introduces more instances of word 'stage'".
>
> I agree. Something must be done, at least to remove the cache Vs index
> confusion. I'm not sure exactly what's best, and we should agree where
> to go before going there. The previous attempts to introduce more
> "stage" in Git's command-line (e.g. the "git stage" alias) introduced
> more confusion than anything else.

I definitely agree about removing the cache vs. index confusion.  I'm
curious about the confusions surrounding this "git stage" alias.  Was
it simply an implementation issue, or was it an issue surrounding the
name?

FWIW, I've trained my employees to think of it as a staging area as
well.  At least in English, it seems to be the best understood analogy
to the index's purpose.

>> The phrase "staging area" is not an everyday phrase or common CS
>> lingo, and that unfortunately makes it a suboptimal choice of words
>> especially to those of us, to whom a large portion of their exposure
>> to the English language is through the command words we use when we
>> talk to our computers.
>
> I do not think being understandable immediately by non-native is so
> important actually. To me as a french, "commit" makes no sense as an
> english word to describe what "git commit" does, but it's OK as I never
> really translate it. Even fr.po translates "a commit" by "un commit".
>
> That said, having something that immediately makes sense to a non-native
> is obviously a good point.
>
> Another proposal which I liked BTW was to use the word "precommit".
> Short, and easily understood as the place where the next commit is
> prepared.

I'm not sure what concept "precommit" invokes, but it's certainly not
where the next commit is prepared.  Two thoughts come to mind: the
precommit hook, and "what is a pre-commit?"  How would you talk about
preparing for a commit?  Do you "precommit a file?"  "Add the file to
the precommit?"  I'm just curious.

Thanks!

-John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-18 Thread Matthieu Moy
I'm lacking time to read and answer in detail, sorry.

Junio C Hamano  writes:

> "It must be done" is different from "any change is good, as long as
> it introduces more instances of word 'stage'".

I agree. Something must be done, at least to remove the cache Vs index
confusion. I'm not sure exactly what's best, and we should agree where
to go before going there. The previous attempts to introduce more
"stage" in Git's command-line (e.g. the "git stage" alias) introduced
more confusion than anything else.

> The phrase "staging area" is not an everyday phrase or common CS
> lingo, and that unfortunately makes it a suboptimal choice of words
> especially to those of us, to whom a large portion of their exposure
> to the English language is through the command words we use when we
> talk to our computers.

I do not think being understandable immediately by non-native is so
important actually. To me as a french, "commit" makes no sense as an
english word to describe what "git commit" does, but it's OK as I never
really translate it. Even fr.po translates "a commit" by "un commit".

That said, having something that immediately makes sense to a non-native
is obviously a good point.

Another proposal which I liked BTW was to use the word "precommit".
Short, and easily understood as the place where the next commit is
prepared.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-17 Thread Felipe Contreras
Junio C Hamano wrote:
> Felipe Contreras  writes:
> 
> > Junio, can you make an exception and reply to this thread? The change
> > to move away from the term "the index" has been suggested many times
> > since many years ago, it is an extremely important change to users,
> > and all the Git developers agree it must be done.
> 
> "It must be done" is different from "any change is good, as long as
> it introduces more instances of word 'stage'". As we established, we
> do not seem to be able to do a sensible design discussion with you
> without wasting time for nothing, I won't directly comment on that
> patch series, at least for now.
> 
> However, since you asked, I would share a couple of comments
> regarding the index, cache and staging area.
> 
> (1) "Staging area".
> 
> The phrase "staging area" is not an everyday phrase or common CS
> lingo, and that unfortunately makes it a suboptimal choice of words
> especially to those of us, to whom a large portion of their exposure
> to the English language is through the command words we use when we
> talk to our computers.

That's because Git is the only command tool that has such a concept.

> I personally do not mind explaining the index is "like a staging
> area, where an army piles up ammunition to a temporary dump before
> starting a major campaign." to native speakers, though ;-).

If you agree that explaining to users "the index is like a staging area", then
why just not call it the staging area?

Moreover, a staging area is not just a temporary dump, it is used in
preparation for something specific, and you might need to remove certain
weapons in place of better ones suited for the mission, that's why the word
"staging" is used.

> The index can also be thought of "like the buffer cache, where new
> contents of files are kept before they are committed to disk
> platter."

A buffer and a cache are two very different things used for two very different
purposes, and the term cache doesn't apply to "the index".

> At least, non-native speaker with CS background would understand that, much
> better than "the index" (no, I am not saying that we should call it "the
> cache"; I am just saying "the index" is not a good word, but we may need to
> find a better word than "the staging area").

All right, so that's progress; you do accept "the index" is not a good term.
Now, if you don't think "staging area" is a good term, do you have any that is
better?

This has been discussed for several years, and nobody has come up with a better
term, in fact, the vast majority of people prefer the term "staging area", and
it is already used in online documentation, including the ProGit book. It seems
inside and outside the Git project, the term has already been chosen.

Do you honestly think somebody is just suddenly going to come up with a better
term? How long do we have to wait before we decide X is the best term we could
come up with? One year? Two years? Ten years? Or do you just want to wait until
we have the "perfect" term, which might be never.

> The noun phrase "staging area" and the verb "to stage" are good
> (especially when we do not worry too much about us foreigners), but
> we need to make sure "stage" is not mistaken as a noun when used in
> a context that talks about the index as a whole, to avoid confusion
> with the use of the word "stage" long established since
> 
> http://thread.gmane.org/gmane.comp.version-control.git/231/focus=286
> 
> to call "ours" stage#2, etc.

Lets assume that "staging area" is not the best option, even though after
years of discussion nobody has come up with a better one. How would a different
term solve the problem you state above? If we use the term "commit draft", we
still could have people saying "draft # 2". So is there any term that would
avoid this problem, and is it really important to worry about such a marginal
problem?

Regardless of the term used, we can make sure it's not used in that context, so
I don't understand how that argument goes against "staging area". We can make
sure "staging area" is not used to denote the different "index" files.

Is this the *only* argument you have against the term "staging area"?

> (2) "cached" vs "index".
> 
> I think this is the more major issue between the two issues (the
> other one being "why do we force people to say 'index'?").  Some
> commands take "--cached", some others take "--index", some take
> both.  What these two mean are documented in gitcli manual page, but
> that is far from sufficient.  The users can get confused by this UI
> mistake in different ways.
> 
>  * We do need to have "git apply" that mimics "patch" (i.e. works
>only to a working tree files, or even outside Git repository)
>without any option, "git apply --mode1" that only applies the
>change to the index, and "git apply --mode2" that applies the
>change to both the index and the working tree. No renaming of
>"the index" does not change this need to have three different
>mode

Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-17 Thread Junio C Hamano
Felipe Contreras  writes:

> Junio, can you make an exception and reply to this thread? The change
> to move away from the term "the index" has been suggested many times
> since many years ago, it is an extremely important change to users,
> and all the Git developers agree it must be done.

"It must be done" is different from "any change is good, as long as
it introduces more instances of word 'stage'". As we established, we
do not seem to be able to do a sensible design discussion with you
without wasting time for nothing, I won't directly comment on that
patch series, at least for now.

However, since you asked, I would share a couple of comments
regarding the index, cache and staging area.

(1) "Staging area".

The phrase "staging area" is not an everyday phrase or common CS
lingo, and that unfortunately makes it a suboptimal choice of words
especially to those of us, to whom a large portion of their exposure
to the English language is through the command words we use when we
talk to our computers.

I personally do not mind explaining the index is "like a staging
area, where an army piles up ammunition to a temporary dump before
starting a major campaign." to native speakers, though ;-).

The index can also be thought of "like the buffer cache, where new
contents of files are kept before they are committed to disk
platter."  At least, non-native speaker with CS background would
understand that, much better than "the index" (no, I am not saying
that we should call it "the cache"; I am just saying "the index" is
not a good word, but we may need to find a better word than "the
staging area").

The noun phrase "staging area" and the verb "to stage" are good
(especially when we do not worry too much about us foreigners), but
we need to make sure "stage" is not mistaken as a noun when used in
a context that talks about the index as a whole, to avoid confusion
with the use of the word "stage" long established since

http://thread.gmane.org/gmane.comp.version-control.git/231/focus=286

to call "ours" stage#2, etc.


(2) "cached" vs "index".

I think this is the more major issue between the two issues (the
other one being "why do we force people to say 'index'?").  Some
commands take "--cached", some others take "--index", some take
both.  What these two mean are documented in gitcli manual page, but
that is far from sufficient.  The users can get confused by this UI
mistake in different ways.

 * We do need to have "git apply" that mimics "patch" (i.e. works
   only to a working tree files, or even outside Git repository)
   without any option, "git apply --mode1" that only applies the
   change to the index, and "git apply --mode2" that applies the
   change to both the index and the working tree. No renaming of
   "the index" does not change this need to have three different
   mode of operation.

   It was a major UI mistake to call one of the modes "--cached" and
   another "--index", because there is no way, other than rote
   learning, for people to choose the one between the two depending
   that is right for the situation they face.

   If "--cached" were called "--index-only", it might have been a
   lot more learnable (and then "--index" could be renamed to
   "--index-and-working-tree" at the same time to reduce the
   confusion further).  Alternatively, with the synonym "--staged"
   for "--cached" already in place for "git diff", we could
   introduce "--staged-and-working-tree" as a synonym for "--index"
   to achieve the same effect (of course we need to find a way to
   shorten "-and-working-tree" part in a sensible way).

 * "git grep" barfs when given "--index", even though it does accept
   "--cached" and searches the patterns in contents that are in the
   index. This is technically correct, as the command does not
   search both in the index and in the working tree, but again,
   there is no way other than rote learning for users to tell that
   "--cached" is the correct one to use, even after they know that
   they want to search in the index (I already called it a major UI
   mistake, didn't I?).

   A new synonym "--staged" for "--cached" may be able to alleviate
   the confusion somewhat, given especially that "git diff" already
   knows "--staged" as a synonym for "--cached".  I think a better
   end result will come if we taught "git grep --index" to actually
   search the patterns both in the index and in the working tree at
   the same time.  There is no logical reason from the end user's
   point of view that "git grep --index" (aka "git grep
   --staged-and-working-tree") needs to fail; if we make the
   "--mode2" to mean to work on both the index and the working tree
   for any Git command when it makes sense, things will be more
   consistent, and it certainly makes sense to ask "git grep" to
   work on both the index and the working tree.  We do allow "git
   grep -e pattern tree-ish-1 tree-ish-2" to search in multiple data
   sources already, so it can be seen as a logica

Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-14 Thread Felipe Contreras
On Mon, Oct 14, 2013 at 5:29 PM, Felipe Contreras
 wrote:
> tl;dr: everyone except Junio C Hamano and Drew Northup agrees; we should move
> away from the name "the index".

Junio, can you make an exception and reply to this thread? The change
to move away from the term "the index" has been suggested many times
since many years ago, it is an extremely important change to users,
and all the Git developers agree it must be done.

Virtually everyone has agreed already that the term "staging area" is
the best option and this patch series is a good first step. Other than
the --work patches, this series could easily be merged to the 'pu'
branch. Yet not only is this series not there, but you haven't said
what needs to be done to get there.

It has been more than a month that I demonstrated to you that virtual
nobody has any problems with moving away from term "the index"[1][2],
and yet you haven't even responded.

I'm not even asking about this series, all I want to know is if any
change that tries to move away from the term "the index" towards
"staging area" would ever be considered for inclusion. Yes or no.

All I want is a simple answer to a simple question. Is that too much to ask?

[1] http://article.gmane.org/gmane.comp.version-control.git/233469
[2] http://article.gmane.org/gmane.comp.version-control.git/233468

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-14 Thread Felipe Contreras
tl;dr: everyone except Junio C Hamano and Drew Northup agrees; we should move
away from the name "the index".

It has been discussed many times in the past that 'index' is not an
appropriate description for what the high-level user does with it, and
it has been agreed that 'staging area' is the best term.

The term 'staging area' is more intuitive for newcomers which are more
familiar with English than with Git, and it seems to be a
straightforward mental notion for people with different mother tongues.

In fact it is so intuitive that it's used already in a lot online
documentation, and the people that do teach Git professionally use this
term, because it's easier for many kinds of audiences to grasp.

The meaning of the words 'cache' and 'index' doesn't represent correctly
the mental model of the high-level user:

cache: a 'cache' is a place for easier access; a squirrel caches nuts
so it doesn't have to go looking for them in the future when it might
be much more difficult. Git porcelain is not using the staging area
for easier future access; it's not a cache.

index: an 'index' is a guide of pointers to something else; a book
index has a list of entries so the reader can locate information
easily without having to go through the whole book. Git porcelain is
not using the staging area to find out entries quicker; it's not an
index.

stage: a 'stage' is a special area designated for convenience in order
for some activity to take place; an orator would prepare a stage in
order for her speak to be successful, otherwise many people might not
be able to hear, or see her. Git porcelain is using the staging area
precisely as a special area to be separated from the working directory
for convenience.

The term 'stage' is a good noun itself, but also 'staging area', it
has a good verb; 'to stage', and a nice past-participle; 'staged'.

The first step in moving Git towards this term, is first to add --stage
options for every command that uses --index or --cache. However, there's
a problem with the 'git apply' command, because it treats --index and
--cache differently. Different solutions were proposed, including a
special --stage-only option, however, I think the best solution is a
--[no-]work option to specify if the working directory should be touched
or not, so --index becomes --staged, and --cached becomes --staged
--no-work.

In addition, the 'git stage' command can be extended so the staging area
can be brought closer to the user, like other important Git concepts,
like 'git branch, 'git tag', and 'git remote'. For example, the command
'git stage edit' (which allows the user to edit directly the diff from
HEAD to the staging area) can have a home, where previously there was no
place. It would become natural then to do 'git stage diff', and then
'git stage edit' (to edit the previous diff).

After adding the new --stage options and making sure no functionality is
lost, they can become the recommended ones in the documentation,
eventually, the old ones get deprecated, and eventually obsoleted.

Also, the documentation would need to be updated to replace many
instances of 'the index', with 'the staging area' in porcelain commands.

Moreover, the --stage and --work options also make sense for 'git
reset', and after these options are added, the complicated table to
explain the different behaviors between --soft, --mixed, and --hard
becomes so simple it's not needed any more:

  working stage HEAD target working stage HEAD
  
   A   B CD --no-stage  A   B D
--stage A   D D
--work  D   D D

  working stage HEAD target working stage HEAD
  
   A   B CC --no-stage  A   B C
--stage A   C C
--work  C   C C

  working stage HEAD target working stage HEAD
  
   B   B CD --no-stage  B   B D
--stage B   D D
--work  D   D D

  working stage HEAD target working stage HEAD
  
   B   B CC --no-stage  B   B C
--stage B   C C
--work  C   C C

  working stage HEAD target working stage HEAD
  
   B   C CD --no-stage  B   C D
--stage B   D D
--work  D   D D

  working stage HEAD target working