Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-12 Thread Torsten Bögershausen
On Fri, Nov 10, 2017 at 09:22:10AM +0900, Junio C Hamano wrote:

> 
> Taking these altogether, perhaps
> 
> Apply the "clean" process freshly to all tracked files to
> forcibly add them again to the index.  This is useful after
> changing `core.autocrlf` configuration or the `text` attribute
> in order to correct files added with wrong CRLF/LF line endings.
> This option implies `-u`.
> 
> Thanks.

OK with the text - I can send a new version the next days.


Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-09 Thread Junio C Hamano
Torsten Bögershausen  writes:

> Here is a somwhat shorter description:
>
> Apply the "clean" process freshly to all tracked files.
> This is useful after changing `core.autocrlf` or the `text`
> attributes in the `.gitattributes` file because
> Git may not consider these files as changed.

I think it is OK to omit .git/config for brevity (I am assuming that
your justification is because you thought it was obvious it is a
configuration variable); but then it is equally obvious (if not
more) that `text` attribute comes from .gitattributes (notice we do
not mention core.autocrlf is a configuration variable in the above,
but we do say `text` is an attribute) so it can also be omitted for
brevity.

> Correct the files that had been commited with CRLF,
> they will from now on have LF instead.

Reading this as a single sentence immediately after the above
paragraph leaves me feel confused.  First of all, this would not
happen unless the user corrects core.autocrlf/text like described
above.  In fact, updating these settings is done as in order to do
that correction.  So I'd say it should not be split.

> Re-run what the `clean` filter does.

This again looks out of place just like the previous sentence.  In
fact, provided if "the clean process" is understood by the end user,
this is redundant.

> This option implies `-u`.

Taking these altogether, perhaps

Apply the "clean" process freshly to all tracked files to
forcibly add them again to the index.  This is useful after
changing `core.autocrlf` configuration or the `text` attribute
in order to correct files added with wrong CRLF/LF line endings.
This option implies `-u`.

Thanks.


Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-09 Thread Torsten Bögershausen
[]
> 
> If we had such a term in Documentation/glossary-contents.txt, we
> could even say
> 
>   Add contents of all paths to the index by freshly applying
>   the "clean" process, even to the ones Git may think are
>   unmodified in the working tree since they were added the
>   last time (based on the file timestamps etc.).  This is
>   often useful after updating settings like `core.autocrlf` in
>   the `.git/config` file and the `text` attributes in the
>   `.gitattributes` file to correct the index entries that
>   records lines with CRLF to use LF instead, or changing what
>   the `clean` filter does.  This option implies `-u`.
> 
> The point is to express that the CRLF/LF is a consequence (even
> though it may be the most prominent one from end-users' point of
> view) of a larger processing.

Here is a somwhat shorter description:

Apply the "clean" process freshly to all tracked files.
This is useful after changing `core.autocrlf` or the `text`
attributes in the `.gitattributes` file because
Git may not consider these files as changed.
Correct the files that had been commited with CRLF,
they will from now on have LF instead.
Re-run what the `clean` filter does.
This option implies `-u`.


> 
> > [snip the TC. Adding line endings is good)
> 
> What is TC in this context?

Sorry for confusion: TC means test case.


Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-07 Thread Junio C Hamano
Torsten Bögershausen  writes:

>> > +--renormalize::
>> > +  Normalizes the line endings from CRLF to LF of tracked files.
>> > +  This applies to files which are either "text" or "text=auto"
>> > +  in .gitattributes (or core.autocrlf is true or input)
>> > +  --renormalize implies -u
>> > +
>> 
>> OK.
>
> I think the fact, that clean filters are re-run, and re-evaluated
> in case they are changed, should be made more clear here.
> I don't know how to explain it better that CRLF conversion and/or filters are
> re-applied, this is an attempt:
>
>
> --renormalize::
>   Normalizes the line endings from CRLF to LF of tracked files,
>   if the .gitattributes or core.autocrlf say so.
>   Additionally the clean and ident filters, if any, are re-run.
>   --renormalize implies -u

That is certainly better.  Do we have an end-user facing phrase to
collectively call everything the "convert_to_git()" processing does?
When I talk casually about it, I'd call it the "clean" process (as
opposed to the "smudge" process) as a term that includes all the
things that Git does to massage contents in the working tree to
in-repository representation.

If we had such a term in Documentation/glossary-contents.txt, we
could even say

Add contents of all paths to the index by freshly applying
the "clean" process, even to the ones Git may think are
unmodified in the working tree since they were added the
last time (based on the file timestamps etc.).  This is
often useful after updating settings like `core.autocrlf` in
the `.git/config` file and the `text` attributes in the
`.gitattributes` file to correct the index entries that
records lines with CRLF to use LF instead, or changing what
the `clean` filter does.  This option implies `-u`.

The point is to express that the CRLF/LF is a consequence (even
though it may be the most prominent one from end-users' point of
view) of a larger processing.

> [snip the TC. Adding line endings is good)

What is TC in this context?


Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-07 Thread Torsten Bögershausen
[snip]
> 
> > @@ -175,6 +175,12 @@ for "git add --no-all ...", i.e. ignored 
> > removed files.
> > warning (e.g., if you are manually performing operations on
> > submodules).
> >  
> > +--renormalize::
> > +   Normalizes the line endings from CRLF to LF of tracked files.
> > +   This applies to files which are either "text" or "text=auto"
> > +   in .gitattributes (or core.autocrlf is true or input)
> > +   --renormalize implies -u
> > +
> 
> OK.

I think the fact, that clean filters are re-run, and re-evaluated
in case they are changed, should be made more clear here.
I don't know how to explain it better that CRLF conversion and/or filters are
re-applied, this is an attempt:


--renormalize::
Normalizes the line endings from CRLF to LF of tracked files,
if the .gitattributes or core.autocrlf say so.
Additionally the clean and ident filters, if any, are re-run.
--renormalize implies -u





> 
> > +static int renormalize_tracked_files(const struct pathspec *pathspec, int 
> > flags)
> > +{
> > +   int i, retval = 0;
> > +
> > +   for (i = 0; i < active_nr; i++) {
> > +   struct cache_entry *ce = active_cache[i];
> > +
> > +   if (ce_stage(ce))
> > +   continue; /* do not touch unmerged paths */
> > +   if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> > +   continue; /* do not touch non blobs */
> > +   if (pathspec && !ce_path_match(ce, pathspec, NULL))
> > +   continue;
> > +   retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);
> 
> We are removing the entry and then adding the same entry under the
> same name, and iteration over the active_cache[] from 0 through
> active_nr should be safe, I guess.
> 
> > ...
> > -   add_new_files = !take_worktree_changes && !refresh_only;
> > +   add_new_files = !take_worktree_changes && !refresh_only && 
> > !add_renormalize;
> 
> If renormalize is given, we will *not* take new files, good.
> 
> > @@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char 
> > *prefix)
> >  
> > plug_bulk_checkin();
> >  
> > -   exit_status |= add_files_to_cache(prefix, , flags);
> > +   if (add_renormalize)
> > +   exit_status |= renormalize_tracked_files(, flags);
> > +   else
> > +   exit_status |= add_files_to_cache(prefix, , flags);
> >  
> > if (add_new_files)
> > exit_status |= add_files(, flags);
> 
> OK.
> 
> > ...
> > int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
> >   (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> > +   int newflags = HASH_WRITE_OBJECT;
> > +
> > +   if (flags & HASH_RENORMALIZE)
> > +   newflags |= HASH_RENORMALIZE;
> > ...
> > @@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const 
> > char *path, struct stat *st,
> > if (ignore_case) {
> > adjust_dirname_case(istate, ce->name);
> > }
> > +   if (!(flags & HASH_RENORMALIZE)) {
> > +   alias = index_file_exists(istate, ce->name,
> > + ce_namelen(ce), ignore_case);
> > +   if (alias &&
> > +   !ce_stage(alias) &&
> > +   !ie_match_stat(istate, alias, st, ce_option)) {
> > +   /* Nothing changed, really */
> > +   if (!S_ISGITLINK(alias->ce_mode))
> > +   ce_mark_uptodate(alias);
> > +   alias->ce_flags |= CE_ADDED;
> 
> OK, so RENORMALIZE option forces the code to skip the "does the path
> exist already?  maybe we can do without adding it?" check.
> 
> > if (!intent_only) {
> > -   if (index_path(>oid, path, st, HASH_WRITE_OBJECT)) {
> > +   if (index_path(>oid, path, st, newflags)) {
> 
> And then we do hash it.  We later do add_index_entry() on this thing
> and we have OK_TO_REPLACE bit in the add_option, so we are good to go.
> 
> > diff --git a/sha1_file.c b/sha1_file.c
> > index 10c3a0083d..15abb184c2 100644
> > --- a/sha1_file.c
> > +++ b/sha1_file.c
> > @@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const 
> > unsigned char *sha1)
> > return NULL;
> >  }
> >  
> > +
> > +static enum safe_crlf get_safe_crlf(unsigned flags)
> > +{
> > +   if (flags & HASH_RENORMALIZE)
> > +   return SAFE_CRLF_RENORMALIZE;
> > +   else if (flags & HASH_WRITE_OBJECT)
> > +   return safe_crlf;
> > +   else
> > +   return SAFE_CRLF_FALSE;
> > +}
> > +
> > +
> >  int mkdir_in_gitdir(const char *path)
> >  {
> > if (mkdir(path, 0777)) {
> > @@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, 
> > size_t size,
> > if ((type == OBJ_BLOB) && path) {
> > struct strbuf nbuf = STRBUF_INIT;
> > if (convert_to_git(_index, path, buf, size, ,
> > -  write_object ? safe_crlf : SAFE_CRLF_FALSE)) 
> > {
> > +  get_safe_crlf(flags))) {
> > 

Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-06 Thread Junio C Hamano
tbo...@web.de writes:

> From: Torsten Bögershausen 
>
> Make it safer to normalize the line endings in a repository:
> Files that had been commited with CRLF will be commited with LF.
> ...
> Helped-By: Junio C Hamano 
> Signed-off-by: Torsten Bögershausen 
> ---

Nicely explained.

> @@ -175,6 +175,12 @@ for "git add --no-all ...", i.e. ignored 
> removed files.
>   warning (e.g., if you are manually performing operations on
>   submodules).
>  
> +--renormalize::
> + Normalizes the line endings from CRLF to LF of tracked files.
> + This applies to files which are either "text" or "text=auto"
> + in .gitattributes (or core.autocrlf is true or input)
> + --renormalize implies -u
> +

OK.

> +static int renormalize_tracked_files(const struct pathspec *pathspec, int 
> flags)
> +{
> + int i, retval = 0;
> +
> + for (i = 0; i < active_nr; i++) {
> + struct cache_entry *ce = active_cache[i];
> +
> + if (ce_stage(ce))
> + continue; /* do not touch unmerged paths */
> + if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> + continue; /* do not touch non blobs */
> + if (pathspec && !ce_path_match(ce, pathspec, NULL))
> + continue;
> + retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);

We are removing the entry and then adding the same entry under the
same name, and iteration over the active_cache[] from 0 through
active_nr should be safe, I guess.

> ...
> - add_new_files = !take_worktree_changes && !refresh_only;
> + add_new_files = !take_worktree_changes && !refresh_only && 
> !add_renormalize;

If renormalize is given, we will *not* take new files, good.

> @@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char 
> *prefix)
>  
>   plug_bulk_checkin();
>  
> - exit_status |= add_files_to_cache(prefix, , flags);
> + if (add_renormalize)
> + exit_status |= renormalize_tracked_files(, flags);
> + else
> + exit_status |= add_files_to_cache(prefix, , flags);
>  
>   if (add_new_files)
>   exit_status |= add_files(, flags);

OK.

> ...
>   int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
> (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> + int newflags = HASH_WRITE_OBJECT;
> +
> + if (flags & HASH_RENORMALIZE)
> + newflags |= HASH_RENORMALIZE;
> ...
> @@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const char 
> *path, struct stat *st,
>   if (ignore_case) {
>   adjust_dirname_case(istate, ce->name);
>   }
> + if (!(flags & HASH_RENORMALIZE)) {
> + alias = index_file_exists(istate, ce->name,
> +   ce_namelen(ce), ignore_case);
> + if (alias &&
> + !ce_stage(alias) &&
> + !ie_match_stat(istate, alias, st, ce_option)) {
> + /* Nothing changed, really */
> + if (!S_ISGITLINK(alias->ce_mode))
> + ce_mark_uptodate(alias);
> + alias->ce_flags |= CE_ADDED;

OK, so RENORMALIZE option forces the code to skip the "does the path
exist already?  maybe we can do without adding it?" check.

>   if (!intent_only) {
> - if (index_path(>oid, path, st, HASH_WRITE_OBJECT)) {
> + if (index_path(>oid, path, st, newflags)) {

And then we do hash it.  We later do add_index_entry() on this thing
and we have OK_TO_REPLACE bit in the add_option, so we are good to go.

> diff --git a/sha1_file.c b/sha1_file.c
> index 10c3a0083d..15abb184c2 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const 
> unsigned char *sha1)
>   return NULL;
>  }
>  
> +
> +static enum safe_crlf get_safe_crlf(unsigned flags)
> +{
> + if (flags & HASH_RENORMALIZE)
> + return SAFE_CRLF_RENORMALIZE;
> + else if (flags & HASH_WRITE_OBJECT)
> + return safe_crlf;
> + else
> + return SAFE_CRLF_FALSE;
> +}
> +
> +
>  int mkdir_in_gitdir(const char *path)
>  {
>   if (mkdir(path, 0777)) {
> @@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, 
> size_t size,
>   if ((type == OBJ_BLOB) && path) {
>   struct strbuf nbuf = STRBUF_INIT;
>   if (convert_to_git(_index, path, buf, size, ,
> -write_object ? safe_crlf : SAFE_CRLF_FALSE)) 
> {
> +get_safe_crlf(flags))) {
>   buf = strbuf_detach(, );
>   re_allocated = 1;
>   }
> @@ -1714,7 +1726,7 @@ static int index_stream_convert_blob(unsigned char 
> *sha1, int fd,
>   assert(would_convert_to_git_filter_fd(path));
>  
>