Re: [PATCH v4 08/10] dir: simplify untracked cache "ident" field

2015-12-30 Thread Christian Couder
On Wed, Dec 30, 2015 at 12:47 PM, Torsten Bögershausen  wrote:
> On 2015-12-29 08.09, Christian Couder wrote:
>> It is not a good idea to compare kernel versions and disable
>> the untracked cache if it changes as people may upgrade and
>> still want the untracked cache to work. So let's just
>> compare work tree locations to decide if we should disable
>> it.
> OK with that.
>>
>> Also it's not useful to store many locations in the ident
>> field and compare to any of them. It can even be dangerous
>> if GIT_WORK_TREE is used with different values. So let's
>> just store one location, the location of the current work
>> tree.
> I don't think I fully agree in all details.

Suppose I use "git update-index --untracked-cache" first with
GIT_WORK_TREE=/foo and then with GIT_WORK_TREE=/bar. It will store
both /foo and /bar as valid locations for the worktree in the "ident"
field.

Then I work a bit with GIT_WORK_TREE=/foo and then a bit with
GIT_WORK_TREE=/bar. Now does the untracked cache info in the index
reflect the state of /foo or the state of /bar?

With a config variable, the problem is worse, because, as we
automatically add the untracked cache when git status is used, we are
even less likely to pay attention to GIT_WORK_TREE when the cache is
added.

> The list is there to distinguish between different clients when sharing
> a Git workspace over a network to different clients:
>
> A file system is exported from Linux to Linux via NFS,
> including a Git workspace)
> The same Workspace is exported via SAMBA to Windows and, in an extreme case,
> AFS to Mac OS.
>
> Other combinations could be
> Linux - SAMBA - Linux
> Linux - AFP - Linux
>
> Mac OS - NFS - Linux
> Mac OS - AFP - Mac OS
> Mac OS - SMB - Linux, Mac OS, Windows
> The list is incomplete (BSD and other Unix is missing),
> there are other combinations of network protocols,
> there are NAS boxes which mostly Linux under the hood, but
> may have vendor specific tweaks.
>
> Now we want to know, which of the combinations work.

In my opinion at this point it is a bit insane to want untracked cache
to work when it is accessed by different clients over the network. It
is much safer to just disable it (with a config option) in this case.
But I am ok to try to improve things for your use case to work if it
doesn't complicate things too much.

> The different clients need to test separately.
> E.g. for a given server:
>
> NFS - Linux
> SMB - Linux
> SMB Windows.
>
> At this point I would agree that the old code from dir.c:
>
> static const char *get_ident_string(void)
> {
> static struct strbuf sb = STRBUF_INIT;
> struct utsname uts;
>
> if (sb.len)
> return sb.buf;
> if (uname() < 0)
> die_errno(_("failed to get kernel name and information"));
> strbuf_addf(, "Location %s, system %s %s %s", get_git_work_tree(),
> uts.sysname, uts.release, uts.version);
> return sb.buf;
> }
> --
> is unneccessary picky, and the sysname should be enough:
>
> strbuf_addf(, "Location %s, system %s", get_git_work_tree(),
> uts.sysname);
>
> The old code did not find out, which network protocol that we used,
> (you need to call "mount", and grep for the directory, and try to get
> the FS information, which may be ext4, btrfs, smbfs, nfs)
> This is unnecessary complicated, so we endet up in using the path.

I am ok with the change to compare only the location and the system
name. I wonder a bit though if this change is completely backward
compatible.

> If I was a system administrator, (I sometimes am), I would set up a
> bunch of Linux boxes in a similar way, so that the repo is under
> /nfs/server1/projects/myproject
> (and the same path is used for all Linux boxes)
>
> The Windows machines may use something like
> N:/projects/myproject
> or
> //server1/projects/myproject
>
> and Mac OS may use
> /Volumes/projects/myproject
> (If mounted from the finder)
> or
> /nfs/projects/myproject
> (When autofs is used)
>
> I may have missed the discussion somewhat, could you explain why having
> different uname/location combinations are not usable at your site ?

I tried to explain above why using a config option to set the
untracked cache implies that you have to be careful at what happens
when people use GIT_WORK_TREE.

But it should be usable if indeed we put only the system name as you
suggest. So I think I will do that.

> How much burden is actually put on your system, and is there a way to keep a
> list of different clients ?

About a list of different client, instead of just one, I don't think
it is sane given the GIT_WORK_TREE problem.

If you really have many clients that should use the untracked cache
with different paths, then what you might want is a set of config
options like this:

[untrackedCache "windows7"]
enable = true
system = Windows
systemVersion = 7
path = C:/my/work/tree
[untrackedCache 

Re: [PATCH v4 08/10] dir: simplify untracked cache "ident" field

2015-12-30 Thread Christian Couder
On Tue, Dec 29, 2015 at 11:32 PM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> -static int ident_in_untracked(const struct untracked_cache *uc)
>> +static int ident_current_location_in_untracked(const struct untracked_cache 
>> *uc)
>>  {
>> - const char *end = uc->ident.buf + uc->ident.len;
>> - const char *p   = uc->ident.buf;
>> + struct strbuf cur_loc = STRBUF_INIT;
>> + int res = 0;
>>
>> - for (p = uc->ident.buf; p < end; p += strlen(p) + 1)
>> - if (!strcmp(p, get_ident_string()))
>> - return 1;
>> - return 0;
>> + /*
>> +  * Previous git versions may have saved many strings in the
>> +  * "ident" field, but it is insane to manage many locations,
>> +  * so just take care of the first one.
>> +  */
>> +
>> + strbuf_addf(_loc, "Location %s, system ", get_git_work_tree());
>> +
>> + if (starts_with(uc->ident.buf, cur_loc.buf))
>> + res = 1;
>> +
>> + strbuf_release(_loc);
>> +
>> + return res;
>>  }
>
> The ", system " part looks funny.  I think I know what you are
> trying to do, though.
>
> If the path to the working tree has ", system " as a substring,
> I wonder if funny things may happen with this starts_with()?

Yeah, I think in the next iteration I will do what Torsten suggests.
It looks like the only sane solution, other than just not using the
"ident" field anymore and relying only on config variables like the
example I give in my reply to Torsten.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 08/10] dir: simplify untracked cache "ident" field

2015-12-30 Thread Torsten Bögershausen
On 2015-12-29 08.09, Christian Couder wrote:
> It is not a good idea to compare kernel versions and disable
> the untracked cache if it changes as people may upgrade and
> still want the untracked cache to work. So let's just
> compare work tree locations to decide if we should disable
> it.
OK with that.
> 
> Also it's not useful to store many locations in the ident
> field and compare to any of them. It can even be dangerous
> if GIT_WORK_TREE is used with different values. So let's
> just store one location, the location of the current work
> tree.
I don't think I fully agree in all details.
The list is there to distinguish between different clients when sharing
a Git workspace over a network to different clients:

A file system is exported from Linux to Linux via NFS,
including a Git workspace)
The same Workspace is exported via SAMBA to Windows and, in an extreme case,
AFS to Mac OS.

Other combinations could be
Linux - SAMBA - Linux
Linux - AFP - Linux

Mac OS - NFS - Linux
Mac OS - AFP - Mac OS
Mac OS - SMB - Linux, Mac OS, Windows
The list is incomplete (BSD and other Unix is missing),
there are other combinations of network protocols,
there are NAS boxes which mostly Linux under the hood, but
may have vendor specific tweaks.

Now we want to know, which of the combinations work.
The different clients need to test separately.
E.g. for a given server:

NFS - Linux
SMB - Linux
SMB Windows.

At this point I would agree that the old code from dir.c:

static const char *get_ident_string(void)
{
static struct strbuf sb = STRBUF_INIT;
struct utsname uts;

if (sb.len)
return sb.buf;
if (uname() < 0)
die_errno(_("failed to get kernel name and information"));
strbuf_addf(, "Location %s, system %s %s %s", get_git_work_tree(),
uts.sysname, uts.release, uts.version);
return sb.buf;
}
--
is unneccessary picky, and the sysname should be enough:

strbuf_addf(, "Location %s, system %s", get_git_work_tree(),
uts.sysname);


The old code did not find out, which network protocol that we used,
(you need to call "mount", and grep for the directory, and try to get
the FS information, which may be ext4, btrfs, smbfs, nfs)
This is unnecessary complicated, so we endet up in using the path.

If I was a system administrator, (I sometimes am), I would set up a
bunch of Linux boxes in a similar way, so that the repo is under
/nfs/server1/projects/myproject
(and the same path is used for all Linux boxes)

The Windows machines may use something like
N:/projects/myproject
or
//server1/projects/myproject

and Mac OS may use
/Volumes/projects/myproject
(If mounted from the finder)
or
/nfs/projects/myproject
(When autofs is used)


I may have missed the discussion somewhat, could you explain why having
different uname/location combinations are not usable at your site ?

How much burden is actually put on your system, and is there a way to keep a
list of different clients ?


What I understand is that a kernel update of a Linux machine shouldn't matter,
but if a machine runs Windows or Linux should matter.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 08/10] dir: simplify untracked cache "ident" field

2015-12-29 Thread Junio C Hamano
Christian Couder  writes:

> -static int ident_in_untracked(const struct untracked_cache *uc)
> +static int ident_current_location_in_untracked(const struct untracked_cache 
> *uc)
>  {
> - const char *end = uc->ident.buf + uc->ident.len;
> - const char *p   = uc->ident.buf;
> + struct strbuf cur_loc = STRBUF_INIT;
> + int res = 0;
>  
> - for (p = uc->ident.buf; p < end; p += strlen(p) + 1)
> - if (!strcmp(p, get_ident_string()))
> - return 1;
> - return 0;
> + /*
> +  * Previous git versions may have saved many strings in the
> +  * "ident" field, but it is insane to manage many locations,
> +  * so just take care of the first one.
> +  */
> +
> + strbuf_addf(_loc, "Location %s, system ", get_git_work_tree());
> +
> + if (starts_with(uc->ident.buf, cur_loc.buf))
> + res = 1;
> +
> + strbuf_release(_loc);
> +
> + return res;
>  }

The ", system " part looks funny.  I think I know what you are
trying to do, though.

If the path to the working tree has ", system " as a substring,
I wonder if funny things may happen with this starts_with()?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 08/10] dir: simplify untracked cache "ident" field

2015-12-28 Thread Christian Couder
It is not a good idea to compare kernel versions and disable
the untracked cache if it changes as people may upgrade and
still want the untracked cache to work. So let's just
compare work tree locations to decide if we should disable
it.

Also it's not useful to store many locations in the ident
field and compare to any of them. It can even be dangerous
if GIT_WORK_TREE is used with different values. So let's
just store one location, the location of the current work
tree.

If this location changed and we still want an untracked
cache, let's delete the cache and recreate it.

Signed-off-by: Christian Couder 
---
 dir.c | 49 ++---
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/dir.c b/dir.c
index dba1ad0..1a8b5a2 100644
--- a/dir.c
+++ b/dir.c
@@ -1918,23 +1918,36 @@ static const char *get_ident_string(void)
return sb.buf;
 }
 
-static int ident_in_untracked(const struct untracked_cache *uc)
+static int ident_current_location_in_untracked(const struct untracked_cache 
*uc)
 {
-   const char *end = uc->ident.buf + uc->ident.len;
-   const char *p   = uc->ident.buf;
+   struct strbuf cur_loc = STRBUF_INIT;
+   int res = 0;
 
-   for (p = uc->ident.buf; p < end; p += strlen(p) + 1)
-   if (!strcmp(p, get_ident_string()))
-   return 1;
-   return 0;
+   /*
+* Previous git versions may have saved many strings in the
+* "ident" field, but it is insane to manage many locations,
+* so just take care of the first one.
+*/
+
+   strbuf_addf(_loc, "Location %s, system ", get_git_work_tree());
+
+   if (starts_with(uc->ident.buf, cur_loc.buf))
+   res = 1;
+
+   strbuf_release(_loc);
+
+   return res;
 }
 
-void add_untracked_ident(struct untracked_cache *uc)
+static void set_untracked_ident(struct untracked_cache *uc)
 {
-   if (ident_in_untracked(uc))
-   return;
+   strbuf_reset(>ident);
strbuf_addstr(>ident, get_ident_string());
-   /* this strbuf contains a list of strings, save NUL too */
+
+   /*
+* This strbuf used to contain a list of NUL separated
+* strings, so save NUL too for backward compatibility.
+*/
strbuf_addch(>ident, 0);
 }
 
@@ -1945,15 +1958,21 @@ static void new_untracked_cache(void)
uc->exclude_per_dir = ".gitignore";
/* should be the same flags used by git-status */
uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+   set_untracked_ident(uc);
the_index.untracked = uc;
+   the_index.cache_changed |= UNTRACKED_CHANGED;
 }
 
 void add_untracked_cache(void)
 {
-   if (!the_index.untracked)
+   if (!the_index.untracked) {
new_untracked_cache();
-   add_untracked_ident(the_index.untracked);
-   the_index.cache_changed |= UNTRACKED_CHANGED;
+   } else {
+   if (!ident_current_location_in_untracked(the_index.untracked)) {
+   free_untracked_cache(the_index.untracked);
+   new_untracked_cache();
+   }
+   }
 }
 
 void remove_untracked_cache(void)
@@ -2021,7 +2040,7 @@ static struct untracked_cache_dir 
*validate_untracked_cache(struct dir_struct *d
if (dir->exclude_list_group[EXC_CMDL].nr)
return NULL;
 
-   if (!ident_in_untracked(dir->untracked)) {
+   if (!ident_current_location_in_untracked(dir->untracked)) {
warning(_("Untracked cache is disabled on this system."));
return NULL;
}
-- 
2.7.0.rc2.10.g544ad6b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html