date:20170711

Re: bug: HEAD vs. head on case-insensitive filesystems

2017-07-11 Thread Kenneth Hsu

> I was going to point you to the recent thread in
> 
>   http://public-inbox.org/git/87ziclb2pa@gmail.com/
> 
> but I see you already participated there. So if your mail here is
> "here's a summary of how HEAD/head don't quite work", then OK, that
> might be handy. But I think the ultimate resolution is not "let's make
> them work", but "let's consistently enforce case-sensitivity in ref
> names, regardless of the underlying filesystem".

Thanks, that makes sense.  Perhaps it's best to just view my original
message as, "here's a new example of where 'head' doesn't work".

Re: Dropping a merge from history -- rebase or filter-branch or ...?

2017-07-11 Thread Andrew Ardill

Hi Martin,

>From the sound of it you really just want to revert the merge of the
pull requests. A really good description of options for this is at
https://git-scm.com/blog/2010/03/02/undoing-merges.html

There is also a section there about bringing the changes back in at a
future date, depending on how you do the revert.

Does that page describe what you're trying to do?

Regards,

Andrew Ardill


On 8 July 2017 at 07:07, Martin Langhoff  wrote:
> Hi git-folk!
>
> long time no see! I'm trying to do one of those "actually, please
> don't" things that turn out to be needed in the field.
>
> I need to open our next "for release" development branch from our
> master, but without a couple of disruptive feature branches, which
> have been merged into master already. We develop in github, so I'll
> call them Pull Requests (PRs) as gh does.
>
> So I'd like to run a filter-branch or git-rebase --interactive
> --preserve-merges that drops some PRs. Problem is, they don't work!
>
> filter-branch --commit-filter is fantastic, and gives me all the
> control I want... except that it will "skip the commit", but still use
> the trees in the later commits, so the code changes brought in by
> those commits I wanted to avoid will be there. I think the docs/help
> that discuss  "skip commit" should have a big warning there!
>
> rebase --interactive --preserve-merges  --keep-empty made a complete
> hash of things. Nonsense conflicts all over on the merge commits; I
> think it re-ran the merge without picking up the conflict resolutions
> we had applied.
>
> The changes we want to avoid are fairly localized -- a specific module
> got refactored in 3 stages. The rest of the history should replay
> cleanly. I don't want to delete the module.
>
> My fallback is a manually constructed revert. While still an option, I
> think it's better to have a clean stat without sizable feature-branch
> reverts.
>
> cheers,
>
>
>
> m
> --
>  martin.langh...@gmail.com
>  - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
>  - don't be distracted~  http://github.com/martin-langhoff
>by shiny stuff

Re: [PATCH] strbuf: use designated initializers in STRBUF_INIT

2017-07-11 Thread brian m. carlson

On Tue, Jul 11, 2017 at 07:24:07AM +0200, Johannes Sixt wrote:
> Am 11.07.2017 um 02:05 schrieb brian m. carlson:
> > I have tried compiling Git with a C++ compiler, so that I could test
> > whether that was a viable alternative for MSVC in case its C++ mode
> > supported features its C mode did not.  Let's just say that the
> > compilation aborted very quickly and I gave up after a few minutes.
> 
> It's 3 cleanup patches and one hacky patch with this size:
> 
>  80 files changed, 899 insertions(+), 807 deletions(-)
> 
> to compile with C++. It passed the test suite last time I tried. Getting rid
> of the remaining 1000+ -fpermissive warnings is a different matter, though.

Yeah, that's the size I was thinking.  My goal was "easily achievable in
half an hour," which it didn't seem like at the time.

> I can revive the patches if there is interest.

I'd be interested in at least a pointer to them if you have one.  I
think it might allow us to take advantage of desirable features that are
in the intersection of C99 and C++.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'

2017-07-11 Thread Jonathan Nieder

Hi,

Brandon Williams wrote:

> Convert grep to use 'struct repository' which enables recursing into
> submodules to be handled in-process.

\o/

This will be even nicer with the changes described at
https://public-inbox.org/git/20170706202739.6056-1-sbel...@google.com/.
Until then, I fear it will cause a regression --- see (*) below.

[...]
>  Documentation/git-grep.txt |   7 -
>  builtin/grep.c | 390 
> +
>  cache.h|   1 -
>  git.c  |   2 +-
>  grep.c |  13 --
>  grep.h |   1 -
>  setup.c|  12 +-
>  7 files changed, 81 insertions(+), 345 deletions(-)

Yay, tests still pass.

[..]
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -95,13 +95,6 @@ OPTIONS
>option the prefix of all submodule output will be the name of
>   the parent project's  object.
>  
> ---parent-basename ::
> - For internal use only.  In order to produce uniform output with the
> - --recurse-submodules option, this option can be used to provide the
> - basename of a parent's  object to a submodule so the submodule
> - can prefix its output with the parent's name rather than the SHA1 of
> - the submodule.

Being able to get rid of this is a very nice change.

[...]
> +++ b/builtin/grep.c
[...]
> @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char 
> *filename)
>  {
>   struct strbuf buf = STRBUF_INIT;
>  
> - if (super_prefix)
> - strbuf_addstr(, super_prefix);
> - strbuf_addstr(, filename);
> -
>   if (opt->relative && opt->prefix_length) {
> - char *name = strbuf_detach(, NULL);
> - quote_path_relative(name, opt->prefix, );
> - free(name);
> + quote_path_relative(filename, opt->prefix, );
> + } else {
> + strbuf_addstr(, filename);
>   }

style micronit: can avoid these braces since both branches are
single-line.

[...]
> @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char 
> *prefix)
>   exit(status);
>  }
>  
> -static void compile_submodule_options(const struct grep_opt *opt,
> -   const char **argv,
> -   int cached, int untracked,
> -   int opt_exclude, int use_index,
> -   int pattern_type_arg)
> -{
[...]
> - /*
> -  * Limit number of threads for child process to use.
> -  * This is to prevent potential fork-bomb behavior of git-grep as each
> -  * submodule process has its own thread pool.
> -  */
> - argv_array_pushf(_options, "--threads=%d",
> -  (num_threads + 1) / 2);

Being able to get rid of this is another very nice change.

[...]
> + /* add objects to alternates */
> + add_to_alternates_memory(submodule.objectdir);

(*) This sets up a single in-memory object store with all the
processed submodules.  Processed objects are never freed.
This means that if I run a command like

git grep --recurse-submodules -e neverfound HEAD

in a project with many submodules then memory consumption scales in
the same way as if the project were all one repository.  By contrast,
without this patch, git is able to take advantage of the implicit
free() when each child exits to limit its memory usage.

Worse, this increases the number of pack files git has to pay
attention to the sum of the numbers of pack files in all the
repositories processed so far.  A single object lookup can take
O(number of packs * log(number of objects in each pack)) time.  That
means performance is likely to suffer as the number of submodules
increases (n^2 performance) even on systems with a lot of memory.

Once the object store is part of the repository struct and freeable,
those problems go away and this patch becomes a no-brainer.

What should happen until then?  Should this go in "next" so we can get
experience with it but with care not to let it graduate to "master"?

Aside from those two concerns, this patch looks very good from a quick
skim, though I haven't reviewed it closely line-by-line.  Once we know
how to go forward, I'm happy to look at it again.

Thanks,
Jonathan

Re: [PATCH 2/3] setup: have the_repository use the_index

2017-07-11 Thread Junio C Hamano

Brandon Williams  writes:

> Have the index state which is stored in 'the_repository' be a pointer to
> the in-core instead 'the_index'.  This makes it easier to begin
> transitioning more parts of the code base to operate on a 'struct
> repository'.
>
> Signed-off-by: Brandon Williams 
> ---
>  setup.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/setup.c b/setup.c
> index 860507e1f..b370bf3c1 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1123,6 +1123,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>   setup_git_env();
>   }
>   }
> + the_repository->index = _index;
>  
>   strbuf_release();
>   strbuf_release();

I would have expected this to be going in the different direction,
i.e. there is an embedded instance of index_state in a repository
object, and the_repository.index is defined to be the old the_index,
i.e.

#define the_index (the_repository.index)

When a Git command that recurses into submodules in-core using
the_repository that represents the top-level superproject and
another repository object tht represents a submodule, don't we want
the repository object for the submodule also have its own default
index without having to allocate one and point at it with the index
field?

I dunno.  Being able to leave the index field NULL lets you say
"this is a bare repository and there is no place for the index file
for it", but even if we never write out the in-core index to an
index file on disk, being able to populate the in-core index that is
default for the repository object from a tree-ish and iterating over
it (e.g.  running in-core merge of trees) is a useful thing to do,
so I do not consider "index field can be set to NULL to signify a
bare repository" a very strong plus.

Re: [PATCH 2/3] setup: have the_repository use the_index

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 5:00 PM, Jonathan Nieder  wrote:

>  /* The main repository */
> -static struct repository the_repo;
> +static struct repository the_repo = { .index = _index };

https://public-inbox.org/git/20170710070342.txmlwwq6gvjkw...@sigill.intra.peff.net/
specifically said we'd not use all the features today
but want to have the test balloon long enough up in
the air? (So this is just a critique of the syntax, I agree on the content)

Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williams  wrote:

> +   if (repo_submodule_init(, superproject, path))
> +   return 0;

What happens if we go through the "return 0", do we rather want to
print an error ?

> +   /* add objects to alternates */
> +   add_to_alternates_memory(submodule.objectdir);

Not trying to make my object series more important than it is... but
we really don't want to spread this add_to_alternates_memory hack. :/

I agree with Jacob that a patch with such a diffstat is a joy to review. :)

Thanks,
Stefan

Re: [PATCH 2/3] setup: have the_repository use the_index

2017-07-11 Thread Jonathan Nieder

Hi,

Brandon Williams wrote:

> Have the index state which is stored in 'the_repository' be a pointer to
> the in-core instead 'the_index'.  This makes it easier to begin
> transitioning more parts of the code base to operate on a 'struct
> repository'.
>
> Signed-off-by: Brandon Williams 
> ---
>  setup.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/setup.c b/setup.c
> index 860507e1f..b370bf3c1 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1123,6 +1123,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>   setup_git_env();
>   }
>   }
> + the_repository->index = _index;

I wonder if this can be done sooner.  For example, does the following
work?  This way, 'the_repository->index == _index' would be an
invariant that always holds, even in the early setup stage before
setup_git_directory_gently has run completely.

Thanks,
Jonathan

diff --git i/repository.c w/repository.c
index edca907404..bdc1f93282 100644
--- i/repository.c
+++ w/repository.c
@@ -4,7 +4,7 @@
 #include "submodule-config.h"
 
 /* The main repository */
-static struct repository the_repo;
+static struct repository the_repo = { .index = _index };
 struct repository *the_repository = _repo;
 
 static char *git_path_from_env(const char *envvar, const char *git_dir,

Re: [PATCH 1/3] repo_read_index: don't discard the index

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williams  wrote:
> Have 'repo_read_index()' behave more like the other read_index family of
> functions and don't discard the index if it has already been populated.

instead rely on the quick return of read_index_from which has

/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
if (istate->initialized)
return istate->cache_nr;

such that we do not have memory leaks or other issues. Currently
we do not have a lot of callers, such that we can change the contract
of the 'repo_read_index' function easily. However going through all
the callers and then looking at the implementation, may hint at a
desire to have repo_read_index documented in repository.h
(There is a hint in struct repository, that its field index can be initialized
using repo_read_index, but what does repo_read_index actually do?)

Re: [PATCH 1/3] repo_read_index: don't discard the index

2017-07-11 Thread Jonathan Nieder

Brandon Williams wrote:

> Have 'repo_read_index()' behave more like the other read_index family of
> functions and don't discard the index if it has already been populated.
>
> Signed-off-by: Brandon Williams 
> ---
>  repository.c | 2 --
>  1 file changed, 2 deletions(-)

How did you discover this?  E.g. was it from code inspection or does
this make the function more convenient to use for some kinds of callers?

Reviewed-by: Jonathan Nieder

[PATCH] RFC: Introduce '.gitorderfile'

2017-07-11 Thread Stefan Beller

Conceptually the file order as set with command line -O or via the config
'diff.orderFile' is interesting to both the author (when I run a quick git
 diff locally) as well as reviewer (a patch floating on the mailing list),
so it is not just the author who should be responsible for getting their
config in order, but a project would benefit when they could give a good
default for such an order.

While the change in this RFC patch to diff.c may look uncontroversial,
(Oh look! it's just another knob we can turn!), the change to the
newly introduced '.gitorderfile' may be more controversial. Here is my
rationale for proposing it:

  I want to force myself to think about the design before pointing out
  memory leaks and coding style, so the least I would wish for is:
*.h
*.c
  but as we have more to look at, I would want to have the most abstract
  thing to come first. And most abstract from the actual code is the
  user interaction, the documentation.  I heard the claim that the git
  project deliberately names the directory 'Documentation/' with a capital
  D such that we had this property by default already. With a patch like
  this we could rename Documentation/ to docs and still enjoy reading the
  docs first.
  Given this alibi, I would claim that t/ is misnamed though! I personally
  would prefer to review tests just after the documentation instead of
  after the code as the tests are more abstract and encode promises to the
  user unlike the code itself that is truth at the end of the day.

Signed-off-by: Stefan Beller 
---

I wrote:
> offtopic: As a general thing for our patches, can we configure
> (or even convince Git in general), that headers ought to be sent *before*
> its accompanying source? I think that would help reviewers like me, who
> tend to start reading linearly and then giving random thoughts, because the
> header prepares the reviewer for the source code with expectations. Also
> by having it the other way around, the review first focuses on design
> (Is this function signature sane; the docs said it would do X while not
> doing Y, is that sane?) instead of code.

and hence I came up with this patch, as I think we would want to expose
such a good feature ('diff.orderFile') even for those who are not looking
for it themselves.

Thanks,
Stefan

 .gitorderfile |  6 ++
 diff.c| 11 +++
 2 files changed, 17 insertions(+)
 create mode 100644 .gitorderfile

diff --git a/.gitorderfile b/.gitorderfile
new file mode 100644
index 00..5131ede927
--- /dev/null
+++ b/.gitorderfile
@@ -0,0 +1,6 @@
+Documentation/*
+t/*
+*.sh
+*.h
+*.c
+Makefile
diff --git a/diff.c b/diff.c
index 00b4c86698..8d537db06a 100644
--- a/diff.c
+++ b/diff.c
@@ -3398,6 +3398,17 @@ void diff_setup(struct diff_options *options)
if (diff_indent_heuristic)
DIFF_XDL_SET(options, INDENT_HEURISTIC);

+   if (!diff_order_file_cfg) {
+   struct stat st;
+   int c = lstat(".gitorderfile", );
+   if (c == 0 && S_ISREG(st.st_mode))
+   diff_order_file_cfg = ".gitorderfile";
+   else if (c < 0 && errno == ENOENT)
+   ; /* File does not exist. no preset. */
+   else
+   die_errno("stat '.gitorderfile'");
+   }
+
options->orderfile = diff_order_file_cfg;

if (diff_no_prefix) {
-- 
2.13.2.695.g117ddefdb4

[PATCH] git-p4: parse marshal output "p4 -G" in p4 changes

2017-07-11 Thread Miguel Torroja

The option -G of p4 (python marshal output) gives more context about the
data being output. That's useful when using the command "change -o" as
we can distinguish between warning/error line and real change description.

Some p4 triggers in the server side generate some warnings when
executed. Unfortunately those messages are mixed with the output of
"p4 change -o". Those extra warning lines are reported as {'code':'info'}
in python marshal output (-G). The real change output is reported as
{'code':'stat'}

the function p4CmdList accepts a new argument: skip_info. When set to
True it ignores any 'code':'info' entry (skip_info=True by default).

A new test has been created in t9807-git-p4-submit.sh adding a p4 trigger
that outputs extra lines with "p4 change -o" and "p4 changes"

Signed-off-by: Miguel Torroja 
---
 git-p4.py| 92 
 t/t9807-git-p4-submit.sh | 30 
 2 files changed, 92 insertions(+), 30 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 8d151da91..1facf32db 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -509,7 +509,7 @@ def isModeExec(mode):
 def isModeExecChanged(src_mode, dst_mode):
 return isModeExec(src_mode) != isModeExec(dst_mode)
 
-def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None):
+def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=True):
 
 if isinstance(cmd,basestring):
 cmd = "-G " + cmd
@@ -545,6 +545,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None):
 try:
 while True:
 entry = marshal.load(p4.stdout)
+if skip_info:
+if 'code' in entry and entry['code'] == 'info':
+continue
 if cb is not None:
 cb(entry)
 else:
@@ -879,8 +882,12 @@ def p4ChangesForPaths(depotPaths, changeRange, 
requestedBlockSize):
 cmd += ["%s...@%s" % (p, revisionRange)]
 
 # Insert changes in chronological order
-for line in reversed(p4_read_pipe_lines(cmd)):
-changes.add(int(line.split(" ")[1]))
+for entry in reversed(p4CmdList(cmd)):
+if entry.has_key('p4ExitCode'):
+die('Error retrieving changes descriptions 
({})'.format(entry['p4ExitCode']))
+if not entry.has_key('change'):
+continue
+changes.add(int(entry['change']))
 
 if not block_size:
 break
@@ -1494,7 +1501,7 @@ class P4Submit(Command, P4UserMap):
 c['User'] = newUser
 input = marshal.dumps(c)
 
-result = p4CmdList("change -f -i", stdin=input)
+result = p4CmdList("change -f -i", stdin=input,skip_info=False)
 for r in result:
 if r.has_key('code'):
 if r['code'] == 'error':
@@ -1526,37 +1533,62 @@ class P4Submit(Command, P4UserMap):
 
 [upstream, settings] = findUpstreamBranchPoint()
 
-template = ""
+template = """\
+# A Perforce Change Specification.
+#
+#  Change:  The change number. 'new' on a new changelist.
+#  Date:The date this specification was last modified.
+#  Client:  The client on which the changelist was created.  Read-only.
+#  User:The user who created the changelist.
+#  Status:  Either 'pending' or 'submitted'. Read-only.
+#  Type:Either 'public' or 'restricted'. Default is 'public'.
+#  Description: Comments about the changelist.  Required.
+#  Jobs:What opened jobs are to be closed by this changelist.
+#   You may delete jobs from this list.  (New changelists only.)
+#  Files:   What opened files from the default changelist are to be added
+#   to this changelist.  You may delete files from this list.
+#   (New changelists only.)
+"""
+files_list = []
 inFilesSection = False
+change_entry = None
 args = ['change', '-o']
 if changelist:
 args.append(str(changelist))
-
-for line in p4_read_pipe_lines(args):
-if line.endswith("\r\n"):
-line = line[:-2] + "\n"
-if inFilesSection:
-if line.startswith("\t"):
-# path starts and ends with a tab
-path = line[1:]
-lastTab = path.rfind("\t")
-if lastTab != -1:
-path = path[:lastTab]
-if settings.has_key('depot-paths'):
-if not [p for p in settings['depot-paths']
-if p4PathStartsWith(path, p)]:
-continue
-else:
-if not p4PathStartsWith(path, self.depotPath):
-continue
+for entry in p4CmdList(args):
+if not entry.has_key('code'):
+continue
+if entry['code']

Re: "groups of files" in Git?

2017-07-11 Thread Igor Djordjevic

For starters, let me say that I consider myself a mere advanced 
beginner Git user, and I haven`t used Perforce ever before (did some 
reading now), but still, for what it`s worth, here are my thoughts, 
please bare with me :)

Do feel free to correct me if I miss something.

On 11/07/2017 19:39, Lars Schneider wrote:
> 
>> On 11 Jul 2017, at 17:45, Nikolay Shustov  wrote:
>>
>> Hi,
>> I have been recently struggling with migrating my development workflow
>> from Perforce to Git, all because of the following thing:
>>
>> I have to work on several features in the same code tree parallel, in
>> the same Perforce workspace. The major reason why I cannot work on one
>> feature then on another is just because I have to make sure that the
>> changes in the related areas of the product play together well.
>>
>> With Perforce, I can have multiple changelists opened, that group the
>> changed files as needed.
>>
>> With Git I cannot seem to finding the possibility to figure out how to
>> achieve the same result. And the problem is that putting change sets
>> on different Git branches (or workdirs, or whatever Git offers that
>> makes the changes to be NOT in the same source tree) is not a viable
>> option from me as I would have to re-build code as I re-integrate the
>> changes between the branches (or whatever changes separation Git
>> feature is used).
>> Build takes time and resources and considering that I have to do it on
>> multiple platforms (I do cross-platform development) it really
>> denominates the option of not having multiple changes in the same code
>> tree.
>>
>> Am I ignorant about some Git feature/way of using Git that would help?
>> Is it worth considering adding to Git a feature like "group of files"
>> that would offer some virtutal grouping of the locally changed files
>> in the checked-out branch?
> 
> Interesting question that came up at my workplace, too.
> 
> Here is what I suggested:
> 1. Keep working on a single branch and make commits for all features
> 2. If you make a commit, prefix the commit message with the feature name
> 3. After you are done with a feature create a new feature branch based on
>your combined feature branch. Use `git rebase -i` [1] to remove all
>commits that are not relevant for the feature. Alternatively you could
>cherry pick the relevant commits [2] if this is faster.
> 
> I wonder what others think about this solution. Maybe there is a better
> solution that I overlooked?
> 
> - Lars
> 
> [1] 
> https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history
> [2] 
> http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html

This "single branch, related commits" approach is exactly what came 
to my mind as well.

But, isn`t Perforce "changelist" an "atomic" group of changes - like 
"commit" in Git, "changeset" in Team Foundation Version Control, 
etc...?

If so, it would mean that this "multiple pending changelists" flow 
would/should be translated to "multiple pending commits" in Git, 
where in the end _a single_ Perforce "changelist" is _a single_ Git 
"commit".

Might be this is where the confusion is coming from, trying to fit 
natural Git "multiple commits per feature" (but "single feature per 
branch") concept into a "single changelist per feature" Perforce 
concept, described/required here?

I don`t think there is a firm concept of such "multiple pending 
commits" in Git, but the point is the author is having multiple 
changelists/commits, and it actively updates them (the _same ones_), 
until they`re ready to be merged (submitted, in Perforce)... which 
you can do in Git, and might be quite easily :)

So... In Git, you can create a separate commit for each changelist 
you have in Perforce (all these commits being on the same branch, as 
desired). Then, when you would have wanted to "update" the pending 
Perforce changelist (not sure what the corresponding command is in 
Perforce), you would just `git commit` your current state with 
additional "--squash" or "--fixup" parameters (depending on if you 
would like to add more description to existing/original commit 
message, or not), and the original commit SHA1.

In the end, when everything is tested together and you would like to 
commit features separately (like submitting changelists in Perforce), 
you would just need to `git rebase -i --autosquash` your branch, 
where Git would squash all your "update" commits (fixup/squash ones, 
that is) to the original/initial ones you made as per your 
changelists/features. No need for manual rearranging, cherry-picking, 
or whatever.

An example flow, with two "changelists" for two features (I`ll be 
using capital letters A, B, C... instead of commit SHA1, for 
simplicity):

... do some "Feature 1" work...
$ git commit -m "Feature 1"
... do some "Feature 2" work...
$ git commit -m "Feature 2"
... do some "Feature 1" work...

Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'

2017-07-11 Thread Jacob Keller

On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williams  wrote:
> Convert grep to use 'struct repository' which enables recursing into
> submodules to be handled in-process.
>
> Signed-off-by: Brandon Williams 
> ---
>  Documentation/git-grep.txt |   7 -
>  builtin/grep.c | 390 
> +
>  cache.h|   1 -
>  git.c  |   2 +-
>  grep.c |  13 --
>  grep.h |   1 -
>  setup.c|  12 +-
>  7 files changed, 81 insertions(+), 345 deletions(-)
>

No real indepth comments here, but it's nice to see how much code
reduction this has enabled!

Thanks,
Jake

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 5033483db..720c7850e 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -95,13 +95,6 @@ OPTIONS
>  option the prefix of all submodule output will be the name of
> the parent project's  object.
>
> ---parent-basename ::
> -   For internal use only.  In order to produce uniform output with the
> -   --recurse-submodules option, this option can be used to provide the
> -   basename of a parent's  object to a submodule so the submodule
> -   can prefix its output with the parent's name rather than the SHA1 of
> -   the submodule.
> -
>  -a::
>  --text::
> Process binary files as if they were text.
> diff --git a/builtin/grep.c b/builtin/grep.c
> index fa351c49f..0b0a8459e 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -28,13 +28,7 @@ static char const * const grep_usage[] = {
> NULL
>  };
>
> -static const char *super_prefix;
>  static int recurse_submodules;
> -static struct argv_array submodule_options = ARGV_ARRAY_INIT;
> -static const char *parent_basename;
> -
> -static int grep_submodule_launch(struct grep_opt *opt,
> -const struct grep_source *gs);
>
>  #define GREP_NUM_THREADS_DEFAULT 8
>  static int num_threads;
> @@ -186,10 +180,7 @@ static void *run(void *arg)
> break;
>
> opt->output_priv = w;
> -   if (w->source.type == GREP_SOURCE_SUBMODULE)
> -   hit |= grep_submodule_launch(opt, >source);
> -   else
> -   hit |= grep_source(opt, >source);
> +   hit |= grep_source(opt, >source);
> grep_source_clear_data(>source);
> work_done(w);
> }
> @@ -327,21 +318,13 @@ static int grep_oid(struct grep_opt *opt, const struct 
> object_id *oid,
>  {
> struct strbuf pathbuf = STRBUF_INIT;
>
> -   if (super_prefix) {
> -   strbuf_add(, filename, tree_name_len);
> -   strbuf_addstr(, super_prefix);
> -   strbuf_addstr(, filename + tree_name_len);
> +   if (opt->relative && opt->prefix_length) {
> +   quote_path_relative(filename + tree_name_len, opt->prefix, 
> );
> +   strbuf_insert(, 0, filename, tree_name_len);
> } else {
> strbuf_addstr(, filename);
> }
>
> -   if (opt->relative && opt->prefix_length) {
> -   char *name = strbuf_detach(, NULL);
> -   quote_path_relative(name + tree_name_len, opt->prefix, 
> );
> -   strbuf_insert(, 0, name, tree_name_len);
> -   free(name);
> -   }
> -
>  #ifndef NO_PTHREADS
> if (num_threads) {
> add_work(opt, GREP_SOURCE_OID, pathbuf.buf, path, oid);
> @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char 
> *filename)
>  {
> struct strbuf buf = STRBUF_INIT;
>
> -   if (super_prefix)
> -   strbuf_addstr(, super_prefix);
> -   strbuf_addstr(, filename);
> -
> if (opt->relative && opt->prefix_length) {
> -   char *name = strbuf_detach(, NULL);
> -   quote_path_relative(name, opt->prefix, );
> -   free(name);
> +   quote_path_relative(filename, opt->prefix, );
> +   } else {
> +   strbuf_addstr(, filename);
> }
>
>  #ifndef NO_PTHREADS
> @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char 
> *prefix)
> exit(status);
>  }
>
> -static void compile_submodule_options(const struct grep_opt *opt,
> - const char **argv,
> - int cached, int untracked,
> - int opt_exclude, int use_index,
> - int pattern_type_arg)
> -{
> -   struct grep_pat *pattern;
> -
> -   if (recurse_submodules)
> -   argv_array_push(_options, "--recurse-submodules");
> -
> -   if (cached)
> -   argv_array_push(_options, "--cached");
> -   if (!use_index)
> -   argv_array_push(_options,

Re: [RFC PATCH 3/3] sha1_file: add promised blob hook support

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 12:48 PM, Jonathan Tan  wrote:
> Teach sha1_file to invoke a hook whenever a blob is requested and
> unavailable but is promised. The hook is a shell command that can be
> configured through "git config"; this hook takes in a list of hashes and
> writes (if successful) the corresponding objects to the repo's local
> storage.
>
> This is meant as a temporary measure to ensure that all Git commands
> work in such a situation. Future patches will update some commands to
> either tolerate promised blobs (without invoking the hook) or be more
> efficient in invoking the promised blob hook.
>
> In order to determine the code changes in sha1_file.c necessary, I
> investigated the following:
>  (1) functions in sha1_file that take in a hash, without the user
>  regarding how the object is stored (loose or packed)
>  (2) functions in sha1_file that operate on packed objects (because I
>  need to check callers that know about the loose/packed distinction
>  and operate on both differently, and ensure that they can handle
>  the concept of objects that are neither loose nor packed)
>
> (1) is handled by the modification to sha1_object_info_extended().
>
> For (2), I looked at for_each_packed_object and at the packed-related
> functions that take in a hash. For for_each_packed_object, the callers
> either already work or are fixed in this patch:
>  - reachable - only to find recent objects
>  - builtin/fsck - already knows about promised blobs
>  - builtin/cat-file - fixed in this commit
>
> Callers of the other functions do not need to be changed:
>  - parse_pack_index
>- http - indirectly from http_get_info_packs
>  - find_pack_entry_one
>- this searches a single pack that is provided as an argument; the
>  caller already knows (through other means) that the sought object
>  is in a specific pack
>  - find_sha1_pack
>- fast-import - appears to be an optimization to not store a
>  file if it is already in a pack
>- http-walker - to search through a struct alt_base
>- http-push - to search through remote packs
>  - has_sha1_pack
>- builtin/fsck - already knows about promised blobs
>- builtin/count-objects - informational purposes only (check if loose
>  object is also packed)
>- builtin/prune-packed - check if object to be pruned is packed (if
>  not, don't prune it)
>- revision - used to exclude packed objects if requested by user
>- diff - just for optimization
>
> An alternative design that I considered but rejected:
>
>  - Adding a hook whenever a packed blob is requested, not on any blob.
>That is, whenever we attempt to search the packfiles for a blob, if
>it is missing (from the packfiles and from the loose object storage),
>to invoke the hook (which must then store it as a packfile), open the
>packfile the hook generated, and report that the blob is found in
>that new packfile. This reduces the amount of analysis needed (in
>that we only need to look at how packed blobs are handled), but
>requires that the hook generate packfiles (or for sha1_file to pack
>whatever loose objects are generated), creating one packfile for each
>missing blob and potentially very many packfiles that must be
>linearly searched. This may be tolerable now for repos that only have
>a few missing blobs (for example, repos that only want to exclude
>large blobs), and might be tolerable in the future if we have
>batching support for the most commonly used commands, but is not
>tolerable now for repos that exclude a large amount of blobs.
>
> Signed-off-by: Jonathan Tan 

>
> +core.promisedBlobCommand::
> +   If set, whenever a blob in the local repo is attempted to be read, but
> +   is both missing and a promised blob, invoke this shell command to
> +   generate or obtain that blob before reporting an error. This shell
> +   command should take one or more hashes, each terminated by a newline,
> +   as standard input, and (if successful) should write the corresponding
> +   objects to the local repo (packed or loose).

Any git command invoked here behaves differently as
GIT_IGNORE_PROMISED_BLOBS is set.
GIT_IGNORE_PROMISED_BLOBS needs documentation.


> +int request_promised_blobs(const struct oid_array *blobs)

> +   if (i == blobs->nr)
> +   /* Nothing to fetch */

s/to fetch/to get/ ? Might be to nitpicky.

So this function takes an array of blobs and obtains any that
are promised.

> +
> +   argv[0] = promised_blob_command;
> +   cp.argv = argv;
> +   cp.env = env;

I had the impression that we'd favor .args and .env_array
nowdays, such that we'd not need to construct the NULL
terminated array ourselves. (e.g. whenever we'd change argv,
we'd need to make sure it contains enough NULL at the end)

argv_array_push(, promised_blob_command);
argv_array_push(_array, \

Re: [PATCH] git-p4: parse marshal output "p4 -G" in p4 changes

2017-07-11 Thread Miguel Torroja

Hi Luke,


My bad as I didn't check that case.
It was p4CmdList as you said. the default value of the new field
skip_info (set to True) ignores any info messages. and the script is
waiting for a valid message.
If I set it to False, then it does return an info entry and it accepts
the submit change

I'm sending another patch update

On Tue, Jul 11, 2017 at 10:35 AM, Luke Diamand  wrote:
> On 3 July 2017 at 23:57, Miguel Torroja  wrote:
>> The option -G of p4 (python marshal output) gives more context about the
>> data being output. That's useful when using the command "change -o" as
>> we can distinguish between warning/error line and real change description.
>>
>> Some p4 triggers in the server side generate some warnings when
>> executed. Unfortunately those messages are mixed with the output of
>> "p4 change -o". Those extra warning lines are reported as {'code':'info'}
>> in python marshal output (-G). The real change output is reported as
>> {'code':'stat'}
>>
>> the function p4CmdList accepts a new argument: skip_info. When set to
>> True it ignores any 'code':'info' entry (skip_info=True by default).
>>
>> A new test has been created to t9807-git-p4-submit.sh adding a p4 trigger
>> that outputs extra lines with "p4 change -o" and "p4 changes"
>
> The latest version of mt/p4-parse-G-output (09521c7a0) seems to break
> t9813-git-p4-preserve-users.sh.
>
> I don't quite know why, but I wonder if it's the change to p4CmdList() ?
>
> Luke
>
>>
>> Signed-off-by: Miguel Torroja 
>> ---
>>  git-p4.py| 90 
>> 
>>  t/t9807-git-p4-submit.sh | 30 
>>  2 files changed, 91 insertions(+), 29 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 8d151da91..a262e3253 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -509,7 +509,7 @@ def isModeExec(mode):
>>  def isModeExecChanged(src_mode, dst_mode):
>>  return isModeExec(src_mode) != isModeExec(dst_mode)
>>
>> -def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None):
>> +def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=True):
>>
>>  if isinstance(cmd,basestring):
>>  cmd = "-G " + cmd
>> @@ -545,6 +545,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', 
>> cb=None):
>>  try:
>>  while True:
>>  entry = marshal.load(p4.stdout)
>> +if skip_info:
>> +if 'code' in entry and entry['code'] == 'info':
>> +continue
>>  if cb is not None:
>>  cb(entry)
>>  else:
>> @@ -879,8 +882,12 @@ def p4ChangesForPaths(depotPaths, changeRange, 
>> requestedBlockSize):
>>  cmd += ["%s...@%s" % (p, revisionRange)]
>>
>>  # Insert changes in chronological order
>> -for line in reversed(p4_read_pipe_lines(cmd)):
>> -changes.add(int(line.split(" ")[1]))
>> +for entry in reversed(p4CmdList(cmd)):
>> +if entry.has_key('p4ExitCode'):
>> +die('Error retrieving changes descriptions 
>> ({})'.format(entry['p4ExitCode']))
>> +if not entry.has_key('change'):
>> +continue
>> +changes.add(int(entry['change']))
>>
>>  if not block_size:
>>  break
>> @@ -1526,37 +1533,62 @@ class P4Submit(Command, P4UserMap):
>>
>>  [upstream, settings] = findUpstreamBranchPoint()
>>
>> -template = ""
>> +template = """\
>> +# A Perforce Change Specification.
>> +#
>> +#  Change:  The change number. 'new' on a new changelist.
>> +#  Date:The date this specification was last modified.
>> +#  Client:  The client on which the changelist was created.  Read-only.
>> +#  User:The user who created the changelist.
>> +#  Status:  Either 'pending' or 'submitted'. Read-only.
>> +#  Type:Either 'public' or 'restricted'. Default is 'public'.
>> +#  Description: Comments about the changelist.  Required.
>> +#  Jobs:What opened jobs are to be closed by this changelist.
>> +#   You may delete jobs from this list.  (New changelists only.)
>> +#  Files:   What opened files from the default changelist are to be 
>> added
>> +#   to this changelist.  You may delete files from this list.
>> +#   (New changelists only.)
>> +"""
>> +files_list = []
>>  inFilesSection = False
>> +change_entry = None
>>  args = ['change', '-o']
>>  if changelist:
>>  args.append(str(changelist))
>> -
>> -for line in p4_read_pipe_lines(args):
>> -if line.endswith("\r\n"):
>> -line = line[:-2] + "\n"
>> -if inFilesSection:
>> -if line.startswith("\t"):
>> -# path starts and ends with a tab
>> -path = line[1:]
>> -lastTab = path.rfind("\t")
>> -

Re: "groups of files" in Git?

2017-07-11 Thread astian

Nikolay Shustov wrote:
> With Perforce, I can have multiple changelists opened, that group the
> changed files as needed.
>
> With Git I cannot seem to finding the possibility to figure out how to
> achieve the same result. And the problem is that putting change sets
> on different Git branches (or workdirs, or whatever Git offers that
> makes the changes to be NOT in the same source tree) is not a viable
> option from me as I would have to re-build code as I re-integrate the
> changes between the branches (or whatever changes separation Git
> feature is used).
> Build takes time and resources and considering that I have to do it on
> multiple platforms (I do cross-platform development) it really
> denominates the option of not having multiple changes in the same code
> tree.
>
> Am I ignorant about some Git feature/way of using Git that would help?
> Is it worth considering adding to Git a feature like "group of files"
> that would offer some virtutal grouping of the locally changed files
> in the checked-out branch?

I never used Perforce and I'm not even sure I understand your problem,
but I thought I'd mention something that nobody else seems to have yet
(unless I missed it):

First, one thing that seems obvious to me from your description is that
these "parallel features" you work on are obviously interdependent,
therefore I would rather consider the whole thing as a single feature.
Therefore, it makes sense to me to work in a single "topic branch".

This doesn't preclude one from separating the changes in logically
sensible pieces.  Indeed this is par for the course in Git and people do
it all the time by dividing the bulk of changes into a carefully chosen
series of commits.

I think the most common way of doing this is to simply work on the whole
thing and once you're happy with it you use "git rebase --interative" in
order to "prettify" your history.

But, and here comes the part I think nobody mentioned yet, if your
feature work is considerably large or spans a considerably long time it
may be undesirable to postpone all that work until the very end (perhaps
by then you already forgot important information, or perhaps too many
changes have accumulated so reviewing them all becomes significantly
less efficient).  In that case, one solution is to use a "patch
management system" which will let you do that work incrementally (going
back and forth as needed).

If you know mercurial, this is "hg mq".  I don't think Git has any such
system built-in, but I know there are at least these external tools that
integrate with Git:
https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Patch-management_Interface_layers

Feel free to ignore this if I totally misunderstood your use case.

Cheers.

Re: [RFC PATCH 2/3] sha1-array: support appending unsigned char hash

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 12:48 PM, Jonathan Tan  wrote:
> In a subsequent patch, sha1_file will need to append object names in the
> form of "unsigned char *" to oid arrays. Teach sha1-array support for
> that.
>
> Signed-off-by: Jonathan Tan 

This breaks the oid/sha1 barrier?

I would have expected the caller to do a

  oid_array_append_oid(, sha1_to_oid(sha1));

with sha1_to_oid working off some static memory, such that the
call to oid_array_append_oid (which we have as oid_array_append)
is just as cheap?

Could you say more in the commit message on why we collude
sha1 and oids here?

Thanks,
Stefan

[PATCH 1/3] repo_read_index: don't discard the index

2017-07-11 Thread Brandon Williams

Have 'repo_read_index()' behave more like the other read_index family of
functions and don't discard the index if it has already been populated.

Signed-off-by: Brandon Williams 
---
 repository.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/repository.c b/repository.c
index edca90740..8e60af1d5 100644
--- a/repository.c
+++ b/repository.c
@@ -235,8 +235,6 @@ int repo_read_index(struct repository *repo)
 {
if (!repo->index)
repo->index = xcalloc(1, sizeof(*repo->index));
-   else
-   discard_index(repo->index);
 
return read_index_from(repo->index, repo->index_file);
 }
-- 
2.13.2.932.g7449e964c-goog

[PATCH 3/3] grep: recurse in-process using 'struct repository'

2017-07-11 Thread Brandon Williams

Convert grep to use 'struct repository' which enables recursing into
submodules to be handled in-process.

Signed-off-by: Brandon Williams 
---
 Documentation/git-grep.txt |   7 -
 builtin/grep.c | 390 +
 cache.h|   1 -
 git.c  |   2 +-
 grep.c |  13 --
 grep.h |   1 -
 setup.c|  12 +-
 7 files changed, 81 insertions(+), 345 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 5033483db..720c7850e 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -95,13 +95,6 @@ OPTIONS
 option the prefix of all submodule output will be the name of
the parent project's  object.
 
---parent-basename ::
-   For internal use only.  In order to produce uniform output with the
-   --recurse-submodules option, this option can be used to provide the
-   basename of a parent's  object to a submodule so the submodule
-   can prefix its output with the parent's name rather than the SHA1 of
-   the submodule.
-
 -a::
 --text::
Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index fa351c49f..0b0a8459e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -28,13 +28,7 @@ static char const * const grep_usage[] = {
NULL
 };
 
-static const char *super_prefix;
 static int recurse_submodules;
-static struct argv_array submodule_options = ARGV_ARRAY_INIT;
-static const char *parent_basename;
-
-static int grep_submodule_launch(struct grep_opt *opt,
-const struct grep_source *gs);
 
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
@@ -186,10 +180,7 @@ static void *run(void *arg)
break;
 
opt->output_priv = w;
-   if (w->source.type == GREP_SOURCE_SUBMODULE)
-   hit |= grep_submodule_launch(opt, >source);
-   else
-   hit |= grep_source(opt, >source);
+   hit |= grep_source(opt, >source);
grep_source_clear_data(>source);
work_done(w);
}
@@ -327,21 +318,13 @@ static int grep_oid(struct grep_opt *opt, const struct 
object_id *oid,
 {
struct strbuf pathbuf = STRBUF_INIT;
 
-   if (super_prefix) {
-   strbuf_add(, filename, tree_name_len);
-   strbuf_addstr(, super_prefix);
-   strbuf_addstr(, filename + tree_name_len);
+   if (opt->relative && opt->prefix_length) {
+   quote_path_relative(filename + tree_name_len, opt->prefix, 
);
+   strbuf_insert(, 0, filename, tree_name_len);
} else {
strbuf_addstr(, filename);
}
 
-   if (opt->relative && opt->prefix_length) {
-   char *name = strbuf_detach(, NULL);
-   quote_path_relative(name + tree_name_len, opt->prefix, 
);
-   strbuf_insert(, 0, name, tree_name_len);
-   free(name);
-   }
-
 #ifndef NO_PTHREADS
if (num_threads) {
add_work(opt, GREP_SOURCE_OID, pathbuf.buf, path, oid);
@@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char 
*filename)
 {
struct strbuf buf = STRBUF_INIT;
 
-   if (super_prefix)
-   strbuf_addstr(, super_prefix);
-   strbuf_addstr(, filename);
-
if (opt->relative && opt->prefix_length) {
-   char *name = strbuf_detach(, NULL);
-   quote_path_relative(name, opt->prefix, );
-   free(name);
+   quote_path_relative(filename, opt->prefix, );
+   } else {
+   strbuf_addstr(, filename);
}
 
 #ifndef NO_PTHREADS
@@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char 
*prefix)
exit(status);
 }
 
-static void compile_submodule_options(const struct grep_opt *opt,
- const char **argv,
- int cached, int untracked,
- int opt_exclude, int use_index,
- int pattern_type_arg)
-{
-   struct grep_pat *pattern;
-
-   if (recurse_submodules)
-   argv_array_push(_options, "--recurse-submodules");
-
-   if (cached)
-   argv_array_push(_options, "--cached");
-   if (!use_index)
-   argv_array_push(_options, "--no-index");
-   if (untracked)
-   argv_array_push(_options, "--untracked");
-   if (opt_exclude > 0)
-   argv_array_push(_options, "--exclude-standard");
-
-   if (opt->invert)
-   argv_array_push(_options, "-v");
-   if (opt->ignore_case)
-   argv_array_push(_options, "-i");
-   if (opt->word_regexp)
-   argv_array_push(_options, "-w");
-   switch

[PATCH 2/3] setup: have the_repository use the_index

2017-07-11 Thread Brandon Williams

Have the index state which is stored in 'the_repository' be a pointer to
the in-core instead 'the_index'.  This makes it easier to begin
transitioning more parts of the code base to operate on a 'struct
repository'.

Signed-off-by: Brandon Williams 
---
 setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.c b/setup.c
index 860507e1f..b370bf3c1 100644
--- a/setup.c
+++ b/setup.c
@@ -1123,6 +1123,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
setup_git_env();
}
}
+   the_repository->index = _index;
 
strbuf_release();
strbuf_release();
-- 
2.13.2.932.g7449e964c-goog

[PATCH 0/3] Convert grep to recurse in-process

2017-07-11 Thread Brandon Williams

This series utilizes the new 'struct repository' in order to convert grep to be
able to recurse into submodules in-process much like how ls-files was converted
to recuse in-process.  The result is a much smaller code footprint due to not
needing to compile an argv array of options to be used when launched a process
for operating on a submodule.

Brandon Williams (3):
  repo_read_index: don't discard the index
  setup: have the_repository use the_index
  grep: recurse in-process using 'struct repository'

 Documentation/git-grep.txt |   7 -
 builtin/grep.c | 390 +
 cache.h|   1 -
 git.c  |   2 +-
 grep.c |  13 --
 grep.h |   1 -
 repository.c   |   2 -
 setup.c|  13 +-
 8 files changed, 82 insertions(+), 347 deletions(-)

-- 
2.13.2.932.g7449e964c-goog

Re: [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 12:48 PM, Jonathan Tan  wrote:
> Currently, Git does not support repos with very large numbers of blobs
> or repos that wish to minimize manipulation of certain blobs (for
> example, because they are very large) very well, even if the user
> operates mostly on part of the repo, because Git is designed on the
> assumption that every blob referenced by a tree object is available
> somewhere in the repo storage.
>
> As a first step to reducing this problem, introduce the concept of
> promised blobs. Each Git repo can contain a list of promised blobs and
> their sizes at $GIT_DIR/objects/promisedblob. This patch contains
> functions to query them; functions for creating and modifying that file
> will be introduced in later patches.
>
> A repository that is missing a blob but has that blob promised is not
> considered to be in error, so also teach fsck this.

Here I wondered what this file looks like, in a later patch you
add documentation:

  +objects/promisedblob::
  +   This file records the sha1 object names and sizes of promised
  +   blobs.
  +

but that leaves more fundamental questions:
* Is that a list of sha1s, separated by LF? (CRLF? How would Windows
  users interact with it, are they supposed to ever modify this file?)
* Similar to .git/packed-refs, would we want to have a first line
  that has some sort of comment?
* In the first question I assumed a linear list, will that be a sorted list,
  or do we want to have some fancy data structure here?
  (critbit tree, bloom filter)
* is the contents in ASCII or binary? (What are the separators?)
* In the future I presume we'd want to quickly answer "Is X in the
  promised blobs list?" so would it be possible (in the future) to
  improve the searching e.g. binary search?
* ... I'll read on to see my questions answered, but I'd guess
  others would prefer to see it in the docs. :)

> + * A mmap-ed byte array of size (missing_blob_nr * ENTRY_SIZE). Each
> + * ENTRY_SIZE-sized entry consists of the SHA-1 of the promised blob and its
> + * 64-bit size in network byte order. The entries are sorted in ascending 
> SHA-1
> + * order.

This seems to be the answer to the docs. :)
So binary representation, the size variable takes a fixed amount, such
that the n-th element in the file is at n * ENTRY_SIZE.

> +   if (st.st_size % ENTRY_SIZE) {
> +   die("Size of %s is not a multiple of %d", filename, 
> ENTRY_SIZE);
> +   }

So it looks as if the file format is an array of ENTRY_SIZE.

Similar to other files, would we want to prefix the file with
a 4 letter magic number and a version number?

> +   prepare_promised_blobs();
> +   result = sha1_entry_pos(promised_blobs, ENTRY_SIZE, 0, 0,
> +   promised_blob_nr, promised_blob_nr, 
> oid->hash);

IIRC, this is a binary search.

> +int for_each_promised_blob(each_promised_blob_fn cb, void *data)
> +{
> +   struct object_id oid;
> +   int i, r;
> +
> +   prepare_promised_blobs();
> +   for (i = 0; i < promised_blob_nr; i++) {
> +   memcpy(oid.hash, _blobs[i * ENTRY_SIZE],
> +  GIT_SHA1_RAWSZ);
> +   r = cb(, data);
> +   if (r)
> +   return r;
> +   }
> +   return 0;
> +}
> diff --git a/promised-blob.h b/promised-blob.h
> new file mode 100644
> index 0..a303ea1ff
> --- /dev/null
> +++ b/promised-blob.h
> @@ -0,0 +1,14 @@
> +#ifndef PROMISED_BLOB_H
> +#define PROMISED_BLOB_H
> +
> +/*
> + * Returns 1 if oid is the name of a promised blob. If size is not NULL, also
> + * returns its size.
> + */
> +extern int is_promised_blob(const struct object_id *oid,
> +   unsigned long *size);
> +
> +typedef int each_promised_blob_fn(const struct object_id *oid, void *data);
> +int for_each_promised_blob(each_promised_blob_fn, void *);
> +
> +#endif
> diff --git a/t/t3907-promised-blob.sh b/t/t3907-promised-blob.sh
> new file mode 100755
> index 0..827072004
> --- /dev/null
> +++ b/t/t3907-promised-blob.sh
> @@ -0,0 +1,29 @@
> +#!/bin/sh
> +
> +test_description='promised blobs'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'fsck fails on missing blobs' '
> +   rm -rf repo &&

We should not need to delete the repo here, this is the first test?

> +   rm repo/.git/objects/$(echo $HASH | cut -c1-2)/$(echo $HASH | cut 
> -c3-40) &&

Later down the road, do we want to have a
(plumbing) command to move an object from
standard blob to promised blob (as of now I'd think
it would perform this rm call as well as an insert into
the promised blobs file) ?
(Well, we'd also have to think about how to get objects
out of a pack)

With such a command you can easily write your own custom
filter to free up blobs again.

> +   test_must_fail git -C repo fsck

test_i18n_grep missing out ?

maybe, too? (Maybe that is already tested elsewhere,
so no need for it)

> +'
> +
>

Greetings to you

2017-07-11 Thread Ouadrago Karim

Greetings My Dear Friend,

Before I introduce myself, I wish to inform you that this letter is
not a hoax mail and I urge you to treat it serious. This letter must
come to you as a big surprise, but I believe it is only a day that
people meet and become great friends and business partners. Please I
want you to read this letter very carefully and I must apologize for
barging this message into your mail box without any formal
introduction due to the urgency and confidentiality of this business
and I know that this message will come to you as a surprise. Please
this is not a joke and I will not like you to joke with it ok,

With due respect to your person and much sincerity of purpose, I make
this contact with you as I believe that you can be of great assistance
to me. My name is Mr. ouadrago.karim from Burkina Faso, West Africa. I
work in African Development Bank (ADB) as telex manager. I have been
searching for your contact since you left our country some years ago.
I do not know whether this is your correct email address or not
because I only used your name initials to search for your contact. In
case you are not the person I am supposed to contact, please see this
as a confidential message and do not reveal it to another person but
if you are not the intended receiver, do let me know whether you can
be of assistance regarding my proposal below because it is top secret.

I am about to retire from active Banking service to start a new life
but I am skeptical to reveal this particular secret to a stranger. You
must assure me that everything will be handled confidentially because
we are not going to suffer again in life. It has been 10 years now
that most of the greedy African Politicians used our bank to launder
money overseas through the help of their Political advisers. Most of
the funds which they transferred out of the shores of Africa were gold
and oil money that was supposed to have been used to develop the
continent. Their Political advisers always inflated the amounts before
transferring to foreign accounts, so I also used the opportunity to
divert part of the funds hence I am aware that there is no official
trace of how much was transferred as all the accounts used for such
transfers were being closed after transfer. I acted as the Bank
Officer to most of the politicians and when I discovered that they
were using me to succeed in their greedy act; I also cleaned some of
their banking records from the Bank files and no one cared to ask me
because the money was too much for them to control. They laundered
over $5billion Dollars during the process.

Before I send this message to you, I have already diverted
($17.4million Dollars) to an escrow account belonging to no one in the
bank. The bank is anxious now to know who the beneficiary to the funds
is because they have made a lot of profits with the funds. It is more
than Eight years now and most of the politicians are no longer using
our bank to transfer funds overseas. The ($17.4million Dollars) has
been laying waste in our bank and I don't want to retire from the bank
without transferring the funds to a foreign account to enable me share
the proceeds with the receiver (a foreigner). The money will be shared
60% for me and 40% for you. There is no one coming to ask you about
the funds because I secured everything. I only want you to assist me
by providing a reliable bank account where the funds can be
transferred.

You are not to face any difficulties or legal implications as I am
going to handle the transfer personally. If you are capable of
receiving the funds, do let me know immediately to enable me give you
a detailed information on what to do. For me, I have not stolen the
money from anyone because the other people that took the whole money
did not face any problems. This is my chance to grab my own life
opportunity but you must keep the details of the funds secret to avoid
any leakages as no one in the bank knows about the my plans.

Please get back to me if you are interested and capable to handle this
project, I shall intimate you on what to do when I hear from your
confirmation and acceptance. If you are capable of being my trusted
associate, do declare your consent to me. I am looking forward to hear
from you immediately for further information.


Thanks with my best regards.

Mr. ouadrago.karim.

Re: [RFC/WIP PATCH] object store classification

2017-07-11 Thread Junio C Hamano

Stefan Beller  writes:

>> At the implementation level, it should have a
>> hashtable (lazily populated) for all the objects in a single
>> $GIT_OBJECT_DIRECTORY, grafts/replace info, and a set of pointers to
>> other object-store instances that are its alternate object stores.
>
> So one repository has one or more object stores?

One repository foo/.git/ has one foo/.git/objects/ directory, so it
has its own single object store.  That object store may refer to
another object store by having foo/.git/objects/info/alternates.

Similarly, foo/.git/objects/info/grafts and foo/.git/refs/replace/
would belong to the single object store repository foo/.git/ has.

> I would expect that most of the time the question from above
> "give me info on the object I can refer to with this object name"
> is asked with the additional information: "and I know it is in this
> repository", so we rather want to have
>
>   lookup_object(struct *repo, char *name);
>
> instead of
>
>   lookup_object(struct *object_store, char *name);

Absolutely.  That is why repository has its own single object_store,
which may refer to other object_stores.

Re: [PATCH] commit & merge: modularize the empty message validator

2017-07-11 Thread Junio C Hamano

Kaartic Sivaraam  writes:

> In the context of "git merge" the meaning of an "empty message"
> is one that contains no line of text. This is not in line with
> "git commit" where an "empty message" is one that contains only
> whitespaces and/or signed-off-by lines. This could cause surprises
> to users who are accustomed to the meaning of an "empty message"
> of "git commit".
>
> Prevent such surprises by ensuring the meaning of an empty 'merge
> message' to be in line with that of an empty 'commit message'. This
> is done by separating the empty message validator from 'commit' and
> making it stand-alone.
>
> Signed-off-by: Kaartic Sivaraam 
> ---
>  I have made an attempt to solve the issue by separating the concerned
>  function as I found no reason against it.
>
>  I've tried to name them with what felt appropriate and concise to me.
>  Let me know if it's alright.

I probably would have avoided a pair of new files just to house a
single function.  I anticipate that the last helper function in
commit.c at the top-level would become relevant to this topic, and
because of that, I would have added this function at the end of the
file if I were doing this patch.

> @@ -772,7 +773,7 @@ static void prepare_to_commit(struct commit_list 
> *remoteheads)
>   }
>   read_merge_msg();
>   strbuf_stripspace(, 0 < option_edit);
> - if (!msg.len)
> + if (!msg.len || message_is_empty(, 0))

I do not see much point in checking !msg.len here.  The function
immediately returns by hitting the termination condition of the
outermost loop and this is not a performance-critical codepath.

I think the "validation" done with the rest_is_empty() is somewhat
bogus.  Why should we reject a commit without a message and a
trailer block with only signed-off-by lines, while accepting a
commit without a message and a trailer block as long as the trailer
block has something equally meaningless by itself, like
"Helped-by:"?  I think we should inspect the proposed commit log
message taken from the editor, find its tail ignoring the trailing
comment using ignore_non_trailer, and further separate the result
into (, , ) using the same
logic used by the interpret-trailers tool, and then complain when
 turns out to be empty, to be truly useful and consistent.

And for that eventual future, merging the logic used in commit and
merge might be a good first step.

Having said all that, I am not sure "Prevent such surprises" is a
problem that is realistic to begin with.  When a user sees the
editor buffer in "git merge", it is pre-populated with at least a
single line of message "Merge branch 'foo'", possibly followed by
the summary of the side branch being merged, so unless the user
deliberately removes everything and then add a sign-off line
(because we do not usually add one), there is no room for "such
surprises" in the first place.  It does not _hurt_ to diagnose such
a crazy case, but it feels a bit lower priority.

So from the point of "let's improve what merge does", this change
looks to me a borderline "Meh"; but to improve the "why sign-off is
so special and behave differently from helped-by when deciding if
there is any log?" situation, having a separate helper function that
is shared across multiple codepaths that accept edited result may be
a good idea.

Re: "groups of files" in Git?

2017-07-11 Thread Lars Schneider

> On Tue, Jul 11, 2017 at 1:39 PM, Lars Schneider
>  wrote:
>> 
>>> On 11 Jul 2017, at 17:45, Nikolay Shustov  wrote:
>>> 
>>> Hi,
>>> I have been recently struggling with migrating my development workflow
>>> from Perforce to Git, all because of the following thing:
>>> 
>>> I have to work on several features in the same code tree parallel, in
>>> the same Perforce workspace. The major reason why I cannot work on one
>>> feature then on another is just because I have to make sure that the
>>> changes in the related areas of the product play together well.
>>> 
>>> With Perforce, I can have multiple changelists opened, that group the
>>> changed files as needed.
>>> 
>>> With Git I cannot seem to finding the possibility to figure out how to
>>> achieve the same result. And the problem is that putting change sets
>>> on different Git branches (or workdirs, or whatever Git offers that
>>> makes the changes to be NOT in the same source tree) is not a viable
>>> option from me as I would have to re-build code as I re-integrate the
>>> changes between the branches (or whatever changes separation Git
>>> feature is used).
>>> Build takes time and resources and considering that I have to do it on
>>> multiple platforms (I do cross-platform development) it really
>>> denominates the option of not having multiple changes in the same code
>>> tree.
>>> 
>>> Am I ignorant about some Git feature/way of using Git that would help?
>>> Is it worth considering adding to Git a feature like "group of files"
>>> that would offer some virtutal grouping of the locally changed files
>>> in the checked-out branch?
>> 
>> Interesting question that came up at my workplace, too.
>> 
>> Here is what I suggested:
>> 1. Keep working on a single branch and make commits for all features
>> 2. If you make a commit, prefix the commit message with the feature name
>> 3. After you are done with a feature create a new feature branch based on
>>   your combined feature branch. Use `git rebase -i` [1] to remove all
>>   commits that are not relevant for the feature. Alternatively you could
>>   cherry pick the relevant commits [2] if this is faster.
>> 
>> I wonder what others think about this solution. Maybe there is a better
>> solution that I overlooked?
>> 
>> - Lars
>> 
>> [1] 
>> https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history
>> [2] 
>> http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html
>> 

> On 11 Jul 2017, at 19:54, Nikolay Shustov  wrote:
> 
> Thank you for the idea, however I am having troubles with basically
> maintaining the uncommitted groups of files: I would prefer the clear
> distinction that "those files belong to feature A" and "these files
> belong to feature B", before I commit anything. Committing separately
> every change for feature A and for feature B would probably a good
> option unless I have many changes and then cherry-picking the proper
> commits to create a single changeset for the integration would become
> a nightmare.

I see. Why so complicated with gitattributes then?

How about this:
Let's say you start working on featureX that affects file1 and file2
and featureY that affects file8 and file9

1. Create aliases to add the files:
   $ git config --local alias.featx 'add file1 file2'
   $ git config --local alias.featy 'add file8 file9'

2. Work on the features. Whenever you have something ready for featureX
   run this:
   $ git featx
   $ git commit

   Whenever you have something ready for featureY run this:
   $ git featy
   $ git commit

Wouldn't that work?

- Lars

Re: "groups of files" in Git?

2017-07-11 Thread Fredrik Gustafsson

Hi,
I will choose a bit of a less diplomatic path here. Instead of trying to
tell you how you can make git fit your needs, I would say that you
shouldn't. I've two arguments:

1.
It's always painful when you try to use a tool in some way it's not
intended or used to work. If you're doing something different than
anyone else using that tool, you're probably doing something wrong!

I doubt that your case is so special, so my suggestion is to either use
git the way most people use it, with one branch for each feature, or do
not use git at all, since perforce seems to be better with your
workstyle.

2.
Git is a snapshot based SCM system. That means that each commit unique
identify a version of the code. With your system (as well as any time
you're not commiting all changed files) the commit is never tested.
You've no idea of actually knowing if your two changes is dependent or
not. Of course you can guess, but it's still a guess and in your current
work way with perforce you have no way of knowing if your changesets
have a dependency between eachother or not since you never test them
individually.

--

Please let me know if you feel that I've missed something.

I can see four solutions:

1.
Now I would suggest that you have each feature in a commit and simply
run your tests every few commits so you don't have to run it for each
commit.

2.
Improve your build and test time. I'm sure there's things here to
improve.

3.
Continue to use perforce. If I recall correctly perforce has even a git
integration today.

4.
Use integration branches in git and run the tests on that branch. This
can be easy todo if you write some scripts for it.

Good luck!

-- 
Fredrik Gustafsson

phone: +46 733-608274
e-mail: iv...@iveqy.com
website: http://www.iveqy.com

Re: [RFC/WIP PATCH] object store classification

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 11:46 AM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>>>   the_repository -> the_object_store
>>>   but the object store is a complex beast having different hash tables
>>>   for the different alternates.
>>
>> After looking at the patch and some of the comments here I think that
>> this is probably the best approach with a few tweaks (which may be
>> completely unfounded because I'm not familiar with all the object store
>> code).
>>
>> In an OO world I would envision a single object (let's say 'struct
>> object_store') which is responsible for managing a repository's objects
>> no matter where the individual objects came from (main object store or
>> an alternate for that repository).  And if I understand correctly the
>> single hash table that exists now caches objects like this.
>
> I would say that conceptually an object-store object has an
> interface to answer "give me info on the object I can refer to with
> this object name".

ok.

> At the implementation level, it should have a
> hashtable (lazily populated) for all the objects in a single
> $GIT_OBJECT_DIRECTORY, grafts/replace info, and a set of pointers to
> other object-store instances that are its alternate object stores.

So one repository has one or more object stores?

I would expect that most of the time the question from above
"give me info on the object I can refer to with this object name"
is asked with the additional information: "and I know it is in this
repository", so we rather want to have

  lookup_object(struct *repo, char *name);

instead of

  lookup_object(struct *object_store, char *name);

because the caller most likely would not care if the object
comes from the alternate or from the main object store?
(There may be special cases in e.g. repacking/gc where
we want to instruct the repacker to only repack the main
object store, ignoring any alternates or such, but any other
command would not care, AFAICT)

So we could have the convenience function

lookup_object(struct *repo, char *name)
{
foreach_objectstore(repo, lookup_object_in_single_store, object)
if (object)
return object;
return NULL;
}

but such code is related to storing objects, that I would prefer to see
it in the object store (object.{h,c}) instead of the repository code.
Also my initial plan was to have all objects (including from any alternate)
in a single hashmap per repository as that seems to be most efficient
assuming all alternates fit into memory.

This whole object store object orientation is only started to not have
the memory pressure from lots of submodule objects polluting the
main object store. When we have its own hashmap for each alternate
our performance would degrade with the number of alternates.
(Assuming the hypothetical worst case of one alternate per object,
then the lookup time would be linear in time given the above function,
I wonder if there are users that heavily use lots of alternates such that
this performance characteristics would be measurable in the code to
be written)

> You'd need to have a mechanism to avoid creating cycles in these
> pointers, of course.

This is another complication with many hashmaps (=object stores).
In the future, is it likely that we would want to drop an alternate
store from within a running process? If so we'd want to have
hashmaps per alternate, otherwise I only see disadvantages
for more than one hashmap (-> more than one object store)
per repository. And with that said, I think we can make a wrapper
struct object_store, that would encapsulate all needed variables.

+   struct object_store {
+   struct object **obj_hash;
+   int nr_objs, obj_hash_size;
+   } objects;

But instead of defining it at the repository, we'd rather define it
in object.h.

Stefan

[RFC PATCH 3/3] sha1_file: add promised blob hook support

2017-07-11 Thread Jonathan Tan

Teach sha1_file to invoke a hook whenever a blob is requested and
unavailable but is promised. The hook is a shell command that can be
configured through "git config"; this hook takes in a list of hashes and
writes (if successful) the corresponding objects to the repo's local
storage.

This is meant as a temporary measure to ensure that all Git commands
work in such a situation. Future patches will update some commands to
either tolerate promised blobs (without invoking the hook) or be more
efficient in invoking the promised blob hook.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in sha1_file that operate on packed objects (because I
 need to check callers that know about the loose/packed distinction
 and operate on both differently, and ensure that they can handle
 the concept of objects that are neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and at the packed-related
functions that take in a hash. For for_each_packed_object, the callers
either already work or are fixed in this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about promised blobs
 - builtin/cat-file - fixed in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
 - find_pack_entry_one
   - this searches a single pack that is provided as an argument; the
 caller already knows (through other means) that the sought object
 is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a
 file if it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promised blobs
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

An alternative design that I considered but rejected:

 - Adding a hook whenever a packed blob is requested, not on any blob.
   That is, whenever we attempt to search the packfiles for a blob, if
   it is missing (from the packfiles and from the loose object storage),
   to invoke the hook (which must then store it as a packfile), open the
   packfile the hook generated, and report that the blob is found in
   that new packfile. This reduces the amount of analysis needed (in
   that we only need to look at how packed blobs are handled), but
   requires that the hook generate packfiles (or for sha1_file to pack
   whatever loose objects are generated), creating one packfile for each
   missing blob and potentially very many packfiles that must be
   linearly searched. This may be tolerable now for repos that only have
   a few missing blobs (for example, repos that only want to exclude
   large blobs), and might be tolerable in the future if we have
   batching support for the most commonly used commands, but is not
   tolerable now for repos that exclude a large amount of blobs.

Signed-off-by: Jonathan Tan 
---
 Documentation/config.txt   |  8 
 Documentation/gitrepository-layout.txt |  8 
 builtin/cat-file.c |  9 
 promised-blob.c| 75 ++
 promised-blob.h| 13 ++
 sha1_file.c| 44 +---
 t/t3907-promised-blob.sh   | 36 
 7 files changed, 179 insertions(+), 14 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d5c9c4cab..c293ac921 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -393,6 +393,14 @@ The default is false, except linkgit:git-clone[1] or 
linkgit:git-init[1]
 will probe and set core.ignoreCase true if appropriate when the repository
 is created.
 
+core.promisedBlobCommand::
+   If set, whenever a blob in the local repo is attempted to be read, but
+   is both missing and a promised blob, invoke this shell command to
+   generate or obtain that blob before reporting an error. This shell
+   command should take one or more hashes, each terminated by a newline,
+   as standard input, and (if successful) should write the corresponding
+   objects to the local repo (packed or loose).
+
 core.precomposeUnicode::
This option is only used by Mac OS implementation of Git.
When core.precomposeUnicode=true, Git reverts the unicode decomposition
diff --git

[RFC PATCH 2/3] sha1-array: support appending unsigned char hash

2017-07-11 Thread Jonathan Tan

In a subsequent patch, sha1_file will need to append object names in the
form of "unsigned char *" to oid arrays. Teach sha1-array support for
that.

Signed-off-by: Jonathan Tan 
---
 sha1-array.c | 7 +++
 sha1-array.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/sha1-array.c b/sha1-array.c
index 838b3bf84..6e0e35391 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -9,6 +9,13 @@ void oid_array_append(struct oid_array *array, const struct 
object_id *oid)
array->sorted = 0;
 }
 
+void oid_array_append_sha1(struct oid_array *array, const unsigned char *sha1)
+{
+   ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
+   hashcpy(array->oid[array->nr++].hash, sha1);
+   array->sorted = 0;
+}
+
 static int void_hashcmp(const void *a, const void *b)
 {
return oidcmp(a, b);
diff --git a/sha1-array.h b/sha1-array.h
index 04b075633..3479959e4 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -11,6 +11,7 @@ struct oid_array {
 #define OID_ARRAY_INIT { NULL, 0, 0, 0 }
 
 void oid_array_append(struct oid_array *array, const struct object_id *oid);
+void oid_array_append_sha1(struct oid_array *array, const unsigned char *sha1);
 int oid_array_lookup(struct oid_array *array, const struct object_id *oid);
 void oid_array_clear(struct oid_array *array);
 
-- 
2.13.2.932.g7449e964c-goog

[RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs

2017-07-11 Thread Jonathan Tan

Currently, Git does not support repos with very large numbers of blobs
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every blob referenced by a tree object is available
somewhere in the repo storage.

As a first step to reducing this problem, introduce the concept of
promised blobs. Each Git repo can contain a list of promised blobs and
their sizes at $GIT_DIR/objects/promisedblob. This patch contains
functions to query them; functions for creating and modifying that file
will be introduced in later patches.

A repository that is missing a blob but has that blob promised is not
considered to be in error, so also teach fsck this.

Signed-off-by: Jonathan Tan 
---
 Makefile |  1 +
 builtin/fsck.c   | 13 +++
 promised-blob.c  | 95 
 promised-blob.h  | 14 +++
 t/t3907-promised-blob.sh | 29 +++
 t/test-lib-functions.sh  |  6 +++
 6 files changed, 158 insertions(+)
 create mode 100644 promised-blob.c
 create mode 100644 promised-blob.h
 create mode 100755 t/t3907-promised-blob.sh

diff --git a/Makefile b/Makefile
index 9c9c42f8f..e96163269 100644
--- a/Makefile
+++ b/Makefile
@@ -828,6 +828,7 @@ LIB_OBJS += preload-index.o
 LIB_OBJS += pretty.o
 LIB_OBJS += prio-queue.o
 LIB_OBJS += progress.o
+LIB_OBJS += promised-blob.o
 LIB_OBJS += prompt.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 99dea7adf..7454be7f1 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "streaming.h"
 #include "decorate.h"
+#include "promised-blob.h"
 
 #define REACHABLE 0x0001
 #define SEEN  0x0002
@@ -223,6 +224,9 @@ static void check_reachable_object(struct object *obj)
if (!(obj->flags & HAS_OBJ)) {
if (has_sha1_pack(obj->oid.hash))
return; /* it is in pack - forget about it */
+   if (obj->type == OBJ_BLOB &&
+   is_promised_blob(>oid, NULL))
+   return;
printf("missing %s %s\n", printable_type(obj),
describe_object(obj));
errors_found |= ERROR_REACHABLE;
@@ -642,6 +646,13 @@ static int mark_packed_for_connectivity(const struct 
object_id *oid,
return 0;
 }
 
+static int mark_promised_blob_for_connectivity(const struct object_id *oid,
+  void *data)
+{
+   mark_object_for_connectivity(oid);
+   return 0;
+}
+
 static char const * const fsck_usage[] = {
N_("git fsck [] [...]"),
NULL
@@ -701,6 +712,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
if (connectivity_only) {
for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
+   for_each_promised_blob(mark_promised_blob_for_connectivity,
+  NULL);
} else {
fsck_object_dir(get_object_directory());
 
diff --git a/promised-blob.c b/promised-blob.c
new file mode 100644
index 0..493808ed2
--- /dev/null
+++ b/promised-blob.c
@@ -0,0 +1,95 @@
+#include "cache.h"
+#include "promised-blob.h"
+#include "sha1-lookup.h"
+#include "strbuf.h"
+
+#define ENTRY_SIZE (GIT_SHA1_RAWSZ + 8)
+/*
+ * A mmap-ed byte array of size (missing_blob_nr * ENTRY_SIZE). Each
+ * ENTRY_SIZE-sized entry consists of the SHA-1 of the promised blob and its
+ * 64-bit size in network byte order. The entries are sorted in ascending SHA-1
+ * order.
+ */
+static char *promised_blobs;
+static int64_t promised_blob_nr = -1;
+
+static void prepare_promised_blobs(void)
+{
+   char *filename;
+   int fd;
+   struct stat st;
+
+   if (promised_blob_nr >= 0)
+   return;
+
+   if (getenv("GIT_IGNORE_PROMISED_BLOBS")) {
+   promised_blob_nr = 0;
+   return;
+   }
+   
+   filename = xstrfmt("%s/promisedblob", get_object_directory());
+   fd = git_open(filename);
+   if (fd < 0) {
+   if (errno == ENOENT) {
+   promised_blob_nr = 0;
+   goto cleanup;
+   }
+   perror("prepare_promised_blobs");
+   die("Could not open %s", filename);
+   }
+   if (fstat(fd, )) {
+   perror("prepare_promised_blobs");
+   die("Could not stat %s", filename);
+   }
+   if (st.st_size == 0) {
+   promised_blob_nr = 0;
+   goto cleanup;
+   }
+   if (st.st_size % ENTRY_SIZE) {
+   die("Size of %s is not a multiple of %d", filename, ENTRY_SIZE);
+   }
+
+   promised_blobs =

[RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs")

2017-07-11 Thread Jonathan Tan

These patches are part of a set of patches implementing partial clone,
as you can see here:

https://github.com/jonathantanmy/git/tree/partialclone

In that branch, clone with batch checkout works, as you can see in the
README. The code and tests are generally done, but some patches are
still missing documentation and commit messages.

These 3 patches implement the foundational concept - formerly known as
"missing blobs" in the "missing blob manifest", I decided to call them
"promised blobs". The repo knows their object names and sizes. It also
does not have the blobs themselves, but can be configured to know how to
fetch them.

An older version of these patches was sent as a single demonstration
patch in versions 1 to 3 of [1]. In there, Junio suggested that I have
only one file containing missing blob information. I have made that
suggested change in this version.

One thing remaining is to add a repository extension [2] so that older
versions of Git fail immediately instead of trying to read missing
blobs, but I thought I'd send these first in order to get some initial
feedback.

[1] https://public-inbox.org/git/cover.1497035376.git.jonathanta...@google.com/
[2] Documentation/technical/repository-version.txt

Jonathan Tan (3):
  promised-blob, fsck: introduce promised blobs
  sha1-array: support appending unsigned char hash
  sha1_file: add promised blob hook support

 Documentation/config.txt   |   8 ++
 Documentation/gitrepository-layout.txt |   8 ++
 Makefile   |   1 +
 builtin/cat-file.c |   9 ++
 builtin/fsck.c |  13 +++
 promised-blob.c| 170 +
 promised-blob.h|  27 ++
 sha1-array.c   |   7 ++
 sha1-array.h   |   1 +
 sha1_file.c|  44 ++---
 t/t3907-promised-blob.sh   |  65 +
 t/test-lib-functions.sh|   6 ++
 12 files changed, 345 insertions(+), 14 deletions(-)
 create mode 100644 promised-blob.c
 create mode 100644 promised-blob.h
 create mode 100755 t/t3907-promised-blob.sh

-- 
2.13.2.932.g7449e964c-goog

Re: git config --help not functional on bad config

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 12:13:59PM -0700, Stefan Beller wrote:

> > There are other die calls hidden deeper. For instance:
> >
> >   $ git -c core.ignorecase=foo help config
> >   fatal: bad numeric config value 'foo' for 'core.ignorecase': invalid unit
> >
> > Those ones are hard to fix without changing the call signature of
> > git_config_bool().
> 
> Good point. While looking at it, parse_help_format can also die,
> so building a safe git help config is hard:
> 
> git config --global help.format foo
> # everything is broken, how do I fix it?
> git help config # breaks, too, for the same reason as you outlined :/

Yeah, I think that should be switched to return an error, and probably
all errors from git_help_config() should issue a warning and still
return 0.

-Peff

Re: git config --help not functional on bad config

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 12:08 PM, Jeff King  wrote:
> On Tue, Jul 11, 2017 at 12:05:01PM -0700, Stefan Beller wrote:
>
>> > diff --git a/builtin/help.c b/builtin/help.c
>> > index 334a8494a..c42dfc9e9 100644
>> > --- a/builtin/help.c
>> > +++ b/builtin/help.c
>> > @@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char 
>> > *value, void *cb)
>> > if (starts_with(var, "man."))
>> > return add_man_viewer_info(var, value);
>> >
>> > -   return git_default_config(var, value, cb);
>> > +   return 0;
>>
>> Instead of ignoring any default config, we could do
>>
>> git_default_config(var, value, cb);
>> /* ignore broken config, we may be the help call for config */
>> return 0;
>>
>> I looked through git_default_config which seems to only die
>> with useful error messages for compression related settings,
>> but these we may want to convert to errors instead of dies,
>> when going this way.
>
> There are other die calls hidden deeper. For instance:
>
>   $ git -c core.ignorecase=foo help config
>   fatal: bad numeric config value 'foo' for 'core.ignorecase': invalid unit
>
> Those ones are hard to fix without changing the call signature of
> git_config_bool().

Good point. While looking at it, parse_help_format can also die,
so building a safe git help config is hard:

git config --global help.format foo
# everything is broken, how do I fix it?
git help config # breaks, too, for the same reason as you outlined :/

Re: git config --help not functional on bad config

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 12:05:01PM -0700, Stefan Beller wrote:

> > diff --git a/builtin/help.c b/builtin/help.c
> > index 334a8494a..c42dfc9e9 100644
> > --- a/builtin/help.c
> > +++ b/builtin/help.c
> > @@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char 
> > *value, void *cb)
> > if (starts_with(var, "man."))
> > return add_man_viewer_info(var, value);
> >
> > -   return git_default_config(var, value, cb);
> > +   return 0;
> 
> Instead of ignoring any default config, we could do
> 
> git_default_config(var, value, cb);
> /* ignore broken config, we may be the help call for config */
> return 0;
> 
> I looked through git_default_config which seems to only die
> with useful error messages for compression related settings,
> but these we may want to convert to errors instead of dies,
> when going this way.

There are other die calls hidden deeper. For instance:

  $ git -c core.ignorecase=foo help config
  fatal: bad numeric config value 'foo' for 'core.ignorecase': invalid unit

Those ones are hard to fix without changing the call signature of
git_config_bool().

-Peff

Re: git config --help not functional on bad config

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 10:53 AM, Jeff King  wrote:
> On Tue, Jul 11, 2017 at 03:49:21PM +0100, Peter Krefting wrote:
>
>> That's fine. However, when trying to look for help, it is not that useful:
>>
>>   $ git config --help
>>   error: malformed value for branch.autosetuprebase
>>   fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' 
>> at line 24
>>
>> Perhaps it should allow "--help" to go through even if the configuration is
>> bad?
>
> Yes, I agree the current behavior is poor. What's happening under the
> hood is that "--help" for any command runs "git help config", which in
> turn looks at the config to pick up things like help.format.
>
> But it also loads git_default_config(), which I suspect isn't actually
> useful. It goes all the way back to 70087cdbd (git-help: add
> "help.format" config variable., 2007-12-15), and it looks like it was
> probably added just to match other config callbacks.
>
> So I think we could probably just do this:
>
> diff --git a/builtin/help.c b/builtin/help.c
> index 334a8494a..c42dfc9e9 100644
> --- a/builtin/help.c
> +++ b/builtin/help.c
> @@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char 
> *value, void *cb)
> if (starts_with(var, "man."))
> return add_man_viewer_info(var, value);
>
> -   return git_default_config(var, value, cb);
> +   return 0;

Instead of ignoring any default config, we could do

git_default_config(var, value, cb);
/* ignore broken config, we may be the help call for config */
return 0;

I looked through git_default_config which seems to only die
with useful error messages for compression related settings,
but these we may want to convert to errors instead of dies,
when going this way.

Thanks,
Stefan

Re: [RFC/WIP PATCH] object store classification

2017-07-11 Thread Junio C Hamano

Brandon Williams  writes:

>>   the_repository -> the_object_store
>>   but the object store is a complex beast having different hash tables
>>   for the different alternates.
>
> After looking at the patch and some of the comments here I think that
> this is probably the best approach with a few tweaks (which may be
> completely unfounded because I'm not familiar with all the object store
> code).
>
> In an OO world I would envision a single object (let's say 'struct
> object_store') which is responsible for managing a repository's objects
> no matter where the individual objects came from (main object store or
> an alternate for that repository).  And if I understand correctly the
> single hash table that exists now caches objects like this.

I would say that conceptually an object-store object has an
interface to answer "give me info on the object I can refer to with
this object name".  At the implementation level, it should have a
hashtable (lazily populated) for all the objects in a single
$GIT_OBJECT_DIRECTORY, grafts/replace info, and a set of pointers to
other object-store instances that are its alternate object stores.
You'd need to have a mechanism to avoid creating cycles in these
pointers, of course.

Re: "groups of files" in Git?

2017-07-11 Thread Nikolay Shustov

Thank you for #3.
As for 1+2, the documentation says:

"Each line in gitattributes file is of form:

pattern attr1 attr2 ...

...
When the pattern matches the path in question, the attributes listed
on the line are given to the path."

My understanding is that to have the bunch of the files in the
separate directories having the same attribute, I would have to, for
each file, create a separate gitattributes line with the exact
paths/filename and needed attribute. Is it would you are suggesting or
I misunderstood the idea?


On Tue, Jul 11, 2017 at 2:19 PM, Stefan Beller  wrote:
> On Tue, Jul 11, 2017 at 11:10 AM, Nikolay Shustov
>  wrote:
>> Thank you for the explanation. The example about backend and frontend
>> is relevant even though I normally have to deal with more layers at
>> the same time.
>>
>> However, in my case I have the thing that you have already tried to
>> address, partially: the changes always align with file boundaries BUT
>> not with directory boundaries. Imagine you have the stack of backend,
>> data transport and frontend layers. The feature has to touch all three
>> layers thus resulting in the changes in the apparently different
>> directories. Thus, making the distinction by the pathspec (if I
>> understood it right from reading the documentation) would not help.
>>
>> The attributes could be a solution, if I could:
>> 1. create attribute designated to the feature
>> 2. "mark" uncommitted files in different directory with that attribute
>
> 1+2 should be answered by the gitattributes man page
> https://git-scm.com/docs/gitattributes
>
>
>> 3. filter the list of unchanged files with such attribute
>
> This sounds like one of
>   "git status :(attr:backend) ."
>   "git status :(exclude,attr:backend) ."
>
>> 4. create commit for the files only with the certain attribute
>>
>> You've kindly demonstrated that #4 is doable; however I could not
>> clearly get for the Git documentation if #1 - #3 are achievable...
>> Could you point me to the right place in the documentation, please?
>>

Re: [PATCH 1/7] api-builtin.txt: document SUPPORT_SUPER_PREFIX

2017-07-11 Thread Brandon Williams

On 07/11, Martin Ågren wrote:
> On 11 July 2017 at 00:50, Brandon Williams  wrote:
> > On 07/10, Martin Ågren wrote:
> >> Commit 74866d75 ("git: make super-prefix option", 2016-10-07) introduced
> >> SUPPORT_SUPER_PREFIX as a builtin flag without documenting it in
> >> api-builtin.txt. The next patch will add another flag, so document
> >> SUPPORT_SUPER_PREFIX, thereby bringing the documentation up to date with
> >> the available flags.
> >>
> >> Signed-off-by: Martin Ågren 
> >> ---
> >>  Documentation/technical/api-builtin.txt | 4 
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/Documentation/technical/api-builtin.txt 
> >> b/Documentation/technical/api-builtin.txt
> >> index 22a39b929..60f442822 100644
> >> --- a/Documentation/technical/api-builtin.txt
> >> +++ b/Documentation/technical/api-builtin.txt
> >> @@ -42,6 +42,10 @@ where options is the bitwise-or of:
> >>   on bare repositories.
> >>   This only makes sense when `RUN_SETUP` is also set.
> >>
> >> +`SUPPORT_SUPER_PREFIX`::
> >> +
> >> + The builtin supports --super-prefix.
> >> +
> >>  . Add `builtin/foo.o` to `BUILTIN_OBJS` in `Makefile`.
> >>
> >>  Additionally, if `foo` is a new command, there are 3 more things to do:
> >
> > I added SUPER_PREFIX as an implementation detail when trying to recurse
> > submodules using multiple processes.  Now that we have a 'struct
> > repository' my plan is to remove SUPER_PREFIX in its entirety.  Since
> > this won't happen overnight it may still take a bit till its removed so
> > maybe it makes sense to better document it until that happens?
> 
> I could add something about how this is "temporary" although I have no
> idea about the timeframe. ;-)

Its probably better to ensure that things are documented (even if they
are temporary) but you don't need to mark it as such.  And as far as
timeframe, that's my burden to bare ;)

> 
> > Either way I think that this sort of Documention better lives in the
> > code as it is easier to keep up to date.
> 
> Agreed and I believe that's the long-term goal. I could replace this
> preparatory patch with another one where I move api-builtin.txt into
> builtin.h or git.c (and document SUPPORT_SUPER_PREFIX if that's
> wanted). That's probably better, since right now, patch 2/7 adds
> basically the same documentation for the new flag in two places.

I know that sort of migration may be out of the scope of your series
(since its just documentation), though you're more than welcome to do
the work as it would be much appreciated by me and a few other people :)

-- 
Brandon Williams

Re: "groups of files" in Git?

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 11:10 AM, Nikolay Shustov
 wrote:
> Thank you for the explanation. The example about backend and frontend
> is relevant even though I normally have to deal with more layers at
> the same time.
>
> However, in my case I have the thing that you have already tried to
> address, partially: the changes always align with file boundaries BUT
> not with directory boundaries. Imagine you have the stack of backend,
> data transport and frontend layers. The feature has to touch all three
> layers thus resulting in the changes in the apparently different
> directories. Thus, making the distinction by the pathspec (if I
> understood it right from reading the documentation) would not help.
>
> The attributes could be a solution, if I could:
> 1. create attribute designated to the feature
> 2. "mark" uncommitted files in different directory with that attribute

1+2 should be answered by the gitattributes man page
https://git-scm.com/docs/gitattributes


> 3. filter the list of unchanged files with such attribute

This sounds like one of
  "git status :(attr:backend) ."
  "git status :(exclude,attr:backend) ."

> 4. create commit for the files only with the certain attribute
>
> You've kindly demonstrated that #4 is doable; however I could not
> clearly get for the Git documentation if #1 - #3 are achievable...
> Could you point me to the right place in the documentation, please?
>

Re: "groups of files" in Git?

2017-07-11 Thread Nikolay Shustov

Thank you for the explanation. The example about backend and frontend
is relevant even though I normally have to deal with more layers at
the same time.

However, in my case I have the thing that you have already tried to
address, partially: the changes always align with file boundaries BUT
not with directory boundaries. Imagine you have the stack of backend,
data transport and frontend layers. The feature has to touch all three
layers thus resulting in the changes in the apparently different
directories. Thus, making the distinction by the pathspec (if I
understood it right from reading the documentation) would not help.

The attributes could be a solution, if I could:
1. create attribute designated to the feature
2. "mark" uncommitted files in different directory with that attribute
3. filter the list of unchanged files with such attribute
4. create commit for the files only with the certain attribute

You've kindly demonstrated that #4 is doable; however I could not
clearly get for the Git documentation if #1 - #3 are achievable...
Could you point me to the right place in the documentation, please?


On Tue, Jul 11, 2017 at 1:27 PM, Junio C Hamano  wrote:
> Nikolay Shustov  writes:
>
>> I have to work on several features in the same code tree parallel, in
>> the same Perforce workspace. The major reason why I cannot work on one
>> feature then on another is just because I have to make sure that the
>> changes in the related areas of the product play together well.
>>
>> With Perforce, I can have multiple changelists opened, that group the
>> changed files as needed.
>>
>> With Git I cannot seem to finding the possibility to figure out how to
>> achieve the same result. And the problem is that putting change sets
>> on different Git branches (or workdirs, or whatever Git offers that
>> makes the changes to be NOT in the same source tree) is not a viable
>> option from me ...
>
> Naturally.  If these separate changes need to work together, it is
> way too inconvenient if these changes do not appear in a single
> unified working tree to be built and tested.
>
>> Is it worth considering adding to Git a feature like "group of files"
>> that would offer some virtutal grouping of the locally changed files
>> in the checked-out branch?
>
> Let's step back and let me make sure if I understand you correctly.
> You want to work on a system with two distinct areas (say, the
> frontend and the backend), that have to work together, but you want
> to make two commits, one for each area.  You make changes for both
> areas in your working tree, build and test them to make sure the
> whole thing works well together, and at the end, you make two
> commits.
>
> In your real project, you may be doing more than two areas and more
> than two commits, but is the above a good degenerative case that
> shows the basic idea?  If not, then please disregard all of the
> following.
>
> You can make partial commits in Git.  In the simplest case, you may
> have two separate files backend.py and frontend.py, you make edits
> to both files and then make two commits:
>
> $ git commit backend.py
> $ git commit frontend.py
>
> Changes to some files may contain both changes for the backend and
> for the frontend that does not allow you to separate commits at file
> boundary, and Git even lets you handle such a case.  If you have the
> third file in addition to the above two, called common.py, you could
> instead
>
> $ git add backend.py
> $ git add -p common.py
>
> to prepare the index to contain only the changes for the backend
> part ("add -p" lets you interactively pick and choose the hunks
> relevant to the backend part), and conclude the commit for the
> backend part with
>
> $ git commit ;# no paths arguments
>
> and then when all the remaining changes are for the frontend part,
> you can follow it with
>
> $ git commit -a
>
> to make another commit for the frontend part.
>
> A short answer to your question, provided if I understood you
> correctly, is "no, there is no way to say 'backend.py, backend-2.py,
> ...' are the backend things and call it a filegroup", accompanied by
> "a filegroup would only be effective when changes align with file
> boundary".
>
> And if your response is "but most of the time changes align with
> file boundary", a counter-response is "and most of the time changes
> align with directory boundary (in well structured project, at
> least), so you can do 'git commit backend/' for all the backend part
> without having to name all the paths anyway".
>
> There is an experimental "attributes-limited pathspec" feature in
> recent versions of Git, which lets you assign arbitrary sets of
> labels to each paths, and using that you may be able to do
>
> $ git commit ':(attr:filegroup=backend).'
>
> and I suspect that would be the closest thing you would want (read
> about 'pathspec' in 'git help glossary')
>

Re: [PATCH 4/4] hook: add a simple first example

2017-07-11 Thread Kaartic Sivaraam

On Tue, 2017-07-11 at 09:03 -0700, Junio C Hamano wrote:
> But does it even be useful to enable it?
Just for the note, I thought it was considered useful (at least to
someone) due to it's presence in the sample script.

Re: [PATCH 4/4] hook: add a simple first example

2017-07-11 Thread Kaartic Sivaraam

On Tue, 2017-07-11 at 09:03 -0700, Junio C Hamano wrote:
> But does it even be useful to enable it?  The commented out "git
> status" output already lists the modified paths, so I do not think
> anything is gained by adding 'diff --cached --name-status' there.
The script *doesn't* add the output of "diff --cached --name-status"
near the status output. It adds it above the first comment and of
course in an uncommented form.

So, should we remove it (diff --cached --name-status) altogether?

Re: [RFC/WIP PATCH] object store classification

2017-07-11 Thread Brandon Williams

On 07/10, Stefan Beller wrote:
> On Fri, Jul 7, 2017 at 9:50 AM, Junio C Hamano  wrote:
> > Ben Peart  writes:
> >
> >> For more API/state design purity, I wonder if there should be an
> >> object_store structure that is passed to each of the object store APIs
> >> instead of passing the repository object. The repository object could
> >> then contain an instance of the object_store structure.
> >>
> >> That said, I haven't take a close look at all the code in object.c to
> >> see if all the data needed can be cleanly abstracted into an
> >> object_store structure.
> >
> > My gut feeling was it is just the large hashtable that keeps track of
> > objects we have seen, but the object replacement/grafts and other
> > things may also want to become per-repository.
> 
> This is similar to the_index which is referenced by the_repository.
> But as we do not have anything like the_object_store already, we are
> free to design it, as the required work that needs to be put in is the
> same.
> 
> With the object replacements/grafts coming up as well as alternates,
> we definitely want that to be per repository, the question is if we rather
> want
> 
>   the_repository -> many object_stores (one for each, alternate, grafts,
>   and the usual place at $GIT_DIR/objects
>   where the object_store is a hashmap, maybe an additional describing
>   string or path.
> 
> or
> 
>   the_repository -> the_object_store
>   but the object store is a complex beast having different hash tables
>   for the different alternates.

After looking at the patch and some of the comments here I think that
this is probably the best approach with a few tweaks (which may be
completely unfounded because I'm not familiar with all the object store
code).

In an OO world I would envision a single object (let's say 'struct
object_store') which is responsible for managing a repository's objects
no matter where the individual objects came from (main object store or
an alternate for that repository).  And if I understand correctly the
single hash table that exists now caches objects like this.

I also think that such a 'struct object_store' should probably be an
opaque type to a majority of the code base.  This means that it probably
shouldn't have its definition in 'repository.h'.

As far as API, I think it should be similar to the new repo_config (old
one too, though it was implicit) API in that the code base doesn't need
to know about 'struct configset', it just passes a pointer to the
repository and then the 'struct configset' which is stored in the
repository is operated on under the hood.  This way the code base would
just query for an object using the repository as a handle like:

  get_object(repo, OID);

  and not:

  get_object(repo->object_store, OID);

Of course under the hood it would be preferable to have the functions
operate on the object_store struct explicitly.

> 
> or
> 
>   the_repository -> the_object_store_hash_map
>   which is this patch that would try to put any object related to this
>   repository into the same hashmap and the hashmap is not special
>   for each of the different object locations.
> 
> 
> >
> >> One concern I have is that the global state refactoring effort will
> >> just result in all the global state getting moved into a single
> >> (global) repository object thus limiting it's usefulness.

I think we do need to think about this, but it shouldn't be too much of
a concern right now.  The first step is to get enough of the object
store object oriented such that you can have two object stores
corresponding to two different repositories working in parallel.

> >
> > I actually am not worried about it that much, and I say this with
> > the background of having done the same "grouping a set of global
> > state variables into a single structure and turning them into a
> > single default instance" for the_index.  Whether you like it or not,
> > the majority of operations do work on the default instance---that
> > was why the operations could live with just "a set of global state
> > variables" in the first place, and the convenience compatibility
> > macros that allow you to operate on the fields of the default
> > instance as if they were separate variables have been a huge
> > typesaver that also reduces the cognitive burden.  I'd expect that
> > the same will hold for the new "repository" and the "object_store"
> > abstractions.
> 
> Sounds reasonable to expect.
> 
> Thanks,
> Stefan

-- 
Brandon Williams

Re: "groups of files" in Git?

2017-07-11 Thread Nikolay Shustov

Thank you for the idea, however I am having troubles with basically
maintaining the uncommitted groups of files: I would prefer the clear
distinction that "those files belong to feature A" and "these files
belong to feature B", before I commit anything. Committing separately
every change for feature A and for feature B would probably a good
option unless I have many changes and then cherry-picking the proper
commits to create a single changeset for the integration would become
a nightmare.

On Tue, Jul 11, 2017 at 1:39 PM, Lars Schneider
 wrote:
>
>> On 11 Jul 2017, at 17:45, Nikolay Shustov  wrote:
>>
>> Hi,
>> I have been recently struggling with migrating my development workflow
>> from Perforce to Git, all because of the following thing:
>>
>> I have to work on several features in the same code tree parallel, in
>> the same Perforce workspace. The major reason why I cannot work on one
>> feature then on another is just because I have to make sure that the
>> changes in the related areas of the product play together well.
>>
>> With Perforce, I can have multiple changelists opened, that group the
>> changed files as needed.
>>
>> With Git I cannot seem to finding the possibility to figure out how to
>> achieve the same result. And the problem is that putting change sets
>> on different Git branches (or workdirs, or whatever Git offers that
>> makes the changes to be NOT in the same source tree) is not a viable
>> option from me as I would have to re-build code as I re-integrate the
>> changes between the branches (or whatever changes separation Git
>> feature is used).
>> Build takes time and resources and considering that I have to do it on
>> multiple platforms (I do cross-platform development) it really
>> denominates the option of not having multiple changes in the same code
>> tree.
>>
>> Am I ignorant about some Git feature/way of using Git that would help?
>> Is it worth considering adding to Git a feature like "group of files"
>> that would offer some virtutal grouping of the locally changed files
>> in the checked-out branch?
>
> Interesting question that came up at my workplace, too.
>
> Here is what I suggested:
> 1. Keep working on a single branch and make commits for all features
> 2. If you make a commit, prefix the commit message with the feature name
> 3. After you are done with a feature create a new feature branch based on
>your combined feature branch. Use `git rebase -i` [1] to remove all
>commits that are not relevant for the feature. Alternatively you could
>cherry pick the relevant commits [2] if this is faster.
>
> I wonder what others think about this solution. Maybe there is a better
> solution that I overlooked?
>
> - Lars
>
> [1] 
> https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history
> [2] 
> http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html
>

Re: git config --help not functional on bad config

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 03:49:21PM +0100, Peter Krefting wrote:

> That's fine. However, when trying to look for help, it is not that useful:
> 
>   $ git config --help
>   error: malformed value for branch.autosetuprebase
>   fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' 
> at line 24
> 
> Perhaps it should allow "--help" to go through even if the configuration is
> bad?

Yes, I agree the current behavior is poor. What's happening under the
hood is that "--help" for any command runs "git help config", which in
turn looks at the config to pick up things like help.format.

But it also loads git_default_config(), which I suspect isn't actually
useful. It goes all the way back to 70087cdbd (git-help: add
"help.format" config variable., 2007-12-15), and it looks like it was
probably added just to match other config callbacks.

So I think we could probably just do this:

diff --git a/builtin/help.c b/builtin/help.c
index 334a8494a..c42dfc9e9 100644
--- a/builtin/help.c
+++ b/builtin/help.c
@@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char 
*value, void *cb)
if (starts_with(var, "man."))
return add_man_viewer_info(var, value);

-   return git_default_config(var, value, cb);
+   return 0;
 }

 static struct cmdnames main_cmds, other_cmds;

which makes your case work.

-Peff

Re: "groups of files" in Git?

2017-07-11 Thread Nikolay Shustov

> The way of Git is that a commit (snapshot) by definition describes a
> set of files (The set of all files in the project). So If you need two 
> features
> there at the same time, you probably want it in the same commit.

Thank you, but if I wanted these two features to be in the same
commit, I would have no reasons to see them as two distinctive groups.
I mean, groups of uncommitted files.

The general problem of not having multiple features in the same code
tree is the cost of doing multiple builds and integration testing
runs.
Now I imagine there could be workaround of having two features
developed at different branches and then merging them into 3rd branch
for building/testing; however this introduces overhead of maintaining
at lest two code trees: one for "dirty changes" where I do the code
changes that are not guaranteed to be even build-able and another
"build/test" code tree. Plus merging the changes from one to another.
A bit too much, IMHO.

On Tue, Jul 11, 2017 at 1:18 PM, Stefan Beller  wrote:
> On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustov
>  wrote:
>> Hi,
>> I have been recently struggling with migrating my development workflow
>> from Perforce to Git, all because of the following thing:
>>
>> I have to work on several features in the same code tree parallel, in
>> the same Perforce workspace. The major reason why I cannot work on one
>> feature then on another is just because I have to make sure that the
>> changes in the related areas of the product play together well.
>
> So in that case the features are not independent, but related to each other?
> In that case you want to have these things in the same working tree as
> well as in the same branch.
>
> Take a look at git.git itself, for example:
>
> git clone git://github.com/git/git
> git log --oneline --graph
>
> You will see a lot of "Merge X into master/maint" commits, but then
> you may want to dive into each feature by:
>
> git log --oneline e83e71c5e1
>
> for example and then you'll see lots of commits (that were developed
> in the same branch), but that are closely related. However they are
> different enough to be in different commits. (different features, as
> I understand)
>
>> With Perforce, I can have multiple changelists opened, that group the
>> changed files as needed.
>>
>> With Git I cannot seem to finding the possibility to figure out how to
>> achieve the same result. And the problem is that putting change sets
>> on different Git branches (or workdirs, or whatever Git offers that
>> makes the changes to be NOT in the same source tree) is not a viable
>> option from me as I would have to re-build code as I re-integrate the
>> changes between the branches (or whatever changes separation Git
>> feature is used).
>
> you would merge the branches and then run the tests/integration. Yes that
> seems cumbersome.
>
>> Build takes time and resources and considering that I have to do it on
>> multiple platforms (I do cross-platform development) it really
>> denominates the option of not having multiple changes in the same code
>> tree.
>>
>> Am I ignorant about some Git feature/way of using Git that would help?
>> Is it worth considering adding to Git a feature like "group of files"
>> that would offer some virtutal grouping of the locally changed files
>> in the checked-out branch?
>
> The way of Git is that a commit (snapshot) by definition describes a
> set of files (The set of all files in the project). So If you need two 
> features
> there at the same time, you probably want it in the same commit.
>
> If they are different enough such that you could have them independently,
> but really want to test them together, your testing may need to become
> more elaborate (test a merge of all feature branches) I would think.
>
>>
>> Thanks in advance,
>> - Nikolay

Re: "groups of files" in Git?

2017-07-11 Thread Lars Schneider


> On 11 Jul 2017, at 17:45, Nikolay Shustov  wrote:
> 
> Hi,
> I have been recently struggling with migrating my development workflow
> from Perforce to Git, all because of the following thing:
> 
> I have to work on several features in the same code tree parallel, in
> the same Perforce workspace. The major reason why I cannot work on one
> feature then on another is just because I have to make sure that the
> changes in the related areas of the product play together well.
> 
> With Perforce, I can have multiple changelists opened, that group the
> changed files as needed.
> 
> With Git I cannot seem to finding the possibility to figure out how to
> achieve the same result. And the problem is that putting change sets
> on different Git branches (or workdirs, or whatever Git offers that
> makes the changes to be NOT in the same source tree) is not a viable
> option from me as I would have to re-build code as I re-integrate the
> changes between the branches (or whatever changes separation Git
> feature is used).
> Build takes time and resources and considering that I have to do it on
> multiple platforms (I do cross-platform development) it really
> denominates the option of not having multiple changes in the same code
> tree.
> 
> Am I ignorant about some Git feature/way of using Git that would help?
> Is it worth considering adding to Git a feature like "group of files"
> that would offer some virtutal grouping of the locally changed files
> in the checked-out branch?

Interesting question that came up at my workplace, too.

Here is what I suggested:
1. Keep working on a single branch and make commits for all features
2. If you make a commit, prefix the commit message with the feature name
3. After you are done with a feature create a new feature branch based on
   your combined feature branch. Use `git rebase -i` [1] to remove all
   commits that are not relevant for the feature. Alternatively you could
   cherry pick the relevant commits [2] if this is faster.

I wonder what others think about this solution. Maybe there is a better
solution that I overlooked?

- Lars

[1] 
https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history
[2] 
http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html

Re: "groups of files" in Git?

2017-07-11 Thread Junio C Hamano

Nikolay Shustov  writes:

> I have to work on several features in the same code tree parallel, in
> the same Perforce workspace. The major reason why I cannot work on one
> feature then on another is just because I have to make sure that the
> changes in the related areas of the product play together well.
>
> With Perforce, I can have multiple changelists opened, that group the
> changed files as needed.
>
> With Git I cannot seem to finding the possibility to figure out how to
> achieve the same result. And the problem is that putting change sets
> on different Git branches (or workdirs, or whatever Git offers that
> makes the changes to be NOT in the same source tree) is not a viable
> option from me ...

Naturally.  If these separate changes need to work together, it is
way too inconvenient if these changes do not appear in a single
unified working tree to be built and tested.

> Is it worth considering adding to Git a feature like "group of files"
> that would offer some virtutal grouping of the locally changed files
> in the checked-out branch?

Let's step back and let me make sure if I understand you correctly.
You want to work on a system with two distinct areas (say, the
frontend and the backend), that have to work together, but you want
to make two commits, one for each area.  You make changes for both
areas in your working tree, build and test them to make sure the
whole thing works well together, and at the end, you make two
commits.  

In your real project, you may be doing more than two areas and more
than two commits, but is the above a good degenerative case that
shows the basic idea?  If not, then please disregard all of the
following.

You can make partial commits in Git.  In the simplest case, you may
have two separate files backend.py and frontend.py, you make edits
to both files and then make two commits:

$ git commit backend.py
$ git commit frontend.py

Changes to some files may contain both changes for the backend and
for the frontend that does not allow you to separate commits at file
boundary, and Git even lets you handle such a case.  If you have the
third file in addition to the above two, called common.py, you could
instead

$ git add backend.py
$ git add -p common.py

to prepare the index to contain only the changes for the backend
part ("add -p" lets you interactively pick and choose the hunks
relevant to the backend part), and conclude the commit for the
backend part with

$ git commit ;# no paths arguments

and then when all the remaining changes are for the frontend part,
you can follow it with

$ git commit -a

to make another commit for the frontend part.

A short answer to your question, provided if I understood you
correctly, is "no, there is no way to say 'backend.py, backend-2.py,
...' are the backend things and call it a filegroup", accompanied by
"a filegroup would only be effective when changes align with file
boundary".

And if your response is "but most of the time changes align with
file boundary", a counter-response is "and most of the time changes
align with directory boundary (in well structured project, at
least), so you can do 'git commit backend/' for all the backend part
without having to name all the paths anyway".

There is an experimental "attributes-limited pathspec" feature in
recent versions of Git, which lets you assign arbitrary sets of
labels to each paths, and using that you may be able to do

$ git commit ':(attr:filegroup=backend).'

and I suspect that would be the closest thing you would want (read
about 'pathspec' in 'git help glossary')

RE: "groups of files" in Git?

2017-07-11 Thread Randall S. Becker

-Original Message-
On July 11, 2017 11:45 AM Nikolay Shustov wrote:
>I have been recently struggling with migrating my development workflow from 
>Perforce to Git, all because of the following thing:
>I have to work on several features in the same code tree parallel, in the same 
>Perforce workspace. The major reason why I cannot
>work on one feature then on another is just because I have to make sure that 
>the changes in the related areas of the product play together well.
>With Perforce, I can have multiple changelists opened, that group the changed 
>files as needed.

>With Git I cannot seem to finding the possibility to figure out how to achieve 
>the same result. And the problem is
>that putting change sets on different Git branches (or workdirs, or whatever 
>Git offers that makes the changes to
>be NOT in the same source tree) is not a viable option from me as I would have 
>to re-build code as I re-integrate
>the changes between the branches (or whatever changes separation Git feature 
>is used).
>Build takes time and resources and considering that I have to do it on 
>multiple platforms (I do cross-platform development) it really denominates the 
>option of not having multiple changes in the same code tree.

Change sets are core git functionality. When you commit, you commit a group of 
changes across multiple files, not single file at a time, like most legacy SCM 
systems. Individual features are managed typically managed using topic branches 
that can be switched (using checkout) rapidly, which in your case will cause a 
localized rebuild based on what files were swapped.

If you need something slightly different than topic branches, do multiple 
clones off a base integration branch. This will give you multiple working 
source trees. Switch each clone to its own branch and work with them locally. 
If you need to move changes from one branch to another, commit, push on one 
branch, and pull merge to the other branch.

You could also use stash to accomplish similar things, but I wouldn't.

Cheers,
Randall

Re: "groups of files" in Git?

2017-07-11 Thread Stefan Beller

On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustov
 wrote:
> Hi,
> I have been recently struggling with migrating my development workflow
> from Perforce to Git, all because of the following thing:
>
> I have to work on several features in the same code tree parallel, in
> the same Perforce workspace. The major reason why I cannot work on one
> feature then on another is just because I have to make sure that the
> changes in the related areas of the product play together well.

So in that case the features are not independent, but related to each other?
In that case you want to have these things in the same working tree as
well as in the same branch.

Take a look at git.git itself, for example:

git clone git://github.com/git/git
git log --oneline --graph

You will see a lot of "Merge X into master/maint" commits, but then
you may want to dive into each feature by:

git log --oneline e83e71c5e1

for example and then you'll see lots of commits (that were developed
in the same branch), but that are closely related. However they are
different enough to be in different commits. (different features, as
I understand)

> With Perforce, I can have multiple changelists opened, that group the
> changed files as needed.
>
> With Git I cannot seem to finding the possibility to figure out how to
> achieve the same result. And the problem is that putting change sets
> on different Git branches (or workdirs, or whatever Git offers that
> makes the changes to be NOT in the same source tree) is not a viable
> option from me as I would have to re-build code as I re-integrate the
> changes between the branches (or whatever changes separation Git
> feature is used).

you would merge the branches and then run the tests/integration. Yes that
seems cumbersome.

> Build takes time and resources and considering that I have to do it on
> multiple platforms (I do cross-platform development) it really
> denominates the option of not having multiple changes in the same code
> tree.
>
> Am I ignorant about some Git feature/way of using Git that would help?
> Is it worth considering adding to Git a feature like "group of files"
> that would offer some virtutal grouping of the locally changed files
> in the checked-out branch?

The way of Git is that a commit (snapshot) by definition describes a
set of files (The set of all files in the project). So If you need two features
there at the same time, you probably want it in the same commit.

If they are different enough such that you could have them independently,
but really want to test them together, your testing may need to become
more elaborate (test a merge of all feature branches) I would think.

>
> Thanks in advance,
> - Nikolay

Re: [PATCH 0/7] tag: more fine-grained pager-configuration

2017-07-11 Thread Junio C Hamano

Jeff King  writes:

>> I see you CC'ed Peff who's passionate in this area, so these patches
>> are in good hands already ;-) I briefly skimmed your patches myself,
>> and did not spot anything glaringly wrong.
>
> Heh, I don't think of "paging tag output" as one of my passions, but you
> may be right. :)

Perhaps not "tag output", but the paging is the area you are the
person who spent most time on.

> I left a few comments on the code and direction, but I agree that
> overall it looks pretty good. And I'm very impressed with the attention
> to detail for a first-time contributor.

Yes.

Re: Weirdness with git change detection

2017-07-11 Thread Junio C Hamano

Junio C Hamano  writes:

> Peter Eckersley  writes:
>
>> I have a local git repo that's in some weird state where changes
>> appear to be detected by "git diff" and prevent operations like "git
>> checkout" from switching branches, but those changes are not removed
>> by a "git reset --hard" or "git stash".
>>
>> Here's an example of the behaviour, with "git reset --hard" failing to
>> clear a diff in the index:
>>
>> https://paste.debian.net/975811/
>>
>> Happy to collect additional debugging information if it's useful.
>
> Do you have any funny clean-smudge pair that do not round-trip?

Ah, nevermind.  Peff's analysis looks correct.  Thanks for a report
to provoke a good discussion.

Re: [PATCH 4/4] hook: add a simple first example

2017-07-11 Thread Junio C Hamano

Kaartic Sivaraam  writes:

> I'm was trying to ask, "Is there any way to change the second example
> (diff --name-status) to make it work with "commit --amend" so that it
> could be uncommented by default ?"

But does it even be useful to enable it?  The commented out "git
status" output already lists the modified paths, so I do not think
anything is gained by adding 'diff --cached --name-status' there.

Re: Weirdness with git change detection

2017-07-11 Thread Junio C Hamano

Peter Eckersley  writes:

> I have a local git repo that's in some weird state where changes
> appear to be detected by "git diff" and prevent operations like "git
> checkout" from switching branches, but those changes are not removed
> by a "git reset --hard" or "git stash".
>
> Here's an example of the behaviour, with "git reset --hard" failing to
> clear a diff in the index:
>
> https://paste.debian.net/975811/
>
> Happy to collect additional debugging information if it's useful.

Do you have any funny clean-smudge pair that do not round-trip?

Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-11 Thread Junio C Hamano

Lars Schneider  writes:

> Thanks for the explanation! I looked at the Git release calendar [1] and
> realized that 2.14-rc0 is scheduled for this Thursday. My assumption was
> that either on this date 2.14 will be cut from the tip of master or master
> would not change significantly after the rc0 date until the 2.14 release.

Your assumption is still correct.  Undercooked topics may be reclassified
to "Will cook in 'next'" any time.

Thanks.

> [1] http://tinyurl.com/gitCal

Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-11 Thread Junio C Hamano

Jeff King  writes:

> On Tue, Jul 11, 2017 at 10:28:08AM +0200, Lars Schneider wrote:
>
>> > On 11 Jul 2017, at 00:48, Junio C Hamano  wrote:
>> > 
>> > * ls/filter-process-delayed (2017-06-30) 7 commits
>> >  (merged to 'next' on 2017-07-05 at a35e644082)
>> > + convert: add "status=delayed" to filter process protocol
>> > + convert: refactor capabilities negotiation
>> > + convert: move multiple file filter error handling to separate function
>> > + convert: put the flags field before the flag itself for consistent style
>> > + t0021: write "OUT " only on success
>> > + t0021: make debug log file name configurable
>> > + t0021: keep filter log files on comparison
>> > 
>> > The filter-process interface learned to allow a process with long
>> > latency give a "delayed" response.
>> > 
>> > Will merge to 'master'.
>> 
>> Hi Junio,
>> 
>> can you already tell if this topic has still a chance to be part of 2.14?
>
> I'm not Junio, but we should be able to reason it out. :)

I am ;-).

> It's marked as "will merge to master", which typically means it will
> happen during the next integration cycle (i.e., within a few days when
> the next "What's cooking" is written).

Just like being in 'next' is not a promise for a topic to be in the
upcoming release (or any future one for that matter), "Will merge to
X" merely means "With the current shape of the series and with the
best of our current knowledge, this is thought to be mergeable to X
at some point in the future".  When a more urgent topic that
conflicts heavily with it appears, when a serious bug is found in
the topic, etc., "our current best knowledge" may be invalidated.

Anticipating such an event that changes our assumption happening, I
try to keep a topic in 'next' for at least a week, if it is a
non-trivial topic that changes more than a few dozen lines (not
counting tests and docs).  For things that touch deeper guts of the
system, I'd prefer to cook it longer.  For more trivial things, it
may only be a day or two.  So "at some point in the future" varies
depending on the weight of a topic.

> Since 2.14 will be cut from the tip of master in a few weeks, once
> it's merged it will definitely be in 2.14 (barring a revert, of
> course).  This holds true during release freeze, too. Anything
> that makes it to master is part of the release.

This is correct.

> The stopping point there is that things stop getting marked as
> "will merge to master".

This is correct, if you allow the possibility that a thing that used
to be marked as "Will merge to 'master'" gets demoted to "Will cook
in 'next'" (and you need to anticipate that possibility---I try to
demote topics that got in to 'next' just before -rc0 to to "Will
cook" when tagging -rc0, unless they are fixes that are not too
involved).

I probably should have marked this as "Will cook in 'next'" in the
first place.  The practice has been to blindly promote "Will merge
to 'next'" to "Will merge to 'master'" by default when a topic gets
merged to 'next', and then inside of the first week try to find a
chance to re-read the topic to decide to demote it to "Will cook".

"groups of files" in Git?

2017-07-11 Thread Nikolay Shustov

Hi,
I have been recently struggling with migrating my development workflow
from Perforce to Git, all because of the following thing:

I have to work on several features in the same code tree parallel, in
the same Perforce workspace. The major reason why I cannot work on one
feature then on another is just because I have to make sure that the
changes in the related areas of the product play together well.

With Perforce, I can have multiple changelists opened, that group the
changed files as needed.

With Git I cannot seem to finding the possibility to figure out how to
achieve the same result. And the problem is that putting change sets
on different Git branches (or workdirs, or whatever Git offers that
makes the changes to be NOT in the same source tree) is not a viable
option from me as I would have to re-build code as I re-integrate the
changes between the branches (or whatever changes separation Git
feature is used).
Build takes time and resources and considering that I have to do it on
multiple platforms (I do cross-platform development) it really
denominates the option of not having multiple changes in the same code
tree.

Am I ignorant about some Git feature/way of using Git that would help?
Is it worth considering adding to Git a feature like "group of files"
that would offer some virtutal grouping of the locally changed files
in the checked-out branch?

Thanks in advance,
- Nikolay

Re: [PATCH] strbuf: use designated initializers in STRBUF_INIT

2017-07-11 Thread Junio C Hamano

Ben Peart  writes:

>> If this patch can survive a few releases without complaint,
>> then we can feel more confident that designated initializers
>> are widely supported by our user base.  It also is an
>> indication that other C99 features may be supported, but not
>> a guarantee (e.g., gcc had designated initializers before
>> C99 existed).
>
> Correct.  MSVC also supports designated initializers but does not
> fully support C99.

Thanks for a datapoint.

Just so that people do not misunderstand, it is not our goal to
declare that now you need a fully C99 compiler to build Git.

When deciding what shell scripting tools with what options in our
scripted Porcelains and tests, we use POSIX.1 as a rough guideline.
We say "let's not use this, as it is not even in POSIX" but we never
say "use of it is OK because it is in POSIX", and we sometimes even
say "even it is in POSIX, various implementation of it is buggy and
it does not work properly in practice" to certain things [*1*].

C89 has been the base of such a guideline for our C programs, and
people must not to view this patch as an attempt to raise the base
to C99.  It is rather an attempt to see how far we can safely raise
the base by checking some selected useful new features [*2*] that we
have had occasions to wish that they were available, and designated
initializer for structure fields is one of them.

We may find out that, after starting with "C89, plus this and that
feature that are known to be commonly supported", the guideline may
become more succinct to say "C99, minus this and that feature that
are not usable on some platforms", if it turns out that majority of
the systems that are not fully C99 have all of the things we care
about.  We do not know yet, and we are only at the beginning of the
journey to find it out.

[Footnote]

*1* Like `backtick` command substitutions that is very hard to make
them nest properly and impossible to mix portably with quoting.

*2* New, relative to our current practice, that is.

git config --help not functional on bad config

2017-07-11 Thread Peter Krefting


So, I set the wrong value for a configuration option, and git tells me:

  $ git config branch.autoSetupRebase false
  $ git log
  error: malformed value for branch.autosetuprebase
  fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' at 
line 24

That's fine. However, when trying to look for help, it is not that 
useful:


  $ git config --help
  error: malformed value for branch.autosetuprebase
  fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' at 
line 24

Perhaps it should allow "--help" to go through even if the 
configuration is bad?


After a "git config --unset branch.autosetuprebase" everything works 
again, but I had to look that up manually calling "man git-config" 
(afterwards I realized I could go out of the repo to be unaffected 
by the config, but that probably wouldn't have helped if I had put 
this in my global config).


--
\\// Peter - http://www.softwolves.pp.se/

Re: [PATCH/RFC] commit & merge: modularize the empty message validator

2017-07-11 Thread Kaartic Sivaraam

Sorry, forgot to add the RFC suffix to [PATCH]. Please consider it's
presence.

-- 
Kaartic

[PATCH 4/4] hook: add a simple first example

2017-07-11 Thread Kaartic Sivaraam

Add a simple example that replaces an outdated example
that was removed. This ensures that there's at the least
a simple example that illustrates what could be done
using the hook just by enabling it.

Also, update the documentation.

Signed-off-by: Kaartic Sivaraam 
---
 Documentation/githooks.txt | 3 +++
 templates/hooks--prepare-commit-msg.sample | 9 ++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index fdc01aa25..59f38efba 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -121,6 +121,9 @@ it is not suppressed by the `--no-verify` option.  A 
non-zero exit
 means a failure of the hook and aborts the commit.  It should not
 be used as replacement for pre-commit hook.
 
+The sample `prepare-commit-msg` hook that comes with Git removes the
+help message found in the commented portion of the commit template.
+
 commit-msg
 ~~
 
diff --git a/templates/hooks--prepare-commit-msg.sample 
b/templates/hooks--prepare-commit-msg.sample
index 87d770592..dc707e46e 100755
--- a/templates/hooks--prepare-commit-msg.sample
+++ b/templates/hooks--prepare-commit-msg.sample
@@ -9,20 +9,23 @@
 #
 # To enable this hook, rename this file to "prepare-commit-msg".
 
-# This hook includes two examples.
+# This hook includes three examples. The first one removes the
+# "# Please enter the commit message..." help message.
 #
-# The first includes the output of "git diff --name-status -r"
+# The second includes the output of "git diff --name-status -r"
 # into the message, just before the "git status" output.  It is
 # commented because it doesn't cope with --amend or with squashed
 # commits.
 #
-# The second example adds a Signed-off-by line to the message, that can
+# The third example adds a Signed-off-by line to the message, that can
 # still be edited.  This is rarely a good idea.
 
 COMMIT_MSG_FILE=$1
 COMMIT_SOURCE=$2
 SHA1=$3
 
+@PERL_PATH@ -i.bak -ne 'print unless(m/^. Please enter the commit 
message/..m/^#$/)' "$COMMIT_MSG_FILE"
+
 # case "$COMMIT_SOURCE,$SHA1" in
 #  ,|template,)
 #@PERL_PATH@ -i.bak -pe '
-- 
2.13.2.957.g457671ade

[PATCH] commit & merge: modularize the empty message validator

2017-07-11 Thread Kaartic Sivaraam

In the context of "git merge" the meaning of an "empty message"
is one that contains no line of text. This is not in line with
"git commit" where an "empty message" is one that contains only
whitespaces and/or signed-off-by lines. This could cause surprises
to users who are accustomed to the meaning of an "empty message"
of "git commit".

Prevent such surprises by ensuring the meaning of an empty 'merge
message' to be in line with that of an empty 'commit message'. This
is done by separating the empty message validator from 'commit' and
making it stand-alone.

Signed-off-by: Kaartic Sivaraam 
---
 I have made an attempt to solve the issue by separating the concerned
 function as I found no reason against it.

 I've tried to name them with what felt appropriate and concise to me.
 Let me know if it's alright.
 
 Makefile|  1 +
 builtin/commit.c| 39 +--
 builtin/merge.c |  3 ++-
 message-validator.c | 34 ++
 message-validator.h |  6 ++
 5 files changed, 48 insertions(+), 35 deletions(-)
 create mode 100644 message-validator.c
 create mode 100644 message-validator.h

diff --git a/Makefile b/Makefile
index ffa6da71b..c1c26e434 100644
--- a/Makefile
+++ b/Makefile
@@ -783,6 +783,7 @@ LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += mergesort.o
+LIB_OBJS += message-validator.o
 LIB_OBJS += mru.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += notes.o
diff --git a/builtin/commit.c b/builtin/commit.c
index 8d1cac062..4c3112bb4 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -33,6 +33,7 @@
 #include "notes-utils.h"
 #include "mailmap.h"
 #include "sigchain.h"
+#include "message-validator.h"
 
 static const char * const builtin_commit_usage[] = {
N_("git commit [] [--] ..."),
@@ -979,41 +980,11 @@ static int prepare_to_commit(const char *index_file, 
const char *prefix,
return 1;
 }
 
-static int rest_is_empty(struct strbuf *sb, int start)
-{
-   int i, eol;
-   const char *nl;
-
-   /* Check if the rest is just whitespace and Signed-of-by's. */
-   for (i = start; i < sb->len; i++) {
-   nl = memchr(sb->buf + i, '\n', sb->len - i);
-   if (nl)
-   eol = nl - sb->buf;
-   else
-   eol = sb->len;
-
-   if (strlen(sign_off_header) <= eol - i &&
-   starts_with(sb->buf + i, sign_off_header)) {
-   i = eol;
-   continue;
-   }
-   while (i < eol)
-   if (!isspace(sb->buf[i++]))
-   return 0;
-   }
-
-   return 1;
-}
-
-/*
- * Find out if the message in the strbuf contains only whitespace and
- * Signed-off-by lines.
- */
-static int message_is_empty(struct strbuf *sb)
+static int is_empty(struct strbuf *sb)
 {
if (cleanup_mode == CLEANUP_NONE && sb->len)
return 0;
-   return rest_is_empty(sb, 0);
+   return message_is_empty(sb, 0);
 }
 
 /*
@@ -1035,7 +1006,7 @@ static int template_untouched(struct strbuf *sb)
if (!skip_prefix(sb->buf, tmpl.buf, ))
start = sb->buf;
strbuf_release();
-   return rest_is_empty(sb, start - sb->buf);
+   return message_is_empty(sb, start - sb->buf);
 }
 
 static const char *find_author_by_nickname(const char *name)
@@ -1744,7 +1715,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
fprintf(stderr, _("Aborting commit; you did not edit the 
message.\n"));
exit(1);
}
-   if (message_is_empty() && !allow_empty_message) {
+   if (is_empty() && !allow_empty_message) {
rollback_index_files();
fprintf(stderr, _("Aborting commit due to empty commit 
message.\n"));
exit(1);
diff --git a/builtin/merge.c b/builtin/merge.c
index 703827f00..625cfb848 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -31,6 +31,7 @@
 #include "gpg-interface.h"
 #include "sequencer.h"
 #include "string-list.h"
+#include "message-validator.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
@@ -772,7 +773,7 @@ static void prepare_to_commit(struct commit_list 
*remoteheads)
}
read_merge_msg();
strbuf_stripspace(, 0 < option_edit);
-   if (!msg.len)
+   if (!msg.len || message_is_empty(, 0))
abort_commit(remoteheads, _("Empty commit message."));
strbuf_release(_msg);
strbuf_addbuf(_msg, );
diff --git a/message-validator.c b/message-validator.c
new file mode 100644
index 0..32feb4e26
--- /dev/null
+++ b/message-validator.c
@@ -0,0 +1,34 @@
+#include "git-compat-util.h"
+#include "sequencer.h"
+#include "strbuf.h"
+#include "message-validator.h"
+
+/*
+ * Find out if the message in the strbuf contains only whitespace and
+ * Signed-off-by

[PATCH 3/4] hook: add sign-off using "interpret-trailers"

2017-07-11 Thread Kaartic Sivaraam

The sample hook to prepare the commit message before
a commit allows users to opt-in to add the sign-off
to the commit message. The sign-off is added at a place
that isn't consistent with the "-s" option of "git commit".
Further, it could go out of view in certain cases.

Add the sign-off in a way similar to "-s" option of
"git commit" using git's interpret-trailers command.

It works well in all cases except when the user invokes
"git commit" without any arguments. In that case manually
add a new line after the first line to ensure it's consistent
with the output of "-s" option.

Signed-off-by: Kaartic Sivaraam 
---
 templates/hooks--prepare-commit-msg.sample | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/templates/hooks--prepare-commit-msg.sample 
b/templates/hooks--prepare-commit-msg.sample
index eb5912163..87d770592 100755
--- a/templates/hooks--prepare-commit-msg.sample
+++ b/templates/hooks--prepare-commit-msg.sample
@@ -32,4 +32,8 @@ SHA1=$3
 # esac
 
 # SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
-# grep -qs "^$SOB" "$COMMIT_MSG_FILE" || echo "$SOB" >> "$COMMIT_MSG_FILE"
+# git interpret-trailers --in-place --trailer "$SOB" "$COMMIT_MSG_FILE"
+# if test -z "$COMMIT_SOURCE"
+# then
+#   @PERL_PATH@ -i.bak -pe 'print "\n" if !$first_line++' "$COMMIT_MSG_FILE"
+# fi
-- 
2.13.2.957.g457671ade

[PATCH 4/4] hook: add a simple first example

2017-07-11 Thread Kaartic Sivaraam

Add a simple example that replaces an outdated example
that was removed. This ensures that there's at the least
a simple example that illustrates what could be done
using the hook just by enabling it.

Also, update the documentation.

Signed-off-by: Kaartic Sivaraam 
---
 Documentation/githooks.txt | 3 +++
 templates/hooks--prepare-commit-msg.sample | 5 -
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index fdc01aa25..59f38efba 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -121,6 +121,9 @@ it is not suppressed by the `--no-verify` option.  A 
non-zero exit
 means a failure of the hook and aborts the commit.  It should not
 be used as replacement for pre-commit hook.
 
+The sample `prepare-commit-msg` hook that comes with Git removes the
+help message found in the commented portion of the commit template.
+
 commit-msg
 ~~
 
diff --git a/templates/hooks--prepare-commit-msg.sample 
b/templates/hooks--prepare-commit-msg.sample
index 87d770592..f0228c1cb 100755
--- a/templates/hooks--prepare-commit-msg.sample
+++ b/templates/hooks--prepare-commit-msg.sample
@@ -9,7 +9,8 @@
 #
 # To enable this hook, rename this file to "prepare-commit-msg".
 
-# This hook includes two examples.
+# This hook includes three examples. The first one removes the
+# "# Please enter the commit message..." help message.
 #
 # The first includes the output of "git diff --name-status -r"
 # into the message, just before the "git status" output.  It is
@@ -23,6 +24,8 @@ COMMIT_MSG_FILE=$1
 COMMIT_SOURCE=$2
 SHA1=$3
 
+@PERL_PATH@ -i.bak -ne 'print unless(m/^. Please enter the commit 
message/..m/^#$/)' "$COMMIT_MSG_FILE"
+
 # case "$COMMIT_SOURCE,$SHA1" in
 #  ,|template,)
 #@PERL_PATH@ -i.bak -pe '
-- 
2.13.2.957.g457671ade

[PATCH 2/4] hook: name the positional variables

2017-07-11 Thread Kaartic Sivaraam

It's always nice to have named variables instead of
positional variables as they communicate their purpose
well.

Appropriately name the positional variables of the hook
to make it easier to see what's going on.

Signed-off-by: Kaartic Sivaraam 
---
 templates/hooks--prepare-commit-msg.sample | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/templates/hooks--prepare-commit-msg.sample 
b/templates/hooks--prepare-commit-msg.sample
index 279ddc1a7..eb5912163 100755
--- a/templates/hooks--prepare-commit-msg.sample
+++ b/templates/hooks--prepare-commit-msg.sample
@@ -19,14 +19,17 @@
 # The second example adds a Signed-off-by line to the message, that can
 # still be edited.  This is rarely a good idea.
 
+COMMIT_MSG_FILE=$1
+COMMIT_SOURCE=$2
+SHA1=$3
 
-# case "$2,$3" in
+# case "$COMMIT_SOURCE,$SHA1" in
 #  ,|template,)
 #@PERL_PATH@ -i.bak -pe '
 #   print "\n" . `git diff --cached --name-status -r`
-#   if /^#/ && $first++ == 0' "$1" ;;
+#   if /^#/ && $first++ == 0' "$COMMIT_MSG_FILE" ;;
 #  *) ;;
 # esac
 
 # SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
-# grep -qs "^$SOB" "$1" || echo "$SOB" >> "$1"
+# grep -qs "^$SOB" "$COMMIT_MSG_FILE" || echo "$SOB" >> "$COMMIT_MSG_FILE"
-- 
2.13.2.957.g457671ade

[PATCH 1/4] hook: cleanup script

2017-07-11 Thread Kaartic Sivaraam

Prepare the 'preare-commit-msg' sample script for
upcoming changes. Preparation includes removal of
an example that has outlived it's purpose. The example
is the one that comments the "Conflicts:" part of a
merge commit message. It isn't relevant anymore as
it's done by default since 261f315b ("merge & sequencer:
turn "Conflicts:" hint into a comment", 2014-08-28).

Further update the relevant comments from the sample script
and update the documentation.

Signed-off-by: Kaartic Sivaraam 
---
 Documentation/githooks.txt |  3 ---
 templates/hooks--prepare-commit-msg.sample | 24 ++--
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 706091a56..fdc01aa25 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -121,9 +121,6 @@ it is not suppressed by the `--no-verify` option.  A 
non-zero exit
 means a failure of the hook and aborts the commit.  It should not
 be used as replacement for pre-commit hook.
 
-The sample `prepare-commit-msg` hook that comes with Git comments
-out the `Conflicts:` part of a merge's commit message.
-
 commit-msg
 ~~
 
diff --git a/templates/hooks--prepare-commit-msg.sample 
b/templates/hooks--prepare-commit-msg.sample
index 86b8f227e..279ddc1a7 100755
--- a/templates/hooks--prepare-commit-msg.sample
+++ b/templates/hooks--prepare-commit-msg.sample
@@ -9,28 +9,24 @@
 #
 # To enable this hook, rename this file to "prepare-commit-msg".
 
-# This hook includes three examples.  The first comments out the
-# "Conflicts:" part of a merge commit.
+# This hook includes two examples.
 #
-# The second includes the output of "git diff --name-status -r"
+# The first includes the output of "git diff --name-status -r"
 # into the message, just before the "git status" output.  It is
 # commented because it doesn't cope with --amend or with squashed
 # commits.
 #
-# The third example adds a Signed-off-by line to the message, that can
+# The second example adds a Signed-off-by line to the message, that can
 # still be edited.  This is rarely a good idea.
 
-case "$2,$3" in
-  merge,)
-@PERL_PATH@ -i.bak -ne 's/^/# /, s/^# #/#/ if /^Conflicts/ .. /#/; print' 
"$1" ;;
 
-# ,|template,)
-#   @PERL_PATH@ -i.bak -pe '
-#  print "\n" . `git diff --cached --name-status -r`
-#   if /^#/ && $first++ == 0' "$1" ;;
-
-  *) ;;
-esac
+# case "$2,$3" in
+#  ,|template,)
+#@PERL_PATH@ -i.bak -pe '
+#   print "\n" . `git diff --cached --name-status -r`
+#   if /^#/ && $first++ == 0' "$1" ;;
+#  *) ;;
+# esac
 
 # SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
 # grep -qs "^$SOB" "$1" || echo "$SOB" >> "$1"
-- 
2.13.2.957.g457671ade

Re: [PATCH 0/7] tag: more fine-grained pager-configuration

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 03:50:39PM +0200, Martin Ågren wrote:

> > Would it makes sense to just have git-tag respect pager.tag in listing
> > mode, and otherwise ignore it completely?
> 
> Yes. I doubt anyone would notice, and no-one should mind with the
> change (famous last words).
> 
> It does mean there's a precedence for individual commands to get to
> choose when to honor pager.foo. If more such exceptions are added, at
> some point, some command will learn to ignore pager.foo in some
> particular situation and someone will consider it a regression.

Yes, perhaps. One could even argue that "git tag foo" is OK to page and
somebody would consider it a regression not to respect pager.tag. It
seems useless to me, but at least it's not totally broken like "git tag
-a foo" is.

But I find it pretty unlikely, as it doesn't produce copious output. I'd
worry more about eventually finding a command with two copious-output
modes, and somebody wants to distinguish between them. If we can't come
up with a plausible example now, I'm inclined to deal with it if and
when it ever comes up. Right now I just don't want to paint ourselves
into a corner, and I think we can always add more config later (e.g.,
tag.pageFooCommand=true or something; that's not great, but I'm mostly
betting that it will never come up).

> Well, I respect your hunch about .enable and .command and I certainly
> don't want to take things in a direction that makes that approach less
> clean. You have convinced me that I will instead try to teach git tag
> to be more clever about when to use the pager, at least to see what
> that looks like.

Thanks. I was the original proponent of "pager.tag.list", and only after
reading your series today did I think "gee, this seems unnecessarily
complex". So it's entirely possible I'm missing a corner case that you
may uncover through discussion or experimenting.

> Let's call such a "git tag" the "future git tag". Just to convince
> myself I've thought through the implications -- how would
> pager.tag.enable=true affect that future git tag? Would it be fair to
> say that enable=false means "definitely don't start the pager (unless
> --paginate)" and that enable=true means "feel free to use it (unless
> --no-paginate)"? The future git tag would default to using
> enable=true. Would --paginate also be "feel free to use it", or rather
> "the pager must be used"?

I think the rules would be:

  1. If --paginate is given, do that. Likewise for --no-pager.

  2. Otherwise, respect tag.pager.enable if it is set.

  3. Otherwise, use the built-in default.

  4. Any time the pager is run, respect tag.pager.command if it is set.

And then there are two optional bits:

  2a. If tag.pager is set to a boolean, behave as if tag.pager.enable is
  set to the same boolean. If it's set to a string, behave as if
  tag.pager.enable is set to "true", and tag.pager.command is set to
  the string. That should give us backwards compatibility.

  2b. If tag.pager.command is set but tag.pager.enable isn't, possibly we
  should assume that tag.pager.enable is set to "true". That makes
  the common case of "page and use this command" a single config
  block instead of two. It matters less for "tag" where paging would
  be the default.

So I think that what you asked above, but to be clear on the final
question: --paginate should always be "must be used". And that meshes
nicely with implementing it via the git.c wrapper, where it takes
precedence way before we ever hit the setup_auto_pager() call.

-Peff

Re: [PATCH 2/7] git.c: let builtins opt for handling `pager.foo` themselves

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 03:46:08PM +0200, Martin Ågren wrote:

> > Can this ever trigger in execv_dashed_external()? We should only get
> > there if get_builtin() returned NULL in the first place. Otherwise, we'd
> > run and exited via handle_builtin().
> 
> I can trigger it with this:
> 
> $ git -c pager.tag="echo paging" -c pager.tag.list=no -c alias.t=tag t -l
> 
> where the alias is what triggers it and the two pager-configurations
> demonstrate the effect.

Interesting. As you noted below, I think the dashed external we exec
should be choosing whether to run the pager. I suspect what's happening
is that execv_dashed_external() says "a-ha, we're running 'tag', and I
know how to check its pager config". But IMHO that is wrong in the first
place (but it just never really made a difference until your series).

That's just a guess, though. I didn't dig.

-Peff

Re: [PATCH 0/7] tag: more fine-grained pager-configuration

2017-07-11 Thread Martin Ågren

On 11 July 2017 at 12:19, Jeff King  wrote:
> On Mon, Jul 10, 2017 at 11:55:13PM +0200, Martin Ågren wrote:
>
>> Using, e.g., `git -c pager.tag tag -a new-tag` results in errors such
>> as "Vim: Warning: Output is not to a terminal" and a garbled terminal.
>> A user who makes use of `git tag -a` and `git tag -l` will probably
>> choose not to configure `pager.tag` or to set it to "no", so that `git
>> tag -a` will actually work, at the cost of not getting the pager with
>> `git tag -l`.
>
> Right, I think we are all agreed that "pager.tag" as it is now is
> essentially worthless.
>
> If I understand your series correctly, though, it adds pager.tag.list
> but leaves "pager.tag" behaving more or less the same. It's good that we
> now have a way for a user to do the thing they actually want, but it
> feels like we're leaving pager.tag behind as a booby-trap.
>
> Should we disable it entirely (or only respect it in list mode)?
>
> At which point, I wonder if we actually need pager.tag.list at all.
> Slicing up the namespace further would be valuable if there were a
> command which had two pager-worthy modes, and somebody might want to
> enable the pager for one but not the other. But it seems like most
> commands in this boat (e.g., tag, branch, stash) really have two modes:
> listing things or creating things.
>
> Would it makes sense to just have git-tag respect pager.tag in listing
> mode, and otherwise ignore it completely?

Yes. I doubt anyone would notice, and no-one should mind with the
change (famous last words).

It does mean there's a precedence for individual commands to get to
choose when to honor pager.foo. If more such exceptions are added, at
some point, some command will learn to ignore pager.foo in some
particular situation and someone will consider it a regression.

> One nice side effect is that it keeps the multi-level pager.X.Y
> namespace clear. We've talked about transitioning to allow:
>
>   [pager "foo"]
>   enable = true
>   command = some-custom-pager
>
> to configure aspects of the pager separately for git-foo. This has two
> benefits:
>
>   1. Syntactically, it allows configuration for commands whose names
>  aren't valid config keys.
>
>   2. It would allow setting a command with "enable=false", so that "git
>  foo" did not paginate, but "git -p foo" paginated with a custom
>  command.
>
> Those are admittedly minor features. And assuming we don't go crazy with
> the multi-level names, we could have "pager.tag.list" live alongside
> "pager.tag.command". So it's not really an objection, so much as wonder
> out loud whether we can keep this as simple as possible.

Well, I respect your hunch about .enable and .command and I certainly
don't want to take things in a direction that makes that approach less
clean. You have convinced me that I will instead try to teach git tag
to be more clever about when to use the pager, at least to see what
that looks like.

Let's call such a "git tag" the "future git tag". Just to convince
myself I've thought through the implications -- how would
pager.tag.enable=true affect that future git tag? Would it be fair to
say that enable=false means "definitely don't start the pager (unless
--paginate)" and that enable=true means "feel free to use it (unless
--no-paginate)"? The future git tag would default to using
enable=true. Would --paginate also be "feel free to use it", or rather
"the pager must be used"?

At some point, I thought about "true"/"false"/"maybe", where "maybe"
would be what the future git tag implements. Of course, there's a fair
chance not everyone will agree what exactly should be paged with
"maybe". So it's back to adding various knobs. ;)

Anyway, this is more my thinking out loud. I'll drop pager.tag.list in
v2 and will instead make pager.tag more clever. That should force me
to think through this some more.

>> This is an attempt to implement something like that. I decided to let
>> `pager.tag.list` fall back to `pager.tag` before falling back to "on".
>> The default for `pager.tag` is still "off". I can see how that might
>> seem confusing. However, my argument is that it would be awkward for
>> `git tag -l` to ignore `pager.tag` -- we are after all running a
>> subcommand of `git tag`. Also, this avoids a regression for someone
>> who has set `pager.tag` and uses `git tag -l`.
>
> Yeah, I agree that turning "pager.tag" into a complete noop is probably
> a bad idea. But if we made it a silent noop for the non-list cases, that
> would be OK (and the hypothetical user who set it to make `git tag -l`
> work would see a strict improvement; they'd still get their paging but
> not the weird breakage with non-list modes). And I think that applies
> whether we have a pager.tag.list in addition or not.

Good thinking.

Thanks a lot for your comments. I appreciate it.

Martin

Re: [PATCH 6/7] tag: make git tag -l consider new config `pager.tag.list`

2017-07-11 Thread Martin Ågren

On 11 July 2017 at 12:41, Jeff King  wrote:
> On Mon, Jul 10, 2017 at 11:55:19PM +0200, Martin Ågren wrote:
>
>> diff --git a/builtin/tag.c b/builtin/tag.c
>> index e0f129872..e96ef7d70 100644
>> --- a/builtin/tag.c
>> +++ b/builtin/tag.c
>> @@ -446,6 +446,8 @@ int cmd_tag(int argc, const char **argv, const char 
>> *prefix)
>>
>>   argc = parse_options(argc, argv, prefix, options, git_tag_usage, 0);
>>
>> + if (cmdmode == 'l')
>> + setup_auto_pager("tag.list", 0);
>>   setup_auto_pager("tag", 0);
>
> Ideally this would kick in whenever we are in list mode, even if the
> user didn't say "-l". So:
>
>   $ git tag
>
> should probably respect the list config. Likewise, certain options like
> "--contains" trigger list mode. I think the pager setup could just be
> bumped a few lines down, past the "if (!cmdmode)" block that sets up
> those defaults.

Oops, that's embarassing. Thanks. That means the paging starts
slightly later, but what happens in those few lines looks fairly
trivial, so there shouldn't be any real difference in behavior. I'll
add one or two tests.

Martin

Re: [PATCH 3/7] git.c: provide setup_auto_pager()

2017-07-11 Thread Martin Ågren

On 11 July 2017 at 12:37, Jeff King  wrote:
> On Mon, Jul 10, 2017 at 11:55:16PM +0200, Martin Ågren wrote:
>
>> +void setup_auto_pager(const char *cmd, int def)
>> +{
>> + if (use_pager != -1)
>> + return;
>
> I think you probably also want to return early here if pager_in_use()
> is true. If a script runs "git tag -l", you wouldn't want to start a new
> pager when the outer script has already been paged (either via config,
> or via "git -p").

Good point. Thanks.

Martin

Re: [PATCH 2/7] git.c: let builtins opt for handling `pager.foo` themselves

2017-07-11 Thread Martin Ågren

On 11 July 2017 at 12:24, Jeff King  wrote:
> On Mon, Jul 10, 2017 at 11:55:15PM +0200, Martin Ågren wrote:
>
>> To allow individual builtins to make more informed decisions about when
>> to respect `pager.foo`, introduce a flag IGNORE_PAGER_CONFIG. If the flag
>> is set, do not check `pager.foo`. This applies to two code-paths -- one
>> in run_builtin() and one in execv_dashed_external().
>
> Can this ever trigger in execv_dashed_external()? We should only get
> there if get_builtin() returned NULL in the first place. Otherwise, we'd
> run and exited via handle_builtin().

I can trigger it with this:

$ git -c pager.tag="echo paging" -c pager.tag.list=no -c alias.t=tag t -l

where the alias is what triggers it and the two pager-configurations
demonstrate the effect.

> So I think this hunk:
>
>> @@ -543,11 +550,14 @@ static void execv_dashed_external(const char **argv)
>>  {
>>   struct child_process cmd = CHILD_PROCESS_INIT;
>>   int status;
>> + struct cmd_struct *builtin;
>>
>>   if (get_super_prefix())
>>   die("%s doesn't support --super-prefix", argv[0]);
>>
>> - if (use_pager == -1)
>> + builtin = get_builtin(argv[0]);
>> + if (use_pager == -1 &&
>> + !(builtin && builtin->option & IGNORE_PAGER_CONFIG))
>>   use_pager = check_pager_config(argv[0]);
>>   commit_pager_choice();
>
> ...can just go away.

If I remove this, the call I gave above will page although it
shouldn't, and it doesn't if I keep this hunk. There's this in
run_argv: "If we tried alias and futzed with our environment, it no
longer is safe to invoke builtins directly in general.  We have to
spawn them as dashed externals." There's also a NEEDSWORK.

Although, thinking about it, I'm not sure why when I remove this hunk,
the child process doesn't set up the paging correctly. Maybe something
related to my using "-c", or something about the launching of child
processes. Those are both areas where I lack knowledge. Will look into
it.

Martin

Re: [PATCH 4/4] hook: add a simple first example

2017-07-11 Thread Kaartic Sivaraam

On Mon, 2017-07-10 at 13:02 -0700, Junio C Hamano wrote:
> Kaartic Sivaraam  writes:
> 
> >  I made an attempt to make the second example work with amending 
> >  with the aim of making it suitable for usage out of the box. It
> >  seems that it's not easy to make it work as the status of a file
> >  cannot be determined correctly when the index while amending
> >  introduces changes to a file that has a change in the commit being
> >  amended.
> > 
> >  Is there any way in which the second example could be made to work
> > with
> >  amending without much effort? I'm asking this assuming something
> > might
> >  have happened, since the script was added, that could ease the
> > task.
> 
> Sorry, but I do not understand what you are asking here.
> 
I'm was trying to ask, "Is there any way to change the second example
(diff --name-status) to make it work with "commit --amend" so that it
could be uncommented by default ?" 

If there was a way then the patch 4/4 could be dropped as the name
status example would be enough make the script live (I think). 

> After going back and checking 1/4, I realize that I misread the
> patch.
> you did keep the commented out 'diff --name-status' thing, so it
> still
> has three---it just lost one half of the original "first"
> example.  So
> please disregard my earlier "do we still have three, not two?"
> 
Actually speaking, I did think of promoting the second to the first to
make the sub-patches independent of each other. I held myself as I
thought it would be overkill. Anyways, I'll just overkill it!

Re: [PATCH 3/4] hook: add signature using "interpret-trailers"

2017-07-11 Thread Kaartic Sivaraam

Note: Re-attempting to send mail due to rejection
On Mon, 2017-07-10 at 16:13 +0100, Ramsay Jones wrote:
> 
> Again, s/signature/sign-off/g, or similar (including subject line).
> 
Thanks! Forgot that in the course of splitting the patches and
modifying them.

-- 
Kaartic

Re: [PATCH 3/4] hook: add signature using "interpret-trailers"

2017-07-11 Thread Kaartic Sivaraam

On Mon, 2017-07-10 at 16:13 +0100, Ramsay Jones wrote:
> 
> Again, s/signature/sign-off/g, or similar (including subject line).
> 
Thanks! Forgot that in the course of splitting the patches and
modifying them.

Re: [PATCH 0/7] tag: more fine-grained pager-configuration

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 03:42:14PM -0700, Junio C Hamano wrote:

> > A review would be much appreciated. Comments on the way I
> > structured the series would be just as welcome as ones on the
> > final result.
> 
> I see you CC'ed Peff who's passionate in this area, so these patches
> are in good hands already ;-) I briefly skimmed your patches myself,
> and did not spot anything glaringly wrong.

Heh, I don't think of "paging tag output" as one of my passions, but you
may be right. :)

I left a few comments on the code and direction, but I agree that
overall it looks pretty good. And I'm very impressed with the attention
to detail for a first-time contributor.

-Peff

Re: [PATCH 7/7] tag: make git tag -l use pager by default

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 11:55:20PM +0200, Martin Ågren wrote:

> The previous patch introduced `pager.tag.list`. Its default was to use
> the value of `pager.tag` or, if that was also not set, to fall back to
> "off".
> 
> Change that fallback value to "on". Let the default value for
> `pager.tag` remain at "off". Update documentation and tests.

Thanks for splitting this out. The default setting is definitely
orthogonal to allowing the config.

I've been running with this default for several years now (using the
patch I showed in the earlier thread you linked). It _does_ occasionally
annoy me, but I think overall it's an improvement. So it seems like a
good thing to try, and we can see how people respond as they try it out.

-Peff

Re: [PATCH 6/7] tag: make git tag -l consider new config `pager.tag.list`

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 11:55:19PM +0200, Martin Ågren wrote:

> diff --git a/builtin/tag.c b/builtin/tag.c
> index e0f129872..e96ef7d70 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -446,6 +446,8 @@ int cmd_tag(int argc, const char **argv, const char 
> *prefix)
>  
>   argc = parse_options(argc, argv, prefix, options, git_tag_usage, 0);
>  
> + if (cmdmode == 'l')
> + setup_auto_pager("tag.list", 0);
>   setup_auto_pager("tag", 0);

Ideally this would kick in whenever we are in list mode, even if the
user didn't say "-l". So:

  $ git tag

should probably respect the list config. Likewise, certain options like
"--contains" trigger list mode. I think the pager setup could just be
bumped a few lines down, past the "if (!cmdmode)" block that sets up
those defaults.

-Peff

Re: [PATCH 5/7] tag: handle `pager.tag`-configuration within the builtin

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 11:55:18PM +0200, Martin Ågren wrote:

> Use the mechanisms introduced in two earlier patches to ignore
> `pager.tag` in git.c and let the `git tag` builtin handle it on its own.
> 
> This is in preparation for the next patch, where we will want to handle
> slightly different configuration variables depending on which options
> are used with `git tag`. For this reason, place the call to
> setup_auto_pager() after the options have been parsed.
> 
> No functional change is intended. That said, there is a window between
> where the pager is started before and after this patch, and if an error
> occurs within this window, as of this patch the error message might not
> be paged where it would have been paged before. Since
> operation-parsing has to happen inside this window, a difference can be
> seen with, e.g., `git -c pager.tag="echo pager is used" tag
> --unknown-option`. This change in paging-behavior should be acceptable
> since it only affects erroneous usages.

Thanks for carefully thinking through the details. I agree that it's an
acceptable change of behavior.

-Peff

Re: [PATCH 3/7] git.c: provide setup_auto_pager()

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 11:55:16PM +0200, Martin Ågren wrote:

> +void setup_auto_pager(const char *cmd, int def)
> +{
> + if (use_pager != -1)
> + return;

I think you probably also want to return early here if pager_in_use()
is true. If a script runs "git tag -l", you wouldn't want to start a new
pager when the outer script has already been paged (either via config,
or via "git -p").

> + use_pager = check_pager_config(cmd);
> +
> + if (use_pager == -1) {
> + struct strbuf buf = STRBUF_INIT;
> + size_t len;
> +
> + strbuf_addstr(, cmd);
> + len = buf.len;
> + while (use_pager == -1 && len--) {
> + if (buf.buf[len] == '.') {
> + strbuf_setlen(, len);
> + use_pager = check_pager_config(buf.buf);
> + }
> + }
> + strbuf_release();
> + }

This looks good. I wondered if we could fold this all into a single
loop, rather than having the extra check_pager_config() beforehand
(which confused me for a half-second at first). But I tried and I don't
think the result ended up any more readable.

-Peff

Re: [PATCH 2/7] git.c: let builtins opt for handling `pager.foo` themselves

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 11:55:15PM +0200, Martin Ågren wrote:

> To allow individual builtins to make more informed decisions about when
> to respect `pager.foo`, introduce a flag IGNORE_PAGER_CONFIG. If the flag
> is set, do not check `pager.foo`. This applies to two code-paths -- one
> in run_builtin() and one in execv_dashed_external().

Can this ever trigger in execv_dashed_external()? We should only get
there if get_builtin() returned NULL in the first place. Otherwise, we'd
run and exited via handle_builtin().

So I think this hunk:

> @@ -543,11 +550,14 @@ static void execv_dashed_external(const char **argv)
>  {
>   struct child_process cmd = CHILD_PROCESS_INIT;
>   int status;
> + struct cmd_struct *builtin;
>  
>   if (get_super_prefix())
>   die("%s doesn't support --super-prefix", argv[0]);
>  
> - if (use_pager == -1)
> + builtin = get_builtin(argv[0]);
> + if (use_pager == -1 &&
> + !(builtin && builtin->option & IGNORE_PAGER_CONFIG))
>   use_pager = check_pager_config(argv[0]);
>   commit_pager_choice();

...can just go away. And that highlights the issue with externals; we
don't have any internal database that says "these ones handle their own
pager". We don't even have a list of all the possibilities, since users
can drop whatever they like into their $PATH.

So we'd have to make a (backwards-incompatible) decision that pager.*
doesn't work for external commands, and they must manually trigger the
pager themselves. I'm undecided on whether that's reasonable or not
(certainly it's what git-stash would want, but it may be hurting other
commands).

Anyway, I think that's out of scope for your series, which is just
trying to make the builtins work better.

-Peff

Re: [PATCH 0/7] tag: more fine-grained pager-configuration

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 11:55:13PM +0200, Martin Ågren wrote:

> Using, e.g., `git -c pager.tag tag -a new-tag` results in errors such
> as "Vim: Warning: Output is not to a terminal" and a garbled terminal.
> A user who makes use of `git tag -a` and `git tag -l` will probably
> choose not to configure `pager.tag` or to set it to "no", so that `git
> tag -a` will actually work, at the cost of not getting the pager with
> `git tag -l`.

Right, I think we are all agreed that "pager.tag" as it is now is
essentially worthless.

If I understand your series correctly, though, it adds pager.tag.list
but leaves "pager.tag" behaving more or less the same. It's good that we
now have a way for a user to do the thing they actually want, but it
feels like we're leaving pager.tag behind as a booby-trap.

Should we disable it entirely (or only respect it in list mode)?

At which point, I wonder if we actually need pager.tag.list at all.
Slicing up the namespace further would be valuable if there were a
command which had two pager-worthy modes, and somebody might want to
enable the pager for one but not the other. But it seems like most
commands in this boat (e.g., tag, branch, stash) really have two modes:
listing things or creating things.

Would it makes sense to just have git-tag respect pager.tag in listing
mode, and otherwise ignore it completely?

One nice side effect is that it keeps the multi-level pager.X.Y
namespace clear. We've talked about transitioning to allow:

  [pager "foo"]
  enable = true
  command = some-custom-pager

to configure aspects of the pager separately for git-foo. This has two
benefits:

  1. Syntactically, it allows configuration for commands whose names
 aren't valid config keys.

  2. It would allow setting a command with "enable=false", so that "git
 foo" did not paginate, but "git -p foo" paginated with a custom
 command.

Those are admittedly minor features. And assuming we don't go crazy with
the multi-level names, we could have "pager.tag.list" live alongside
"pager.tag.command". So it's not really an objection, so much as wonder
out loud whether we can keep this as simple as possible.

> This is an attempt to implement something like that. I decided to let
> `pager.tag.list` fall back to `pager.tag` before falling back to "on".
> The default for `pager.tag` is still "off". I can see how that might
> seem confusing. However, my argument is that it would be awkward for
> `git tag -l` to ignore `pager.tag` -- we are after all running a
> subcommand of `git tag`. Also, this avoids a regression for someone
> who has set `pager.tag` and uses `git tag -l`.

Yeah, I agree that turning "pager.tag" into a complete noop is probably
a bad idea. But if we made it a silent noop for the non-list cases, that
would be OK (and the hypothetical user who set it to make `git tag -l`
work would see a strict improvement; they'd still get their paging but
not the weird breakage with non-list modes). And I think that applies
whether we have a pager.tag.list in addition or not.

> I am not moving all builtins to handling the pager on their own, but
> instead introduce a flag IGNORE_PAGER_CONFIG and use it only with the
> tag builtin. That means there's another flag to reason about, but it
> avoids moving all builtins to handling the paging themselves just to
> make one of them do something more "clever".

I think this is very sensible. It's the vast minority that would want to
enable this (git-branch is the other obvious one). At some point we may
need a plan to handle non-builtins (like stash), but that's largely
orthogonal to what you're doing here.

> The discussion mentioned above discusses various approaches. It also
> notes how the current pager.foo-configuration conflates _whether_ and
> _how_ to run a pager. Arguably, this series paints ourselves even
> further into that corner. The idea of `pager.foo.command` and
> `pager.foo.enabled` has been mentioned and this series might make such
> an approach slightly less clean, conceptually. We could introduce
> `paging.foo` as a "yes"/"no"/"maybe" to go alongside `pager.foo` which
> is then "less"/"cat"/ That's definitely outside this series, but
> should not be prohibited by it...

I think I covered my thoughts on this part above. :)

> A review would be much appreciated. Comments on the way I structured
> the series would be just as welcome as ones on the final result.

Overall the code looks good, though I'll respond with a few comments.

-Peff

Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 11:18:17AM +0200, Lars Schneider wrote:

> Thanks for the explanation! I looked at the Git release calendar [1] and
> realized that 2.14-rc0 is scheduled for this Thursday. My assumption was
> that either on this date 2.14 will be cut from the tip of master or master
> would not change significantly after the rc0 date until the 2.14 release.

Yeah, certainly forking off a 2.14-rc branch would be a reasonable way
to do the release management. It just happens to not be the way we do
it.

One nice thing about keeping "master" as the stabilizing
release-candidate branch is that it encourages more people to run it
(versus having people manually switch to building a 2.14-rc branch).

-Peff

Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-11 Thread Lars Schneider


> On 11 Jul 2017, at 11:11, Jeff King  wrote:
> 
> On Tue, Jul 11, 2017 at 10:28:08AM +0200, Lars Schneider wrote:
> 
>>> On 11 Jul 2017, at 00:48, Junio C Hamano  wrote:
>>> 
>>> * ls/filter-process-delayed (2017-06-30) 7 commits
>>> (merged to 'next' on 2017-07-05 at a35e644082)
>>> + convert: add "status=delayed" to filter process protocol
>>> + convert: refactor capabilities negotiation
>>> + convert: move multiple file filter error handling to separate function
>>> + convert: put the flags field before the flag itself for consistent style
>>> + t0021: write "OUT " only on success
>>> + t0021: make debug log file name configurable
>>> + t0021: keep filter log files on comparison
>>> 
>>> The filter-process interface learned to allow a process with long
>>> latency give a "delayed" response.
>>> 
>>> Will merge to 'master'.
>> 
>> Hi Junio,
>> 
>> can you already tell if this topic has still a chance to be part of 2.14?
> 
> I'm not Junio, but we should be able to reason it out. :)
> 
> It's marked as "will merge to master", which typically means it will
> happen during the next integration cycle (i.e., within a few days when
> the next "What's cooking" is written).
> 
> Since 2.14 will be cut from the tip of master in a few weeks, once it's
> merged it will definitely be in 2.14 (barring a revert, of course).
> 
> This holds true during release freeze, too. Anything that makes it to
> master is part of the release. The stopping point there is that things
> stop getting marked as "will merge to master".

Thanks for the explanation! I looked at the Git release calendar [1] and
realized that 2.14-rc0 is scheduled for this Thursday. My assumption was
that either on this date 2.14 will be cut from the tip of master or master
would not change significantly after the rc0 date until the 2.14 release.

- Lars


[1] http://tinyurl.com/gitCal

Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 10:28:08AM +0200, Lars Schneider wrote:

> > On 11 Jul 2017, at 00:48, Junio C Hamano  wrote:
> > 
> > * ls/filter-process-delayed (2017-06-30) 7 commits
> >  (merged to 'next' on 2017-07-05 at a35e644082)
> > + convert: add "status=delayed" to filter process protocol
> > + convert: refactor capabilities negotiation
> > + convert: move multiple file filter error handling to separate function
> > + convert: put the flags field before the flag itself for consistent style
> > + t0021: write "OUT " only on success
> > + t0021: make debug log file name configurable
> > + t0021: keep filter log files on comparison
> > 
> > The filter-process interface learned to allow a process with long
> > latency give a "delayed" response.
> > 
> > Will merge to 'master'.
> 
> Hi Junio,
> 
> can you already tell if this topic has still a chance to be part of 2.14?

I'm not Junio, but we should be able to reason it out. :)

It's marked as "will merge to master", which typically means it will
happen during the next integration cycle (i.e., within a few days when
the next "What's cooking" is written).

Since 2.14 will be cut from the tip of master in a few weeks, once it's
merged it will definitely be in 2.14 (barring a revert, of course).

This holds true during release freeze, too. Anything that makes it to
master is part of the release. The stopping point there is that things
stop getting marked as "will merge to master".

-Peff

[PATCH] gc: run pre-detach operations under lock

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 03:25:36AM -0400, Jeff King wrote:

> Annoyingly, the lock code interacts badly with daemonizing because that
> latter will fork to a new process. So the simple solution like:
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 2ba50a287..79480124a 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char 
> *prefix)
>   if (report_last_gc_error())
>   return -1;
>  
> + if (lock_repo_for_gc(force, ))
> + return 0;
> +
>   if (gc_before_repack())
>   return -1;
>   /*
> 
> means that anybody looking at the lockfile will report the wrong pid
> (and thus think the lock is invalid). I guess we'd need to update it in
> place after daemonizing.

Updating it in place is a bit tricky. I came up with this hack that
makes it work, but I'm not sure if the reasoning is too gross.

-- >8 --
Subject: [PATCH] gc: run pre-detach operations under lock

We normally try to avoid having two auto-gc operations run
at the same time, because it wastes resources. This was done
long ago in 64a99eb47 (gc: reject if another gc is running,
unless --force is given, 2013-08-08).

When we do a detached auto-gc, we run the ref-related
commands _before_ detaching, to avoid confusing lock
contention. This was done by 62aad1849 (gc --auto: do not
lock refs in the background, 2014-05-25).

These two features do not interact well. The pre-detach
operations are run before we check the gc.pid lock, meaning
that on a busy repository we may run many of them
concurrently. Ideally we'd take the lock before spawning any
operations, and hold it for the duration of the program.

This is tricky, though, with the way the pid-file interacts
with the daemonize() process.  Other processes will check
that the pid recorded in the pid-file still exists. But
detaching causes us to fork and continue running under a
new pid. So if we take the lock before detaching, the
pid-file will have a bogus pid in it. We'd have to go back
and update it with the new pid after detaching. We'd also
have to play some tricks with the tempfile subsystem to
tweak the "owner" field, so that the parent process does not
clean it up on exit, but the child process does.

Instead, we can do something a bit simpler: take the lock
only for the duration of the pre-detach work, then detach,
then take it again for the post-detach work. Technically,
this means that the post-detach lock could lose to another
process doing pre-detach work. But in the long run this
works out.

That second process would then follow-up by doing
post-detach work. Unless it was in turn blocked by a third
process doing pre-detach work, and so on. This could in
theory go on indefinitely, as the pre-detach work does not
repack, and so need_to_gc() will continue to trigger.  But
in each round we are racing between the pre- and post-detach
locks. Eventually, one of the post-detach locks will win the
race and complete the full gc. So in the worst case, we may
racily repeat the pre-detach work, but we would never do so
simultaneously (it would happen via a sequence of serialized
race-wins).

Signed-off-by: Jeff King 
---
 builtin/gc.c  |  4 
 t/t6500-gc.sh | 21 +
 2 files changed, 25 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index bd91f136f..5a535040f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -414,8 +414,12 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (report_last_gc_error())
return -1;

+   if (lock_repo_for_gc(force, ))
+   return 0;
if (gc_before_repack())
return -1;
+   delete_tempfile();
+
/*
 * failure to daemonize is ok, we'll continue
 * in foreground
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index cc7acd101..41b0be575 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -95,6 +95,27 @@ test_expect_success 'background auto gc does not run if 
gc.log is present and re
test_line_count = 1 packs
 '

+test_expect_success 'background auto gc respects lock for all operations' '
+   # make sure we run a background auto-gc
+   test_commit make-pack &&
+   git repack &&
+   test_config gc.autopacklimit 1 &&
+   test_config gc.autodetach true &&
+
+   # create a ref whose loose presence we can use to detect a pack-refs run
+   git update-ref refs/heads/should-be-loose HEAD &&
+   test_path_is_file .git/refs/heads/should-be-loose &&
+
+   # now fake a concurrent gc that holds the lock; we can use our
+   # shell pid so that it looks valid.
+   hostname=$(hostname || echo unknown) &&
+   printf

Re: [PATCH] git-p4: parse marshal output "p4 -G" in p4 changes

2017-07-11 Thread Luke Diamand

On 3 July 2017 at 23:57, Miguel Torroja  wrote:
> The option -G of p4 (python marshal output) gives more context about the
> data being output. That's useful when using the command "change -o" as
> we can distinguish between warning/error line and real change description.
>
> Some p4 triggers in the server side generate some warnings when
> executed. Unfortunately those messages are mixed with the output of
> "p4 change -o". Those extra warning lines are reported as {'code':'info'}
> in python marshal output (-G). The real change output is reported as
> {'code':'stat'}
>
> the function p4CmdList accepts a new argument: skip_info. When set to
> True it ignores any 'code':'info' entry (skip_info=True by default).
>
> A new test has been created to t9807-git-p4-submit.sh adding a p4 trigger
> that outputs extra lines with "p4 change -o" and "p4 changes"

The latest version of mt/p4-parse-G-output (09521c7a0) seems to break
t9813-git-p4-preserve-users.sh.

I don't quite know why, but I wonder if it's the change to p4CmdList() ?

Luke

>
> Signed-off-by: Miguel Torroja 
> ---
>  git-p4.py| 90 
> 
>  t/t9807-git-p4-submit.sh | 30 
>  2 files changed, 91 insertions(+), 29 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index 8d151da91..a262e3253 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -509,7 +509,7 @@ def isModeExec(mode):
>  def isModeExecChanged(src_mode, dst_mode):
>  return isModeExec(src_mode) != isModeExec(dst_mode)
>
> -def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None):
> +def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=True):
>
>  if isinstance(cmd,basestring):
>  cmd = "-G " + cmd
> @@ -545,6 +545,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None):
>  try:
>  while True:
>  entry = marshal.load(p4.stdout)
> +if skip_info:
> +if 'code' in entry and entry['code'] == 'info':
> +continue
>  if cb is not None:
>  cb(entry)
>  else:
> @@ -879,8 +882,12 @@ def p4ChangesForPaths(depotPaths, changeRange, 
> requestedBlockSize):
>  cmd += ["%s...@%s" % (p, revisionRange)]
>
>  # Insert changes in chronological order
> -for line in reversed(p4_read_pipe_lines(cmd)):
> -changes.add(int(line.split(" ")[1]))
> +for entry in reversed(p4CmdList(cmd)):
> +if entry.has_key('p4ExitCode'):
> +die('Error retrieving changes descriptions 
> ({})'.format(entry['p4ExitCode']))
> +if not entry.has_key('change'):
> +continue
> +changes.add(int(entry['change']))
>
>  if not block_size:
>  break
> @@ -1526,37 +1533,62 @@ class P4Submit(Command, P4UserMap):
>
>  [upstream, settings] = findUpstreamBranchPoint()
>
> -template = ""
> +template = """\
> +# A Perforce Change Specification.
> +#
> +#  Change:  The change number. 'new' on a new changelist.
> +#  Date:The date this specification was last modified.
> +#  Client:  The client on which the changelist was created.  Read-only.
> +#  User:The user who created the changelist.
> +#  Status:  Either 'pending' or 'submitted'. Read-only.
> +#  Type:Either 'public' or 'restricted'. Default is 'public'.
> +#  Description: Comments about the changelist.  Required.
> +#  Jobs:What opened jobs are to be closed by this changelist.
> +#   You may delete jobs from this list.  (New changelists only.)
> +#  Files:   What opened files from the default changelist are to be added
> +#   to this changelist.  You may delete files from this list.
> +#   (New changelists only.)
> +"""
> +files_list = []
>  inFilesSection = False
> +change_entry = None
>  args = ['change', '-o']
>  if changelist:
>  args.append(str(changelist))
> -
> -for line in p4_read_pipe_lines(args):
> -if line.endswith("\r\n"):
> -line = line[:-2] + "\n"
> -if inFilesSection:
> -if line.startswith("\t"):
> -# path starts and ends with a tab
> -path = line[1:]
> -lastTab = path.rfind("\t")
> -if lastTab != -1:
> -path = path[:lastTab]
> -if settings.has_key('depot-paths'):
> -if not [p for p in settings['depot-paths']
> -if p4PathStartsWith(path, p)]:
> -continue
> -else:
> -if not p4PathStartsWith(path, self.depotPath):
> -continue
> +for entry in

Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)

2017-07-11 Thread Lars Schneider


> On 11 Jul 2017, at 00:48, Junio C Hamano  wrote:
> 
> * ls/filter-process-delayed (2017-06-30) 7 commits
>  (merged to 'next' on 2017-07-05 at a35e644082)
> + convert: add "status=delayed" to filter process protocol
> + convert: refactor capabilities negotiation
> + convert: move multiple file filter error handling to separate function
> + convert: put the flags field before the flag itself for consistent style
> + t0021: write "OUT " only on success
> + t0021: make debug log file name configurable
> + t0021: keep filter log files on comparison
> 
> The filter-process interface learned to allow a process with long
> latency give a "delayed" response.
> 
> Will merge to 'master'.

Hi Junio,

can you already tell if this topic has still a chance to be part of 2.14?

Thanks,
Lars

Re: Weirdness with git change detection

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 10:20:43AM +0200, Torsten Bögershausen wrote:

> > No problem. I actually think it would be interesting if Git could
> > somehow detect and warn about this situation. But the obvious way to do
> > that would be to re-run the clean filter directly after checkout. And
> > doing that all the time is expensive.
> 
> Would it be possible to limit the round-trip-check to "git reset --hard" ?
> If yes, possibly many users are willing to pay the price, if Git tells
> the "your filters don't round-trip". (Including CRLF conversions)

Anything's possible, I suppose. But I don't think I'd want that feature
turned on myself.

> > Perhaps some kind of "lint" program would be interesting to warn of
> > possible misconfigurations. Of course people would have to run it for it
> > to be useful. :)
> 
> What do you have in mind here ?
> Don't we need to run some content through the filter(s)?

I was thinking of a tool that could run a series of checks on the
repository and nag about potential problems. One of them could be doing
a round-trip repo->clean->smudge for each file.

Another one might be warning about files that differ only in case.

The idea being that users could run "git lint" if they suspect something
funny is going on. I dunno. It may be a dead-end. Most such
oddities are better detected and handled during actual git operations if
we can. So this would really just be for things that are too expensive
to detect in normal operations.

-Peff

Re: Weirdness with git change detection

2017-07-11 Thread Torsten Bögershausen




On 11/07/17 09:34, Jeff King wrote:

On Tue, Jul 11, 2017 at 12:31:25AM -0700, Peter Eckersley wrote:


I did try to test that hypothesis by editing the filter to be a no-op,
but it's possible I go that wrong. My apologies for bugging the list!


Actually I like this kind of feedback, to see how people are using Git,
including the problems they have.



No problem. I actually think it would be interesting if Git could
somehow detect and warn about this situation. But the obvious way to do
that would be to re-run the clean filter directly after checkout. And
doing that all the time is expensive.


Would it be possible to limit the round-trip-check to "git reset --hard" ?
If yes, possibly many users are willing to pay the price, if Git tells
the "your filters don't round-trip". (Including CRLF conversions)



Perhaps some kind of "lint" program would be interesting to warn of
possible misconfigurations. Of course people would have to run it for it
to be useful. :)


What do you have in mind here ?
Don't we need to run some content through the filter(s)?

Re: Flurries of 'git reflog expire'

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 12:35:50AM -0700, Bryan Turner wrote:

> That's a few of the reasons we've switched over. I'd imagine most
> hosting providers take a similarly "hands on" approach to controlling
> their GC. Beyond a certain scale, it seems almost unavoidable. Git
> never has more than a repository-level view of the world; only the
> hosting provider can see the big picture.

Thanks for writing this out. I agree with all of the reasons given (in
my email which I suspect crossed with yours, I just said "throttling",
but there really are a lot of other reasons).

-Peff

Re: [PATCH] use DIV_ROUND_UP

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 07:08:38PM +0200, René Scharfe wrote:

> > So I think it's a true mechanical conversion, but I have to admit the
> > original is confusing. Without digging I suspect it's correct, though,
> > just because a simple bug here would mean that our ewah bitmaps totally
> > don't work. So it's probably not worth spending time on.
> 
> A few lines below there is "self->bit_size = i + 1", so the code
> calculates how many words the old and the new value are apart (or by how
> many words the bitmap needs to be extended), which becomes easier to see
> with the patch.

Yeah, I'd agree the consistent use of "i + 1" makes the end result after
your patch easier to read.

-Peff

Re: Bug: Git checkout case Insensitive branch name compare

2017-07-11 Thread Jeff King

On Mon, Jul 10, 2017 at 01:15:40PM -0700, kinchit raja wrote:

> Bugs Details:
> 
> Git checkout case Insensitive branch name compare

Right, this is known. Branch names (and all ref names) are case
sensitive in Git. Storage on a case-insensitive filesystem may lead to
confusion if you mix the cases. Sometimes it will work, and sometimes it
will create quite confusing results. One suggested path forward is to
encode characters in the filesystem. See:

http://public-inbox.org/git/20170705083611.jgxbp4sqogicf...@sigill.intra.peff.net/

and further down-thread:

  http://public-inbox.org/git/xmqqo9sxdwjp@gitster.mtv.corp.google.com/

for some recent discussion.

-Peff

Re: Flurries of 'git reflog expire'

2017-07-11 Thread Bryan Turner

On Mon, Jul 10, 2017 at 9:45 PM, Andreas Krey  wrote:
> On Thu, 06 Jul 2017 10:01:05 +, Bryan Turner wrote:
> 
>> I also want to add that Bitbucket Server 5.x includes totally
>> rewritten GC handling. 5.0.x automatically disables auto GC in all
>> repositories and manages it explicitly, and 5.1.x fully removes use of
>> "git gc" in favor of running relevant plumbing commands directly.
>
> That's the part that irks me. This shouldn't be necessary - git itself
> should make sure auto GC isn't run in parallel. Now I probably can't
> evaluate whether a git upgrade would fix this, but given that you
> are going the do-gc-ourselves route I suppose it wouldn't.
>

I believe I've seen some commits on the mailing list that suggest "git
gc --auto" manages its concurrency better in newer versions than it
used to, but even then it can only manage its concurrency within a
single repository. For a hosting server with thousands, or tens of
thousands, of active repositories, there still wouldn't be any
protection against "git gc --auto" running concurrently in dozens of
them at the same time.

But it's not only about concurrency. "git gc" (and by extension "git
gc --auto") is a general purpose tool, designed to generally do what
you need, and to mostly stay out of your way while it does it. I'd
hazard to say it's not really designed for managing heavily-trafficked
repositories on busy hosting services, though, and as a result, there
are things it can't do.

For example, I can configure auto GC to run based on how many loose
objects or packs I have, but there's no heuristic to make it repack
refs when I have a lot of loose ones, or configure it to _only_ pack
refs without repacking objects or pruning reflogs. There are knobs for
various things (like "gc.*.reflogExpire"), but those don't give
complete control. Even if I set "gc.reflogExpire=never", "git gc"
still forks "git reflog expire --all" (compared to
"gc.packRefs=false", which completely prevents forking "git
pack-refs").

A trace on "git gc" shows this:
$ GIT_TRACE=1 git gc
00:10:45.058066 git.c:437   trace: built-in: git 'gc'
00:10:45.067075 run-command.c:369   trace: run_command:
'pack-refs' '--all' '--prune'
00:10:45.077086 git.c:437   trace: built-in: git
'pack-refs' '--all' '--prune'
00:10:45.084098 run-command.c:369   trace: run_command: 'reflog'
'expire' '--all'
00:10:45.093102 git.c:437   trace: built-in: git 'reflog'
'expire' '--all'
00:10:45.097088 run-command.c:369   trace: run_command: 'repack'
'-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago'
00:10:45.106096 git.c:437   trace: built-in: git 'repack'
'-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago'
00:10:45.107098 run-command.c:369   trace: run_command:
'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty'
'--all' '--reflog' '--indexed-objects'
'--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset'
'objects/pack/.tmp-15212-pack'
00:10:45.127117 git.c:437   trace: built-in: git
'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty'
'--all' '--reflog' '--indexed-objects'
'--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset'
'objects/pack/.tmp-15212-pack'
Counting objects: 6, done.
Delta compression using up to 16 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 6 (delta 0)
00:10:45.173161 run-command.c:369   trace: run_command: 'prune'
'--expire' '2.weeks.ago'
00:10:45.184171 git.c:437   trace: built-in: git 'prune'
'--expire' '2.weeks.ago'
00:10:45.199202 run-command.c:369   trace: run_command: 'worktree'
'prune' '--expire' '3.months.ago'
00:10:45.208193 git.c:437   trace: built-in: git
'worktree' 'prune' '--expire' '3.months.ago'
00:10:45.212198 run-command.c:369   trace: run_command: 'rerere' 'gc'
00:10:45.221223 git.c:437   trace: built-in: git 'rerere' 'gc'

The bare repositories used by Bitbucket Server:
* Don't have reflogs enabled generally, and for the ones that are
enabled "gc.*.reflogExpire" is set to "never"
* Never have worktrees, so they don't need to be pruned
* Never use rerere, so that doesn't need to GC
* Have pruning disabled if they've been forked, due to using
alternates to manage disk space

That means of all the commands "git gc" runs, under the covers, at
most only "pack-refs", "repack" and sometimes "prune" have any value.
"reflog expire --all" in particular is extremely likely to fail. Which
brings up another consideration.

"git gc --auto" has no sense of context, or adjacent behavior. Even if
it correctly guards against concurrency, it still doesn't know what
else is going on. Immediately after a push, Bitbucket Server has many
other housekeeping tasks it performs, especially around pull requests.
That means pull request refs are disproportionately likely to be
"moving" immediately after a push completes--exactly

Re: Weirdness with git change detection

2017-07-11 Thread Jeff King

On Tue, Jul 11, 2017 at 12:31:25AM -0700, Peter Eckersley wrote:

> I did try to test that hypothesis by editing the filter to be a no-op,
> but it's possible I go that wrong. My apologies for bugging the list!

No problem. I actually think it would be interesting if Git could
somehow detect and warn about this situation. But the obvious way to do
that would be to re-run the clean filter directly after checkout. And
doing that all the time is expensive.

Perhaps some kind of "lint" program would be interesting to warn of
possible misconfigurations. Of course people would have to run it for it
to be useful. :)

-Peff

Re: Flurries of 'git reflog expire'

2017-07-11 Thread Jeff King

On Thu, Jul 06, 2017 at 10:01:05AM -0700, Bryan Turner wrote:

> I also want to add that Bitbucket Server 5.x includes totally
> rewritten GC handling. 5.0.x automatically disables auto GC in all
> repositories and manages it explicitly, and 5.1.x fully removes use of
> "git gc" in favor of running relevant plumbing commands directly. We
> moved away from "git gc" specifically to avoid the "git reflog expire
> --all", because there's no config setting that _fully disables_
> forking that process.

FWIW, I think auto-gc in general is not a good way to handle maintenance
on a busy hosting server. Repacking can be very resource hungry (both
CPU and memory), and it needs to be throttled. You _could_ throttle with
an auto-gc hook, but that isn't very elegant when it comes to
re-queueing jobs which fail or timeout.

The right model IMHO (and what GitHub uses, and what I'm guessing
Bitbucket is doing in more recent versions) is to make note of write
operations in a data structure, then use that data to schedule
maintenance in a job queue. But that can never really be part of Git
itself, as the notion of a system job queue is outside its scope.

-Peff

Re: Weirdness with git change detection

2017-07-11 Thread Peter Eckersley

I did try to test that hypothesis by editing the filter to be a no-op,
but it's possible I go that wrong. My apologies for bugging the list!

On 11 July 2017 at 00:06, Jeff King  wrote:
> On Tue, Jul 11, 2017 at 06:15:17AM +0200, Torsten Bögershausen wrote:
>
>> On 11/07/17 01:45, Peter Eckersley wrote:
>> > I have a local git repo that's in some weird state where changes
>> > appear to be detected by "git diff" and prevent operations like "git
>> > checkout" from switching branches, but those changes are not removed
>> > by a "git reset --hard" or "git stash".
>> >
>> > Here's an example of the behaviour, with "git reset --hard" failing to
>> > clear a diff in the index:
>> >
>> > https://paste.debian.net/975811/
>> >
>> > Happy to collect additional debugging information if it's useful.
>> >
>> If possible, we need to clone the repo and debug ourselfs - in other
>> words, the problem must be reproducible for others.
>>
>> It the repo public ?
>
> It looks like https://github.com/AI-metrics/AI-metrics.
>
> I notice it has a .gitattributes file with:
>
>   *.ipynb filter=clean_ipynb
>
> There's a config snippet in the repo with:
>
>   [filter "clean_ipynb"]
> clean = ipynb_drop_output
> smudge = cat
>
> and the drop_output script is included. From the paste we can see that
> Peter was at commit c464aaa. Checking out that commit and running the
> script shows that it produces the differences that Git is showing.
>
> The problem is that the currently committed contents do not match the
> output of the clean filter. So even when "git reset --hard" puts the
> content from the repository back into the working tree (putting it
> through the smudge filter, which is a noop), running the clean filter on
> the result will always have a difference. Either the filter needs to be
> disabled, or the cleaned contents committed.
>
> -Peff



-- 
Peter

[BUG] detached auto-gc does not respect lock for 'reflog expire', was Re: Flurries of 'git reflog expire'

2017-07-11 Thread Jeff King

[Updating the subject since I think this really is a bug].

On Tue, Jul 11, 2017 at 06:45:53AM +0200, Andreas Krey wrote:

> > I also want to add that Bitbucket Server 5.x includes totally
> > rewritten GC handling. 5.0.x automatically disables auto GC in all
> > repositories and manages it explicitly, and 5.1.x fully removes use of
> > "git gc" in favor of running relevant plumbing commands directly.
> 
> That's the part that irks me. This shouldn't be necessary - git itself
> should make sure auto GC isn't run in parallel. Now I probably can't
> evaluate whether a git upgrade would fix this, but given that you
> are going the do-gc-ourselves route I suppose it wouldn't.

It's _supposed_ to take a lock, even in older versions. See 64a99eb47
(gc: reject if another gc is running, unless --force is given,
2013-08-08).

But it looks like before we take that lock, we sometimes run pack-refs
and reflog expire. This is due to 62aad1849 (gc --auto: do not lock refs
in the background, 2014-05-25). IMHO this is buggy; it should be
checking the lock before calling gc_before_repack() and daemonizing.

Annoyingly, the lock code interacts badly with daemonizing because that
latter will fork to a new process. So the simple solution like:

diff --git a/builtin/gc.c b/builtin/gc.c
index 2ba50a287..79480124a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (report_last_gc_error())
return -1;

+   if (lock_repo_for_gc(force, ))
+   return 0;
+
if (gc_before_repack())
return -1;
/*

means that anybody looking at the lockfile will report the wrong pid
(and thus think the lock is invalid). I guess we'd need to update it in
place after daemonizing.

-Peff

1 2 >

1 - 100 of 103 matches

Mail list logo