Re: bug: HEAD vs. head on case-insensitive filesystems
> I was going to point you to the recent thread in > > http://public-inbox.org/git/87ziclb2pa@gmail.com/ > > but I see you already participated there. So if your mail here is > "here's a summary of how HEAD/head don't quite work", then OK, that > might be handy. But I think the ultimate resolution is not "let's make > them work", but "let's consistently enforce case-sensitivity in ref > names, regardless of the underlying filesystem". Thanks, that makes sense. Perhaps it's best to just view my original message as, "here's a new example of where 'head' doesn't work".
Re: Dropping a merge from history -- rebase or filter-branch or ...?
Hi Martin, >From the sound of it you really just want to revert the merge of the pull requests. A really good description of options for this is at https://git-scm.com/blog/2010/03/02/undoing-merges.html There is also a section there about bringing the changes back in at a future date, depending on how you do the revert. Does that page describe what you're trying to do? Regards, Andrew Ardill On 8 July 2017 at 07:07, Martin Langhoffwrote: > Hi git-folk! > > long time no see! I'm trying to do one of those "actually, please > don't" things that turn out to be needed in the field. > > I need to open our next "for release" development branch from our > master, but without a couple of disruptive feature branches, which > have been merged into master already. We develop in github, so I'll > call them Pull Requests (PRs) as gh does. > > So I'd like to run a filter-branch or git-rebase --interactive > --preserve-merges that drops some PRs. Problem is, they don't work! > > filter-branch --commit-filter is fantastic, and gives me all the > control I want... except that it will "skip the commit", but still use > the trees in the later commits, so the code changes brought in by > those commits I wanted to avoid will be there. I think the docs/help > that discuss "skip commit" should have a big warning there! > > rebase --interactive --preserve-merges --keep-empty made a complete > hash of things. Nonsense conflicts all over on the merge commits; I > think it re-ran the merge without picking up the conflict resolutions > we had applied. > > The changes we want to avoid are fairly localized -- a specific module > got refactored in 3 stages. The rest of the history should replay > cleanly. I don't want to delete the module. > > My fallback is a manually constructed revert. While still an option, I > think it's better to have a clean stat without sizable feature-branch > reverts. > > cheers, > > > > m > -- > martin.langh...@gmail.com > - ask interesting questions ~ http://linkedin.com/in/martinlanghoff > - don't be distracted~ http://github.com/martin-langhoff >by shiny stuff
Re: [PATCH] strbuf: use designated initializers in STRBUF_INIT
On Tue, Jul 11, 2017 at 07:24:07AM +0200, Johannes Sixt wrote: > Am 11.07.2017 um 02:05 schrieb brian m. carlson: > > I have tried compiling Git with a C++ compiler, so that I could test > > whether that was a viable alternative for MSVC in case its C++ mode > > supported features its C mode did not. Let's just say that the > > compilation aborted very quickly and I gave up after a few minutes. > > It's 3 cleanup patches and one hacky patch with this size: > > 80 files changed, 899 insertions(+), 807 deletions(-) > > to compile with C++. It passed the test suite last time I tried. Getting rid > of the remaining 1000+ -fpermissive warnings is a different matter, though. Yeah, that's the size I was thinking. My goal was "easily achievable in half an hour," which it didn't seem like at the time. > I can revive the patches if there is interest. I'd be interested in at least a pointer to them if you have one. I think it might allow us to take advantage of desirable features that are in the intersection of C99 and C++. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
Hi, Brandon Williams wrote: > Convert grep to use 'struct repository' which enables recursing into > submodules to be handled in-process. \o/ This will be even nicer with the changes described at https://public-inbox.org/git/20170706202739.6056-1-sbel...@google.com/. Until then, I fear it will cause a regression --- see (*) below. [...] > Documentation/git-grep.txt | 7 - > builtin/grep.c | 390 > + > cache.h| 1 - > git.c | 2 +- > grep.c | 13 -- > grep.h | 1 - > setup.c| 12 +- > 7 files changed, 81 insertions(+), 345 deletions(-) Yay, tests still pass. [..] > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -95,13 +95,6 @@ OPTIONS >option the prefix of all submodule output will be the name of > the parent project's object. > > ---parent-basename :: > - For internal use only. In order to produce uniform output with the > - --recurse-submodules option, this option can be used to provide the > - basename of a parent's object to a submodule so the submodule > - can prefix its output with the parent's name rather than the SHA1 of > - the submodule. Being able to get rid of this is a very nice change. [...] > +++ b/builtin/grep.c [...] > @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char > *filename) > { > struct strbuf buf = STRBUF_INIT; > > - if (super_prefix) > - strbuf_addstr(, super_prefix); > - strbuf_addstr(, filename); > - > if (opt->relative && opt->prefix_length) { > - char *name = strbuf_detach(, NULL); > - quote_path_relative(name, opt->prefix, ); > - free(name); > + quote_path_relative(filename, opt->prefix, ); > + } else { > + strbuf_addstr(, filename); > } style micronit: can avoid these braces since both branches are single-line. [...] > @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char > *prefix) > exit(status); > } > > -static void compile_submodule_options(const struct grep_opt *opt, > - const char **argv, > - int cached, int untracked, > - int opt_exclude, int use_index, > - int pattern_type_arg) > -{ [...] > - /* > - * Limit number of threads for child process to use. > - * This is to prevent potential fork-bomb behavior of git-grep as each > - * submodule process has its own thread pool. > - */ > - argv_array_pushf(_options, "--threads=%d", > - (num_threads + 1) / 2); Being able to get rid of this is another very nice change. [...] > + /* add objects to alternates */ > + add_to_alternates_memory(submodule.objectdir); (*) This sets up a single in-memory object store with all the processed submodules. Processed objects are never freed. This means that if I run a command like git grep --recurse-submodules -e neverfound HEAD in a project with many submodules then memory consumption scales in the same way as if the project were all one repository. By contrast, without this patch, git is able to take advantage of the implicit free() when each child exits to limit its memory usage. Worse, this increases the number of pack files git has to pay attention to the sum of the numbers of pack files in all the repositories processed so far. A single object lookup can take O(number of packs * log(number of objects in each pack)) time. That means performance is likely to suffer as the number of submodules increases (n^2 performance) even on systems with a lot of memory. Once the object store is part of the repository struct and freeable, those problems go away and this patch becomes a no-brainer. What should happen until then? Should this go in "next" so we can get experience with it but with care not to let it graduate to "master"? Aside from those two concerns, this patch looks very good from a quick skim, though I haven't reviewed it closely line-by-line. Once we know how to go forward, I'm happy to look at it again. Thanks, Jonathan
Re: [PATCH 2/3] setup: have the_repository use the_index
Brandon Williamswrites: > Have the index state which is stored in 'the_repository' be a pointer to > the in-core instead 'the_index'. This makes it easier to begin > transitioning more parts of the code base to operate on a 'struct > repository'. > > Signed-off-by: Brandon Williams > --- > setup.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/setup.c b/setup.c > index 860507e1f..b370bf3c1 100644 > --- a/setup.c > +++ b/setup.c > @@ -1123,6 +1123,7 @@ const char *setup_git_directory_gently(int *nongit_ok) > setup_git_env(); > } > } > + the_repository->index = _index; > > strbuf_release(); > strbuf_release(); I would have expected this to be going in the different direction, i.e. there is an embedded instance of index_state in a repository object, and the_repository.index is defined to be the old the_index, i.e. #define the_index (the_repository.index) When a Git command that recurses into submodules in-core using the_repository that represents the top-level superproject and another repository object tht represents a submodule, don't we want the repository object for the submodule also have its own default index without having to allocate one and point at it with the index field? I dunno. Being able to leave the index field NULL lets you say "this is a bare repository and there is no place for the index file for it", but even if we never write out the in-core index to an index file on disk, being able to populate the in-core index that is default for the repository object from a tree-ish and iterating over it (e.g. running in-core merge of trees) is a useful thing to do, so I do not consider "index field can be set to NULL to signify a bare repository" a very strong plus.
Re: [PATCH 2/3] setup: have the_repository use the_index
On Tue, Jul 11, 2017 at 5:00 PM, Jonathan Niederwrote: > /* The main repository */ > -static struct repository the_repo; > +static struct repository the_repo = { .index = _index }; https://public-inbox.org/git/20170710070342.txmlwwq6gvjkw...@sigill.intra.peff.net/ specifically said we'd not use all the features today but want to have the test balloon long enough up in the air? (So this is just a critique of the syntax, I agree on the content)
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williamswrote: > + if (repo_submodule_init(, superproject, path)) > + return 0; What happens if we go through the "return 0", do we rather want to print an error ? > + /* add objects to alternates */ > + add_to_alternates_memory(submodule.objectdir); Not trying to make my object series more important than it is... but we really don't want to spread this add_to_alternates_memory hack. :/ I agree with Jacob that a patch with such a diffstat is a joy to review. :) Thanks, Stefan
Re: [PATCH 2/3] setup: have the_repository use the_index
Hi, Brandon Williams wrote: > Have the index state which is stored in 'the_repository' be a pointer to > the in-core instead 'the_index'. This makes it easier to begin > transitioning more parts of the code base to operate on a 'struct > repository'. > > Signed-off-by: Brandon Williams> --- > setup.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/setup.c b/setup.c > index 860507e1f..b370bf3c1 100644 > --- a/setup.c > +++ b/setup.c > @@ -1123,6 +1123,7 @@ const char *setup_git_directory_gently(int *nongit_ok) > setup_git_env(); > } > } > + the_repository->index = _index; I wonder if this can be done sooner. For example, does the following work? This way, 'the_repository->index == _index' would be an invariant that always holds, even in the early setup stage before setup_git_directory_gently has run completely. Thanks, Jonathan diff --git i/repository.c w/repository.c index edca907404..bdc1f93282 100644 --- i/repository.c +++ w/repository.c @@ -4,7 +4,7 @@ #include "submodule-config.h" /* The main repository */ -static struct repository the_repo; +static struct repository the_repo = { .index = _index }; struct repository *the_repository = _repo; static char *git_path_from_env(const char *envvar, const char *git_dir,
Re: [PATCH 1/3] repo_read_index: don't discard the index
On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williamswrote: > Have 'repo_read_index()' behave more like the other read_index family of > functions and don't discard the index if it has already been populated. instead rely on the quick return of read_index_from which has /* istate->initialized covers both .git/index and .git/sharedindex.xxx */ if (istate->initialized) return istate->cache_nr; such that we do not have memory leaks or other issues. Currently we do not have a lot of callers, such that we can change the contract of the 'repo_read_index' function easily. However going through all the callers and then looking at the implementation, may hint at a desire to have repo_read_index documented in repository.h (There is a hint in struct repository, that its field index can be initialized using repo_read_index, but what does repo_read_index actually do?)
Re: [PATCH 1/3] repo_read_index: don't discard the index
Brandon Williams wrote: > Have 'repo_read_index()' behave more like the other read_index family of > functions and don't discard the index if it has already been populated. > > Signed-off-by: Brandon Williams> --- > repository.c | 2 -- > 1 file changed, 2 deletions(-) How did you discover this? E.g. was it from code inspection or does this make the function more convenient to use for some kinds of callers? Reviewed-by: Jonathan Nieder
[PATCH] RFC: Introduce '.gitorderfile'
Conceptually the file order as set with command line -O or via the config 'diff.orderFile' is interesting to both the author (when I run a quick git diff locally) as well as reviewer (a patch floating on the mailing list), so it is not just the author who should be responsible for getting their config in order, but a project would benefit when they could give a good default for such an order. While the change in this RFC patch to diff.c may look uncontroversial, (Oh look! it's just another knob we can turn!), the change to the newly introduced '.gitorderfile' may be more controversial. Here is my rationale for proposing it: I want to force myself to think about the design before pointing out memory leaks and coding style, so the least I would wish for is: *.h *.c but as we have more to look at, I would want to have the most abstract thing to come first. And most abstract from the actual code is the user interaction, the documentation. I heard the claim that the git project deliberately names the directory 'Documentation/' with a capital D such that we had this property by default already. With a patch like this we could rename Documentation/ to docs and still enjoy reading the docs first. Given this alibi, I would claim that t/ is misnamed though! I personally would prefer to review tests just after the documentation instead of after the code as the tests are more abstract and encode promises to the user unlike the code itself that is truth at the end of the day. Signed-off-by: Stefan Beller--- I wrote: > offtopic: As a general thing for our patches, can we configure > (or even convince Git in general), that headers ought to be sent *before* > its accompanying source? I think that would help reviewers like me, who > tend to start reading linearly and then giving random thoughts, because the > header prepares the reviewer for the source code with expectations. Also > by having it the other way around, the review first focuses on design > (Is this function signature sane; the docs said it would do X while not > doing Y, is that sane?) instead of code. and hence I came up with this patch, as I think we would want to expose such a good feature ('diff.orderFile') even for those who are not looking for it themselves. Thanks, Stefan .gitorderfile | 6 ++ diff.c| 11 +++ 2 files changed, 17 insertions(+) create mode 100644 .gitorderfile diff --git a/.gitorderfile b/.gitorderfile new file mode 100644 index 00..5131ede927 --- /dev/null +++ b/.gitorderfile @@ -0,0 +1,6 @@ +Documentation/* +t/* +*.sh +*.h +*.c +Makefile diff --git a/diff.c b/diff.c index 00b4c86698..8d537db06a 100644 --- a/diff.c +++ b/diff.c @@ -3398,6 +3398,17 @@ void diff_setup(struct diff_options *options) if (diff_indent_heuristic) DIFF_XDL_SET(options, INDENT_HEURISTIC); + if (!diff_order_file_cfg) { + struct stat st; + int c = lstat(".gitorderfile", ); + if (c == 0 && S_ISREG(st.st_mode)) + diff_order_file_cfg = ".gitorderfile"; + else if (c < 0 && errno == ENOENT) + ; /* File does not exist. no preset. */ + else + die_errno("stat '.gitorderfile'"); + } + options->orderfile = diff_order_file_cfg; if (diff_no_prefix) { -- 2.13.2.695.g117ddefdb4
[PATCH] git-p4: parse marshal output "p4 -G" in p4 changes
The option -G of p4 (python marshal output) gives more context about the data being output. That's useful when using the command "change -o" as we can distinguish between warning/error line and real change description. Some p4 triggers in the server side generate some warnings when executed. Unfortunately those messages are mixed with the output of "p4 change -o". Those extra warning lines are reported as {'code':'info'} in python marshal output (-G). The real change output is reported as {'code':'stat'} the function p4CmdList accepts a new argument: skip_info. When set to True it ignores any 'code':'info' entry (skip_info=True by default). A new test has been created in t9807-git-p4-submit.sh adding a p4 trigger that outputs extra lines with "p4 change -o" and "p4 changes" Signed-off-by: Miguel Torroja--- git-p4.py| 92 t/t9807-git-p4-submit.sh | 30 2 files changed, 92 insertions(+), 30 deletions(-) diff --git a/git-p4.py b/git-p4.py index 8d151da91..1facf32db 100755 --- a/git-p4.py +++ b/git-p4.py @@ -509,7 +509,7 @@ def isModeExec(mode): def isModeExecChanged(src_mode, dst_mode): return isModeExec(src_mode) != isModeExec(dst_mode) -def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None): +def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=True): if isinstance(cmd,basestring): cmd = "-G " + cmd @@ -545,6 +545,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None): try: while True: entry = marshal.load(p4.stdout) +if skip_info: +if 'code' in entry and entry['code'] == 'info': +continue if cb is not None: cb(entry) else: @@ -879,8 +882,12 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize): cmd += ["%s...@%s" % (p, revisionRange)] # Insert changes in chronological order -for line in reversed(p4_read_pipe_lines(cmd)): -changes.add(int(line.split(" ")[1])) +for entry in reversed(p4CmdList(cmd)): +if entry.has_key('p4ExitCode'): +die('Error retrieving changes descriptions ({})'.format(entry['p4ExitCode'])) +if not entry.has_key('change'): +continue +changes.add(int(entry['change'])) if not block_size: break @@ -1494,7 +1501,7 @@ class P4Submit(Command, P4UserMap): c['User'] = newUser input = marshal.dumps(c) -result = p4CmdList("change -f -i", stdin=input) +result = p4CmdList("change -f -i", stdin=input,skip_info=False) for r in result: if r.has_key('code'): if r['code'] == 'error': @@ -1526,37 +1533,62 @@ class P4Submit(Command, P4UserMap): [upstream, settings] = findUpstreamBranchPoint() -template = "" +template = """\ +# A Perforce Change Specification. +# +# Change: The change number. 'new' on a new changelist. +# Date:The date this specification was last modified. +# Client: The client on which the changelist was created. Read-only. +# User:The user who created the changelist. +# Status: Either 'pending' or 'submitted'. Read-only. +# Type:Either 'public' or 'restricted'. Default is 'public'. +# Description: Comments about the changelist. Required. +# Jobs:What opened jobs are to be closed by this changelist. +# You may delete jobs from this list. (New changelists only.) +# Files: What opened files from the default changelist are to be added +# to this changelist. You may delete files from this list. +# (New changelists only.) +""" +files_list = [] inFilesSection = False +change_entry = None args = ['change', '-o'] if changelist: args.append(str(changelist)) - -for line in p4_read_pipe_lines(args): -if line.endswith("\r\n"): -line = line[:-2] + "\n" -if inFilesSection: -if line.startswith("\t"): -# path starts and ends with a tab -path = line[1:] -lastTab = path.rfind("\t") -if lastTab != -1: -path = path[:lastTab] -if settings.has_key('depot-paths'): -if not [p for p in settings['depot-paths'] -if p4PathStartsWith(path, p)]: -continue -else: -if not p4PathStartsWith(path, self.depotPath): -continue +for entry in p4CmdList(args): +if not entry.has_key('code'): +continue +if entry['code']
Re: "groups of files" in Git?
For starters, let me say that I consider myself a mere advanced beginner Git user, and I haven`t used Perforce ever before (did some reading now), but still, for what it`s worth, here are my thoughts, please bare with me :) Do feel free to correct me if I miss something. On 11/07/2017 19:39, Lars Schneider wrote: > >> On 11 Jul 2017, at 17:45, Nikolay Shustovwrote: >> >> Hi, >> I have been recently struggling with migrating my development workflow >> from Perforce to Git, all because of the following thing: >> >> I have to work on several features in the same code tree parallel, in >> the same Perforce workspace. The major reason why I cannot work on one >> feature then on another is just because I have to make sure that the >> changes in the related areas of the product play together well. >> >> With Perforce, I can have multiple changelists opened, that group the >> changed files as needed. >> >> With Git I cannot seem to finding the possibility to figure out how to >> achieve the same result. And the problem is that putting change sets >> on different Git branches (or workdirs, or whatever Git offers that >> makes the changes to be NOT in the same source tree) is not a viable >> option from me as I would have to re-build code as I re-integrate the >> changes between the branches (or whatever changes separation Git >> feature is used). >> Build takes time and resources and considering that I have to do it on >> multiple platforms (I do cross-platform development) it really >> denominates the option of not having multiple changes in the same code >> tree. >> >> Am I ignorant about some Git feature/way of using Git that would help? >> Is it worth considering adding to Git a feature like "group of files" >> that would offer some virtutal grouping of the locally changed files >> in the checked-out branch? > > Interesting question that came up at my workplace, too. > > Here is what I suggested: > 1. Keep working on a single branch and make commits for all features > 2. If you make a commit, prefix the commit message with the feature name > 3. After you are done with a feature create a new feature branch based on >your combined feature branch. Use `git rebase -i` [1] to remove all >commits that are not relevant for the feature. Alternatively you could >cherry pick the relevant commits [2] if this is faster. > > I wonder what others think about this solution. Maybe there is a better > solution that I overlooked? > > - Lars > > [1] > https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history > [2] > http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html This "single branch, related commits" approach is exactly what came to my mind as well. But, isn`t Perforce "changelist" an "atomic" group of changes - like "commit" in Git, "changeset" in Team Foundation Version Control, etc...? If so, it would mean that this "multiple pending changelists" flow would/should be translated to "multiple pending commits" in Git, where in the end _a single_ Perforce "changelist" is _a single_ Git "commit". Might be this is where the confusion is coming from, trying to fit natural Git "multiple commits per feature" (but "single feature per branch") concept into a "single changelist per feature" Perforce concept, described/required here? I don`t think there is a firm concept of such "multiple pending commits" in Git, but the point is the author is having multiple changelists/commits, and it actively updates them (the _same ones_), until they`re ready to be merged (submitted, in Perforce)... which you can do in Git, and might be quite easily :) So... In Git, you can create a separate commit for each changelist you have in Perforce (all these commits being on the same branch, as desired). Then, when you would have wanted to "update" the pending Perforce changelist (not sure what the corresponding command is in Perforce), you would just `git commit` your current state with additional "--squash" or "--fixup" parameters (depending on if you would like to add more description to existing/original commit message, or not), and the original commit SHA1. In the end, when everything is tested together and you would like to commit features separately (like submitting changelists in Perforce), you would just need to `git rebase -i --autosquash` your branch, where Git would squash all your "update" commits (fixup/squash ones, that is) to the original/initial ones you made as per your changelists/features. No need for manual rearranging, cherry-picking, or whatever. An example flow, with two "changelists" for two features (I`ll be using capital letters A, B, C... instead of commit SHA1, for simplicity): ... do some "Feature 1" work... $ git commit -m "Feature 1" ... do some "Feature 2" work... $ git commit -m "Feature 2" ... do some "Feature 1" work...
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williamswrote: > Convert grep to use 'struct repository' which enables recursing into > submodules to be handled in-process. > > Signed-off-by: Brandon Williams > --- > Documentation/git-grep.txt | 7 - > builtin/grep.c | 390 > + > cache.h| 1 - > git.c | 2 +- > grep.c | 13 -- > grep.h | 1 - > setup.c| 12 +- > 7 files changed, 81 insertions(+), 345 deletions(-) > No real indepth comments here, but it's nice to see how much code reduction this has enabled! Thanks, Jake > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 5033483db..720c7850e 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -95,13 +95,6 @@ OPTIONS > option the prefix of all submodule output will be the name of > the parent project's object. > > ---parent-basename :: > - For internal use only. In order to produce uniform output with the > - --recurse-submodules option, this option can be used to provide the > - basename of a parent's object to a submodule so the submodule > - can prefix its output with the parent's name rather than the SHA1 of > - the submodule. > - > -a:: > --text:: > Process binary files as if they were text. > diff --git a/builtin/grep.c b/builtin/grep.c > index fa351c49f..0b0a8459e 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -28,13 +28,7 @@ static char const * const grep_usage[] = { > NULL > }; > > -static const char *super_prefix; > static int recurse_submodules; > -static struct argv_array submodule_options = ARGV_ARRAY_INIT; > -static const char *parent_basename; > - > -static int grep_submodule_launch(struct grep_opt *opt, > -const struct grep_source *gs); > > #define GREP_NUM_THREADS_DEFAULT 8 > static int num_threads; > @@ -186,10 +180,7 @@ static void *run(void *arg) > break; > > opt->output_priv = w; > - if (w->source.type == GREP_SOURCE_SUBMODULE) > - hit |= grep_submodule_launch(opt, >source); > - else > - hit |= grep_source(opt, >source); > + hit |= grep_source(opt, >source); > grep_source_clear_data(>source); > work_done(w); > } > @@ -327,21 +318,13 @@ static int grep_oid(struct grep_opt *opt, const struct > object_id *oid, > { > struct strbuf pathbuf = STRBUF_INIT; > > - if (super_prefix) { > - strbuf_add(, filename, tree_name_len); > - strbuf_addstr(, super_prefix); > - strbuf_addstr(, filename + tree_name_len); > + if (opt->relative && opt->prefix_length) { > + quote_path_relative(filename + tree_name_len, opt->prefix, > ); > + strbuf_insert(, 0, filename, tree_name_len); > } else { > strbuf_addstr(, filename); > } > > - if (opt->relative && opt->prefix_length) { > - char *name = strbuf_detach(, NULL); > - quote_path_relative(name + tree_name_len, opt->prefix, > ); > - strbuf_insert(, 0, name, tree_name_len); > - free(name); > - } > - > #ifndef NO_PTHREADS > if (num_threads) { > add_work(opt, GREP_SOURCE_OID, pathbuf.buf, path, oid); > @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char > *filename) > { > struct strbuf buf = STRBUF_INIT; > > - if (super_prefix) > - strbuf_addstr(, super_prefix); > - strbuf_addstr(, filename); > - > if (opt->relative && opt->prefix_length) { > - char *name = strbuf_detach(, NULL); > - quote_path_relative(name, opt->prefix, ); > - free(name); > + quote_path_relative(filename, opt->prefix, ); > + } else { > + strbuf_addstr(, filename); > } > > #ifndef NO_PTHREADS > @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char > *prefix) > exit(status); > } > > -static void compile_submodule_options(const struct grep_opt *opt, > - const char **argv, > - int cached, int untracked, > - int opt_exclude, int use_index, > - int pattern_type_arg) > -{ > - struct grep_pat *pattern; > - > - if (recurse_submodules) > - argv_array_push(_options, "--recurse-submodules"); > - > - if (cached) > - argv_array_push(_options, "--cached"); > - if (!use_index) > - argv_array_push(_options,
Re: [RFC PATCH 3/3] sha1_file: add promised blob hook support
On Tue, Jul 11, 2017 at 12:48 PM, Jonathan Tanwrote: > Teach sha1_file to invoke a hook whenever a blob is requested and > unavailable but is promised. The hook is a shell command that can be > configured through "git config"; this hook takes in a list of hashes and > writes (if successful) the corresponding objects to the repo's local > storage. > > This is meant as a temporary measure to ensure that all Git commands > work in such a situation. Future patches will update some commands to > either tolerate promised blobs (without invoking the hook) or be more > efficient in invoking the promised blob hook. > > In order to determine the code changes in sha1_file.c necessary, I > investigated the following: > (1) functions in sha1_file that take in a hash, without the user > regarding how the object is stored (loose or packed) > (2) functions in sha1_file that operate on packed objects (because I > need to check callers that know about the loose/packed distinction > and operate on both differently, and ensure that they can handle > the concept of objects that are neither loose nor packed) > > (1) is handled by the modification to sha1_object_info_extended(). > > For (2), I looked at for_each_packed_object and at the packed-related > functions that take in a hash. For for_each_packed_object, the callers > either already work or are fixed in this patch: > - reachable - only to find recent objects > - builtin/fsck - already knows about promised blobs > - builtin/cat-file - fixed in this commit > > Callers of the other functions do not need to be changed: > - parse_pack_index >- http - indirectly from http_get_info_packs > - find_pack_entry_one >- this searches a single pack that is provided as an argument; the > caller already knows (through other means) that the sought object > is in a specific pack > - find_sha1_pack >- fast-import - appears to be an optimization to not store a > file if it is already in a pack >- http-walker - to search through a struct alt_base >- http-push - to search through remote packs > - has_sha1_pack >- builtin/fsck - already knows about promised blobs >- builtin/count-objects - informational purposes only (check if loose > object is also packed) >- builtin/prune-packed - check if object to be pruned is packed (if > not, don't prune it) >- revision - used to exclude packed objects if requested by user >- diff - just for optimization > > An alternative design that I considered but rejected: > > - Adding a hook whenever a packed blob is requested, not on any blob. >That is, whenever we attempt to search the packfiles for a blob, if >it is missing (from the packfiles and from the loose object storage), >to invoke the hook (which must then store it as a packfile), open the >packfile the hook generated, and report that the blob is found in >that new packfile. This reduces the amount of analysis needed (in >that we only need to look at how packed blobs are handled), but >requires that the hook generate packfiles (or for sha1_file to pack >whatever loose objects are generated), creating one packfile for each >missing blob and potentially very many packfiles that must be >linearly searched. This may be tolerable now for repos that only have >a few missing blobs (for example, repos that only want to exclude >large blobs), and might be tolerable in the future if we have >batching support for the most commonly used commands, but is not >tolerable now for repos that exclude a large amount of blobs. > > Signed-off-by: Jonathan Tan > > +core.promisedBlobCommand:: > + If set, whenever a blob in the local repo is attempted to be read, but > + is both missing and a promised blob, invoke this shell command to > + generate or obtain that blob before reporting an error. This shell > + command should take one or more hashes, each terminated by a newline, > + as standard input, and (if successful) should write the corresponding > + objects to the local repo (packed or loose). Any git command invoked here behaves differently as GIT_IGNORE_PROMISED_BLOBS is set. GIT_IGNORE_PROMISED_BLOBS needs documentation. > +int request_promised_blobs(const struct oid_array *blobs) > + if (i == blobs->nr) > + /* Nothing to fetch */ s/to fetch/to get/ ? Might be to nitpicky. So this function takes an array of blobs and obtains any that are promised. > + > + argv[0] = promised_blob_command; > + cp.argv = argv; > + cp.env = env; I had the impression that we'd favor .args and .env_array nowdays, such that we'd not need to construct the NULL terminated array ourselves. (e.g. whenever we'd change argv, we'd need to make sure it contains enough NULL at the end) argv_array_push(, promised_blob_command); argv_array_push(_array, \
Re: [PATCH] git-p4: parse marshal output "p4 -G" in p4 changes
Hi Luke, My bad as I didn't check that case. It was p4CmdList as you said. the default value of the new field skip_info (set to True) ignores any info messages. and the script is waiting for a valid message. If I set it to False, then it does return an info entry and it accepts the submit change I'm sending another patch update On Tue, Jul 11, 2017 at 10:35 AM, Luke Diamandwrote: > On 3 July 2017 at 23:57, Miguel Torroja wrote: >> The option -G of p4 (python marshal output) gives more context about the >> data being output. That's useful when using the command "change -o" as >> we can distinguish between warning/error line and real change description. >> >> Some p4 triggers in the server side generate some warnings when >> executed. Unfortunately those messages are mixed with the output of >> "p4 change -o". Those extra warning lines are reported as {'code':'info'} >> in python marshal output (-G). The real change output is reported as >> {'code':'stat'} >> >> the function p4CmdList accepts a new argument: skip_info. When set to >> True it ignores any 'code':'info' entry (skip_info=True by default). >> >> A new test has been created to t9807-git-p4-submit.sh adding a p4 trigger >> that outputs extra lines with "p4 change -o" and "p4 changes" > > The latest version of mt/p4-parse-G-output (09521c7a0) seems to break > t9813-git-p4-preserve-users.sh. > > I don't quite know why, but I wonder if it's the change to p4CmdList() ? > > Luke > >> >> Signed-off-by: Miguel Torroja >> --- >> git-p4.py| 90 >> >> t/t9807-git-p4-submit.sh | 30 >> 2 files changed, 91 insertions(+), 29 deletions(-) >> >> diff --git a/git-p4.py b/git-p4.py >> index 8d151da91..a262e3253 100755 >> --- a/git-p4.py >> +++ b/git-p4.py >> @@ -509,7 +509,7 @@ def isModeExec(mode): >> def isModeExecChanged(src_mode, dst_mode): >> return isModeExec(src_mode) != isModeExec(dst_mode) >> >> -def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None): >> +def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=True): >> >> if isinstance(cmd,basestring): >> cmd = "-G " + cmd >> @@ -545,6 +545,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', >> cb=None): >> try: >> while True: >> entry = marshal.load(p4.stdout) >> +if skip_info: >> +if 'code' in entry and entry['code'] == 'info': >> +continue >> if cb is not None: >> cb(entry) >> else: >> @@ -879,8 +882,12 @@ def p4ChangesForPaths(depotPaths, changeRange, >> requestedBlockSize): >> cmd += ["%s...@%s" % (p, revisionRange)] >> >> # Insert changes in chronological order >> -for line in reversed(p4_read_pipe_lines(cmd)): >> -changes.add(int(line.split(" ")[1])) >> +for entry in reversed(p4CmdList(cmd)): >> +if entry.has_key('p4ExitCode'): >> +die('Error retrieving changes descriptions >> ({})'.format(entry['p4ExitCode'])) >> +if not entry.has_key('change'): >> +continue >> +changes.add(int(entry['change'])) >> >> if not block_size: >> break >> @@ -1526,37 +1533,62 @@ class P4Submit(Command, P4UserMap): >> >> [upstream, settings] = findUpstreamBranchPoint() >> >> -template = "" >> +template = """\ >> +# A Perforce Change Specification. >> +# >> +# Change: The change number. 'new' on a new changelist. >> +# Date:The date this specification was last modified. >> +# Client: The client on which the changelist was created. Read-only. >> +# User:The user who created the changelist. >> +# Status: Either 'pending' or 'submitted'. Read-only. >> +# Type:Either 'public' or 'restricted'. Default is 'public'. >> +# Description: Comments about the changelist. Required. >> +# Jobs:What opened jobs are to be closed by this changelist. >> +# You may delete jobs from this list. (New changelists only.) >> +# Files: What opened files from the default changelist are to be >> added >> +# to this changelist. You may delete files from this list. >> +# (New changelists only.) >> +""" >> +files_list = [] >> inFilesSection = False >> +change_entry = None >> args = ['change', '-o'] >> if changelist: >> args.append(str(changelist)) >> - >> -for line in p4_read_pipe_lines(args): >> -if line.endswith("\r\n"): >> -line = line[:-2] + "\n" >> -if inFilesSection: >> -if line.startswith("\t"): >> -# path starts and ends with a tab >> -path = line[1:] >> -lastTab = path.rfind("\t") >> -
Re: "groups of files" in Git?
Nikolay Shustov wrote: > With Perforce, I can have multiple changelists opened, that group the > changed files as needed. > > With Git I cannot seem to finding the possibility to figure out how to > achieve the same result. And the problem is that putting change sets > on different Git branches (or workdirs, or whatever Git offers that > makes the changes to be NOT in the same source tree) is not a viable > option from me as I would have to re-build code as I re-integrate the > changes between the branches (or whatever changes separation Git > feature is used). > Build takes time and resources and considering that I have to do it on > multiple platforms (I do cross-platform development) it really > denominates the option of not having multiple changes in the same code > tree. > > Am I ignorant about some Git feature/way of using Git that would help? > Is it worth considering adding to Git a feature like "group of files" > that would offer some virtutal grouping of the locally changed files > in the checked-out branch? I never used Perforce and I'm not even sure I understand your problem, but I thought I'd mention something that nobody else seems to have yet (unless I missed it): First, one thing that seems obvious to me from your description is that these "parallel features" you work on are obviously interdependent, therefore I would rather consider the whole thing as a single feature. Therefore, it makes sense to me to work in a single "topic branch". This doesn't preclude one from separating the changes in logically sensible pieces. Indeed this is par for the course in Git and people do it all the time by dividing the bulk of changes into a carefully chosen series of commits. I think the most common way of doing this is to simply work on the whole thing and once you're happy with it you use "git rebase --interative" in order to "prettify" your history. But, and here comes the part I think nobody mentioned yet, if your feature work is considerably large or spans a considerably long time it may be undesirable to postpone all that work until the very end (perhaps by then you already forgot important information, or perhaps too many changes have accumulated so reviewing them all becomes significantly less efficient). In that case, one solution is to use a "patch management system" which will let you do that work incrementally (going back and forth as needed). If you know mercurial, this is "hg mq". I don't think Git has any such system built-in, but I know there are at least these external tools that integrate with Git: https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Patch-management_Interface_layers Feel free to ignore this if I totally misunderstood your use case. Cheers.
Re: [RFC PATCH 2/3] sha1-array: support appending unsigned char hash
On Tue, Jul 11, 2017 at 12:48 PM, Jonathan Tanwrote: > In a subsequent patch, sha1_file will need to append object names in the > form of "unsigned char *" to oid arrays. Teach sha1-array support for > that. > > Signed-off-by: Jonathan Tan This breaks the oid/sha1 barrier? I would have expected the caller to do a oid_array_append_oid(, sha1_to_oid(sha1)); with sha1_to_oid working off some static memory, such that the call to oid_array_append_oid (which we have as oid_array_append) is just as cheap? Could you say more in the commit message on why we collude sha1 and oids here? Thanks, Stefan
[PATCH 1/3] repo_read_index: don't discard the index
Have 'repo_read_index()' behave more like the other read_index family of functions and don't discard the index if it has already been populated. Signed-off-by: Brandon Williams--- repository.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/repository.c b/repository.c index edca90740..8e60af1d5 100644 --- a/repository.c +++ b/repository.c @@ -235,8 +235,6 @@ int repo_read_index(struct repository *repo) { if (!repo->index) repo->index = xcalloc(1, sizeof(*repo->index)); - else - discard_index(repo->index); return read_index_from(repo->index, repo->index_file); } -- 2.13.2.932.g7449e964c-goog
[PATCH 3/3] grep: recurse in-process using 'struct repository'
Convert grep to use 'struct repository' which enables recursing into submodules to be handled in-process. Signed-off-by: Brandon Williams--- Documentation/git-grep.txt | 7 - builtin/grep.c | 390 + cache.h| 1 - git.c | 2 +- grep.c | 13 -- grep.h | 1 - setup.c| 12 +- 7 files changed, 81 insertions(+), 345 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 5033483db..720c7850e 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -95,13 +95,6 @@ OPTIONS option the prefix of all submodule output will be the name of the parent project's object. ---parent-basename :: - For internal use only. In order to produce uniform output with the - --recurse-submodules option, this option can be used to provide the - basename of a parent's object to a submodule so the submodule - can prefix its output with the parent's name rather than the SHA1 of - the submodule. - -a:: --text:: Process binary files as if they were text. diff --git a/builtin/grep.c b/builtin/grep.c index fa351c49f..0b0a8459e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -28,13 +28,7 @@ static char const * const grep_usage[] = { NULL }; -static const char *super_prefix; static int recurse_submodules; -static struct argv_array submodule_options = ARGV_ARRAY_INIT; -static const char *parent_basename; - -static int grep_submodule_launch(struct grep_opt *opt, -const struct grep_source *gs); #define GREP_NUM_THREADS_DEFAULT 8 static int num_threads; @@ -186,10 +180,7 @@ static void *run(void *arg) break; opt->output_priv = w; - if (w->source.type == GREP_SOURCE_SUBMODULE) - hit |= grep_submodule_launch(opt, >source); - else - hit |= grep_source(opt, >source); + hit |= grep_source(opt, >source); grep_source_clear_data(>source); work_done(w); } @@ -327,21 +318,13 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid, { struct strbuf pathbuf = STRBUF_INIT; - if (super_prefix) { - strbuf_add(, filename, tree_name_len); - strbuf_addstr(, super_prefix); - strbuf_addstr(, filename + tree_name_len); + if (opt->relative && opt->prefix_length) { + quote_path_relative(filename + tree_name_len, opt->prefix, ); + strbuf_insert(, 0, filename, tree_name_len); } else { strbuf_addstr(, filename); } - if (opt->relative && opt->prefix_length) { - char *name = strbuf_detach(, NULL); - quote_path_relative(name + tree_name_len, opt->prefix, ); - strbuf_insert(, 0, name, tree_name_len); - free(name); - } - #ifndef NO_PTHREADS if (num_threads) { add_work(opt, GREP_SOURCE_OID, pathbuf.buf, path, oid); @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char *filename) { struct strbuf buf = STRBUF_INIT; - if (super_prefix) - strbuf_addstr(, super_prefix); - strbuf_addstr(, filename); - if (opt->relative && opt->prefix_length) { - char *name = strbuf_detach(, NULL); - quote_path_relative(name, opt->prefix, ); - free(name); + quote_path_relative(filename, opt->prefix, ); + } else { + strbuf_addstr(, filename); } #ifndef NO_PTHREADS @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char *prefix) exit(status); } -static void compile_submodule_options(const struct grep_opt *opt, - const char **argv, - int cached, int untracked, - int opt_exclude, int use_index, - int pattern_type_arg) -{ - struct grep_pat *pattern; - - if (recurse_submodules) - argv_array_push(_options, "--recurse-submodules"); - - if (cached) - argv_array_push(_options, "--cached"); - if (!use_index) - argv_array_push(_options, "--no-index"); - if (untracked) - argv_array_push(_options, "--untracked"); - if (opt_exclude > 0) - argv_array_push(_options, "--exclude-standard"); - - if (opt->invert) - argv_array_push(_options, "-v"); - if (opt->ignore_case) - argv_array_push(_options, "-i"); - if (opt->word_regexp) - argv_array_push(_options, "-w"); - switch
[PATCH 2/3] setup: have the_repository use the_index
Have the index state which is stored in 'the_repository' be a pointer to the in-core instead 'the_index'. This makes it easier to begin transitioning more parts of the code base to operate on a 'struct repository'. Signed-off-by: Brandon Williams--- setup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/setup.c b/setup.c index 860507e1f..b370bf3c1 100644 --- a/setup.c +++ b/setup.c @@ -1123,6 +1123,7 @@ const char *setup_git_directory_gently(int *nongit_ok) setup_git_env(); } } + the_repository->index = _index; strbuf_release(); strbuf_release(); -- 2.13.2.932.g7449e964c-goog
[PATCH 0/3] Convert grep to recurse in-process
This series utilizes the new 'struct repository' in order to convert grep to be able to recurse into submodules in-process much like how ls-files was converted to recuse in-process. The result is a much smaller code footprint due to not needing to compile an argv array of options to be used when launched a process for operating on a submodule. Brandon Williams (3): repo_read_index: don't discard the index setup: have the_repository use the_index grep: recurse in-process using 'struct repository' Documentation/git-grep.txt | 7 - builtin/grep.c | 390 + cache.h| 1 - git.c | 2 +- grep.c | 13 -- grep.h | 1 - repository.c | 2 - setup.c| 13 +- 8 files changed, 82 insertions(+), 347 deletions(-) -- 2.13.2.932.g7449e964c-goog
Re: [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs
On Tue, Jul 11, 2017 at 12:48 PM, Jonathan Tanwrote: > Currently, Git does not support repos with very large numbers of blobs > or repos that wish to minimize manipulation of certain blobs (for > example, because they are very large) very well, even if the user > operates mostly on part of the repo, because Git is designed on the > assumption that every blob referenced by a tree object is available > somewhere in the repo storage. > > As a first step to reducing this problem, introduce the concept of > promised blobs. Each Git repo can contain a list of promised blobs and > their sizes at $GIT_DIR/objects/promisedblob. This patch contains > functions to query them; functions for creating and modifying that file > will be introduced in later patches. > > A repository that is missing a blob but has that blob promised is not > considered to be in error, so also teach fsck this. Here I wondered what this file looks like, in a later patch you add documentation: +objects/promisedblob:: + This file records the sha1 object names and sizes of promised + blobs. + but that leaves more fundamental questions: * Is that a list of sha1s, separated by LF? (CRLF? How would Windows users interact with it, are they supposed to ever modify this file?) * Similar to .git/packed-refs, would we want to have a first line that has some sort of comment? * In the first question I assumed a linear list, will that be a sorted list, or do we want to have some fancy data structure here? (critbit tree, bloom filter) * is the contents in ASCII or binary? (What are the separators?) * In the future I presume we'd want to quickly answer "Is X in the promised blobs list?" so would it be possible (in the future) to improve the searching e.g. binary search? * ... I'll read on to see my questions answered, but I'd guess others would prefer to see it in the docs. :) > + * A mmap-ed byte array of size (missing_blob_nr * ENTRY_SIZE). Each > + * ENTRY_SIZE-sized entry consists of the SHA-1 of the promised blob and its > + * 64-bit size in network byte order. The entries are sorted in ascending > SHA-1 > + * order. This seems to be the answer to the docs. :) So binary representation, the size variable takes a fixed amount, such that the n-th element in the file is at n * ENTRY_SIZE. > + if (st.st_size % ENTRY_SIZE) { > + die("Size of %s is not a multiple of %d", filename, > ENTRY_SIZE); > + } So it looks as if the file format is an array of ENTRY_SIZE. Similar to other files, would we want to prefix the file with a 4 letter magic number and a version number? > + prepare_promised_blobs(); > + result = sha1_entry_pos(promised_blobs, ENTRY_SIZE, 0, 0, > + promised_blob_nr, promised_blob_nr, > oid->hash); IIRC, this is a binary search. > +int for_each_promised_blob(each_promised_blob_fn cb, void *data) > +{ > + struct object_id oid; > + int i, r; > + > + prepare_promised_blobs(); > + for (i = 0; i < promised_blob_nr; i++) { > + memcpy(oid.hash, _blobs[i * ENTRY_SIZE], > + GIT_SHA1_RAWSZ); > + r = cb(, data); > + if (r) > + return r; > + } > + return 0; > +} > diff --git a/promised-blob.h b/promised-blob.h > new file mode 100644 > index 0..a303ea1ff > --- /dev/null > +++ b/promised-blob.h > @@ -0,0 +1,14 @@ > +#ifndef PROMISED_BLOB_H > +#define PROMISED_BLOB_H > + > +/* > + * Returns 1 if oid is the name of a promised blob. If size is not NULL, also > + * returns its size. > + */ > +extern int is_promised_blob(const struct object_id *oid, > + unsigned long *size); > + > +typedef int each_promised_blob_fn(const struct object_id *oid, void *data); > +int for_each_promised_blob(each_promised_blob_fn, void *); > + > +#endif > diff --git a/t/t3907-promised-blob.sh b/t/t3907-promised-blob.sh > new file mode 100755 > index 0..827072004 > --- /dev/null > +++ b/t/t3907-promised-blob.sh > @@ -0,0 +1,29 @@ > +#!/bin/sh > + > +test_description='promised blobs' > + > +. ./test-lib.sh > + > +test_expect_success 'fsck fails on missing blobs' ' > + rm -rf repo && We should not need to delete the repo here, this is the first test? > + rm repo/.git/objects/$(echo $HASH | cut -c1-2)/$(echo $HASH | cut > -c3-40) && Later down the road, do we want to have a (plumbing) command to move an object from standard blob to promised blob (as of now I'd think it would perform this rm call as well as an insert into the promised blobs file) ? (Well, we'd also have to think about how to get objects out of a pack) With such a command you can easily write your own custom filter to free up blobs again. > + test_must_fail git -C repo fsck test_i18n_grep missing out ? maybe, too? (Maybe that is already tested elsewhere, so no need for it) > +' > + >
Greetings to you
Greetings My Dear Friend, Before I introduce myself, I wish to inform you that this letter is not a hoax mail and I urge you to treat it serious. This letter must come to you as a big surprise, but I believe it is only a day that people meet and become great friends and business partners. Please I want you to read this letter very carefully and I must apologize for barging this message into your mail box without any formal introduction due to the urgency and confidentiality of this business and I know that this message will come to you as a surprise. Please this is not a joke and I will not like you to joke with it ok, With due respect to your person and much sincerity of purpose, I make this contact with you as I believe that you can be of great assistance to me. My name is Mr. ouadrago.karim from Burkina Faso, West Africa. I work in African Development Bank (ADB) as telex manager. I have been searching for your contact since you left our country some years ago. I do not know whether this is your correct email address or not because I only used your name initials to search for your contact. In case you are not the person I am supposed to contact, please see this as a confidential message and do not reveal it to another person but if you are not the intended receiver, do let me know whether you can be of assistance regarding my proposal below because it is top secret. I am about to retire from active Banking service to start a new life but I am skeptical to reveal this particular secret to a stranger. You must assure me that everything will be handled confidentially because we are not going to suffer again in life. It has been 10 years now that most of the greedy African Politicians used our bank to launder money overseas through the help of their Political advisers. Most of the funds which they transferred out of the shores of Africa were gold and oil money that was supposed to have been used to develop the continent. Their Political advisers always inflated the amounts before transferring to foreign accounts, so I also used the opportunity to divert part of the funds hence I am aware that there is no official trace of how much was transferred as all the accounts used for such transfers were being closed after transfer. I acted as the Bank Officer to most of the politicians and when I discovered that they were using me to succeed in their greedy act; I also cleaned some of their banking records from the Bank files and no one cared to ask me because the money was too much for them to control. They laundered over $5billion Dollars during the process. Before I send this message to you, I have already diverted ($17.4million Dollars) to an escrow account belonging to no one in the bank. The bank is anxious now to know who the beneficiary to the funds is because they have made a lot of profits with the funds. It is more than Eight years now and most of the politicians are no longer using our bank to transfer funds overseas. The ($17.4million Dollars) has been laying waste in our bank and I don't want to retire from the bank without transferring the funds to a foreign account to enable me share the proceeds with the receiver (a foreigner). The money will be shared 60% for me and 40% for you. There is no one coming to ask you about the funds because I secured everything. I only want you to assist me by providing a reliable bank account where the funds can be transferred. You are not to face any difficulties or legal implications as I am going to handle the transfer personally. If you are capable of receiving the funds, do let me know immediately to enable me give you a detailed information on what to do. For me, I have not stolen the money from anyone because the other people that took the whole money did not face any problems. This is my chance to grab my own life opportunity but you must keep the details of the funds secret to avoid any leakages as no one in the bank knows about the my plans. Please get back to me if you are interested and capable to handle this project, I shall intimate you on what to do when I hear from your confirmation and acceptance. If you are capable of being my trusted associate, do declare your consent to me. I am looking forward to hear from you immediately for further information. Thanks with my best regards. Mr. ouadrago.karim.
Re: [RFC/WIP PATCH] object store classification
Stefan Bellerwrites: >> At the implementation level, it should have a >> hashtable (lazily populated) for all the objects in a single >> $GIT_OBJECT_DIRECTORY, grafts/replace info, and a set of pointers to >> other object-store instances that are its alternate object stores. > > So one repository has one or more object stores? One repository foo/.git/ has one foo/.git/objects/ directory, so it has its own single object store. That object store may refer to another object store by having foo/.git/objects/info/alternates. Similarly, foo/.git/objects/info/grafts and foo/.git/refs/replace/ would belong to the single object store repository foo/.git/ has. > I would expect that most of the time the question from above > "give me info on the object I can refer to with this object name" > is asked with the additional information: "and I know it is in this > repository", so we rather want to have > > lookup_object(struct *repo, char *name); > > instead of > > lookup_object(struct *object_store, char *name); Absolutely. That is why repository has its own single object_store, which may refer to other object_stores.
Re: [PATCH] commit & merge: modularize the empty message validator
Kaartic Sivaraamwrites: > In the context of "git merge" the meaning of an "empty message" > is one that contains no line of text. This is not in line with > "git commit" where an "empty message" is one that contains only > whitespaces and/or signed-off-by lines. This could cause surprises > to users who are accustomed to the meaning of an "empty message" > of "git commit". > > Prevent such surprises by ensuring the meaning of an empty 'merge > message' to be in line with that of an empty 'commit message'. This > is done by separating the empty message validator from 'commit' and > making it stand-alone. > > Signed-off-by: Kaartic Sivaraam > --- > I have made an attempt to solve the issue by separating the concerned > function as I found no reason against it. > > I've tried to name them with what felt appropriate and concise to me. > Let me know if it's alright. I probably would have avoided a pair of new files just to house a single function. I anticipate that the last helper function in commit.c at the top-level would become relevant to this topic, and because of that, I would have added this function at the end of the file if I were doing this patch. > @@ -772,7 +773,7 @@ static void prepare_to_commit(struct commit_list > *remoteheads) > } > read_merge_msg(); > strbuf_stripspace(, 0 < option_edit); > - if (!msg.len) > + if (!msg.len || message_is_empty(, 0)) I do not see much point in checking !msg.len here. The function immediately returns by hitting the termination condition of the outermost loop and this is not a performance-critical codepath. I think the "validation" done with the rest_is_empty() is somewhat bogus. Why should we reject a commit without a message and a trailer block with only signed-off-by lines, while accepting a commit without a message and a trailer block as long as the trailer block has something equally meaningless by itself, like "Helped-by:"? I think we should inspect the proposed commit log message taken from the editor, find its tail ignoring the trailing comment using ignore_non_trailer, and further separate the result into (, , ) using the same logic used by the interpret-trailers tool, and then complain when turns out to be empty, to be truly useful and consistent. And for that eventual future, merging the logic used in commit and merge might be a good first step. Having said all that, I am not sure "Prevent such surprises" is a problem that is realistic to begin with. When a user sees the editor buffer in "git merge", it is pre-populated with at least a single line of message "Merge branch 'foo'", possibly followed by the summary of the side branch being merged, so unless the user deliberately removes everything and then add a sign-off line (because we do not usually add one), there is no room for "such surprises" in the first place. It does not _hurt_ to diagnose such a crazy case, but it feels a bit lower priority. So from the point of "let's improve what merge does", this change looks to me a borderline "Meh"; but to improve the "why sign-off is so special and behave differently from helped-by when deciding if there is any log?" situation, having a separate helper function that is shared across multiple codepaths that accept edited result may be a good idea.
Re: "groups of files" in Git?
> On Tue, Jul 11, 2017 at 1:39 PM, Lars Schneider >wrote: >> >>> On 11 Jul 2017, at 17:45, Nikolay Shustov wrote: >>> >>> Hi, >>> I have been recently struggling with migrating my development workflow >>> from Perforce to Git, all because of the following thing: >>> >>> I have to work on several features in the same code tree parallel, in >>> the same Perforce workspace. The major reason why I cannot work on one >>> feature then on another is just because I have to make sure that the >>> changes in the related areas of the product play together well. >>> >>> With Perforce, I can have multiple changelists opened, that group the >>> changed files as needed. >>> >>> With Git I cannot seem to finding the possibility to figure out how to >>> achieve the same result. And the problem is that putting change sets >>> on different Git branches (or workdirs, or whatever Git offers that >>> makes the changes to be NOT in the same source tree) is not a viable >>> option from me as I would have to re-build code as I re-integrate the >>> changes between the branches (or whatever changes separation Git >>> feature is used). >>> Build takes time and resources and considering that I have to do it on >>> multiple platforms (I do cross-platform development) it really >>> denominates the option of not having multiple changes in the same code >>> tree. >>> >>> Am I ignorant about some Git feature/way of using Git that would help? >>> Is it worth considering adding to Git a feature like "group of files" >>> that would offer some virtutal grouping of the locally changed files >>> in the checked-out branch? >> >> Interesting question that came up at my workplace, too. >> >> Here is what I suggested: >> 1. Keep working on a single branch and make commits for all features >> 2. If you make a commit, prefix the commit message with the feature name >> 3. After you are done with a feature create a new feature branch based on >> your combined feature branch. Use `git rebase -i` [1] to remove all >> commits that are not relevant for the feature. Alternatively you could >> cherry pick the relevant commits [2] if this is faster. >> >> I wonder what others think about this solution. Maybe there is a better >> solution that I overlooked? >> >> - Lars >> >> [1] >> https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history >> [2] >> http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html >> > On 11 Jul 2017, at 19:54, Nikolay Shustov wrote: > > Thank you for the idea, however I am having troubles with basically > maintaining the uncommitted groups of files: I would prefer the clear > distinction that "those files belong to feature A" and "these files > belong to feature B", before I commit anything. Committing separately > every change for feature A and for feature B would probably a good > option unless I have many changes and then cherry-picking the proper > commits to create a single changeset for the integration would become > a nightmare. I see. Why so complicated with gitattributes then? How about this: Let's say you start working on featureX that affects file1 and file2 and featureY that affects file8 and file9 1. Create aliases to add the files: $ git config --local alias.featx 'add file1 file2' $ git config --local alias.featy 'add file8 file9' 2. Work on the features. Whenever you have something ready for featureX run this: $ git featx $ git commit Whenever you have something ready for featureY run this: $ git featy $ git commit Wouldn't that work? - Lars
Re: "groups of files" in Git?
Hi, I will choose a bit of a less diplomatic path here. Instead of trying to tell you how you can make git fit your needs, I would say that you shouldn't. I've two arguments: 1. It's always painful when you try to use a tool in some way it's not intended or used to work. If you're doing something different than anyone else using that tool, you're probably doing something wrong! I doubt that your case is so special, so my suggestion is to either use git the way most people use it, with one branch for each feature, or do not use git at all, since perforce seems to be better with your workstyle. 2. Git is a snapshot based SCM system. That means that each commit unique identify a version of the code. With your system (as well as any time you're not commiting all changed files) the commit is never tested. You've no idea of actually knowing if your two changes is dependent or not. Of course you can guess, but it's still a guess and in your current work way with perforce you have no way of knowing if your changesets have a dependency between eachother or not since you never test them individually. -- Please let me know if you feel that I've missed something. I can see four solutions: 1. Now I would suggest that you have each feature in a commit and simply run your tests every few commits so you don't have to run it for each commit. 2. Improve your build and test time. I'm sure there's things here to improve. 3. Continue to use perforce. If I recall correctly perforce has even a git integration today. 4. Use integration branches in git and run the tests on that branch. This can be easy todo if you write some scripts for it. Good luck! -- Fredrik Gustafsson phone: +46 733-608274 e-mail: iv...@iveqy.com website: http://www.iveqy.com
Re: [RFC/WIP PATCH] object store classification
On Tue, Jul 11, 2017 at 11:46 AM, Junio C Hamanowrote: > Brandon Williams writes: > >>> the_repository -> the_object_store >>> but the object store is a complex beast having different hash tables >>> for the different alternates. >> >> After looking at the patch and some of the comments here I think that >> this is probably the best approach with a few tweaks (which may be >> completely unfounded because I'm not familiar with all the object store >> code). >> >> In an OO world I would envision a single object (let's say 'struct >> object_store') which is responsible for managing a repository's objects >> no matter where the individual objects came from (main object store or >> an alternate for that repository). And if I understand correctly the >> single hash table that exists now caches objects like this. > > I would say that conceptually an object-store object has an > interface to answer "give me info on the object I can refer to with > this object name". ok. > At the implementation level, it should have a > hashtable (lazily populated) for all the objects in a single > $GIT_OBJECT_DIRECTORY, grafts/replace info, and a set of pointers to > other object-store instances that are its alternate object stores. So one repository has one or more object stores? I would expect that most of the time the question from above "give me info on the object I can refer to with this object name" is asked with the additional information: "and I know it is in this repository", so we rather want to have lookup_object(struct *repo, char *name); instead of lookup_object(struct *object_store, char *name); because the caller most likely would not care if the object comes from the alternate or from the main object store? (There may be special cases in e.g. repacking/gc where we want to instruct the repacker to only repack the main object store, ignoring any alternates or such, but any other command would not care, AFAICT) So we could have the convenience function lookup_object(struct *repo, char *name) { foreach_objectstore(repo, lookup_object_in_single_store, object) if (object) return object; return NULL; } but such code is related to storing objects, that I would prefer to see it in the object store (object.{h,c}) instead of the repository code. Also my initial plan was to have all objects (including from any alternate) in a single hashmap per repository as that seems to be most efficient assuming all alternates fit into memory. This whole object store object orientation is only started to not have the memory pressure from lots of submodule objects polluting the main object store. When we have its own hashmap for each alternate our performance would degrade with the number of alternates. (Assuming the hypothetical worst case of one alternate per object, then the lookup time would be linear in time given the above function, I wonder if there are users that heavily use lots of alternates such that this performance characteristics would be measurable in the code to be written) > You'd need to have a mechanism to avoid creating cycles in these > pointers, of course. This is another complication with many hashmaps (=object stores). In the future, is it likely that we would want to drop an alternate store from within a running process? If so we'd want to have hashmaps per alternate, otherwise I only see disadvantages for more than one hashmap (-> more than one object store) per repository. And with that said, I think we can make a wrapper struct object_store, that would encapsulate all needed variables. + struct object_store { + struct object **obj_hash; + int nr_objs, obj_hash_size; + } objects; But instead of defining it at the repository, we'd rather define it in object.h. Stefan
[RFC PATCH 3/3] sha1_file: add promised blob hook support
Teach sha1_file to invoke a hook whenever a blob is requested and unavailable but is promised. The hook is a shell command that can be configured through "git config"; this hook takes in a list of hashes and writes (if successful) the corresponding objects to the repo's local storage. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate promised blobs (without invoking the hook) or be more efficient in invoking the promised blob hook. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in sha1_file that operate on packed objects (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and at the packed-related functions that take in a hash. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about promised blobs - builtin/cat-file - fixed in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promised blobs - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization An alternative design that I considered but rejected: - Adding a hook whenever a packed blob is requested, not on any blob. That is, whenever we attempt to search the packfiles for a blob, if it is missing (from the packfiles and from the loose object storage), to invoke the hook (which must then store it as a packfile), open the packfile the hook generated, and report that the blob is found in that new packfile. This reduces the amount of analysis needed (in that we only need to look at how packed blobs are handled), but requires that the hook generate packfiles (or for sha1_file to pack whatever loose objects are generated), creating one packfile for each missing blob and potentially very many packfiles that must be linearly searched. This may be tolerable now for repos that only have a few missing blobs (for example, repos that only want to exclude large blobs), and might be tolerable in the future if we have batching support for the most commonly used commands, but is not tolerable now for repos that exclude a large amount of blobs. Signed-off-by: Jonathan Tan--- Documentation/config.txt | 8 Documentation/gitrepository-layout.txt | 8 builtin/cat-file.c | 9 promised-blob.c| 75 ++ promised-blob.h| 13 ++ sha1_file.c| 44 +--- t/t3907-promised-blob.sh | 36 7 files changed, 179 insertions(+), 14 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index d5c9c4cab..c293ac921 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -393,6 +393,14 @@ The default is false, except linkgit:git-clone[1] or linkgit:git-init[1] will probe and set core.ignoreCase true if appropriate when the repository is created. +core.promisedBlobCommand:: + If set, whenever a blob in the local repo is attempted to be read, but + is both missing and a promised blob, invoke this shell command to + generate or obtain that blob before reporting an error. This shell + command should take one or more hashes, each terminated by a newline, + as standard input, and (if successful) should write the corresponding + objects to the local repo (packed or loose). + core.precomposeUnicode:: This option is only used by Mac OS implementation of Git. When core.precomposeUnicode=true, Git reverts the unicode decomposition diff --git
[RFC PATCH 2/3] sha1-array: support appending unsigned char hash
In a subsequent patch, sha1_file will need to append object names in the form of "unsigned char *" to oid arrays. Teach sha1-array support for that. Signed-off-by: Jonathan Tan--- sha1-array.c | 7 +++ sha1-array.h | 1 + 2 files changed, 8 insertions(+) diff --git a/sha1-array.c b/sha1-array.c index 838b3bf84..6e0e35391 100644 --- a/sha1-array.c +++ b/sha1-array.c @@ -9,6 +9,13 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid) array->sorted = 0; } +void oid_array_append_sha1(struct oid_array *array, const unsigned char *sha1) +{ + ALLOC_GROW(array->oid, array->nr + 1, array->alloc); + hashcpy(array->oid[array->nr++].hash, sha1); + array->sorted = 0; +} + static int void_hashcmp(const void *a, const void *b) { return oidcmp(a, b); diff --git a/sha1-array.h b/sha1-array.h index 04b075633..3479959e4 100644 --- a/sha1-array.h +++ b/sha1-array.h @@ -11,6 +11,7 @@ struct oid_array { #define OID_ARRAY_INIT { NULL, 0, 0, 0 } void oid_array_append(struct oid_array *array, const struct object_id *oid); +void oid_array_append_sha1(struct oid_array *array, const unsigned char *sha1); int oid_array_lookup(struct oid_array *array, const struct object_id *oid); void oid_array_clear(struct oid_array *array); -- 2.13.2.932.g7449e964c-goog
[RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs
Currently, Git does not support repos with very large numbers of blobs or repos that wish to minimize manipulation of certain blobs (for example, because they are very large) very well, even if the user operates mostly on part of the repo, because Git is designed on the assumption that every blob referenced by a tree object is available somewhere in the repo storage. As a first step to reducing this problem, introduce the concept of promised blobs. Each Git repo can contain a list of promised blobs and their sizes at $GIT_DIR/objects/promisedblob. This patch contains functions to query them; functions for creating and modifying that file will be introduced in later patches. A repository that is missing a blob but has that blob promised is not considered to be in error, so also teach fsck this. Signed-off-by: Jonathan Tan--- Makefile | 1 + builtin/fsck.c | 13 +++ promised-blob.c | 95 promised-blob.h | 14 +++ t/t3907-promised-blob.sh | 29 +++ t/test-lib-functions.sh | 6 +++ 6 files changed, 158 insertions(+) create mode 100644 promised-blob.c create mode 100644 promised-blob.h create mode 100755 t/t3907-promised-blob.sh diff --git a/Makefile b/Makefile index 9c9c42f8f..e96163269 100644 --- a/Makefile +++ b/Makefile @@ -828,6 +828,7 @@ LIB_OBJS += preload-index.o LIB_OBJS += pretty.o LIB_OBJS += prio-queue.o LIB_OBJS += progress.o +LIB_OBJS += promised-blob.o LIB_OBJS += prompt.o LIB_OBJS += quote.o LIB_OBJS += reachable.o diff --git a/builtin/fsck.c b/builtin/fsck.c index 99dea7adf..7454be7f1 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -15,6 +15,7 @@ #include "progress.h" #include "streaming.h" #include "decorate.h" +#include "promised-blob.h" #define REACHABLE 0x0001 #define SEEN 0x0002 @@ -223,6 +224,9 @@ static void check_reachable_object(struct object *obj) if (!(obj->flags & HAS_OBJ)) { if (has_sha1_pack(obj->oid.hash)) return; /* it is in pack - forget about it */ + if (obj->type == OBJ_BLOB && + is_promised_blob(>oid, NULL)) + return; printf("missing %s %s\n", printable_type(obj), describe_object(obj)); errors_found |= ERROR_REACHABLE; @@ -642,6 +646,13 @@ static int mark_packed_for_connectivity(const struct object_id *oid, return 0; } +static int mark_promised_blob_for_connectivity(const struct object_id *oid, + void *data) +{ + mark_object_for_connectivity(oid); + return 0; +} + static char const * const fsck_usage[] = { N_("git fsck [] [...]"), NULL @@ -701,6 +712,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) if (connectivity_only) { for_each_loose_object(mark_loose_for_connectivity, NULL, 0); for_each_packed_object(mark_packed_for_connectivity, NULL, 0); + for_each_promised_blob(mark_promised_blob_for_connectivity, + NULL); } else { fsck_object_dir(get_object_directory()); diff --git a/promised-blob.c b/promised-blob.c new file mode 100644 index 0..493808ed2 --- /dev/null +++ b/promised-blob.c @@ -0,0 +1,95 @@ +#include "cache.h" +#include "promised-blob.h" +#include "sha1-lookup.h" +#include "strbuf.h" + +#define ENTRY_SIZE (GIT_SHA1_RAWSZ + 8) +/* + * A mmap-ed byte array of size (missing_blob_nr * ENTRY_SIZE). Each + * ENTRY_SIZE-sized entry consists of the SHA-1 of the promised blob and its + * 64-bit size in network byte order. The entries are sorted in ascending SHA-1 + * order. + */ +static char *promised_blobs; +static int64_t promised_blob_nr = -1; + +static void prepare_promised_blobs(void) +{ + char *filename; + int fd; + struct stat st; + + if (promised_blob_nr >= 0) + return; + + if (getenv("GIT_IGNORE_PROMISED_BLOBS")) { + promised_blob_nr = 0; + return; + } + + filename = xstrfmt("%s/promisedblob", get_object_directory()); + fd = git_open(filename); + if (fd < 0) { + if (errno == ENOENT) { + promised_blob_nr = 0; + goto cleanup; + } + perror("prepare_promised_blobs"); + die("Could not open %s", filename); + } + if (fstat(fd, )) { + perror("prepare_promised_blobs"); + die("Could not stat %s", filename); + } + if (st.st_size == 0) { + promised_blob_nr = 0; + goto cleanup; + } + if (st.st_size % ENTRY_SIZE) { + die("Size of %s is not a multiple of %d", filename, ENTRY_SIZE); + } + + promised_blobs =
[RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs")
These patches are part of a set of patches implementing partial clone, as you can see here: https://github.com/jonathantanmy/git/tree/partialclone In that branch, clone with batch checkout works, as you can see in the README. The code and tests are generally done, but some patches are still missing documentation and commit messages. These 3 patches implement the foundational concept - formerly known as "missing blobs" in the "missing blob manifest", I decided to call them "promised blobs". The repo knows their object names and sizes. It also does not have the blobs themselves, but can be configured to know how to fetch them. An older version of these patches was sent as a single demonstration patch in versions 1 to 3 of [1]. In there, Junio suggested that I have only one file containing missing blob information. I have made that suggested change in this version. One thing remaining is to add a repository extension [2] so that older versions of Git fail immediately instead of trying to read missing blobs, but I thought I'd send these first in order to get some initial feedback. [1] https://public-inbox.org/git/cover.1497035376.git.jonathanta...@google.com/ [2] Documentation/technical/repository-version.txt Jonathan Tan (3): promised-blob, fsck: introduce promised blobs sha1-array: support appending unsigned char hash sha1_file: add promised blob hook support Documentation/config.txt | 8 ++ Documentation/gitrepository-layout.txt | 8 ++ Makefile | 1 + builtin/cat-file.c | 9 ++ builtin/fsck.c | 13 +++ promised-blob.c| 170 + promised-blob.h| 27 ++ sha1-array.c | 7 ++ sha1-array.h | 1 + sha1_file.c| 44 ++--- t/t3907-promised-blob.sh | 65 + t/test-lib-functions.sh| 6 ++ 12 files changed, 345 insertions(+), 14 deletions(-) create mode 100644 promised-blob.c create mode 100644 promised-blob.h create mode 100755 t/t3907-promised-blob.sh -- 2.13.2.932.g7449e964c-goog
Re: git config --help not functional on bad config
On Tue, Jul 11, 2017 at 12:13:59PM -0700, Stefan Beller wrote: > > There are other die calls hidden deeper. For instance: > > > > $ git -c core.ignorecase=foo help config > > fatal: bad numeric config value 'foo' for 'core.ignorecase': invalid unit > > > > Those ones are hard to fix without changing the call signature of > > git_config_bool(). > > Good point. While looking at it, parse_help_format can also die, > so building a safe git help config is hard: > > git config --global help.format foo > # everything is broken, how do I fix it? > git help config # breaks, too, for the same reason as you outlined :/ Yeah, I think that should be switched to return an error, and probably all errors from git_help_config() should issue a warning and still return 0. -Peff
Re: git config --help not functional on bad config
On Tue, Jul 11, 2017 at 12:08 PM, Jeff Kingwrote: > On Tue, Jul 11, 2017 at 12:05:01PM -0700, Stefan Beller wrote: > >> > diff --git a/builtin/help.c b/builtin/help.c >> > index 334a8494a..c42dfc9e9 100644 >> > --- a/builtin/help.c >> > +++ b/builtin/help.c >> > @@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char >> > *value, void *cb) >> > if (starts_with(var, "man.")) >> > return add_man_viewer_info(var, value); >> > >> > - return git_default_config(var, value, cb); >> > + return 0; >> >> Instead of ignoring any default config, we could do >> >> git_default_config(var, value, cb); >> /* ignore broken config, we may be the help call for config */ >> return 0; >> >> I looked through git_default_config which seems to only die >> with useful error messages for compression related settings, >> but these we may want to convert to errors instead of dies, >> when going this way. > > There are other die calls hidden deeper. For instance: > > $ git -c core.ignorecase=foo help config > fatal: bad numeric config value 'foo' for 'core.ignorecase': invalid unit > > Those ones are hard to fix without changing the call signature of > git_config_bool(). Good point. While looking at it, parse_help_format can also die, so building a safe git help config is hard: git config --global help.format foo # everything is broken, how do I fix it? git help config # breaks, too, for the same reason as you outlined :/
Re: git config --help not functional on bad config
On Tue, Jul 11, 2017 at 12:05:01PM -0700, Stefan Beller wrote: > > diff --git a/builtin/help.c b/builtin/help.c > > index 334a8494a..c42dfc9e9 100644 > > --- a/builtin/help.c > > +++ b/builtin/help.c > > @@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char > > *value, void *cb) > > if (starts_with(var, "man.")) > > return add_man_viewer_info(var, value); > > > > - return git_default_config(var, value, cb); > > + return 0; > > Instead of ignoring any default config, we could do > > git_default_config(var, value, cb); > /* ignore broken config, we may be the help call for config */ > return 0; > > I looked through git_default_config which seems to only die > with useful error messages for compression related settings, > but these we may want to convert to errors instead of dies, > when going this way. There are other die calls hidden deeper. For instance: $ git -c core.ignorecase=foo help config fatal: bad numeric config value 'foo' for 'core.ignorecase': invalid unit Those ones are hard to fix without changing the call signature of git_config_bool(). -Peff
Re: git config --help not functional on bad config
On Tue, Jul 11, 2017 at 10:53 AM, Jeff Kingwrote: > On Tue, Jul 11, 2017 at 03:49:21PM +0100, Peter Krefting wrote: > >> That's fine. However, when trying to look for help, it is not that useful: >> >> $ git config --help >> error: malformed value for branch.autosetuprebase >> fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' >> at line 24 >> >> Perhaps it should allow "--help" to go through even if the configuration is >> bad? > > Yes, I agree the current behavior is poor. What's happening under the > hood is that "--help" for any command runs "git help config", which in > turn looks at the config to pick up things like help.format. > > But it also loads git_default_config(), which I suspect isn't actually > useful. It goes all the way back to 70087cdbd (git-help: add > "help.format" config variable., 2007-12-15), and it looks like it was > probably added just to match other config callbacks. > > So I think we could probably just do this: > > diff --git a/builtin/help.c b/builtin/help.c > index 334a8494a..c42dfc9e9 100644 > --- a/builtin/help.c > +++ b/builtin/help.c > @@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char > *value, void *cb) > if (starts_with(var, "man.")) > return add_man_viewer_info(var, value); > > - return git_default_config(var, value, cb); > + return 0; Instead of ignoring any default config, we could do git_default_config(var, value, cb); /* ignore broken config, we may be the help call for config */ return 0; I looked through git_default_config which seems to only die with useful error messages for compression related settings, but these we may want to convert to errors instead of dies, when going this way. Thanks, Stefan
Re: [RFC/WIP PATCH] object store classification
Brandon Williamswrites: >> the_repository -> the_object_store >> but the object store is a complex beast having different hash tables >> for the different alternates. > > After looking at the patch and some of the comments here I think that > this is probably the best approach with a few tweaks (which may be > completely unfounded because I'm not familiar with all the object store > code). > > In an OO world I would envision a single object (let's say 'struct > object_store') which is responsible for managing a repository's objects > no matter where the individual objects came from (main object store or > an alternate for that repository). And if I understand correctly the > single hash table that exists now caches objects like this. I would say that conceptually an object-store object has an interface to answer "give me info on the object I can refer to with this object name". At the implementation level, it should have a hashtable (lazily populated) for all the objects in a single $GIT_OBJECT_DIRECTORY, grafts/replace info, and a set of pointers to other object-store instances that are its alternate object stores. You'd need to have a mechanism to avoid creating cycles in these pointers, of course.
Re: "groups of files" in Git?
Thank you for #3. As for 1+2, the documentation says: "Each line in gitattributes file is of form: pattern attr1 attr2 ... ... When the pattern matches the path in question, the attributes listed on the line are given to the path." My understanding is that to have the bunch of the files in the separate directories having the same attribute, I would have to, for each file, create a separate gitattributes line with the exact paths/filename and needed attribute. Is it would you are suggesting or I misunderstood the idea? On Tue, Jul 11, 2017 at 2:19 PM, Stefan Bellerwrote: > On Tue, Jul 11, 2017 at 11:10 AM, Nikolay Shustov > wrote: >> Thank you for the explanation. The example about backend and frontend >> is relevant even though I normally have to deal with more layers at >> the same time. >> >> However, in my case I have the thing that you have already tried to >> address, partially: the changes always align with file boundaries BUT >> not with directory boundaries. Imagine you have the stack of backend, >> data transport and frontend layers. The feature has to touch all three >> layers thus resulting in the changes in the apparently different >> directories. Thus, making the distinction by the pathspec (if I >> understood it right from reading the documentation) would not help. >> >> The attributes could be a solution, if I could: >> 1. create attribute designated to the feature >> 2. "mark" uncommitted files in different directory with that attribute > > 1+2 should be answered by the gitattributes man page > https://git-scm.com/docs/gitattributes > > >> 3. filter the list of unchanged files with such attribute > > This sounds like one of > "git status :(attr:backend) ." > "git status :(exclude,attr:backend) ." > >> 4. create commit for the files only with the certain attribute >> >> You've kindly demonstrated that #4 is doable; however I could not >> clearly get for the Git documentation if #1 - #3 are achievable... >> Could you point me to the right place in the documentation, please? >>
Re: [PATCH 1/7] api-builtin.txt: document SUPPORT_SUPER_PREFIX
On 07/11, Martin Ågren wrote: > On 11 July 2017 at 00:50, Brandon Williamswrote: > > On 07/10, Martin Ågren wrote: > >> Commit 74866d75 ("git: make super-prefix option", 2016-10-07) introduced > >> SUPPORT_SUPER_PREFIX as a builtin flag without documenting it in > >> api-builtin.txt. The next patch will add another flag, so document > >> SUPPORT_SUPER_PREFIX, thereby bringing the documentation up to date with > >> the available flags. > >> > >> Signed-off-by: Martin Ågren > >> --- > >> Documentation/technical/api-builtin.txt | 4 > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/Documentation/technical/api-builtin.txt > >> b/Documentation/technical/api-builtin.txt > >> index 22a39b929..60f442822 100644 > >> --- a/Documentation/technical/api-builtin.txt > >> +++ b/Documentation/technical/api-builtin.txt > >> @@ -42,6 +42,10 @@ where options is the bitwise-or of: > >> on bare repositories. > >> This only makes sense when `RUN_SETUP` is also set. > >> > >> +`SUPPORT_SUPER_PREFIX`:: > >> + > >> + The builtin supports --super-prefix. > >> + > >> . Add `builtin/foo.o` to `BUILTIN_OBJS` in `Makefile`. > >> > >> Additionally, if `foo` is a new command, there are 3 more things to do: > > > > I added SUPER_PREFIX as an implementation detail when trying to recurse > > submodules using multiple processes. Now that we have a 'struct > > repository' my plan is to remove SUPER_PREFIX in its entirety. Since > > this won't happen overnight it may still take a bit till its removed so > > maybe it makes sense to better document it until that happens? > > I could add something about how this is "temporary" although I have no > idea about the timeframe. ;-) Its probably better to ensure that things are documented (even if they are temporary) but you don't need to mark it as such. And as far as timeframe, that's my burden to bare ;) > > > Either way I think that this sort of Documention better lives in the > > code as it is easier to keep up to date. > > Agreed and I believe that's the long-term goal. I could replace this > preparatory patch with another one where I move api-builtin.txt into > builtin.h or git.c (and document SUPPORT_SUPER_PREFIX if that's > wanted). That's probably better, since right now, patch 2/7 adds > basically the same documentation for the new flag in two places. I know that sort of migration may be out of the scope of your series (since its just documentation), though you're more than welcome to do the work as it would be much appreciated by me and a few other people :) -- Brandon Williams
Re: "groups of files" in Git?
On Tue, Jul 11, 2017 at 11:10 AM, Nikolay Shustovwrote: > Thank you for the explanation. The example about backend and frontend > is relevant even though I normally have to deal with more layers at > the same time. > > However, in my case I have the thing that you have already tried to > address, partially: the changes always align with file boundaries BUT > not with directory boundaries. Imagine you have the stack of backend, > data transport and frontend layers. The feature has to touch all three > layers thus resulting in the changes in the apparently different > directories. Thus, making the distinction by the pathspec (if I > understood it right from reading the documentation) would not help. > > The attributes could be a solution, if I could: > 1. create attribute designated to the feature > 2. "mark" uncommitted files in different directory with that attribute 1+2 should be answered by the gitattributes man page https://git-scm.com/docs/gitattributes > 3. filter the list of unchanged files with such attribute This sounds like one of "git status :(attr:backend) ." "git status :(exclude,attr:backend) ." > 4. create commit for the files only with the certain attribute > > You've kindly demonstrated that #4 is doable; however I could not > clearly get for the Git documentation if #1 - #3 are achievable... > Could you point me to the right place in the documentation, please? >
Re: "groups of files" in Git?
Thank you for the explanation. The example about backend and frontend is relevant even though I normally have to deal with more layers at the same time. However, in my case I have the thing that you have already tried to address, partially: the changes always align with file boundaries BUT not with directory boundaries. Imagine you have the stack of backend, data transport and frontend layers. The feature has to touch all three layers thus resulting in the changes in the apparently different directories. Thus, making the distinction by the pathspec (if I understood it right from reading the documentation) would not help. The attributes could be a solution, if I could: 1. create attribute designated to the feature 2. "mark" uncommitted files in different directory with that attribute 3. filter the list of unchanged files with such attribute 4. create commit for the files only with the certain attribute You've kindly demonstrated that #4 is doable; however I could not clearly get for the Git documentation if #1 - #3 are achievable... Could you point me to the right place in the documentation, please? On Tue, Jul 11, 2017 at 1:27 PM, Junio C Hamanowrote: > Nikolay Shustov writes: > >> I have to work on several features in the same code tree parallel, in >> the same Perforce workspace. The major reason why I cannot work on one >> feature then on another is just because I have to make sure that the >> changes in the related areas of the product play together well. >> >> With Perforce, I can have multiple changelists opened, that group the >> changed files as needed. >> >> With Git I cannot seem to finding the possibility to figure out how to >> achieve the same result. And the problem is that putting change sets >> on different Git branches (or workdirs, or whatever Git offers that >> makes the changes to be NOT in the same source tree) is not a viable >> option from me ... > > Naturally. If these separate changes need to work together, it is > way too inconvenient if these changes do not appear in a single > unified working tree to be built and tested. > >> Is it worth considering adding to Git a feature like "group of files" >> that would offer some virtutal grouping of the locally changed files >> in the checked-out branch? > > Let's step back and let me make sure if I understand you correctly. > You want to work on a system with two distinct areas (say, the > frontend and the backend), that have to work together, but you want > to make two commits, one for each area. You make changes for both > areas in your working tree, build and test them to make sure the > whole thing works well together, and at the end, you make two > commits. > > In your real project, you may be doing more than two areas and more > than two commits, but is the above a good degenerative case that > shows the basic idea? If not, then please disregard all of the > following. > > You can make partial commits in Git. In the simplest case, you may > have two separate files backend.py and frontend.py, you make edits > to both files and then make two commits: > > $ git commit backend.py > $ git commit frontend.py > > Changes to some files may contain both changes for the backend and > for the frontend that does not allow you to separate commits at file > boundary, and Git even lets you handle such a case. If you have the > third file in addition to the above two, called common.py, you could > instead > > $ git add backend.py > $ git add -p common.py > > to prepare the index to contain only the changes for the backend > part ("add -p" lets you interactively pick and choose the hunks > relevant to the backend part), and conclude the commit for the > backend part with > > $ git commit ;# no paths arguments > > and then when all the remaining changes are for the frontend part, > you can follow it with > > $ git commit -a > > to make another commit for the frontend part. > > A short answer to your question, provided if I understood you > correctly, is "no, there is no way to say 'backend.py, backend-2.py, > ...' are the backend things and call it a filegroup", accompanied by > "a filegroup would only be effective when changes align with file > boundary". > > And if your response is "but most of the time changes align with > file boundary", a counter-response is "and most of the time changes > align with directory boundary (in well structured project, at > least), so you can do 'git commit backend/' for all the backend part > without having to name all the paths anyway". > > There is an experimental "attributes-limited pathspec" feature in > recent versions of Git, which lets you assign arbitrary sets of > labels to each paths, and using that you may be able to do > > $ git commit ':(attr:filegroup=backend).' > > and I suspect that would be the closest thing you would want (read > about 'pathspec' in 'git help glossary') >
Re: [PATCH 4/4] hook: add a simple first example
On Tue, 2017-07-11 at 09:03 -0700, Junio C Hamano wrote: > But does it even be useful to enable it? Just for the note, I thought it was considered useful (at least to someone) due to it's presence in the sample script.
Re: [PATCH 4/4] hook: add a simple first example
On Tue, 2017-07-11 at 09:03 -0700, Junio C Hamano wrote: > But does it even be useful to enable it? The commented out "git > status" output already lists the modified paths, so I do not think > anything is gained by adding 'diff --cached --name-status' there. The script *doesn't* add the output of "diff --cached --name-status" near the status output. It adds it above the first comment and of course in an uncommented form. So, should we remove it (diff --cached --name-status) altogether?
Re: [RFC/WIP PATCH] object store classification
On 07/10, Stefan Beller wrote: > On Fri, Jul 7, 2017 at 9:50 AM, Junio C Hamanowrote: > > Ben Peart writes: > > > >> For more API/state design purity, I wonder if there should be an > >> object_store structure that is passed to each of the object store APIs > >> instead of passing the repository object. The repository object could > >> then contain an instance of the object_store structure. > >> > >> That said, I haven't take a close look at all the code in object.c to > >> see if all the data needed can be cleanly abstracted into an > >> object_store structure. > > > > My gut feeling was it is just the large hashtable that keeps track of > > objects we have seen, but the object replacement/grafts and other > > things may also want to become per-repository. > > This is similar to the_index which is referenced by the_repository. > But as we do not have anything like the_object_store already, we are > free to design it, as the required work that needs to be put in is the > same. > > With the object replacements/grafts coming up as well as alternates, > we definitely want that to be per repository, the question is if we rather > want > > the_repository -> many object_stores (one for each, alternate, grafts, > and the usual place at $GIT_DIR/objects > where the object_store is a hashmap, maybe an additional describing > string or path. > > or > > the_repository -> the_object_store > but the object store is a complex beast having different hash tables > for the different alternates. After looking at the patch and some of the comments here I think that this is probably the best approach with a few tweaks (which may be completely unfounded because I'm not familiar with all the object store code). In an OO world I would envision a single object (let's say 'struct object_store') which is responsible for managing a repository's objects no matter where the individual objects came from (main object store or an alternate for that repository). And if I understand correctly the single hash table that exists now caches objects like this. I also think that such a 'struct object_store' should probably be an opaque type to a majority of the code base. This means that it probably shouldn't have its definition in 'repository.h'. As far as API, I think it should be similar to the new repo_config (old one too, though it was implicit) API in that the code base doesn't need to know about 'struct configset', it just passes a pointer to the repository and then the 'struct configset' which is stored in the repository is operated on under the hood. This way the code base would just query for an object using the repository as a handle like: get_object(repo, OID); and not: get_object(repo->object_store, OID); Of course under the hood it would be preferable to have the functions operate on the object_store struct explicitly. > > or > > the_repository -> the_object_store_hash_map > which is this patch that would try to put any object related to this > repository into the same hashmap and the hashmap is not special > for each of the different object locations. > > > > > >> One concern I have is that the global state refactoring effort will > >> just result in all the global state getting moved into a single > >> (global) repository object thus limiting it's usefulness. I think we do need to think about this, but it shouldn't be too much of a concern right now. The first step is to get enough of the object store object oriented such that you can have two object stores corresponding to two different repositories working in parallel. > > > > I actually am not worried about it that much, and I say this with > > the background of having done the same "grouping a set of global > > state variables into a single structure and turning them into a > > single default instance" for the_index. Whether you like it or not, > > the majority of operations do work on the default instance---that > > was why the operations could live with just "a set of global state > > variables" in the first place, and the convenience compatibility > > macros that allow you to operate on the fields of the default > > instance as if they were separate variables have been a huge > > typesaver that also reduces the cognitive burden. I'd expect that > > the same will hold for the new "repository" and the "object_store" > > abstractions. > > Sounds reasonable to expect. > > Thanks, > Stefan -- Brandon Williams
Re: "groups of files" in Git?
Thank you for the idea, however I am having troubles with basically maintaining the uncommitted groups of files: I would prefer the clear distinction that "those files belong to feature A" and "these files belong to feature B", before I commit anything. Committing separately every change for feature A and for feature B would probably a good option unless I have many changes and then cherry-picking the proper commits to create a single changeset for the integration would become a nightmare. On Tue, Jul 11, 2017 at 1:39 PM, Lars Schneiderwrote: > >> On 11 Jul 2017, at 17:45, Nikolay Shustov wrote: >> >> Hi, >> I have been recently struggling with migrating my development workflow >> from Perforce to Git, all because of the following thing: >> >> I have to work on several features in the same code tree parallel, in >> the same Perforce workspace. The major reason why I cannot work on one >> feature then on another is just because I have to make sure that the >> changes in the related areas of the product play together well. >> >> With Perforce, I can have multiple changelists opened, that group the >> changed files as needed. >> >> With Git I cannot seem to finding the possibility to figure out how to >> achieve the same result. And the problem is that putting change sets >> on different Git branches (or workdirs, or whatever Git offers that >> makes the changes to be NOT in the same source tree) is not a viable >> option from me as I would have to re-build code as I re-integrate the >> changes between the branches (or whatever changes separation Git >> feature is used). >> Build takes time and resources and considering that I have to do it on >> multiple platforms (I do cross-platform development) it really >> denominates the option of not having multiple changes in the same code >> tree. >> >> Am I ignorant about some Git feature/way of using Git that would help? >> Is it worth considering adding to Git a feature like "group of files" >> that would offer some virtutal grouping of the locally changed files >> in the checked-out branch? > > Interesting question that came up at my workplace, too. > > Here is what I suggested: > 1. Keep working on a single branch and make commits for all features > 2. If you make a commit, prefix the commit message with the feature name > 3. After you are done with a feature create a new feature branch based on >your combined feature branch. Use `git rebase -i` [1] to remove all >commits that are not relevant for the feature. Alternatively you could >cherry pick the relevant commits [2] if this is faster. > > I wonder what others think about this solution. Maybe there is a better > solution that I overlooked? > > - Lars > > [1] > https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history > [2] > http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html >
Re: git config --help not functional on bad config
On Tue, Jul 11, 2017 at 03:49:21PM +0100, Peter Krefting wrote: > That's fine. However, when trying to look for help, it is not that useful: > > $ git config --help > error: malformed value for branch.autosetuprebase > fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' > at line 24 > > Perhaps it should allow "--help" to go through even if the configuration is > bad? Yes, I agree the current behavior is poor. What's happening under the hood is that "--help" for any command runs "git help config", which in turn looks at the config to pick up things like help.format. But it also loads git_default_config(), which I suspect isn't actually useful. It goes all the way back to 70087cdbd (git-help: add "help.format" config variable., 2007-12-15), and it looks like it was probably added just to match other config callbacks. So I think we could probably just do this: diff --git a/builtin/help.c b/builtin/help.c index 334a8494a..c42dfc9e9 100644 --- a/builtin/help.c +++ b/builtin/help.c @@ -273,7 +273,7 @@ static int git_help_config(const char *var, const char *value, void *cb) if (starts_with(var, "man.")) return add_man_viewer_info(var, value); - return git_default_config(var, value, cb); + return 0; } static struct cmdnames main_cmds, other_cmds; which makes your case work. -Peff
Re: "groups of files" in Git?
> The way of Git is that a commit (snapshot) by definition describes a > set of files (The set of all files in the project). So If you need two > features > there at the same time, you probably want it in the same commit. Thank you, but if I wanted these two features to be in the same commit, I would have no reasons to see them as two distinctive groups. I mean, groups of uncommitted files. The general problem of not having multiple features in the same code tree is the cost of doing multiple builds and integration testing runs. Now I imagine there could be workaround of having two features developed at different branches and then merging them into 3rd branch for building/testing; however this introduces overhead of maintaining at lest two code trees: one for "dirty changes" where I do the code changes that are not guaranteed to be even build-able and another "build/test" code tree. Plus merging the changes from one to another. A bit too much, IMHO. On Tue, Jul 11, 2017 at 1:18 PM, Stefan Bellerwrote: > On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustov > wrote: >> Hi, >> I have been recently struggling with migrating my development workflow >> from Perforce to Git, all because of the following thing: >> >> I have to work on several features in the same code tree parallel, in >> the same Perforce workspace. The major reason why I cannot work on one >> feature then on another is just because I have to make sure that the >> changes in the related areas of the product play together well. > > So in that case the features are not independent, but related to each other? > In that case you want to have these things in the same working tree as > well as in the same branch. > > Take a look at git.git itself, for example: > > git clone git://github.com/git/git > git log --oneline --graph > > You will see a lot of "Merge X into master/maint" commits, but then > you may want to dive into each feature by: > > git log --oneline e83e71c5e1 > > for example and then you'll see lots of commits (that were developed > in the same branch), but that are closely related. However they are > different enough to be in different commits. (different features, as > I understand) > >> With Perforce, I can have multiple changelists opened, that group the >> changed files as needed. >> >> With Git I cannot seem to finding the possibility to figure out how to >> achieve the same result. And the problem is that putting change sets >> on different Git branches (or workdirs, or whatever Git offers that >> makes the changes to be NOT in the same source tree) is not a viable >> option from me as I would have to re-build code as I re-integrate the >> changes between the branches (or whatever changes separation Git >> feature is used). > > you would merge the branches and then run the tests/integration. Yes that > seems cumbersome. > >> Build takes time and resources and considering that I have to do it on >> multiple platforms (I do cross-platform development) it really >> denominates the option of not having multiple changes in the same code >> tree. >> >> Am I ignorant about some Git feature/way of using Git that would help? >> Is it worth considering adding to Git a feature like "group of files" >> that would offer some virtutal grouping of the locally changed files >> in the checked-out branch? > > The way of Git is that a commit (snapshot) by definition describes a > set of files (The set of all files in the project). So If you need two > features > there at the same time, you probably want it in the same commit. > > If they are different enough such that you could have them independently, > but really want to test them together, your testing may need to become > more elaborate (test a merge of all feature branches) I would think. > >> >> Thanks in advance, >> - Nikolay
Re: "groups of files" in Git?
> On 11 Jul 2017, at 17:45, Nikolay Shustovwrote: > > Hi, > I have been recently struggling with migrating my development workflow > from Perforce to Git, all because of the following thing: > > I have to work on several features in the same code tree parallel, in > the same Perforce workspace. The major reason why I cannot work on one > feature then on another is just because I have to make sure that the > changes in the related areas of the product play together well. > > With Perforce, I can have multiple changelists opened, that group the > changed files as needed. > > With Git I cannot seem to finding the possibility to figure out how to > achieve the same result. And the problem is that putting change sets > on different Git branches (or workdirs, or whatever Git offers that > makes the changes to be NOT in the same source tree) is not a viable > option from me as I would have to re-build code as I re-integrate the > changes between the branches (or whatever changes separation Git > feature is used). > Build takes time and resources and considering that I have to do it on > multiple platforms (I do cross-platform development) it really > denominates the option of not having multiple changes in the same code > tree. > > Am I ignorant about some Git feature/way of using Git that would help? > Is it worth considering adding to Git a feature like "group of files" > that would offer some virtutal grouping of the locally changed files > in the checked-out branch? Interesting question that came up at my workplace, too. Here is what I suggested: 1. Keep working on a single branch and make commits for all features 2. If you make a commit, prefix the commit message with the feature name 3. After you are done with a feature create a new feature branch based on your combined feature branch. Use `git rebase -i` [1] to remove all commits that are not relevant for the feature. Alternatively you could cherry pick the relevant commits [2] if this is faster. I wonder what others think about this solution. Maybe there is a better solution that I overlooked? - Lars [1] https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history [2] http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html
Re: "groups of files" in Git?
Nikolay Shustovwrites: > I have to work on several features in the same code tree parallel, in > the same Perforce workspace. The major reason why I cannot work on one > feature then on another is just because I have to make sure that the > changes in the related areas of the product play together well. > > With Perforce, I can have multiple changelists opened, that group the > changed files as needed. > > With Git I cannot seem to finding the possibility to figure out how to > achieve the same result. And the problem is that putting change sets > on different Git branches (or workdirs, or whatever Git offers that > makes the changes to be NOT in the same source tree) is not a viable > option from me ... Naturally. If these separate changes need to work together, it is way too inconvenient if these changes do not appear in a single unified working tree to be built and tested. > Is it worth considering adding to Git a feature like "group of files" > that would offer some virtutal grouping of the locally changed files > in the checked-out branch? Let's step back and let me make sure if I understand you correctly. You want to work on a system with two distinct areas (say, the frontend and the backend), that have to work together, but you want to make two commits, one for each area. You make changes for both areas in your working tree, build and test them to make sure the whole thing works well together, and at the end, you make two commits. In your real project, you may be doing more than two areas and more than two commits, but is the above a good degenerative case that shows the basic idea? If not, then please disregard all of the following. You can make partial commits in Git. In the simplest case, you may have two separate files backend.py and frontend.py, you make edits to both files and then make two commits: $ git commit backend.py $ git commit frontend.py Changes to some files may contain both changes for the backend and for the frontend that does not allow you to separate commits at file boundary, and Git even lets you handle such a case. If you have the third file in addition to the above two, called common.py, you could instead $ git add backend.py $ git add -p common.py to prepare the index to contain only the changes for the backend part ("add -p" lets you interactively pick and choose the hunks relevant to the backend part), and conclude the commit for the backend part with $ git commit ;# no paths arguments and then when all the remaining changes are for the frontend part, you can follow it with $ git commit -a to make another commit for the frontend part. A short answer to your question, provided if I understood you correctly, is "no, there is no way to say 'backend.py, backend-2.py, ...' are the backend things and call it a filegroup", accompanied by "a filegroup would only be effective when changes align with file boundary". And if your response is "but most of the time changes align with file boundary", a counter-response is "and most of the time changes align with directory boundary (in well structured project, at least), so you can do 'git commit backend/' for all the backend part without having to name all the paths anyway". There is an experimental "attributes-limited pathspec" feature in recent versions of Git, which lets you assign arbitrary sets of labels to each paths, and using that you may be able to do $ git commit ':(attr:filegroup=backend).' and I suspect that would be the closest thing you would want (read about 'pathspec' in 'git help glossary')
RE: "groups of files" in Git?
-Original Message- On July 11, 2017 11:45 AM Nikolay Shustov wrote: >I have been recently struggling with migrating my development workflow from >Perforce to Git, all because of the following thing: >I have to work on several features in the same code tree parallel, in the same >Perforce workspace. The major reason why I cannot >work on one feature then on another is just because I have to make sure that >the changes in the related areas of the product play together well. >With Perforce, I can have multiple changelists opened, that group the changed >files as needed. >With Git I cannot seem to finding the possibility to figure out how to achieve >the same result. And the problem is >that putting change sets on different Git branches (or workdirs, or whatever >Git offers that makes the changes to >be NOT in the same source tree) is not a viable option from me as I would have >to re-build code as I re-integrate >the changes between the branches (or whatever changes separation Git feature >is used). >Build takes time and resources and considering that I have to do it on >multiple platforms (I do cross-platform development) it really denominates the >option of not having multiple changes in the same code tree. Change sets are core git functionality. When you commit, you commit a group of changes across multiple files, not single file at a time, like most legacy SCM systems. Individual features are managed typically managed using topic branches that can be switched (using checkout) rapidly, which in your case will cause a localized rebuild based on what files were swapped. If you need something slightly different than topic branches, do multiple clones off a base integration branch. This will give you multiple working source trees. Switch each clone to its own branch and work with them locally. If you need to move changes from one branch to another, commit, push on one branch, and pull merge to the other branch. You could also use stash to accomplish similar things, but I wouldn't. Cheers, Randall
Re: "groups of files" in Git?
On Tue, Jul 11, 2017 at 8:45 AM, Nikolay Shustovwrote: > Hi, > I have been recently struggling with migrating my development workflow > from Perforce to Git, all because of the following thing: > > I have to work on several features in the same code tree parallel, in > the same Perforce workspace. The major reason why I cannot work on one > feature then on another is just because I have to make sure that the > changes in the related areas of the product play together well. So in that case the features are not independent, but related to each other? In that case you want to have these things in the same working tree as well as in the same branch. Take a look at git.git itself, for example: git clone git://github.com/git/git git log --oneline --graph You will see a lot of "Merge X into master/maint" commits, but then you may want to dive into each feature by: git log --oneline e83e71c5e1 for example and then you'll see lots of commits (that were developed in the same branch), but that are closely related. However they are different enough to be in different commits. (different features, as I understand) > With Perforce, I can have multiple changelists opened, that group the > changed files as needed. > > With Git I cannot seem to finding the possibility to figure out how to > achieve the same result. And the problem is that putting change sets > on different Git branches (or workdirs, or whatever Git offers that > makes the changes to be NOT in the same source tree) is not a viable > option from me as I would have to re-build code as I re-integrate the > changes between the branches (or whatever changes separation Git > feature is used). you would merge the branches and then run the tests/integration. Yes that seems cumbersome. > Build takes time and resources and considering that I have to do it on > multiple platforms (I do cross-platform development) it really > denominates the option of not having multiple changes in the same code > tree. > > Am I ignorant about some Git feature/way of using Git that would help? > Is it worth considering adding to Git a feature like "group of files" > that would offer some virtutal grouping of the locally changed files > in the checked-out branch? The way of Git is that a commit (snapshot) by definition describes a set of files (The set of all files in the project). So If you need two features there at the same time, you probably want it in the same commit. If they are different enough such that you could have them independently, but really want to test them together, your testing may need to become more elaborate (test a merge of all feature branches) I would think. > > Thanks in advance, > - Nikolay
Re: [PATCH 0/7] tag: more fine-grained pager-configuration
Jeff Kingwrites: >> I see you CC'ed Peff who's passionate in this area, so these patches >> are in good hands already ;-) I briefly skimmed your patches myself, >> and did not spot anything glaringly wrong. > > Heh, I don't think of "paging tag output" as one of my passions, but you > may be right. :) Perhaps not "tag output", but the paging is the area you are the person who spent most time on. > I left a few comments on the code and direction, but I agree that > overall it looks pretty good. And I'm very impressed with the attention > to detail for a first-time contributor. Yes.
Re: Weirdness with git change detection
Junio C Hamanowrites: > Peter Eckersley writes: > >> I have a local git repo that's in some weird state where changes >> appear to be detected by "git diff" and prevent operations like "git >> checkout" from switching branches, but those changes are not removed >> by a "git reset --hard" or "git stash". >> >> Here's an example of the behaviour, with "git reset --hard" failing to >> clear a diff in the index: >> >> https://paste.debian.net/975811/ >> >> Happy to collect additional debugging information if it's useful. > > Do you have any funny clean-smudge pair that do not round-trip? Ah, nevermind. Peff's analysis looks correct. Thanks for a report to provoke a good discussion.
Re: [PATCH 4/4] hook: add a simple first example
Kaartic Sivaraamwrites: > I'm was trying to ask, "Is there any way to change the second example > (diff --name-status) to make it work with "commit --amend" so that it > could be uncommented by default ?" But does it even be useful to enable it? The commented out "git status" output already lists the modified paths, so I do not think anything is gained by adding 'diff --cached --name-status' there.
Re: Weirdness with git change detection
Peter Eckersleywrites: > I have a local git repo that's in some weird state where changes > appear to be detected by "git diff" and prevent operations like "git > checkout" from switching branches, but those changes are not removed > by a "git reset --hard" or "git stash". > > Here's an example of the behaviour, with "git reset --hard" failing to > clear a diff in the index: > > https://paste.debian.net/975811/ > > Happy to collect additional debugging information if it's useful. Do you have any funny clean-smudge pair that do not round-trip?
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
Lars Schneiderwrites: > Thanks for the explanation! I looked at the Git release calendar [1] and > realized that 2.14-rc0 is scheduled for this Thursday. My assumption was > that either on this date 2.14 will be cut from the tip of master or master > would not change significantly after the rc0 date until the 2.14 release. Your assumption is still correct. Undercooked topics may be reclassified to "Will cook in 'next'" any time. Thanks. > [1] http://tinyurl.com/gitCal
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
Jeff Kingwrites: > On Tue, Jul 11, 2017 at 10:28:08AM +0200, Lars Schneider wrote: > >> > On 11 Jul 2017, at 00:48, Junio C Hamano wrote: >> > >> > * ls/filter-process-delayed (2017-06-30) 7 commits >> > (merged to 'next' on 2017-07-05 at a35e644082) >> > + convert: add "status=delayed" to filter process protocol >> > + convert: refactor capabilities negotiation >> > + convert: move multiple file filter error handling to separate function >> > + convert: put the flags field before the flag itself for consistent style >> > + t0021: write "OUT " only on success >> > + t0021: make debug log file name configurable >> > + t0021: keep filter log files on comparison >> > >> > The filter-process interface learned to allow a process with long >> > latency give a "delayed" response. >> > >> > Will merge to 'master'. >> >> Hi Junio, >> >> can you already tell if this topic has still a chance to be part of 2.14? > > I'm not Junio, but we should be able to reason it out. :) I am ;-). > It's marked as "will merge to master", which typically means it will > happen during the next integration cycle (i.e., within a few days when > the next "What's cooking" is written). Just like being in 'next' is not a promise for a topic to be in the upcoming release (or any future one for that matter), "Will merge to X" merely means "With the current shape of the series and with the best of our current knowledge, this is thought to be mergeable to X at some point in the future". When a more urgent topic that conflicts heavily with it appears, when a serious bug is found in the topic, etc., "our current best knowledge" may be invalidated. Anticipating such an event that changes our assumption happening, I try to keep a topic in 'next' for at least a week, if it is a non-trivial topic that changes more than a few dozen lines (not counting tests and docs). For things that touch deeper guts of the system, I'd prefer to cook it longer. For more trivial things, it may only be a day or two. So "at some point in the future" varies depending on the weight of a topic. > Since 2.14 will be cut from the tip of master in a few weeks, once > it's merged it will definitely be in 2.14 (barring a revert, of > course). This holds true during release freeze, too. Anything > that makes it to master is part of the release. This is correct. > The stopping point there is that things stop getting marked as > "will merge to master". This is correct, if you allow the possibility that a thing that used to be marked as "Will merge to 'master'" gets demoted to "Will cook in 'next'" (and you need to anticipate that possibility---I try to demote topics that got in to 'next' just before -rc0 to to "Will cook" when tagging -rc0, unless they are fixes that are not too involved). I probably should have marked this as "Will cook in 'next'" in the first place. The practice has been to blindly promote "Will merge to 'next'" to "Will merge to 'master'" by default when a topic gets merged to 'next', and then inside of the first week try to find a chance to re-read the topic to decide to demote it to "Will cook".
"groups of files" in Git?
Hi, I have been recently struggling with migrating my development workflow from Perforce to Git, all because of the following thing: I have to work on several features in the same code tree parallel, in the same Perforce workspace. The major reason why I cannot work on one feature then on another is just because I have to make sure that the changes in the related areas of the product play together well. With Perforce, I can have multiple changelists opened, that group the changed files as needed. With Git I cannot seem to finding the possibility to figure out how to achieve the same result. And the problem is that putting change sets on different Git branches (or workdirs, or whatever Git offers that makes the changes to be NOT in the same source tree) is not a viable option from me as I would have to re-build code as I re-integrate the changes between the branches (or whatever changes separation Git feature is used). Build takes time and resources and considering that I have to do it on multiple platforms (I do cross-platform development) it really denominates the option of not having multiple changes in the same code tree. Am I ignorant about some Git feature/way of using Git that would help? Is it worth considering adding to Git a feature like "group of files" that would offer some virtutal grouping of the locally changed files in the checked-out branch? Thanks in advance, - Nikolay
Re: [PATCH] strbuf: use designated initializers in STRBUF_INIT
Ben Peartwrites: >> If this patch can survive a few releases without complaint, >> then we can feel more confident that designated initializers >> are widely supported by our user base. It also is an >> indication that other C99 features may be supported, but not >> a guarantee (e.g., gcc had designated initializers before >> C99 existed). > > Correct. MSVC also supports designated initializers but does not > fully support C99. Thanks for a datapoint. Just so that people do not misunderstand, it is not our goal to declare that now you need a fully C99 compiler to build Git. When deciding what shell scripting tools with what options in our scripted Porcelains and tests, we use POSIX.1 as a rough guideline. We say "let's not use this, as it is not even in POSIX" but we never say "use of it is OK because it is in POSIX", and we sometimes even say "even it is in POSIX, various implementation of it is buggy and it does not work properly in practice" to certain things [*1*]. C89 has been the base of such a guideline for our C programs, and people must not to view this patch as an attempt to raise the base to C99. It is rather an attempt to see how far we can safely raise the base by checking some selected useful new features [*2*] that we have had occasions to wish that they were available, and designated initializer for structure fields is one of them. We may find out that, after starting with "C89, plus this and that feature that are known to be commonly supported", the guideline may become more succinct to say "C99, minus this and that feature that are not usable on some platforms", if it turns out that majority of the systems that are not fully C99 have all of the things we care about. We do not know yet, and we are only at the beginning of the journey to find it out. [Footnote] *1* Like `backtick` command substitutions that is very hard to make them nest properly and impossible to mix portably with quoting. *2* New, relative to our current practice, that is.
git config --help not functional on bad config
So, I set the wrong value for a configuration option, and git tells me: $ git config branch.autoSetupRebase false $ git log error: malformed value for branch.autosetuprebase fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' at line 24 That's fine. However, when trying to look for help, it is not that useful: $ git config --help error: malformed value for branch.autosetuprebase fatal: bad config variable 'branch.autosetuprebase' in file '.git/config' at line 24 Perhaps it should allow "--help" to go through even if the configuration is bad? After a "git config --unset branch.autosetuprebase" everything works again, but I had to look that up manually calling "man git-config" (afterwards I realized I could go out of the repo to be unaffected by the config, but that probably wouldn't have helped if I had put this in my global config). -- \\// Peter - http://www.softwolves.pp.se/
Re: [PATCH/RFC] commit & merge: modularize the empty message validator
Sorry, forgot to add the RFC suffix to [PATCH]. Please consider it's presence. -- Kaartic
[PATCH 4/4] hook: add a simple first example
Add a simple example that replaces an outdated example that was removed. This ensures that there's at the least a simple example that illustrates what could be done using the hook just by enabling it. Also, update the documentation. Signed-off-by: Kaartic Sivaraam--- Documentation/githooks.txt | 3 +++ templates/hooks--prepare-commit-msg.sample | 9 ++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index fdc01aa25..59f38efba 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -121,6 +121,9 @@ it is not suppressed by the `--no-verify` option. A non-zero exit means a failure of the hook and aborts the commit. It should not be used as replacement for pre-commit hook. +The sample `prepare-commit-msg` hook that comes with Git removes the +help message found in the commented portion of the commit template. + commit-msg ~~ diff --git a/templates/hooks--prepare-commit-msg.sample b/templates/hooks--prepare-commit-msg.sample index 87d770592..dc707e46e 100755 --- a/templates/hooks--prepare-commit-msg.sample +++ b/templates/hooks--prepare-commit-msg.sample @@ -9,20 +9,23 @@ # # To enable this hook, rename this file to "prepare-commit-msg". -# This hook includes two examples. +# This hook includes three examples. The first one removes the +# "# Please enter the commit message..." help message. # -# The first includes the output of "git diff --name-status -r" +# The second includes the output of "git diff --name-status -r" # into the message, just before the "git status" output. It is # commented because it doesn't cope with --amend or with squashed # commits. # -# The second example adds a Signed-off-by line to the message, that can +# The third example adds a Signed-off-by line to the message, that can # still be edited. This is rarely a good idea. COMMIT_MSG_FILE=$1 COMMIT_SOURCE=$2 SHA1=$3 +@PERL_PATH@ -i.bak -ne 'print unless(m/^. Please enter the commit message/..m/^#$/)' "$COMMIT_MSG_FILE" + # case "$COMMIT_SOURCE,$SHA1" in # ,|template,) #@PERL_PATH@ -i.bak -pe ' -- 2.13.2.957.g457671ade
[PATCH] commit & merge: modularize the empty message validator
In the context of "git merge" the meaning of an "empty message" is one that contains no line of text. This is not in line with "git commit" where an "empty message" is one that contains only whitespaces and/or signed-off-by lines. This could cause surprises to users who are accustomed to the meaning of an "empty message" of "git commit". Prevent such surprises by ensuring the meaning of an empty 'merge message' to be in line with that of an empty 'commit message'. This is done by separating the empty message validator from 'commit' and making it stand-alone. Signed-off-by: Kaartic Sivaraam--- I have made an attempt to solve the issue by separating the concerned function as I found no reason against it. I've tried to name them with what felt appropriate and concise to me. Let me know if it's alright. Makefile| 1 + builtin/commit.c| 39 +-- builtin/merge.c | 3 ++- message-validator.c | 34 ++ message-validator.h | 6 ++ 5 files changed, 48 insertions(+), 35 deletions(-) create mode 100644 message-validator.c create mode 100644 message-validator.h diff --git a/Makefile b/Makefile index ffa6da71b..c1c26e434 100644 --- a/Makefile +++ b/Makefile @@ -783,6 +783,7 @@ LIB_OBJS += merge.o LIB_OBJS += merge-blobs.o LIB_OBJS += merge-recursive.o LIB_OBJS += mergesort.o +LIB_OBJS += message-validator.o LIB_OBJS += mru.o LIB_OBJS += name-hash.o LIB_OBJS += notes.o diff --git a/builtin/commit.c b/builtin/commit.c index 8d1cac062..4c3112bb4 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -33,6 +33,7 @@ #include "notes-utils.h" #include "mailmap.h" #include "sigchain.h" +#include "message-validator.h" static const char * const builtin_commit_usage[] = { N_("git commit [] [--] ..."), @@ -979,41 +980,11 @@ static int prepare_to_commit(const char *index_file, const char *prefix, return 1; } -static int rest_is_empty(struct strbuf *sb, int start) -{ - int i, eol; - const char *nl; - - /* Check if the rest is just whitespace and Signed-of-by's. */ - for (i = start; i < sb->len; i++) { - nl = memchr(sb->buf + i, '\n', sb->len - i); - if (nl) - eol = nl - sb->buf; - else - eol = sb->len; - - if (strlen(sign_off_header) <= eol - i && - starts_with(sb->buf + i, sign_off_header)) { - i = eol; - continue; - } - while (i < eol) - if (!isspace(sb->buf[i++])) - return 0; - } - - return 1; -} - -/* - * Find out if the message in the strbuf contains only whitespace and - * Signed-off-by lines. - */ -static int message_is_empty(struct strbuf *sb) +static int is_empty(struct strbuf *sb) { if (cleanup_mode == CLEANUP_NONE && sb->len) return 0; - return rest_is_empty(sb, 0); + return message_is_empty(sb, 0); } /* @@ -1035,7 +1006,7 @@ static int template_untouched(struct strbuf *sb) if (!skip_prefix(sb->buf, tmpl.buf, )) start = sb->buf; strbuf_release(); - return rest_is_empty(sb, start - sb->buf); + return message_is_empty(sb, start - sb->buf); } static const char *find_author_by_nickname(const char *name) @@ -1744,7 +1715,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) fprintf(stderr, _("Aborting commit; you did not edit the message.\n")); exit(1); } - if (message_is_empty() && !allow_empty_message) { + if (is_empty() && !allow_empty_message) { rollback_index_files(); fprintf(stderr, _("Aborting commit due to empty commit message.\n")); exit(1); diff --git a/builtin/merge.c b/builtin/merge.c index 703827f00..625cfb848 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -31,6 +31,7 @@ #include "gpg-interface.h" #include "sequencer.h" #include "string-list.h" +#include "message-validator.h" #define DEFAULT_TWOHEAD (1<<0) #define DEFAULT_OCTOPUS (1<<1) @@ -772,7 +773,7 @@ static void prepare_to_commit(struct commit_list *remoteheads) } read_merge_msg(); strbuf_stripspace(, 0 < option_edit); - if (!msg.len) + if (!msg.len || message_is_empty(, 0)) abort_commit(remoteheads, _("Empty commit message.")); strbuf_release(_msg); strbuf_addbuf(_msg, ); diff --git a/message-validator.c b/message-validator.c new file mode 100644 index 0..32feb4e26 --- /dev/null +++ b/message-validator.c @@ -0,0 +1,34 @@ +#include "git-compat-util.h" +#include "sequencer.h" +#include "strbuf.h" +#include "message-validator.h" + +/* + * Find out if the message in the strbuf contains only whitespace and + * Signed-off-by
[PATCH 3/4] hook: add sign-off using "interpret-trailers"
The sample hook to prepare the commit message before a commit allows users to opt-in to add the sign-off to the commit message. The sign-off is added at a place that isn't consistent with the "-s" option of "git commit". Further, it could go out of view in certain cases. Add the sign-off in a way similar to "-s" option of "git commit" using git's interpret-trailers command. It works well in all cases except when the user invokes "git commit" without any arguments. In that case manually add a new line after the first line to ensure it's consistent with the output of "-s" option. Signed-off-by: Kaartic Sivaraam--- templates/hooks--prepare-commit-msg.sample | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/templates/hooks--prepare-commit-msg.sample b/templates/hooks--prepare-commit-msg.sample index eb5912163..87d770592 100755 --- a/templates/hooks--prepare-commit-msg.sample +++ b/templates/hooks--prepare-commit-msg.sample @@ -32,4 +32,8 @@ SHA1=$3 # esac # SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p') -# grep -qs "^$SOB" "$COMMIT_MSG_FILE" || echo "$SOB" >> "$COMMIT_MSG_FILE" +# git interpret-trailers --in-place --trailer "$SOB" "$COMMIT_MSG_FILE" +# if test -z "$COMMIT_SOURCE" +# then +# @PERL_PATH@ -i.bak -pe 'print "\n" if !$first_line++' "$COMMIT_MSG_FILE" +# fi -- 2.13.2.957.g457671ade
[PATCH 4/4] hook: add a simple first example
Add a simple example that replaces an outdated example that was removed. This ensures that there's at the least a simple example that illustrates what could be done using the hook just by enabling it. Also, update the documentation. Signed-off-by: Kaartic Sivaraam--- Documentation/githooks.txt | 3 +++ templates/hooks--prepare-commit-msg.sample | 5 - 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index fdc01aa25..59f38efba 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -121,6 +121,9 @@ it is not suppressed by the `--no-verify` option. A non-zero exit means a failure of the hook and aborts the commit. It should not be used as replacement for pre-commit hook. +The sample `prepare-commit-msg` hook that comes with Git removes the +help message found in the commented portion of the commit template. + commit-msg ~~ diff --git a/templates/hooks--prepare-commit-msg.sample b/templates/hooks--prepare-commit-msg.sample index 87d770592..f0228c1cb 100755 --- a/templates/hooks--prepare-commit-msg.sample +++ b/templates/hooks--prepare-commit-msg.sample @@ -9,7 +9,8 @@ # # To enable this hook, rename this file to "prepare-commit-msg". -# This hook includes two examples. +# This hook includes three examples. The first one removes the +# "# Please enter the commit message..." help message. # # The first includes the output of "git diff --name-status -r" # into the message, just before the "git status" output. It is @@ -23,6 +24,8 @@ COMMIT_MSG_FILE=$1 COMMIT_SOURCE=$2 SHA1=$3 +@PERL_PATH@ -i.bak -ne 'print unless(m/^. Please enter the commit message/..m/^#$/)' "$COMMIT_MSG_FILE" + # case "$COMMIT_SOURCE,$SHA1" in # ,|template,) #@PERL_PATH@ -i.bak -pe ' -- 2.13.2.957.g457671ade
[PATCH 2/4] hook: name the positional variables
It's always nice to have named variables instead of positional variables as they communicate their purpose well. Appropriately name the positional variables of the hook to make it easier to see what's going on. Signed-off-by: Kaartic Sivaraam--- templates/hooks--prepare-commit-msg.sample | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/templates/hooks--prepare-commit-msg.sample b/templates/hooks--prepare-commit-msg.sample index 279ddc1a7..eb5912163 100755 --- a/templates/hooks--prepare-commit-msg.sample +++ b/templates/hooks--prepare-commit-msg.sample @@ -19,14 +19,17 @@ # The second example adds a Signed-off-by line to the message, that can # still be edited. This is rarely a good idea. +COMMIT_MSG_FILE=$1 +COMMIT_SOURCE=$2 +SHA1=$3 -# case "$2,$3" in +# case "$COMMIT_SOURCE,$SHA1" in # ,|template,) #@PERL_PATH@ -i.bak -pe ' # print "\n" . `git diff --cached --name-status -r` -# if /^#/ && $first++ == 0' "$1" ;; +# if /^#/ && $first++ == 0' "$COMMIT_MSG_FILE" ;; # *) ;; # esac # SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p') -# grep -qs "^$SOB" "$1" || echo "$SOB" >> "$1" +# grep -qs "^$SOB" "$COMMIT_MSG_FILE" || echo "$SOB" >> "$COMMIT_MSG_FILE" -- 2.13.2.957.g457671ade
[PATCH 1/4] hook: cleanup script
Prepare the 'preare-commit-msg' sample script for upcoming changes. Preparation includes removal of an example that has outlived it's purpose. The example is the one that comments the "Conflicts:" part of a merge commit message. It isn't relevant anymore as it's done by default since 261f315b ("merge & sequencer: turn "Conflicts:" hint into a comment", 2014-08-28). Further update the relevant comments from the sample script and update the documentation. Signed-off-by: Kaartic Sivaraam--- Documentation/githooks.txt | 3 --- templates/hooks--prepare-commit-msg.sample | 24 ++-- 2 files changed, 10 insertions(+), 17 deletions(-) diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 706091a56..fdc01aa25 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -121,9 +121,6 @@ it is not suppressed by the `--no-verify` option. A non-zero exit means a failure of the hook and aborts the commit. It should not be used as replacement for pre-commit hook. -The sample `prepare-commit-msg` hook that comes with Git comments -out the `Conflicts:` part of a merge's commit message. - commit-msg ~~ diff --git a/templates/hooks--prepare-commit-msg.sample b/templates/hooks--prepare-commit-msg.sample index 86b8f227e..279ddc1a7 100755 --- a/templates/hooks--prepare-commit-msg.sample +++ b/templates/hooks--prepare-commit-msg.sample @@ -9,28 +9,24 @@ # # To enable this hook, rename this file to "prepare-commit-msg". -# This hook includes three examples. The first comments out the -# "Conflicts:" part of a merge commit. +# This hook includes two examples. # -# The second includes the output of "git diff --name-status -r" +# The first includes the output of "git diff --name-status -r" # into the message, just before the "git status" output. It is # commented because it doesn't cope with --amend or with squashed # commits. # -# The third example adds a Signed-off-by line to the message, that can +# The second example adds a Signed-off-by line to the message, that can # still be edited. This is rarely a good idea. -case "$2,$3" in - merge,) -@PERL_PATH@ -i.bak -ne 's/^/# /, s/^# #/#/ if /^Conflicts/ .. /#/; print' "$1" ;; -# ,|template,) -# @PERL_PATH@ -i.bak -pe ' -# print "\n" . `git diff --cached --name-status -r` -# if /^#/ && $first++ == 0' "$1" ;; - - *) ;; -esac +# case "$2,$3" in +# ,|template,) +#@PERL_PATH@ -i.bak -pe ' +# print "\n" . `git diff --cached --name-status -r` +# if /^#/ && $first++ == 0' "$1" ;; +# *) ;; +# esac # SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p') # grep -qs "^$SOB" "$1" || echo "$SOB" >> "$1" -- 2.13.2.957.g457671ade
Re: [PATCH 0/7] tag: more fine-grained pager-configuration
On Tue, Jul 11, 2017 at 03:50:39PM +0200, Martin Ågren wrote: > > Would it makes sense to just have git-tag respect pager.tag in listing > > mode, and otherwise ignore it completely? > > Yes. I doubt anyone would notice, and no-one should mind with the > change (famous last words). > > It does mean there's a precedence for individual commands to get to > choose when to honor pager.foo. If more such exceptions are added, at > some point, some command will learn to ignore pager.foo in some > particular situation and someone will consider it a regression. Yes, perhaps. One could even argue that "git tag foo" is OK to page and somebody would consider it a regression not to respect pager.tag. It seems useless to me, but at least it's not totally broken like "git tag -a foo" is. But I find it pretty unlikely, as it doesn't produce copious output. I'd worry more about eventually finding a command with two copious-output modes, and somebody wants to distinguish between them. If we can't come up with a plausible example now, I'm inclined to deal with it if and when it ever comes up. Right now I just don't want to paint ourselves into a corner, and I think we can always add more config later (e.g., tag.pageFooCommand=true or something; that's not great, but I'm mostly betting that it will never come up). > Well, I respect your hunch about .enable and .command and I certainly > don't want to take things in a direction that makes that approach less > clean. You have convinced me that I will instead try to teach git tag > to be more clever about when to use the pager, at least to see what > that looks like. Thanks. I was the original proponent of "pager.tag.list", and only after reading your series today did I think "gee, this seems unnecessarily complex". So it's entirely possible I'm missing a corner case that you may uncover through discussion or experimenting. > Let's call such a "git tag" the "future git tag". Just to convince > myself I've thought through the implications -- how would > pager.tag.enable=true affect that future git tag? Would it be fair to > say that enable=false means "definitely don't start the pager (unless > --paginate)" and that enable=true means "feel free to use it (unless > --no-paginate)"? The future git tag would default to using > enable=true. Would --paginate also be "feel free to use it", or rather > "the pager must be used"? I think the rules would be: 1. If --paginate is given, do that. Likewise for --no-pager. 2. Otherwise, respect tag.pager.enable if it is set. 3. Otherwise, use the built-in default. 4. Any time the pager is run, respect tag.pager.command if it is set. And then there are two optional bits: 2a. If tag.pager is set to a boolean, behave as if tag.pager.enable is set to the same boolean. If it's set to a string, behave as if tag.pager.enable is set to "true", and tag.pager.command is set to the string. That should give us backwards compatibility. 2b. If tag.pager.command is set but tag.pager.enable isn't, possibly we should assume that tag.pager.enable is set to "true". That makes the common case of "page and use this command" a single config block instead of two. It matters less for "tag" where paging would be the default. So I think that what you asked above, but to be clear on the final question: --paginate should always be "must be used". And that meshes nicely with implementing it via the git.c wrapper, where it takes precedence way before we ever hit the setup_auto_pager() call. -Peff
Re: [PATCH 2/7] git.c: let builtins opt for handling `pager.foo` themselves
On Tue, Jul 11, 2017 at 03:46:08PM +0200, Martin Ågren wrote: > > Can this ever trigger in execv_dashed_external()? We should only get > > there if get_builtin() returned NULL in the first place. Otherwise, we'd > > run and exited via handle_builtin(). > > I can trigger it with this: > > $ git -c pager.tag="echo paging" -c pager.tag.list=no -c alias.t=tag t -l > > where the alias is what triggers it and the two pager-configurations > demonstrate the effect. Interesting. As you noted below, I think the dashed external we exec should be choosing whether to run the pager. I suspect what's happening is that execv_dashed_external() says "a-ha, we're running 'tag', and I know how to check its pager config". But IMHO that is wrong in the first place (but it just never really made a difference until your series). That's just a guess, though. I didn't dig. -Peff
Re: [PATCH 0/7] tag: more fine-grained pager-configuration
On 11 July 2017 at 12:19, Jeff Kingwrote: > On Mon, Jul 10, 2017 at 11:55:13PM +0200, Martin Ågren wrote: > >> Using, e.g., `git -c pager.tag tag -a new-tag` results in errors such >> as "Vim: Warning: Output is not to a terminal" and a garbled terminal. >> A user who makes use of `git tag -a` and `git tag -l` will probably >> choose not to configure `pager.tag` or to set it to "no", so that `git >> tag -a` will actually work, at the cost of not getting the pager with >> `git tag -l`. > > Right, I think we are all agreed that "pager.tag" as it is now is > essentially worthless. > > If I understand your series correctly, though, it adds pager.tag.list > but leaves "pager.tag" behaving more or less the same. It's good that we > now have a way for a user to do the thing they actually want, but it > feels like we're leaving pager.tag behind as a booby-trap. > > Should we disable it entirely (or only respect it in list mode)? > > At which point, I wonder if we actually need pager.tag.list at all. > Slicing up the namespace further would be valuable if there were a > command which had two pager-worthy modes, and somebody might want to > enable the pager for one but not the other. But it seems like most > commands in this boat (e.g., tag, branch, stash) really have two modes: > listing things or creating things. > > Would it makes sense to just have git-tag respect pager.tag in listing > mode, and otherwise ignore it completely? Yes. I doubt anyone would notice, and no-one should mind with the change (famous last words). It does mean there's a precedence for individual commands to get to choose when to honor pager.foo. If more such exceptions are added, at some point, some command will learn to ignore pager.foo in some particular situation and someone will consider it a regression. > One nice side effect is that it keeps the multi-level pager.X.Y > namespace clear. We've talked about transitioning to allow: > > [pager "foo"] > enable = true > command = some-custom-pager > > to configure aspects of the pager separately for git-foo. This has two > benefits: > > 1. Syntactically, it allows configuration for commands whose names > aren't valid config keys. > > 2. It would allow setting a command with "enable=false", so that "git > foo" did not paginate, but "git -p foo" paginated with a custom > command. > > Those are admittedly minor features. And assuming we don't go crazy with > the multi-level names, we could have "pager.tag.list" live alongside > "pager.tag.command". So it's not really an objection, so much as wonder > out loud whether we can keep this as simple as possible. Well, I respect your hunch about .enable and .command and I certainly don't want to take things in a direction that makes that approach less clean. You have convinced me that I will instead try to teach git tag to be more clever about when to use the pager, at least to see what that looks like. Let's call such a "git tag" the "future git tag". Just to convince myself I've thought through the implications -- how would pager.tag.enable=true affect that future git tag? Would it be fair to say that enable=false means "definitely don't start the pager (unless --paginate)" and that enable=true means "feel free to use it (unless --no-paginate)"? The future git tag would default to using enable=true. Would --paginate also be "feel free to use it", or rather "the pager must be used"? At some point, I thought about "true"/"false"/"maybe", where "maybe" would be what the future git tag implements. Of course, there's a fair chance not everyone will agree what exactly should be paged with "maybe". So it's back to adding various knobs. ;) Anyway, this is more my thinking out loud. I'll drop pager.tag.list in v2 and will instead make pager.tag more clever. That should force me to think through this some more. >> This is an attempt to implement something like that. I decided to let >> `pager.tag.list` fall back to `pager.tag` before falling back to "on". >> The default for `pager.tag` is still "off". I can see how that might >> seem confusing. However, my argument is that it would be awkward for >> `git tag -l` to ignore `pager.tag` -- we are after all running a >> subcommand of `git tag`. Also, this avoids a regression for someone >> who has set `pager.tag` and uses `git tag -l`. > > Yeah, I agree that turning "pager.tag" into a complete noop is probably > a bad idea. But if we made it a silent noop for the non-list cases, that > would be OK (and the hypothetical user who set it to make `git tag -l` > work would see a strict improvement; they'd still get their paging but > not the weird breakage with non-list modes). And I think that applies > whether we have a pager.tag.list in addition or not. Good thinking. Thanks a lot for your comments. I appreciate it. Martin
Re: [PATCH 6/7] tag: make git tag -l consider new config `pager.tag.list`
On 11 July 2017 at 12:41, Jeff Kingwrote: > On Mon, Jul 10, 2017 at 11:55:19PM +0200, Martin Ågren wrote: > >> diff --git a/builtin/tag.c b/builtin/tag.c >> index e0f129872..e96ef7d70 100644 >> --- a/builtin/tag.c >> +++ b/builtin/tag.c >> @@ -446,6 +446,8 @@ int cmd_tag(int argc, const char **argv, const char >> *prefix) >> >> argc = parse_options(argc, argv, prefix, options, git_tag_usage, 0); >> >> + if (cmdmode == 'l') >> + setup_auto_pager("tag.list", 0); >> setup_auto_pager("tag", 0); > > Ideally this would kick in whenever we are in list mode, even if the > user didn't say "-l". So: > > $ git tag > > should probably respect the list config. Likewise, certain options like > "--contains" trigger list mode. I think the pager setup could just be > bumped a few lines down, past the "if (!cmdmode)" block that sets up > those defaults. Oops, that's embarassing. Thanks. That means the paging starts slightly later, but what happens in those few lines looks fairly trivial, so there shouldn't be any real difference in behavior. I'll add one or two tests. Martin
Re: [PATCH 3/7] git.c: provide setup_auto_pager()
On 11 July 2017 at 12:37, Jeff Kingwrote: > On Mon, Jul 10, 2017 at 11:55:16PM +0200, Martin Ågren wrote: > >> +void setup_auto_pager(const char *cmd, int def) >> +{ >> + if (use_pager != -1) >> + return; > > I think you probably also want to return early here if pager_in_use() > is true. If a script runs "git tag -l", you wouldn't want to start a new > pager when the outer script has already been paged (either via config, > or via "git -p"). Good point. Thanks. Martin
Re: [PATCH 2/7] git.c: let builtins opt for handling `pager.foo` themselves
On 11 July 2017 at 12:24, Jeff Kingwrote: > On Mon, Jul 10, 2017 at 11:55:15PM +0200, Martin Ågren wrote: > >> To allow individual builtins to make more informed decisions about when >> to respect `pager.foo`, introduce a flag IGNORE_PAGER_CONFIG. If the flag >> is set, do not check `pager.foo`. This applies to two code-paths -- one >> in run_builtin() and one in execv_dashed_external(). > > Can this ever trigger in execv_dashed_external()? We should only get > there if get_builtin() returned NULL in the first place. Otherwise, we'd > run and exited via handle_builtin(). I can trigger it with this: $ git -c pager.tag="echo paging" -c pager.tag.list=no -c alias.t=tag t -l where the alias is what triggers it and the two pager-configurations demonstrate the effect. > So I think this hunk: > >> @@ -543,11 +550,14 @@ static void execv_dashed_external(const char **argv) >> { >> struct child_process cmd = CHILD_PROCESS_INIT; >> int status; >> + struct cmd_struct *builtin; >> >> if (get_super_prefix()) >> die("%s doesn't support --super-prefix", argv[0]); >> >> - if (use_pager == -1) >> + builtin = get_builtin(argv[0]); >> + if (use_pager == -1 && >> + !(builtin && builtin->option & IGNORE_PAGER_CONFIG)) >> use_pager = check_pager_config(argv[0]); >> commit_pager_choice(); > > ...can just go away. If I remove this, the call I gave above will page although it shouldn't, and it doesn't if I keep this hunk. There's this in run_argv: "If we tried alias and futzed with our environment, it no longer is safe to invoke builtins directly in general. We have to spawn them as dashed externals." There's also a NEEDSWORK. Although, thinking about it, I'm not sure why when I remove this hunk, the child process doesn't set up the paging correctly. Maybe something related to my using "-c", or something about the launching of child processes. Those are both areas where I lack knowledge. Will look into it. Martin
Re: [PATCH 4/4] hook: add a simple first example
On Mon, 2017-07-10 at 13:02 -0700, Junio C Hamano wrote: > Kaartic Sivaraamwrites: > > > I made an attempt to make the second example work with amending > > with the aim of making it suitable for usage out of the box. It > > seems that it's not easy to make it work as the status of a file > > cannot be determined correctly when the index while amending > > introduces changes to a file that has a change in the commit being > > amended. > > > > Is there any way in which the second example could be made to work > > with > > amending without much effort? I'm asking this assuming something > > might > > have happened, since the script was added, that could ease the > > task. > > Sorry, but I do not understand what you are asking here. > I'm was trying to ask, "Is there any way to change the second example (diff --name-status) to make it work with "commit --amend" so that it could be uncommented by default ?" If there was a way then the patch 4/4 could be dropped as the name status example would be enough make the script live (I think). > After going back and checking 1/4, I realize that I misread the > patch. > you did keep the commented out 'diff --name-status' thing, so it > still > has three---it just lost one half of the original "first" > example. So > please disregard my earlier "do we still have three, not two?" > Actually speaking, I did think of promoting the second to the first to make the sub-patches independent of each other. I held myself as I thought it would be overkill. Anyways, I'll just overkill it!
Re: [PATCH 3/4] hook: add signature using "interpret-trailers"
Note: Re-attempting to send mail due to rejection On Mon, 2017-07-10 at 16:13 +0100, Ramsay Jones wrote: > > Again, s/signature/sign-off/g, or similar (including subject line). > Thanks! Forgot that in the course of splitting the patches and modifying them. -- Kaartic
Re: [PATCH 3/4] hook: add signature using "interpret-trailers"
On Mon, 2017-07-10 at 16:13 +0100, Ramsay Jones wrote: > > Again, s/signature/sign-off/g, or similar (including subject line). > Thanks! Forgot that in the course of splitting the patches and modifying them.
Re: [PATCH 0/7] tag: more fine-grained pager-configuration
On Mon, Jul 10, 2017 at 03:42:14PM -0700, Junio C Hamano wrote: > > A review would be much appreciated. Comments on the way I > > structured the series would be just as welcome as ones on the > > final result. > > I see you CC'ed Peff who's passionate in this area, so these patches > are in good hands already ;-) I briefly skimmed your patches myself, > and did not spot anything glaringly wrong. Heh, I don't think of "paging tag output" as one of my passions, but you may be right. :) I left a few comments on the code and direction, but I agree that overall it looks pretty good. And I'm very impressed with the attention to detail for a first-time contributor. -Peff
Re: [PATCH 7/7] tag: make git tag -l use pager by default
On Mon, Jul 10, 2017 at 11:55:20PM +0200, Martin Ågren wrote: > The previous patch introduced `pager.tag.list`. Its default was to use > the value of `pager.tag` or, if that was also not set, to fall back to > "off". > > Change that fallback value to "on". Let the default value for > `pager.tag` remain at "off". Update documentation and tests. Thanks for splitting this out. The default setting is definitely orthogonal to allowing the config. I've been running with this default for several years now (using the patch I showed in the earlier thread you linked). It _does_ occasionally annoy me, but I think overall it's an improvement. So it seems like a good thing to try, and we can see how people respond as they try it out. -Peff
Re: [PATCH 6/7] tag: make git tag -l consider new config `pager.tag.list`
On Mon, Jul 10, 2017 at 11:55:19PM +0200, Martin Ågren wrote: > diff --git a/builtin/tag.c b/builtin/tag.c > index e0f129872..e96ef7d70 100644 > --- a/builtin/tag.c > +++ b/builtin/tag.c > @@ -446,6 +446,8 @@ int cmd_tag(int argc, const char **argv, const char > *prefix) > > argc = parse_options(argc, argv, prefix, options, git_tag_usage, 0); > > + if (cmdmode == 'l') > + setup_auto_pager("tag.list", 0); > setup_auto_pager("tag", 0); Ideally this would kick in whenever we are in list mode, even if the user didn't say "-l". So: $ git tag should probably respect the list config. Likewise, certain options like "--contains" trigger list mode. I think the pager setup could just be bumped a few lines down, past the "if (!cmdmode)" block that sets up those defaults. -Peff
Re: [PATCH 5/7] tag: handle `pager.tag`-configuration within the builtin
On Mon, Jul 10, 2017 at 11:55:18PM +0200, Martin Ågren wrote: > Use the mechanisms introduced in two earlier patches to ignore > `pager.tag` in git.c and let the `git tag` builtin handle it on its own. > > This is in preparation for the next patch, where we will want to handle > slightly different configuration variables depending on which options > are used with `git tag`. For this reason, place the call to > setup_auto_pager() after the options have been parsed. > > No functional change is intended. That said, there is a window between > where the pager is started before and after this patch, and if an error > occurs within this window, as of this patch the error message might not > be paged where it would have been paged before. Since > operation-parsing has to happen inside this window, a difference can be > seen with, e.g., `git -c pager.tag="echo pager is used" tag > --unknown-option`. This change in paging-behavior should be acceptable > since it only affects erroneous usages. Thanks for carefully thinking through the details. I agree that it's an acceptable change of behavior. -Peff
Re: [PATCH 3/7] git.c: provide setup_auto_pager()
On Mon, Jul 10, 2017 at 11:55:16PM +0200, Martin Ågren wrote: > +void setup_auto_pager(const char *cmd, int def) > +{ > + if (use_pager != -1) > + return; I think you probably also want to return early here if pager_in_use() is true. If a script runs "git tag -l", you wouldn't want to start a new pager when the outer script has already been paged (either via config, or via "git -p"). > + use_pager = check_pager_config(cmd); > + > + if (use_pager == -1) { > + struct strbuf buf = STRBUF_INIT; > + size_t len; > + > + strbuf_addstr(, cmd); > + len = buf.len; > + while (use_pager == -1 && len--) { > + if (buf.buf[len] == '.') { > + strbuf_setlen(, len); > + use_pager = check_pager_config(buf.buf); > + } > + } > + strbuf_release(); > + } This looks good. I wondered if we could fold this all into a single loop, rather than having the extra check_pager_config() beforehand (which confused me for a half-second at first). But I tried and I don't think the result ended up any more readable. -Peff
Re: [PATCH 2/7] git.c: let builtins opt for handling `pager.foo` themselves
On Mon, Jul 10, 2017 at 11:55:15PM +0200, Martin Ågren wrote: > To allow individual builtins to make more informed decisions about when > to respect `pager.foo`, introduce a flag IGNORE_PAGER_CONFIG. If the flag > is set, do not check `pager.foo`. This applies to two code-paths -- one > in run_builtin() and one in execv_dashed_external(). Can this ever trigger in execv_dashed_external()? We should only get there if get_builtin() returned NULL in the first place. Otherwise, we'd run and exited via handle_builtin(). So I think this hunk: > @@ -543,11 +550,14 @@ static void execv_dashed_external(const char **argv) > { > struct child_process cmd = CHILD_PROCESS_INIT; > int status; > + struct cmd_struct *builtin; > > if (get_super_prefix()) > die("%s doesn't support --super-prefix", argv[0]); > > - if (use_pager == -1) > + builtin = get_builtin(argv[0]); > + if (use_pager == -1 && > + !(builtin && builtin->option & IGNORE_PAGER_CONFIG)) > use_pager = check_pager_config(argv[0]); > commit_pager_choice(); ...can just go away. And that highlights the issue with externals; we don't have any internal database that says "these ones handle their own pager". We don't even have a list of all the possibilities, since users can drop whatever they like into their $PATH. So we'd have to make a (backwards-incompatible) decision that pager.* doesn't work for external commands, and they must manually trigger the pager themselves. I'm undecided on whether that's reasonable or not (certainly it's what git-stash would want, but it may be hurting other commands). Anyway, I think that's out of scope for your series, which is just trying to make the builtins work better. -Peff
Re: [PATCH 0/7] tag: more fine-grained pager-configuration
On Mon, Jul 10, 2017 at 11:55:13PM +0200, Martin Ågren wrote: > Using, e.g., `git -c pager.tag tag -a new-tag` results in errors such > as "Vim: Warning: Output is not to a terminal" and a garbled terminal. > A user who makes use of `git tag -a` and `git tag -l` will probably > choose not to configure `pager.tag` or to set it to "no", so that `git > tag -a` will actually work, at the cost of not getting the pager with > `git tag -l`. Right, I think we are all agreed that "pager.tag" as it is now is essentially worthless. If I understand your series correctly, though, it adds pager.tag.list but leaves "pager.tag" behaving more or less the same. It's good that we now have a way for a user to do the thing they actually want, but it feels like we're leaving pager.tag behind as a booby-trap. Should we disable it entirely (or only respect it in list mode)? At which point, I wonder if we actually need pager.tag.list at all. Slicing up the namespace further would be valuable if there were a command which had two pager-worthy modes, and somebody might want to enable the pager for one but not the other. But it seems like most commands in this boat (e.g., tag, branch, stash) really have two modes: listing things or creating things. Would it makes sense to just have git-tag respect pager.tag in listing mode, and otherwise ignore it completely? One nice side effect is that it keeps the multi-level pager.X.Y namespace clear. We've talked about transitioning to allow: [pager "foo"] enable = true command = some-custom-pager to configure aspects of the pager separately for git-foo. This has two benefits: 1. Syntactically, it allows configuration for commands whose names aren't valid config keys. 2. It would allow setting a command with "enable=false", so that "git foo" did not paginate, but "git -p foo" paginated with a custom command. Those are admittedly minor features. And assuming we don't go crazy with the multi-level names, we could have "pager.tag.list" live alongside "pager.tag.command". So it's not really an objection, so much as wonder out loud whether we can keep this as simple as possible. > This is an attempt to implement something like that. I decided to let > `pager.tag.list` fall back to `pager.tag` before falling back to "on". > The default for `pager.tag` is still "off". I can see how that might > seem confusing. However, my argument is that it would be awkward for > `git tag -l` to ignore `pager.tag` -- we are after all running a > subcommand of `git tag`. Also, this avoids a regression for someone > who has set `pager.tag` and uses `git tag -l`. Yeah, I agree that turning "pager.tag" into a complete noop is probably a bad idea. But if we made it a silent noop for the non-list cases, that would be OK (and the hypothetical user who set it to make `git tag -l` work would see a strict improvement; they'd still get their paging but not the weird breakage with non-list modes). And I think that applies whether we have a pager.tag.list in addition or not. > I am not moving all builtins to handling the pager on their own, but > instead introduce a flag IGNORE_PAGER_CONFIG and use it only with the > tag builtin. That means there's another flag to reason about, but it > avoids moving all builtins to handling the paging themselves just to > make one of them do something more "clever". I think this is very sensible. It's the vast minority that would want to enable this (git-branch is the other obvious one). At some point we may need a plan to handle non-builtins (like stash), but that's largely orthogonal to what you're doing here. > The discussion mentioned above discusses various approaches. It also > notes how the current pager.foo-configuration conflates _whether_ and > _how_ to run a pager. Arguably, this series paints ourselves even > further into that corner. The idea of `pager.foo.command` and > `pager.foo.enabled` has been mentioned and this series might make such > an approach slightly less clean, conceptually. We could introduce > `paging.foo` as a "yes"/"no"/"maybe" to go alongside `pager.foo` which > is then "less"/"cat"/ That's definitely outside this series, but > should not be prohibited by it... I think I covered my thoughts on this part above. :) > A review would be much appreciated. Comments on the way I structured > the series would be just as welcome as ones on the final result. Overall the code looks good, though I'll respond with a few comments. -Peff
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
On Tue, Jul 11, 2017 at 11:18:17AM +0200, Lars Schneider wrote: > Thanks for the explanation! I looked at the Git release calendar [1] and > realized that 2.14-rc0 is scheduled for this Thursday. My assumption was > that either on this date 2.14 will be cut from the tip of master or master > would not change significantly after the rc0 date until the 2.14 release. Yeah, certainly forking off a 2.14-rc branch would be a reasonable way to do the release management. It just happens to not be the way we do it. One nice thing about keeping "master" as the stabilizing release-candidate branch is that it encourages more people to run it (versus having people manually switch to building a 2.14-rc branch). -Peff
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
> On 11 Jul 2017, at 11:11, Jeff Kingwrote: > > On Tue, Jul 11, 2017 at 10:28:08AM +0200, Lars Schneider wrote: > >>> On 11 Jul 2017, at 00:48, Junio C Hamano wrote: >>> >>> * ls/filter-process-delayed (2017-06-30) 7 commits >>> (merged to 'next' on 2017-07-05 at a35e644082) >>> + convert: add "status=delayed" to filter process protocol >>> + convert: refactor capabilities negotiation >>> + convert: move multiple file filter error handling to separate function >>> + convert: put the flags field before the flag itself for consistent style >>> + t0021: write "OUT " only on success >>> + t0021: make debug log file name configurable >>> + t0021: keep filter log files on comparison >>> >>> The filter-process interface learned to allow a process with long >>> latency give a "delayed" response. >>> >>> Will merge to 'master'. >> >> Hi Junio, >> >> can you already tell if this topic has still a chance to be part of 2.14? > > I'm not Junio, but we should be able to reason it out. :) > > It's marked as "will merge to master", which typically means it will > happen during the next integration cycle (i.e., within a few days when > the next "What's cooking" is written). > > Since 2.14 will be cut from the tip of master in a few weeks, once it's > merged it will definitely be in 2.14 (barring a revert, of course). > > This holds true during release freeze, too. Anything that makes it to > master is part of the release. The stopping point there is that things > stop getting marked as "will merge to master". Thanks for the explanation! I looked at the Git release calendar [1] and realized that 2.14-rc0 is scheduled for this Thursday. My assumption was that either on this date 2.14 will be cut from the tip of master or master would not change significantly after the rc0 date until the 2.14 release. - Lars [1] http://tinyurl.com/gitCal
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
On Tue, Jul 11, 2017 at 10:28:08AM +0200, Lars Schneider wrote: > > On 11 Jul 2017, at 00:48, Junio C Hamanowrote: > > > > * ls/filter-process-delayed (2017-06-30) 7 commits > > (merged to 'next' on 2017-07-05 at a35e644082) > > + convert: add "status=delayed" to filter process protocol > > + convert: refactor capabilities negotiation > > + convert: move multiple file filter error handling to separate function > > + convert: put the flags field before the flag itself for consistent style > > + t0021: write "OUT " only on success > > + t0021: make debug log file name configurable > > + t0021: keep filter log files on comparison > > > > The filter-process interface learned to allow a process with long > > latency give a "delayed" response. > > > > Will merge to 'master'. > > Hi Junio, > > can you already tell if this topic has still a chance to be part of 2.14? I'm not Junio, but we should be able to reason it out. :) It's marked as "will merge to master", which typically means it will happen during the next integration cycle (i.e., within a few days when the next "What's cooking" is written). Since 2.14 will be cut from the tip of master in a few weeks, once it's merged it will definitely be in 2.14 (barring a revert, of course). This holds true during release freeze, too. Anything that makes it to master is part of the release. The stopping point there is that things stop getting marked as "will merge to master". -Peff
[PATCH] gc: run pre-detach operations under lock
On Tue, Jul 11, 2017 at 03:25:36AM -0400, Jeff King wrote: > Annoyingly, the lock code interacts badly with daemonizing because that > latter will fork to a new process. So the simple solution like: > > diff --git a/builtin/gc.c b/builtin/gc.c > index 2ba50a287..79480124a 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char > *prefix) > if (report_last_gc_error()) > return -1; > > + if (lock_repo_for_gc(force, )) > + return 0; > + > if (gc_before_repack()) > return -1; > /* > > means that anybody looking at the lockfile will report the wrong pid > (and thus think the lock is invalid). I guess we'd need to update it in > place after daemonizing. Updating it in place is a bit tricky. I came up with this hack that makes it work, but I'm not sure if the reasoning is too gross. -- >8 -- Subject: [PATCH] gc: run pre-detach operations under lock We normally try to avoid having two auto-gc operations run at the same time, because it wastes resources. This was done long ago in 64a99eb47 (gc: reject if another gc is running, unless --force is given, 2013-08-08). When we do a detached auto-gc, we run the ref-related commands _before_ detaching, to avoid confusing lock contention. This was done by 62aad1849 (gc --auto: do not lock refs in the background, 2014-05-25). These two features do not interact well. The pre-detach operations are run before we check the gc.pid lock, meaning that on a busy repository we may run many of them concurrently. Ideally we'd take the lock before spawning any operations, and hold it for the duration of the program. This is tricky, though, with the way the pid-file interacts with the daemonize() process. Other processes will check that the pid recorded in the pid-file still exists. But detaching causes us to fork and continue running under a new pid. So if we take the lock before detaching, the pid-file will have a bogus pid in it. We'd have to go back and update it with the new pid after detaching. We'd also have to play some tricks with the tempfile subsystem to tweak the "owner" field, so that the parent process does not clean it up on exit, but the child process does. Instead, we can do something a bit simpler: take the lock only for the duration of the pre-detach work, then detach, then take it again for the post-detach work. Technically, this means that the post-detach lock could lose to another process doing pre-detach work. But in the long run this works out. That second process would then follow-up by doing post-detach work. Unless it was in turn blocked by a third process doing pre-detach work, and so on. This could in theory go on indefinitely, as the pre-detach work does not repack, and so need_to_gc() will continue to trigger. But in each round we are racing between the pre- and post-detach locks. Eventually, one of the post-detach locks will win the race and complete the full gc. So in the worst case, we may racily repeat the pre-detach work, but we would never do so simultaneously (it would happen via a sequence of serialized race-wins). Signed-off-by: Jeff King--- builtin/gc.c | 4 t/t6500-gc.sh | 21 + 2 files changed, 25 insertions(+) diff --git a/builtin/gc.c b/builtin/gc.c index bd91f136f..5a535040f 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -414,8 +414,12 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (report_last_gc_error()) return -1; + if (lock_repo_for_gc(force, )) + return 0; if (gc_before_repack()) return -1; + delete_tempfile(); + /* * failure to daemonize is ok, we'll continue * in foreground diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index cc7acd101..41b0be575 100755 --- a/t/t6500-gc.sh +++ b/t/t6500-gc.sh @@ -95,6 +95,27 @@ test_expect_success 'background auto gc does not run if gc.log is present and re test_line_count = 1 packs ' +test_expect_success 'background auto gc respects lock for all operations' ' + # make sure we run a background auto-gc + test_commit make-pack && + git repack && + test_config gc.autopacklimit 1 && + test_config gc.autodetach true && + + # create a ref whose loose presence we can use to detect a pack-refs run + git update-ref refs/heads/should-be-loose HEAD && + test_path_is_file .git/refs/heads/should-be-loose && + + # now fake a concurrent gc that holds the lock; we can use our + # shell pid so that it looks valid. + hostname=$(hostname || echo unknown) && + printf
Re: [PATCH] git-p4: parse marshal output "p4 -G" in p4 changes
On 3 July 2017 at 23:57, Miguel Torrojawrote: > The option -G of p4 (python marshal output) gives more context about the > data being output. That's useful when using the command "change -o" as > we can distinguish between warning/error line and real change description. > > Some p4 triggers in the server side generate some warnings when > executed. Unfortunately those messages are mixed with the output of > "p4 change -o". Those extra warning lines are reported as {'code':'info'} > in python marshal output (-G). The real change output is reported as > {'code':'stat'} > > the function p4CmdList accepts a new argument: skip_info. When set to > True it ignores any 'code':'info' entry (skip_info=True by default). > > A new test has been created to t9807-git-p4-submit.sh adding a p4 trigger > that outputs extra lines with "p4 change -o" and "p4 changes" The latest version of mt/p4-parse-G-output (09521c7a0) seems to break t9813-git-p4-preserve-users.sh. I don't quite know why, but I wonder if it's the change to p4CmdList() ? Luke > > Signed-off-by: Miguel Torroja > --- > git-p4.py| 90 > > t/t9807-git-p4-submit.sh | 30 > 2 files changed, 91 insertions(+), 29 deletions(-) > > diff --git a/git-p4.py b/git-p4.py > index 8d151da91..a262e3253 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -509,7 +509,7 @@ def isModeExec(mode): > def isModeExecChanged(src_mode, dst_mode): > return isModeExec(src_mode) != isModeExec(dst_mode) > > -def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None): > +def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=True): > > if isinstance(cmd,basestring): > cmd = "-G " + cmd > @@ -545,6 +545,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None): > try: > while True: > entry = marshal.load(p4.stdout) > +if skip_info: > +if 'code' in entry and entry['code'] == 'info': > +continue > if cb is not None: > cb(entry) > else: > @@ -879,8 +882,12 @@ def p4ChangesForPaths(depotPaths, changeRange, > requestedBlockSize): > cmd += ["%s...@%s" % (p, revisionRange)] > > # Insert changes in chronological order > -for line in reversed(p4_read_pipe_lines(cmd)): > -changes.add(int(line.split(" ")[1])) > +for entry in reversed(p4CmdList(cmd)): > +if entry.has_key('p4ExitCode'): > +die('Error retrieving changes descriptions > ({})'.format(entry['p4ExitCode'])) > +if not entry.has_key('change'): > +continue > +changes.add(int(entry['change'])) > > if not block_size: > break > @@ -1526,37 +1533,62 @@ class P4Submit(Command, P4UserMap): > > [upstream, settings] = findUpstreamBranchPoint() > > -template = "" > +template = """\ > +# A Perforce Change Specification. > +# > +# Change: The change number. 'new' on a new changelist. > +# Date:The date this specification was last modified. > +# Client: The client on which the changelist was created. Read-only. > +# User:The user who created the changelist. > +# Status: Either 'pending' or 'submitted'. Read-only. > +# Type:Either 'public' or 'restricted'. Default is 'public'. > +# Description: Comments about the changelist. Required. > +# Jobs:What opened jobs are to be closed by this changelist. > +# You may delete jobs from this list. (New changelists only.) > +# Files: What opened files from the default changelist are to be added > +# to this changelist. You may delete files from this list. > +# (New changelists only.) > +""" > +files_list = [] > inFilesSection = False > +change_entry = None > args = ['change', '-o'] > if changelist: > args.append(str(changelist)) > - > -for line in p4_read_pipe_lines(args): > -if line.endswith("\r\n"): > -line = line[:-2] + "\n" > -if inFilesSection: > -if line.startswith("\t"): > -# path starts and ends with a tab > -path = line[1:] > -lastTab = path.rfind("\t") > -if lastTab != -1: > -path = path[:lastTab] > -if settings.has_key('depot-paths'): > -if not [p for p in settings['depot-paths'] > -if p4PathStartsWith(path, p)]: > -continue > -else: > -if not p4PathStartsWith(path, self.depotPath): > -continue > +for entry in
Re: What's cooking in git.git (Jul 2017, #03; Mon, 10)
> On 11 Jul 2017, at 00:48, Junio C Hamanowrote: > > * ls/filter-process-delayed (2017-06-30) 7 commits > (merged to 'next' on 2017-07-05 at a35e644082) > + convert: add "status=delayed" to filter process protocol > + convert: refactor capabilities negotiation > + convert: move multiple file filter error handling to separate function > + convert: put the flags field before the flag itself for consistent style > + t0021: write "OUT " only on success > + t0021: make debug log file name configurable > + t0021: keep filter log files on comparison > > The filter-process interface learned to allow a process with long > latency give a "delayed" response. > > Will merge to 'master'. Hi Junio, can you already tell if this topic has still a chance to be part of 2.14? Thanks, Lars
Re: Weirdness with git change detection
On Tue, Jul 11, 2017 at 10:20:43AM +0200, Torsten Bögershausen wrote: > > No problem. I actually think it would be interesting if Git could > > somehow detect and warn about this situation. But the obvious way to do > > that would be to re-run the clean filter directly after checkout. And > > doing that all the time is expensive. > > Would it be possible to limit the round-trip-check to "git reset --hard" ? > If yes, possibly many users are willing to pay the price, if Git tells > the "your filters don't round-trip". (Including CRLF conversions) Anything's possible, I suppose. But I don't think I'd want that feature turned on myself. > > Perhaps some kind of "lint" program would be interesting to warn of > > possible misconfigurations. Of course people would have to run it for it > > to be useful. :) > > What do you have in mind here ? > Don't we need to run some content through the filter(s)? I was thinking of a tool that could run a series of checks on the repository and nag about potential problems. One of them could be doing a round-trip repo->clean->smudge for each file. Another one might be warning about files that differ only in case. The idea being that users could run "git lint" if they suspect something funny is going on. I dunno. It may be a dead-end. Most such oddities are better detected and handled during actual git operations if we can. So this would really just be for things that are too expensive to detect in normal operations. -Peff
Re: Weirdness with git change detection
On 11/07/17 09:34, Jeff King wrote: On Tue, Jul 11, 2017 at 12:31:25AM -0700, Peter Eckersley wrote: I did try to test that hypothesis by editing the filter to be a no-op, but it's possible I go that wrong. My apologies for bugging the list! Actually I like this kind of feedback, to see how people are using Git, including the problems they have. No problem. I actually think it would be interesting if Git could somehow detect and warn about this situation. But the obvious way to do that would be to re-run the clean filter directly after checkout. And doing that all the time is expensive. Would it be possible to limit the round-trip-check to "git reset --hard" ? If yes, possibly many users are willing to pay the price, if Git tells the "your filters don't round-trip". (Including CRLF conversions) Perhaps some kind of "lint" program would be interesting to warn of possible misconfigurations. Of course people would have to run it for it to be useful. :) What do you have in mind here ? Don't we need to run some content through the filter(s)?
Re: Flurries of 'git reflog expire'
On Tue, Jul 11, 2017 at 12:35:50AM -0700, Bryan Turner wrote: > That's a few of the reasons we've switched over. I'd imagine most > hosting providers take a similarly "hands on" approach to controlling > their GC. Beyond a certain scale, it seems almost unavoidable. Git > never has more than a repository-level view of the world; only the > hosting provider can see the big picture. Thanks for writing this out. I agree with all of the reasons given (in my email which I suspect crossed with yours, I just said "throttling", but there really are a lot of other reasons). -Peff
Re: [PATCH] use DIV_ROUND_UP
On Mon, Jul 10, 2017 at 07:08:38PM +0200, René Scharfe wrote: > > So I think it's a true mechanical conversion, but I have to admit the > > original is confusing. Without digging I suspect it's correct, though, > > just because a simple bug here would mean that our ewah bitmaps totally > > don't work. So it's probably not worth spending time on. > > A few lines below there is "self->bit_size = i + 1", so the code > calculates how many words the old and the new value are apart (or by how > many words the bitmap needs to be extended), which becomes easier to see > with the patch. Yeah, I'd agree the consistent use of "i + 1" makes the end result after your patch easier to read. -Peff
Re: Bug: Git checkout case Insensitive branch name compare
On Mon, Jul 10, 2017 at 01:15:40PM -0700, kinchit raja wrote: > Bugs Details: > > Git checkout case Insensitive branch name compare Right, this is known. Branch names (and all ref names) are case sensitive in Git. Storage on a case-insensitive filesystem may lead to confusion if you mix the cases. Sometimes it will work, and sometimes it will create quite confusing results. One suggested path forward is to encode characters in the filesystem. See: http://public-inbox.org/git/20170705083611.jgxbp4sqogicf...@sigill.intra.peff.net/ and further down-thread: http://public-inbox.org/git/xmqqo9sxdwjp@gitster.mtv.corp.google.com/ for some recent discussion. -Peff
Re: Flurries of 'git reflog expire'
On Mon, Jul 10, 2017 at 9:45 PM, Andreas Kreywrote: > On Thu, 06 Jul 2017 10:01:05 +, Bryan Turner wrote: > >> I also want to add that Bitbucket Server 5.x includes totally >> rewritten GC handling. 5.0.x automatically disables auto GC in all >> repositories and manages it explicitly, and 5.1.x fully removes use of >> "git gc" in favor of running relevant plumbing commands directly. > > That's the part that irks me. This shouldn't be necessary - git itself > should make sure auto GC isn't run in parallel. Now I probably can't > evaluate whether a git upgrade would fix this, but given that you > are going the do-gc-ourselves route I suppose it wouldn't. > I believe I've seen some commits on the mailing list that suggest "git gc --auto" manages its concurrency better in newer versions than it used to, but even then it can only manage its concurrency within a single repository. For a hosting server with thousands, or tens of thousands, of active repositories, there still wouldn't be any protection against "git gc --auto" running concurrently in dozens of them at the same time. But it's not only about concurrency. "git gc" (and by extension "git gc --auto") is a general purpose tool, designed to generally do what you need, and to mostly stay out of your way while it does it. I'd hazard to say it's not really designed for managing heavily-trafficked repositories on busy hosting services, though, and as a result, there are things it can't do. For example, I can configure auto GC to run based on how many loose objects or packs I have, but there's no heuristic to make it repack refs when I have a lot of loose ones, or configure it to _only_ pack refs without repacking objects or pruning reflogs. There are knobs for various things (like "gc.*.reflogExpire"), but those don't give complete control. Even if I set "gc.reflogExpire=never", "git gc" still forks "git reflog expire --all" (compared to "gc.packRefs=false", which completely prevents forking "git pack-refs"). A trace on "git gc" shows this: $ GIT_TRACE=1 git gc 00:10:45.058066 git.c:437 trace: built-in: git 'gc' 00:10:45.067075 run-command.c:369 trace: run_command: 'pack-refs' '--all' '--prune' 00:10:45.077086 git.c:437 trace: built-in: git 'pack-refs' '--all' '--prune' 00:10:45.084098 run-command.c:369 trace: run_command: 'reflog' 'expire' '--all' 00:10:45.093102 git.c:437 trace: built-in: git 'reflog' 'expire' '--all' 00:10:45.097088 run-command.c:369 trace: run_command: 'repack' '-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago' 00:10:45.106096 git.c:437 trace: built-in: git 'repack' '-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago' 00:10:45.107098 run-command.c:369 trace: run_command: 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset' 'objects/pack/.tmp-15212-pack' 00:10:45.127117 git.c:437 trace: built-in: git 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset' 'objects/pack/.tmp-15212-pack' Counting objects: 6, done. Delta compression using up to 16 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (6/6), done. Total 6 (delta 0), reused 6 (delta 0) 00:10:45.173161 run-command.c:369 trace: run_command: 'prune' '--expire' '2.weeks.ago' 00:10:45.184171 git.c:437 trace: built-in: git 'prune' '--expire' '2.weeks.ago' 00:10:45.199202 run-command.c:369 trace: run_command: 'worktree' 'prune' '--expire' '3.months.ago' 00:10:45.208193 git.c:437 trace: built-in: git 'worktree' 'prune' '--expire' '3.months.ago' 00:10:45.212198 run-command.c:369 trace: run_command: 'rerere' 'gc' 00:10:45.221223 git.c:437 trace: built-in: git 'rerere' 'gc' The bare repositories used by Bitbucket Server: * Don't have reflogs enabled generally, and for the ones that are enabled "gc.*.reflogExpire" is set to "never" * Never have worktrees, so they don't need to be pruned * Never use rerere, so that doesn't need to GC * Have pruning disabled if they've been forked, due to using alternates to manage disk space That means of all the commands "git gc" runs, under the covers, at most only "pack-refs", "repack" and sometimes "prune" have any value. "reflog expire --all" in particular is extremely likely to fail. Which brings up another consideration. "git gc --auto" has no sense of context, or adjacent behavior. Even if it correctly guards against concurrency, it still doesn't know what else is going on. Immediately after a push, Bitbucket Server has many other housekeeping tasks it performs, especially around pull requests. That means pull request refs are disproportionately likely to be "moving" immediately after a push completes--exactly
Re: Weirdness with git change detection
On Tue, Jul 11, 2017 at 12:31:25AM -0700, Peter Eckersley wrote: > I did try to test that hypothesis by editing the filter to be a no-op, > but it's possible I go that wrong. My apologies for bugging the list! No problem. I actually think it would be interesting if Git could somehow detect and warn about this situation. But the obvious way to do that would be to re-run the clean filter directly after checkout. And doing that all the time is expensive. Perhaps some kind of "lint" program would be interesting to warn of possible misconfigurations. Of course people would have to run it for it to be useful. :) -Peff
Re: Flurries of 'git reflog expire'
On Thu, Jul 06, 2017 at 10:01:05AM -0700, Bryan Turner wrote: > I also want to add that Bitbucket Server 5.x includes totally > rewritten GC handling. 5.0.x automatically disables auto GC in all > repositories and manages it explicitly, and 5.1.x fully removes use of > "git gc" in favor of running relevant plumbing commands directly. We > moved away from "git gc" specifically to avoid the "git reflog expire > --all", because there's no config setting that _fully disables_ > forking that process. FWIW, I think auto-gc in general is not a good way to handle maintenance on a busy hosting server. Repacking can be very resource hungry (both CPU and memory), and it needs to be throttled. You _could_ throttle with an auto-gc hook, but that isn't very elegant when it comes to re-queueing jobs which fail or timeout. The right model IMHO (and what GitHub uses, and what I'm guessing Bitbucket is doing in more recent versions) is to make note of write operations in a data structure, then use that data to schedule maintenance in a job queue. But that can never really be part of Git itself, as the notion of a system job queue is outside its scope. -Peff
Re: Weirdness with git change detection
I did try to test that hypothesis by editing the filter to be a no-op, but it's possible I go that wrong. My apologies for bugging the list! On 11 July 2017 at 00:06, Jeff Kingwrote: > On Tue, Jul 11, 2017 at 06:15:17AM +0200, Torsten Bögershausen wrote: > >> On 11/07/17 01:45, Peter Eckersley wrote: >> > I have a local git repo that's in some weird state where changes >> > appear to be detected by "git diff" and prevent operations like "git >> > checkout" from switching branches, but those changes are not removed >> > by a "git reset --hard" or "git stash". >> > >> > Here's an example of the behaviour, with "git reset --hard" failing to >> > clear a diff in the index: >> > >> > https://paste.debian.net/975811/ >> > >> > Happy to collect additional debugging information if it's useful. >> > >> If possible, we need to clone the repo and debug ourselfs - in other >> words, the problem must be reproducible for others. >> >> It the repo public ? > > It looks like https://github.com/AI-metrics/AI-metrics. > > I notice it has a .gitattributes file with: > > *.ipynb filter=clean_ipynb > > There's a config snippet in the repo with: > > [filter "clean_ipynb"] > clean = ipynb_drop_output > smudge = cat > > and the drop_output script is included. From the paste we can see that > Peter was at commit c464aaa. Checking out that commit and running the > script shows that it produces the differences that Git is showing. > > The problem is that the currently committed contents do not match the > output of the clean filter. So even when "git reset --hard" puts the > content from the repository back into the working tree (putting it > through the smudge filter, which is a noop), running the clean filter on > the result will always have a difference. Either the filter needs to be > disabled, or the cleaned contents committed. > > -Peff -- Peter
[BUG] detached auto-gc does not respect lock for 'reflog expire', was Re: Flurries of 'git reflog expire'
[Updating the subject since I think this really is a bug]. On Tue, Jul 11, 2017 at 06:45:53AM +0200, Andreas Krey wrote: > > I also want to add that Bitbucket Server 5.x includes totally > > rewritten GC handling. 5.0.x automatically disables auto GC in all > > repositories and manages it explicitly, and 5.1.x fully removes use of > > "git gc" in favor of running relevant plumbing commands directly. > > That's the part that irks me. This shouldn't be necessary - git itself > should make sure auto GC isn't run in parallel. Now I probably can't > evaluate whether a git upgrade would fix this, but given that you > are going the do-gc-ourselves route I suppose it wouldn't. It's _supposed_ to take a lock, even in older versions. See 64a99eb47 (gc: reject if another gc is running, unless --force is given, 2013-08-08). But it looks like before we take that lock, we sometimes run pack-refs and reflog expire. This is due to 62aad1849 (gc --auto: do not lock refs in the background, 2014-05-25). IMHO this is buggy; it should be checking the lock before calling gc_before_repack() and daemonizing. Annoyingly, the lock code interacts badly with daemonizing because that latter will fork to a new process. So the simple solution like: diff --git a/builtin/gc.c b/builtin/gc.c index 2ba50a287..79480124a 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (report_last_gc_error()) return -1; + if (lock_repo_for_gc(force, )) + return 0; + if (gc_before_repack()) return -1; /* means that anybody looking at the lockfile will report the wrong pid (and thus think the lock is invalid). I guess we'd need to update it in place after daemonizing. -Peff