Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Am 21.08.2013 00:36, schrieb Stefan Beller: I think I got all the suggestions except the use of git_path/mkpathdup. I replaced mkpathdup by mkpath where possible, but it's still not perfect. I'll wait for the dokumentation patch of Jonathan, before changing all these occurences forth and back again. I trust Jonathan's judgement of how to use git_path, mkpath, and mkpathdup more than my own. So, please take my earlier comments in this regard with an appropriately large grain of salt. Below there is just the diff against RFC PATCHv4, however I'll send the whole patch as well. Thanks, that is VERY helpful! I'll comment here and have a look at the full patch later. ... int cmd_repack(int argc, const char **argv, const char *prefix) { You should move the opening brace to the next line, which would then not be empty anymore. ... @@ -217,34 +217,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix) { argv_array_push(cmd_args, packtmp); memset(cmd, 0, sizeof(cmd)); - cmd.argv = argv_array_detach(cmd_args, NULL); + cmd.argv = cmd_args.argv; cmd.git_cmd = 1; cmd.out = -1; cmd.no_stdin = 1; - if (run_command(cmd)) + if (start_command(cmd)) return 1; You should have an int ret here and use it like ret = start_command(cmd); if (ret) return ret; to retain any exit codes from the sub-process. I know, the script didn't preserve it: names=$(git pack-objects ...) || exit 1 but that was not idiomatic as it should have been written as names=$(git pack-objects ...) || exit to forward the failure exit code. - struct string_list names = STRING_LIST_INIT_DUP; - struct string_list rollback = STRING_LIST_INIT_DUP; - - char line[1024]; - int counter = 0; - FILE *out = xfdopen(cmd.out, r); Nice! I missed these decl-after-stmt in my earlier review. + count_packs = 0; + out = xfdopen(cmd.out, r); while (fgets(line, sizeof(line), out)) { /* a line consists of 40 hex chars + '\n' */ - assert(strlen(line) == 41); + if (strlen(line) != 41) + die(repack: Expecting 40 character sha1 lines only from pack-objects.); I agree with Jonathan that you should use strbuf_getline() here. line[40] = '\0'; string_list_append(names, line); - counter++; + count_packs++; } - if (!counter) - printf(Nothing new to pack.\n); + if (finish_command(cmd)) + return 1; Same as above here: ret = finish_command(cmd); if (ret) return ret; I would prefer to see argv_array_clear(cmd_args); here, i.e., at the end of the current use rather than later at the beginning of the next use. (Ditto for the other uses of cmd_args.) fclose(out); This should happen before finish_command(). It doesn't matter if there are no errors, but if things go awry, closing the channel before finish_command() avoids deadlocks. + if (!count_packs !quiet) + printf(Nothing new to pack.\n); + ... @@ -301,33 +299,33 @@ int cmd_repack(int argc, const char **argv, const char *prefix) { for_each_string_list_item(item, names) { for (ext = 0; ext 2; ext++) { char *fname, *fname_old; + struct stat statbuffer; fname = mkpathdup(%s/pack-%s%s, packdir, item-string, exts[ext]); - fname_old = mkpathdup(%s-%s%s, packtmp, item-string, exts[ext]); - stat(fname_old, statbuffer); - statbuffer.st_mode = ~S_IWUSR | ~S_IWGRP | ~S_IWOTH; - chmod(fname_old, statbuffer.st_mode); + fname_old = mkpath(%s-%s%s, packtmp, item-string, exts[ext]); + if (!stat(fname_old, statbuffer)) { + statbuffer.st_mode = ~S_IWUSR | ~S_IWGRP | ~S_IWOTH; This is still wrong: it should be one of ... = ~S_IWUSR ~S_IWGRP ~S_IWOTH; ... = ~(S_IWUSR | S_IWGRP | S_IWOTH); + chmod(fname_old, statbuffer.st_mode); + } if (rename(fname_old, fname)) - die(Could not rename packfile: %s - %s, fname_old, fname); + die_errno(_(renaming '%s' failed), fname_old); free(fname); - free(fname_old); } } ... Everything else looks OK. But as I said, mkpath() may have to be reverted to mkpathdup() as per Jonathans comments. -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
I didn't look at functions above cmd_repack. Am 20.08.2013 01:23, schrieb Stefan Beller: +int cmd_repack(int argc, const char **argv, const char *prefix) { + + int pack_everything = 0; + int pack_everything_but_loose = 0; + int delete_redundant = 0; + char *unpack_unreachable = NULL; + int window = 0, window_memory = 0; + int depth = 0; + int max_pack_size = 0; + int no_reuse_delta = 0, no_reuse_object = 0; + int no_update_server_info = 0; + int quiet = 0; + int local = 0; + char *packdir, *packtmp; + struct child_process cmd; + struct string_list_item *item; + struct string_list existing_packs = STRING_LIST_INIT_DUP; + struct stat statbuffer; + int ext; + char *exts[2] = {.idx, .pack}; + + struct option builtin_repack_options[] = { Are the long forms of options your invention? + OPT_BOOL('a', all, pack_everything, + N_(pack everything in a single pack)), + OPT_BOOL('A', all-but-loose, pack_everything_but_loose, + N_(same as -a, and turn unreachable objects loose)), --all-but-loose does not express what the help text says. The long form of -A is --all --unpack-unreachable, so it is really just a short option for convenience. It does not need its own long form. + OPT_BOOL('d', delete-redundant, delete_redundant, + N_(remove redundant packs, and run git-prune-packed)), + OPT_BOOL('f', no-reuse-delta, no_reuse_delta, + N_(pass --no-reuse-delta to git-pack-objects)), + OPT_BOOL('F', no-reuse-object, no_reuse_object, + N_(pass --no-reuse-object to git-pack-objects)), Do we want to allow --no-no-reuse-delta and --no-no-reuse-object? + OPT_BOOL('n', NULL, no_update_server_info, + N_(do not run git-update-server-info)), No long option name? + OPT__QUIET(quiet, N_(be quiet)), + OPT_BOOL('l', local, local, + N_(pass --local to git-pack-objects)), Good. + OPT_STRING(0, unpack-unreachable, unpack_unreachable, N_(approxidate), + N_(with -A, do not loosen objects older than this Packing constraints)), Packing constraints is a section heading, not a continuation of the previous help text. + OPT_INTEGER(0, window, window, + N_(size of the window used for delta compression)), This help text is suboptimal as the option is a count, not a size in the narrow sense. But that can be changed later (as it would affect other tools as well, I guess). + OPT_INTEGER(0, window-memory, window_memory, + N_(same as the above, but limit memory size instead of entries count)), + OPT_INTEGER(0, depth, depth, + N_(limits the maximum delta depth)), + OPT_INTEGER(0, max-pack-size, max_pack_size, + N_(maximum size of each packfile)), + OPT_END() + }; Good. + + git_config(repack_config, NULL); + + argc = parse_options(argc, argv, prefix, builtin_repack_options, + git_repack_usage, 0); + + sigchain_push_common(remove_pack_on_signal); Good. + packdir = mkpathdup(%s/pack, get_object_directory()); + packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, getpid()); Should this not be packdir = xstrdup(git_path(pack)); packtmp = xstrdup(git_path(pack/.tmp-%d-pack, getpid())); Perhaps make packdir and packtmp global so that the strings need not be duplicated in get_pack_filenames and remove_temporary_files? + + remove_temporary_files(); Yes, the shell script had this. But is it really necessary? + + struct argv_array cmd_args = ARGV_ARRAY_INIT; Declaration after statement. + argv_array_push(cmd_args, pack-objects); + argv_array_push(cmd_args, --keep-true-parents); + argv_array_push(cmd_args, --honor-pack-keep); + argv_array_push(cmd_args, --non-empty); + argv_array_push(cmd_args, --all); + argv_array_push(cmd_args, --reflog); + + if (window) + argv_array_pushf(cmd_args, --window=%u, window); + + if (window_memory) + argv_array_pushf(cmd_args, --window-memory=%u, window_memory); + + if (depth) + argv_array_pushf(cmd_args, --depth=%u, depth); + + if (max_pack_size) + argv_array_pushf(cmd_args, --max_pack_size=%u, max_pack_size); + + if (no_reuse_delta) + argv_array_pushf(cmd_args, --no-reuse-delta); + + if (no_reuse_object) + argv_array_pushf(cmd_args, --no-reuse-object); no_reuse_delta
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
On 08/20/2013 03:31 PM, Johannes Sixt wrote: Are the long forms of options your invention? I tried to keep strong similarity with the shell script for ease of review. In the shellscript the options where put in variables having these names, so for example there was: -f) no_reuse=--no-reuse-delta ;; -F) no_reuse=--no-reuse-object ;; So I used these variable names as well in here. And as I assumed the variables are meaningful in itself. In the shell script they may be meaningful, but with the option parser in the C version, I overlooked the possibility for --no-option being possible as you noted below. Maybe we should inverse the logic and have the variables and options called reuse-delta and being enabled by default. +OPT_BOOL('a', all, pack_everything, +N_(pack everything in a single pack)), +OPT_BOOL('A', all-but-loose, pack_everything_but_loose, +N_(same as -a, and turn unreachable objects loose)), --all-but-loose does not express what the help text says. The long form of -A is --all --unpack-unreachable, so it is really just a short option for convenience. It does not need its own long form. Ok, I'll keep that in mind, and will only use the varialbe tied to -A to set the -a and --unpack-unreachable variable. +OPT_BOOL('d', delete-redundant, delete_redundant, +N_(remove redundant packs, and run git-prune-packed)), +OPT_BOOL('f', no-reuse-delta, no_reuse_delta, +N_(pass --no-reuse-delta to git-pack-objects)), +OPT_BOOL('F', no-reuse-object, no_reuse_object, +N_(pass --no-reuse-object to git-pack-objects)), Do we want to allow --no-no-reuse-delta and --no-no-reuse-object? see above, I'd try not to. +OPT_BOOL('n', NULL, no_update_server_info, +N_(do not run git-update-server-info)), No long option name? This is also a negated option, so as above, maybe we could have --update_server_info and --no-update_server_info respectively. Talking about the shortform then: Is it possible to negate the shortform? +OPT__QUIET(quiet, N_(be quiet)), +OPT_BOOL('l', local, local, +N_(pass --local to git-pack-objects)), Good. +OPT_STRING(0, unpack-unreachable, unpack_unreachable, N_(approxidate), +N_(with -A, do not loosen objects older than this Packing constraints)), Packing constraints is a section heading, not a continuation of the previous help text. +OPT_INTEGER(0, window, window, +N_(size of the window used for delta compression)), This help text is suboptimal as the option is a count, not a size in the narrow sense. But that can be changed later (as it would affect other tools as well, I guess). +OPT_INTEGER(0, window-memory, window_memory, +N_(same as the above, but limit memory size instead of entries count)), +OPT_INTEGER(0, depth, depth, +N_(limits the maximum delta depth)), +OPT_INTEGER(0, max-pack-size, max_pack_size, +N_(maximum size of each packfile)), +OPT_END() +}; Good. + +git_config(repack_config, NULL); + +argc = parse_options(argc, argv, prefix, builtin_repack_options, +git_repack_usage, 0); + +sigchain_push_common(remove_pack_on_signal); Good. +packdir = mkpathdup(%s/pack, get_object_directory()); +packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, getpid()); Should this not be packdir = xstrdup(git_path(pack)); packtmp = xstrdup(git_path(pack/.tmp-%d-pack, getpid())); Perhaps make packdir and packtmp global so that the strings need not be duplicated in get_pack_filenames and remove_temporary_files? ok + +remove_temporary_files(); Yes, the shell script had this. But is it really necessary? Well I can drop it if it's not needed. It actually should implement rm -f $PACKTMP-* and then the trap 'rm -f $PACKTMP-*' 0 1 2 3 15 as well. + +struct argv_array cmd_args = ARGV_ARRAY_INIT; Declaration after statement. will fix. +argv_array_push(cmd_args, pack-objects); +argv_array_push(cmd_args, --keep-true-parents); +argv_array_push(cmd_args, --honor-pack-keep); +argv_array_push(cmd_args, --non-empty); +argv_array_push(cmd_args, --all); +argv_array_push(cmd_args, --reflog); + +if (window) +argv_array_pushf(cmd_args, --window=%u, window); + +if (window_memory) +argv_array_pushf(cmd_args, --window-memory=%u, window_memory); + +if (depth) +argv_array_pushf(cmd_args, --depth=%u, depth); + +if (max_pack_size) +argv_array_pushf(cmd_args, --max_pack_size=%u, max_pack_size); + +if (no_reuse_delta) +argv_array_pushf(cmd_args, --no-reuse-delta); + +if (no_reuse_object) +
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Am 20.08.2013 17:08, schrieb Stefan Beller: On 08/20/2013 03:31 PM, Johannes Sixt wrote: You cannot run_command() and then later read its output! You must split it into start_command(), read stdout, finish_command(). Thanks for this hint. Could that explain rare non-deterministic failures in the test suite? Yes, it's a possible explanation. -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Am 20.08.2013 17:08, schrieb Stefan Beller: On 08/20/2013 03:31 PM, Johannes Sixt wrote: Are the long forms of options your invention? I tried to keep strong similarity with the shell script for ease of review. In the shellscript the options where put in variables having these names, so for example there was: -f) no_reuse=--no-reuse-delta ;; -F) no_reuse=--no-reuse-object ;; So I used these variable names as well in here. And as I assumed the variables are meaningful in itself. In the shell script they may be meaningful, but with the option parser in the C version, I overlooked the possibility for --no-option being possible as you noted below. Maybe we should inverse the logic and have the variables and options called reuse-delta and being enabled by default. That's what git repack-objects does, which gets it passed to eventually. But I think Johannes also wanted to point out that the git-repack.sh doesn't recognize --no-reuse-delta, --all etc.. I think it's better to introduce new long options in a separate patch. Switching the programming language is big enough of a change already. :) +OPT_BOOL('f', no-reuse-delta, no_reuse_delta, +N_(pass --no-reuse-delta to git-pack-objects)), +OPT_BOOL('F', no-reuse-object, no_reuse_object, +N_(pass --no-reuse-object to git-pack-objects)), Do we want to allow --no-no-reuse-delta and --no-no-reuse-object? see above, I'd try not to. The declaration above allows --reuse-delta, --no-reuse-delta and --no-no-reuse-delta to be used. The latter looks funny, but I don't think we need to forbid it. That said, dropping the no- and thus declaring them the same way as repack-objects is a good idea. +OPT_BOOL('n', NULL, no_update_server_info, +N_(do not run git-update-server-info)), No long option name? This is also a negated option, so as above, maybe we could have --update_server_info and --no-update_server_info respectively. Talking about the shortform then: Is it possible to negate the shortform? Words in long options are separated by dashes, so --update-server-info. The no- prefix is provided for free by parseopt, unless the flag PARSE_OPT_NONEG is given. There is no automatic way to provide a short option that negates another short option. You can build such a pair explicitly using OPTION_BIT and OPTION_NEGBIT or with OPTION_SET_INT and different values. +if (pack_everything + pack_everything_but_loose == 0) { +argv_array_push(cmd_args, --unpacked); +argv_array_push(cmd_args, --incremental); +} else { +struct string_list fname_list = STRING_LIST_INIT_DUP; +get_pack_filenames(packdir, fname_list); +for_each_string_list_item(item, fname_list) { +char *fname; +fname = mkpathdup(%s/%s.keep, packdir, item-string); +if (stat(fname, statbuffer) S_ISREG(statbuffer.st_mode)) { t7700-repack.sh --valgrind fails and flags that line... if (!stat(fname, statbuffer) ... ... but with this fix it runs fine. I suspect that explains you sporadic test failures. But you are using file_exists() later. That should be good enough here as well, no? will do. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
On 08/20/2013 03:31 PM, Johannes Sixt wrote: +packdir = mkpathdup(%s/pack, get_object_directory()); +packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, getpid()); Should this not be packdir = xstrdup(git_path(pack)); packtmp = xstrdup(git_path(pack/.tmp-%d-pack, getpid())); Just a question for documentational purpose. ;) Am I right suggesting the following: `mkpathdup`:: Use parameters to build the path on the filesystem, i.e. create required folders and then return a duplicate of that path. The caller is responsible to free the memory `xstrdup`:: Duplicates the given string, making the caller responsible to free the return value. (No side effects to fs, other global memory). Basically the same as man 2 strdup with errorhandling. `git_path`:: Returns a pointer to a static string buffer, so it can just be used once or must be duplicated using xstrdup. The path given is relative and is inside the repository. Stefan signature.asc Description: OpenPGP digital signature
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Stefan Beller wrote: On 08/20/2013 03:31 PM, Johannes Sixt wrote: +packdir = mkpathdup(%s/pack, get_object_directory()); +packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, getpid()); Should this not be packdir = xstrdup(git_path(pack)); packtmp = xstrdup(git_path(pack/.tmp-%d-pack, getpid())); Just a question for documentational purpose. ;) Am I right suggesting the following: `mkpathdup`:: Use parameters to build the path on the filesystem, i.e. create required folders and then return a duplicate of that path. The caller is responsible to free the memory Right. mkpathdup is basically just mkpath composed with xstrdup, except that it avoids stomping on mkpath's buffers. The corresponding almost-shortcut for xstrdup(git_path(s)) is git_pathdup(s). But that's a minor detail. Maybe a new Documentation/technical/api-paths.txt is in order. Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
So here is an update of git-repack Thanks for all the reviews and annotations! I think I got all the suggestions except the use of git_path/mkpathdup. I replaced mkpathdup by mkpath where possible, but it's still not perfect. I'll wait for the dokumentation patch of Jonathan, before changing all these occurences forth and back again. What would be perfect here would be a function which just does string processing and returning, so fname = create_string(fmt, ...); or with duplication: fname = create_string_dup(fmt, ...); Ah wait! There are struct str_buf, but these would require more lines (init, add to buffer, get as char*) Below there is just the diff against RFC PATCHv4, however I'll send the whole patch as well. Thanks, Stefan --8-- From e544eb9b7bdea6c2000c5f0d3043845fb901e90b Mon Sep 17 00:00:00 2001 From: Stefan Beller stefanbel...@googlemail.com Date: Wed, 21 Aug 2013 00:35:18 +0200 Subject: [PATCH] Suggestions of reviewers --- builtin/repack.c | 104 +++ 1 file changed, 51 insertions(+), 53 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index a87900e..9fbe636 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -67,7 +67,7 @@ void get_pack_filenames(char *packdir, struct string_list *fname_list) struct dirent *e; char *path, *suffix, *fname; - path = mkpathdup(%s/pack, get_object_directory()); + path = mkpath(%s/pack, get_object_directory()); suffix = .pack; dir = opendir(path); @@ -78,7 +78,6 @@ void get_pack_filenames(char *packdir, struct string_list *fname_list) string_list_append_nodup(fname_list, fname); } } - free(path); closedir(dir); } @@ -88,14 +87,25 @@ void remove_pack(char *path, char* sha1) int ext = 0; for (ext = 0; ext 3; ext++) { char *fname; - fname = mkpathdup(%s/%s%s, path, sha1, exts[ext]); + fname = mkpath(%s/%s%s, path, sha1, exts[ext]); unlink(fname); - free(fname); } } int cmd_repack(int argc, const char **argv, const char *prefix) { + char *exts[2] = {.idx, .pack}; + char *packdir, *packtmp, line[1024]; + struct child_process cmd; + struct string_list_item *item; + struct argv_array cmd_args = ARGV_ARRAY_INIT; + struct string_list names = STRING_LIST_INIT_DUP; + struct string_list rollback = STRING_LIST_INIT_DUP; + struct string_list existing_packs = STRING_LIST_INIT_DUP; + int count_packs, ext; + FILE *out; + + /* variables to be filled by option parsing */ int pack_everything = 0; int pack_everything_but_loose = 0; int delete_redundant = 0; @@ -107,24 +117,17 @@ int cmd_repack(int argc, const char **argv, const char *prefix) { int no_update_server_info = 0; int quiet = 0; int local = 0; - char *packdir, *packtmp; - struct child_process cmd; - struct string_list_item *item; - struct string_list existing_packs = STRING_LIST_INIT_DUP; - struct stat statbuffer; - int ext; - char *exts[2] = {.idx, .pack}; struct option builtin_repack_options[] = { - OPT_BOOL('a', all, pack_everything, + OPT_BOOL('a', NULL, pack_everything, N_(pack everything in a single pack)), - OPT_BOOL('A', all-but-loose, pack_everything_but_loose, + OPT_BOOL('A', NULL, pack_everything_but_loose, N_(same as -a, and turn unreachable objects loose)), - OPT_BOOL('d', delete-redundant, delete_redundant, + OPT_BOOL('d', NULL, delete_redundant, N_(remove redundant packs, and run git-prune-packed)), - OPT_BOOL('f', no-reuse-delta, no_reuse_delta, + OPT_BOOL('f', NULL, no_reuse_delta, N_(pass --no-reuse-delta to git-pack-objects)), - OPT_BOOL('F', no-reuse-object, no_reuse_object, + OPT_BOOL('F', NULL, no_reuse_object, N_(pass --no-reuse-object to git-pack-objects)), OPT_BOOL('n', NULL, no_update_server_info, N_(do not run git-update-server-info)), @@ -154,9 +157,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix) { packdir = mkpathdup(%s/pack, get_object_directory()); packtmp = mkpathdup(%s/.tmp-%d-pack, packdir, getpid()); - remove_temporary_files(); - - struct argv_array cmd_args = ARGV_ARRAY_INIT; argv_array_push(cmd_args, pack-objects); argv_array_push(cmd_args, --keep-true-parents); argv_array_push(cmd_args, --honor-pack-keep); @@ -191,7 +191,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Stefan Beller wrote: I think I got all the suggestions except the use of git_path/mkpathdup. I replaced mkpathdup by mkpath where possible, but it's still not perfect. No, mkpathdup is generally better unless you know what you're doing. I'll wait for the dokumentation patch of Jonathan, I never promised to write one. :) I would have preferred to have a rough draft with the results of your investigations so far to start from. Oh well. I'll look into it tonight. Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html