DO GET BACK IF INTERESTED!!

2017-07-28 Thread Micheal Lenn
Hello

I am Mr Micheal Lennings , an Auditor Officer, PNC bank here in U.S.A. I 
believe it is the wish of God for me to come across you now. I have been in 
search of someone with this last name  ,so when I saw you online, I was pushed 
to contact you and see how best we can assist each other. I am having an 
important business discussion I wish to share with you which I believe will 
interest you, because it is in connection with your last name and you are going 
to benefit from.

One Late jeremiah , a citizen of your country had a fixed deposit with my bank 
in 2005 for 108 calendar months, valued at US$30,000,000.00 (Thirty Million 
United State Dollars) the due date for this deposit contract was last 22nd of 
January 2015. Sadly jerry was among the death victims in the May 27 2006 
Earthquake disaster in Java, Indonesia that killed over 5,000 people. He was in 
Indonesia on a business trip and that was how he met his end. My bank 
management is yet to know about his death, I knew about it because he was my 
friend and I am his account officer. jeremiah did not mention any Next of Kin/ 
Heir when the account was opened, and he was not married and no children. Last 
week my Bank Management requested that jeremiah should give instructions on 
what to do about his funds, if to renew the contract.

I know this will happen and that is why I have been looking for a means to 
handle the situation, because if my Bank Directors happens to know that 
jeremiah is dead and do not have any Heir, they will taKe it to there personal 
purse.There is no risk involved; the transaction will be executed under a 
legitimate arrangement that will protect you from any breach of law. It is 
better that we claim the money, than allowing the Bank Directors to take it, 
they are rich already. I am not a greedy person, so I am suggesting we share 
the funds equal, 50/50% to both parties, my share will assist me to start my 
own company which has been my dream. Let me know your mind on this and please 
do treat this information as TOP SECRET. We shall go over the details once I 
receive your urgent response strictly through my personal 
Email:mike.lennni...@gmail.com

Have a nice day and God bless. Anticipating your communication.

Regards,
Mr.Micheal Lennings
Senior Audior Officer,
PNC Bank


Re: Guidlines for error messages

2017-07-28 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> On reading the CodingGuidelines, I saw a section that specifies rules
> about the structure ...
> ...
> That makes me wonder, has the guideline changed ?
> Is this something that must be fixed ?
> Am I missing something ?

This applies not just to the message format but with ANY guideline
we have around here.

As we did not have perfect foresight when we started the project,
the guidelines grew out of the best practice we learned over time.
We try to catch violations for new code during the review process,
so that we won't add violators, but we do _not_ actively hunt for
existing violators and change them _only_ to fix them, which is
unneeded code churn.  Instead, over time as people notice, most
often while doing a real work in the vicinity of the code that has
violations, we fix them as preparatory clean-up patches before the
real work happens.


Re: [PATCH/RFC] setup: update error message to be more meaningful

2017-07-28 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> Though the message seems to be most fitting one, I'm a little reluctant
> to use it as it "might" create a wrong picture on the minds of the user
> making them think this would be the case in other cases too, which we
> know is not true. For example,
>
>
> git log -p --full-diff master --stat
>
> git commit -v Makefile --amend

These are accepted not by design but by accident.  

I do not think we even document that you are allowed to do these in
"log" and "commit" manual pages, and we should discourage them (I do
not even mind eventually removing these with the usual deprecation
dance, but I do not think it is worth the noise).



Guidlines for error messages

2017-07-28 Thread Kaartic Sivaraam
Hello all,

On reading the CodingGuidelines, I saw a section that specifies rules
about the structure and formatting of error messages. I have reproduced
it below,


Error Messages

 - Do not end error messages with a full stop.

 - Do not capitalize ("unable to open %s", not "Unable to open %s")

 - Say what the error is first ("cannot open %s", not "%s: cannot open")


Having used git all these days, I have seen error messages that do seem
to be violating the above guidelines (at least the first 2 points). A
few are,


Aborting commit due to empty commit message.


fatal: There is no merge to abort (MERGE_HEAD missing).


fatal: You have not concluded your merge (MERGE_HEAD exists).
Please, commit your changes before you merge.


error: Empty commit message.
Not committing merge; use 'git commit' to complete the merge.


error: pathspec 'hell' did not match any file(s) known to git.


fatal: Not a valid object name foo


fatal: ambiguous argument 'foo..bar': unknown revision or path not in the 
working tree.
Use '--' to separate paths from revisions, like this:
'git  [...] -- [...]'


A few that do seem to be following the guideline are

fatal: could not read log file 'imaginary_file': No such file or directory

fatal: no input file given for in-place editing



That makes me wonder, has the guideline changed ?
Is this something that must be fixed ?
Am I missing something ?

-- 
Kaartic


Re: [PATCH/RFC] setup: update error message to be more meaningful

2017-07-28 Thread Kaartic Sivaraam
On Wed, 2017-07-26 at 13:09 -0700, Junio C Hamano wrote:
> Jonathan Nieder  writes:
> 
> > For an initial guess: in the example
> > 
> > git grep test -n
> > 
> > ...
> >  2. Focus on "argument" instead of "filename" so that the message
> > could still apply: something like
> > 
> > fatal: option '-n' must come before non-option arguments
> 
> I think this one is the most sensible.  There may or may not be a
> file called "test" in the working tree, and the user may or may not
> meant to look for a pattern "test".  What is wrong in the sample
> command line is that "test" is not a dashed option and yet it has a
> dashed option "-n" after it, and your version clearly explains it.
Though the message seems to be most fitting one, I'm a little reluctant
to use it as it "might" create a wrong picture on the minds of the user
making them think this would be the case in other cases too, which we
know is not true. For example,


git log -p --full-diff master --stat

git commit -v Makefile --amend


The above are valid commands and produce expected result even though
the 'option argument' comes after the 'non-option' argument. Thus, I
thought a "general statement" like the one above might create a wrong
picture on the minds of user. Any thoughts about how to overcome this?

-- 
Kaartic


Re: reftable [v3]: new ref storage format

2017-07-28 Thread Shawn Pearce
On Fri, Jul 28, 2017 at 7:18 PM, Michael Haggerty  wrote:
> On Fri, Jul 28, 2017 at 3:12 PM, Shawn Pearce  wrote:
>> I'm with you this far, and like the {min,max}_update_index in the
>> header. I'm concerned about update_index in 32 bits. At some point you
>> need to reset the counter, or the repository is broken. 4b updates is
>> enough for anyone? I'd feel better about this being a 64 bit field.
>
> Yes, I was a little bit nervous about 32 bits, too. But that's a *lot*
> of updates: one per second for 136 years. If that limit were ever
> reached, there could be a compaction step, where any update indices
> that don't have associated reflog entries are "compacted out" of the
> numerical sequence and the remaining indices are renumbered
> contiguously.

I considered that, but its a bit of a pain for the writer to renumber
the remaining records.

> But it's ok with me to make it 64 bits. Usually those extra bytes
> would be appear as  and so should prefix- and zlib-compress
> well.

That was my thought. Within a single reference these will prefix
compress right out, and zlib will fix any sins within the log block at
restart points.


Re: reftable [v3]: new ref storage format

2017-07-28 Thread Michael Haggerty
On Fri, Jul 28, 2017 at 3:12 PM, Shawn Pearce  wrote:
> I'm with you this far, and like the {min,max}_update_index in the
> header. I'm concerned about update_index in 32 bits. At some point you
> need to reset the counter, or the repository is broken. 4b updates is
> enough for anyone? I'd feel better about this being a 64 bit field.

Yes, I was a little bit nervous about 32 bits, too. But that's a *lot*
of updates: one per second for 136 years. If that limit were ever
reached, there could be a compaction step, where any update indices
that don't have associated reflog entries are "compacted out" of the
numerical sequence and the remaining indices are renumbered
contiguously.

But it's ok with me to make it 64 bits. Usually those extra bytes
would be appear as  and so should prefix- and zlib-compress
well.

Michael


Re: [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-28 Thread Junio C Hamano
Jonathan Tan  writes:

>> >  extern int repository_format_precious_objects;
>> > +extern char *repository_format_lazy_object;
>> 
>> This is not a new problem, but I think these two should be
>> called repository_extension_$NAME not repository_format_$NAME.
>
> Looking at the original commit 067fbd4 ("introduce "preciousObjects"
> repository extension", 2015-06-24), it seems that this was so named to
> be analogous to the existing "struct repository_format { int version;
> ...}" => "int repository_format_version;". The existing
> repository_format_$NAME thus seems reasonable to me.

OK.  They smell like "repository extension" to me, but probably the
fully spelled name of the concept is "repository format extension",
so using the word "format" out of that phrase sounds OK to me.

Thanks.


Re: [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-28 Thread Jonathan Tan
On Thu, 27 Jul 2017 11:55:46 -0700
Junio C Hamano  wrote:

> My reading hiccupped after the first sentence, as the problem
> description made it sound like this was a boolean ("are we using
> lazy object feature?"), after reading "data type string".  And then
> "the command in that option" made me hiccup one more time, as it did
> not "click" that "in that option" was trying to say that the string
> is used as the command name (or is it a whole command line?  The
> leading part of the command line to which some arguments are
> appended before it gets invoked as a command? or what?).
> 
> Logically, I think it is more like
> 
>  - extensions.lazyobject can be set to tell Git to consider missing
>objects in certain cases are not errors;
> 
>  - the value of extensions.lazyobject variable must be a string,
>which is used to name the command to lazily make the object
>"appear" in the repository on demand.

OK, I'll update the commit message in the next reroll.

> >  extern int repository_format_precious_objects;
> > +extern char *repository_format_lazy_object;
> 
> This is not a new problem, but I think these two should be
> called repository_extension_$NAME not repository_format_$NAME.

Looking at the original commit 067fbd4 ("introduce "preciousObjects"
repository extension", 2015-06-24), it seems that this was so named to
be analogous to the existing "struct repository_format { int version;
...}" => "int repository_format_version;". The existing
repository_format_$NAME thus seems reasonable to me.


Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-28 Thread Junio C Hamano
Stefan Beller  writes:

> The first test marked relies on hard coded sha1:
>
>   # We need to create two object whose sha1s start with 17
>   # since this is what git gc counts.  As it happens, these
>   # two blobs will do so.
>   test_commit 263 &&
>   test_commit 410 &&
>
> The next two seem to rely on state from the first one, I did not
> investigate.

I am moderately negative on this approach, if it is meant to suggest
the final shape of our test suite patch 1/2 started.

This script may be a good example you can use to demonstrate a much
better approach.  As the above comment in the test shows, we want to
create two objects whose object names begin with "17", and running
test_commit with 263 and 410 at this point in the test was a way to
achieve that when Git uses SHA-1 as its hash.

When we use a hash different from SHA-1, the exact strings 263 and
410 may change, but we should be able to find two other strings that
has the same property (i.e. they results in objects that share the
prefix "17").  Perhaps a better way forward for this kind of test is
to parameterize these hardcoded constants and make it easier to use
different values without having to change the rest of the script
when we switch the hash function?  So perhaps have something like

case "$GIT_HASH_FUNCTION" in
SHA-1)  
TEST_17_1="263 410" ;;
CORRUPT-SHA-1)  
TEST_17_1="something else" ;;
esac

near the top of the script and update the above two with:

for token in $TEST_17_1
do
test_commit "$token" || return 1
done &&

would prepare us to switch to SHA-256 or whatever hash function we
choose to use in the future?

t1512 is another example with full of such tests that we would want
to keep moral equivalents in the world with a different hash,
instead of reducing the test coverage.


Re: reftable [v3]: new ref storage format

2017-07-28 Thread Shawn Pearce
On Thu, Jul 27, 2017 at 7:28 AM, Michael Haggerty  wrote:
> On Sat, Jul 22, 2017 at 11:29 AM, Shawn Pearce  wrote:
>> 3rd iteration of the reftable storage format.
>>
>> [...]
>> ### Objectives
>>
>> - Near constant time lookup for any single reference, even when the
>>   repository is cold and not in process or kernel cache.
>> - Near constant time verification a SHA-1 is referred to by at least
>>   one reference (for allow-tip-sha1-in-want).
>> - Efficient lookup of an entire namespace, such as `refs/tags/`.
>> - Support atomic push `O(size_of_update)` operations.
>> - Combine reflog storage with ref storage.
>
> I think that the optimization enabled by obj_records is only
> interesting for server-side repositories that will frequently be
> fetched from, and which additionally have allow-tip-sha1-in-want
> turned on, right? So could we make it explicitly optional?

Correct. Its already optional. Maybe I'm just not clear enough about
that in the document. I agree its not required in the file, and
writers can choose to omit obj_record.


>>  obj record
>>
>> An `obj_record` describes a single object abbreviation, and the blocks
>> containing references using that unique abbreviation:
>>
>> varint( prefix_length )
>> varint( (suffix_length << 3) | cnt_3 )
>> suffix
>> varint( cnt_rest )?
>> varint( block_delta )+
>>
> [...]
>> Each record contains `block_count` number of block identifiers for ref
>> blocks.  The `block_count` is determined by:
>>
>> block_count = cnt_3
>> if (cnt_3 == 0x7) {
>>   block_count += cnt_rest
>> }
>>
>> The `cnt_rest` field is only present when `block_count >= 0x7` and
>> could overflow the `cnt_3` field available in the record start.  This
>> encoding scheme is used as the vast majority of abbreviations are
>> only one reference (or at most a few), and unlikely to exceed 6 blocks.
>
> I agree that this `cnt_3` handling is overly cute. There will never be
> records with `block_count == 0`, will there? Then I propose that
> `cnt_3` be set to zero if `block_count` is greater than 7, in which
> case the full block count is stored as a varint. It's simpler, and I
> think the only `block_count`s for which this scheme costs a byte more
> than your scheme are implausible: 128-134, 16383-16390, etc.

That is a great idea, thanks. I've updated the draft to reflect your
approach of cnt_3 = 0 indicates the actual count follows in a varint.


>> [...]
>> ### Log block format
>
> I'm still not happy about how you store reflogs. Here are my objections:
>
> * You store reflogs by appending a high-resolution time to the
> refname. This is awkward:
>   * There is always some uncertainty about uniqueness, even with a
> high-resolution clock. If you want to eliminate that uncertainty, you
> have to read the last entry in the reflog for the reference *for every
> update* to be sure that your timestamp is not already "in use".
>   * There is a need to invent microsecond values on systems with
> low-resolution clocks, again to ensure uniqueness.
>   * If there is clock skew, then either you lose information about the
> *order* of reference updates (which I would consider unacceptable), or
> you might have to invent fictional times for new updates to retain the
> order relative to previous updates. But inventing fictional times is
> very problematic: suppose somebody's clock is incorrectly set to the
> year 2020 when they make a commit. As far as Git is concerned, that is
> plausible, so the reflog entry gets that timestamp. Then the user
> corrects the clock and makes another commit. We would either be forced
> to rewrite the old reflog entry(ies) or to give a timestamp in 2020 to
> the new update.
> * With reftable storage, multiple-reference updates can be done
> atomically. But the reflogs don't retain the information about which
> references were updated together.
> * Even if all reflogs are retained, it is not possible to deduce the
> global state of all references at a moment in the past.
>
> I propose to solve all of the above problems, plus some others, as follows:
>
> Let's keep an integer `update_index`, which starts at zero and
> increases by one each atomic reference update (i.e., it increases by
> one even if the update touches multiple references).
>
> Let's store reflog entries under keys `(refname, update_index)`, and
> store the timestamp of the entry *within* the record rather than as
> part of the key. The tuple `(refname, update_index)` has to be encoded
> somehow that ensures the desired order; see below. This encoding
> ensures that the order of reference updates is well-defined regardless
> of clock skew, and that one can use the reflog to determine which
> references were updated together and what the complete state of
> references was after any update (assuming that the user retains all
> reflogs).
>
> Let's store in the header of each reftable file the `min_update_index`
> and 

Re: [PATCH 1/2] alter hash function: expose test dependencies on sha1

2017-07-28 Thread Junio C Hamano
Stefan Beller  writes:

> DO NOT APPLY.
>
> Alter the hash function such that with this patch
> any dependency on sha1 in tests will make the test
> fail. This patch applied on master yields this list:
>
> ./t-basic.sh
> 
> ./t8008-blame-formats.sh
>
> Signed-off-by: Stefan Beller 
> ---
>  sha1dc/sha1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Everybody knew and said that we need to make tests less dependent on
the exact hash and in such a way that the ones that currently do
would test the morally equivalent thing with a new hash before we
can go forward.  And everybody said that one way to start that is to
build a Git with different hash function and trying to run tests.

This is the first report of doing so, and the large list above is
the biggest contribution this message makes to the "let's wean
ourselves off of SHA-1" journey.

Thanks for starting this.

>
> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
> index 25eded1399..e18acee9ca 100644
> --- a/sha1dc/sha1.c
> +++ b/sha1dc/sha1.c
> @@ -1756,7 +1756,7 @@ static void sha1_process(SHA1_CTX* ctx, const uint32_t 
> block[16])
>  void SHA1DCInit(SHA1_CTX* ctx)
>  {
>   ctx->total = 0;
> - ctx->ihv[0] = 0x67452301;
> + ctx->ihv[0] = 0x07452301;
>   ctx->ihv[1] = 0xEFCDAB89;
>   ctx->ihv[2] = 0x98BADCFE;
>   ctx->ihv[3] = 0x10325476;


Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-28 Thread Stefan Beller
On Fri, Jul 28, 2017 at 10:59 AM, Junio C Hamano  wrote:
> There is another one that is better suited for this demonstration patch,
> whose sole point is about creating bunch of objects that prefix conflicts
> with each other to test the abbreviation machinery.

'Another one' meaning in another test file? I'll go look for these
tests.

The first patch produces lots of false positives, so maybe I'll need
to work on that (is there another code path for sha1?).

Thanks,
Stefan


[PATCH] Documentation/checkout: clarify submodule HEADs to be detached

2017-07-28 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---

  On top of sb/submodule-recursive-checkout-detach-head
  
  I also checked other man pages such as read-tree, which
  already mentions the behavior as the fix implements:
  
   --[no-]recurse-submodules
   Using --recurse-submodules will update the content of all
   initialized submodules according to the commit recorded in the
   superproject by calling read-tree recursively, also setting the
   submodules HEAD to be detached at that commit.

 Documentation/git-checkout.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index d6399c0af8..84bd323a00 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -262,6 +262,8 @@ section of linkgit:git-add[1] to learn how to operate the 
`--patch` mode.
local modifications in a submodule would be overwritten the checkout
will fail unless `-f` is used. If nothing (or --no-recurse-submodules)
is used, the work trees of submodules will not be updated.
+   Just like linkgit:git-submodule[1], this will detach the
+   submodules HEAD.
 
 ::
Branch to checkout; if it refers to a branch (i.e., a name that,
-- 
2.14.0.rc0.3.g6c2e499285



[PATCH] tests: ensure fsck fails on corrupt packfiles

2017-07-28 Thread Jonathan Tan
t1450-fsck.sh does not have a test that checks fsck's behavior when a
packfile is invalid. It does have a test for when an object in a
packfile is invalid, but in that test, the packfile itself is valid.

Add such a test.

Signed-off-by: Jonathan Tan 
---
I think the existing packfile corruption test (t5303) is good enough
that we would notice such things, but just in case we want to test fsck
specifically, here's a patch.
---
 t/t1450-fsck.sh | 16 
 1 file changed, 16 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bb89e1a5d..4087150db 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -608,6 +608,22 @@ test_expect_success 'fsck errors in packed objects' '
! grep corrupt out
 '
 
+test_expect_success 'fsck fails on corrupt packfile' '
+   hsh=$(git commit-tree -m mycommit HEAD^{tree}) &&
+   pack=$(echo $hsh | git pack-objects .git/objects/pack/pack) &&
+
+   # Corrupt the first byte of the first object. (It contains 3 type bits,
+   # at least one of which is not zero, so setting the first byte to 0 is
+   # sufficient.)
+   chmod a+w .git/objects/pack/pack-$pack.pack &&
+   printf '\0' | dd of=.git/objects/pack/pack-$pack.pack bs=1 conv=notrunc 
seek=12 &&
+
+   test_when_finished "rm -f .git/objects/pack/pack-$pack.*" &&
+   remove_object $hsh &&
+   test_must_fail git fsck 2>out &&
+   test_i18ngrep "checksum mismatch" out
+'
+
 test_expect_success 'fsck finds problems in duplicate loose objects' '
rm -rf broken-duplicate &&
git init broken-duplicate &&
-- 
2.14.0.rc0.400.g1c36432dff-goog



Re: [PATCH v2] contrib/rerere-train: optionally overwrite existing resolutions

2017-07-28 Thread Raman Gupta
On 26/07/17 04:41 PM, Junio C Hamano wrote:
> Raman Gupta  writes:
> 
>> On 26/07/17 02:05 PM, Junio C Hamano wrote:
>>> I haven't tried this patch, but would this work well with options
>>> meant for the 'git rev-list --parents "$@"' that grabs the list of
>>> merge commits to learn from?  e.g.
>>>
>>> $ contrib/rerere-train.sh -n 4 --merges master
>>> $ contrib/rerere-train.sh --overwrite -n 4 --merges master
>>> $ contrib/rerere-train.sh -n 4 --overwrite --merges master
>>>
>>> I do not think it is necessary to make the last one work; as long as
>>> the first two work as expected, we are good even if the last one
>>> dies with a sensible message e.g. "options X, Y and Z must be given
>>> before other options" (currently "X, Y and Z" consists only of
>>> "--overwrite", but I think you get what I mean).
>>
>> You're right -- I didn't try all the cases. I wasn't able to figure
>> out how to get `rev-parse --parseopt` to deal with this situation, so
>> I did it manually. I'm not super-happy with the result, but it does
>> work. Look for PATCH v3.
> 
> Yes, I think you could squash the two case arms in the later loop
> into one i.e.
> 
>   -h|--help|-o|--overwrite)
>   die "please don't." ;;

I considered that but decided the non-collapsed version better
supports this list growing in the future.

> but still the repetition does look ugly.

Agreed.

> As a contrib/ material, I do not care too deeply about it, though.
> 
> Will queue.

Thanks.

Regards,
Raman



Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-28 Thread Junio C Hamano
There is another one that is better suited for this demonstration patch,
whose sole point is about creating bunch of objects that prefix conflicts
with each other to test the abbreviation machinery.



On Fri, Jul 28, 2017 at 10:18 AM, Stefan Beller  wrote:
> The first test marked relies on hard coded sha1:
>
> # We need to create two object whose sha1s start with 17
> # since this is what git gc counts.  As it happens, these
> # two blobs will do so.
> test_commit 263 &&
> test_commit 410 &&
>
> The next two seem to rely on state from the first one, I did not
> investigate.
>
> Signed-off-by: Stefan Beller 
> ---
>  t/t6500-gc.sh | 6 +++---
>  t/test-lib.sh | 4 
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index 41b0be575d..3900baa01d 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -43,7 +43,7 @@ test_expect_success 'gc is not aborted due to a stale 
> symref' '
> )
>  '
>
> -test_expect_success 'auto gc with too many loose objects does not attempt to 
> create bitmaps' '
> +test_expect_success SHA1 'auto gc with too many loose objects does not 
> attempt to create bitmaps' '
> test_config gc.auto 3 &&
> test_config gc.autodetach false &&
> test_config pack.writebitmaps true &&
> @@ -77,7 +77,7 @@ run_and_wait_for_auto_gc () {
> doesnt_matter=$(git gc --auto 9>&1)
>  }
>
> -test_expect_success 'background auto gc does not run if gc.log is present 
> and recent but does if it is old' '
> +test_expect_success SHA1 'background auto gc does not run if gc.log is 
> present and recent but does if it is old' '
> test_commit foo &&
> test_commit bar &&
> git repack &&
> @@ -95,7 +95,7 @@ test_expect_success 'background auto gc does not run if 
> gc.log is present and re
> test_line_count = 1 packs
>  '
>
> -test_expect_success 'background auto gc respects lock for all operations' '
> +test_expect_success SHA1 'background auto gc respects lock for all 
> operations' '
> # make sure we run a background auto-gc
> test_commit make-pack &&
> git repack &&
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 1b6e53f78a..a5a54c6d4a 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1127,6 +1127,10 @@ test_lazy_prereq JGIT '
> type jgit
>  '
>
> +test_lazy_prereq SHA1 '
> +   false
> +'
> +
>  # SANITY is about "can you correctly predict what the filesystem would
>  # do by only looking at the permission bits of the files and
>  # directories?"  A typical example of !SANITY is running the test
> --
> 2.14.0.rc0.3.g6c2e499285
>


[PATCH 1/2] alter hash function: expose test dependencies on sha1

2017-07-28 Thread Stefan Beller
DO NOT APPLY.

Alter the hash function such that with this patch
any dependency on sha1 in tests will make the test
fail. This patch applied on master yields this list:

./t-basic.sh
./t0002-gitfile.sh
./t0005-signals.sh
./t0013-sha1dc.sh
./t0021-conversion.sh
./t0090-cache-tree.sh
./t1001-read-tree-m-2way.sh
./t1007-hash-object.sh
./t1011-read-tree-sparse-checkout.sh
./t1013-read-tree-submodule.sh
./t1100-commit-tree-options.sh
./t1200-tutorial.sh
./t1300-repo-config.sh
./t1304-default-acl.sh
./t1400-update-ref.sh
./t1411-reflog-show.sh
./t1450-fsck.sh
./t1507-rev-parse-upstream.sh
./t1512-rev-parse-disambiguation.sh
./t1700-split-index.sh
./t2011-checkout-invalid-head.sh
./t2013-checkout-submodule.sh
./t2015-checkout-unborn.sh
./t2017-checkout-orphan.sh
./t2022-checkout-paths.sh
./t2101-update-index-reupdate.sh
./t2107-update-index-basic.sh
./t2203-add-intent.sh
./t3033-merge-toplevel.sh
./t3102-ls-tree-wildcards.sh
./t3103-ls-tree-misc.sh
./t3201-branch-contains.sh
./t3301-notes.sh
./t3305-notes-fanout.sh
./t3306-notes-prune.sh
./t3308-notes-merge.sh
./t3309-notes-merge-auto-resolve.sh
./t3310-notes-merge-manual-resolve.sh
./t3311-notes-merge-fanout.sh
./t3400-rebase.sh
./t3404-rebase-interactive.sh
./t3405-rebase-malformed.sh
./t3408-rebase-multi-line.sh
./t3415-rebase-autosquash.sh
./t3419-rebase-patch-id.sh
./t3421-rebase-topology-linear.sh
./t3501-revert-cherry-pick.sh
./t3502-cherry-pick-merge.sh
./t3503-cherry-pick-root.sh
./t3506-cherry-pick-ff.sh
./t3509-cherry-pick-merge-df.sh
./t3600-rm.sh
./t3700-add.sh
./t3701-add-interactive.sh
./t3702-add-edit.sh
./t3903-stash.sh
./t3905-stash-include-untracked.sh
./t4002-diff-basic.sh
./t4007-rename-3.sh
./t4008-diff-break-rewrite.sh
./t4010-diff-pathspec.sh
./t4011-diff-symlink.sh
./t4013-diff-various.sh
./t4014-format-patch.sh
./t4015-diff-whitespace.sh
./t4020-diff-external.sh
./t4022-diff-rewrite.sh
./t4029-diff-trailing-space.sh
./t4030-diff-textconv.sh
./t4033-diff-patience.sh
./t4034-diff-words.sh
./t4039-diff-assume-unchanged.sh
./t4042-diff-textconv-caching.sh
./t4044-diff-index-unique-abbrev.sh
./t4045-diff-relative.sh
./t4048-diff-combined-binary.sh
./t4050-diff-histogram.sh
./t4052-stat-output.sh
./t4054-diff-bogus-tree.sh
./t4060-diff-submodule-option-diff-format.sh
./t4126-apply-empty.sh
./t4151-am-abort.sh
./t4202-log.sh
./t4205-log-pretty-formats.sh
./t4208-log-magic-pathspec.sh
./t4211-line-log.sh
./t4300-merge-tree.sh
./t5150-request-pull.sh
./t5300-pack-object.sh
./t5306-pack-nobase.sh
./t5308-pack-detect-duplicates.sh
./t5309-pack-delta-cycles.sh
./t5313-pack-bounds-checks.sh
./t5512-ls-remote.sh
./t5515-fetch-merge-logic.sh
./t5516-fetch-push.sh
./t5520-pull.sh
./t5521-pull-options.sh
./t6000-rev-list-misc.sh
./t6012-rev-list-simplify.sh
./t6020-merge-df.sh
./t6022-merge-rename.sh
./t6024-recursive-merge.sh
./t6030-bisect-porcelain.sh
./t6031-merge-filemode.sh
./t6035-merge-dir-to-symlink.sh
./t6300-for-each-ref.sh
./t6500-gc.sh
./t7003-filter-branch.sh
./t7012-skip-worktree-writing.sh
./t7063-status-untracked-cache.sh
./t7102-reset.sh
./t7106-reset-unborn-branch.sh
./t7112-reset-submodule.sh
./t7201-co.sh
./t7400-submodule-basic.sh
./t7506-status-submodule.sh
./t7507-commit-verbose.sh
./t7508-status.sh
./t7600-merge.sh
./t7607-merge-overwrite.sh
./t7609-merge-co-error-msgs.sh
./t8008-blame-formats.sh

Signed-off-by: Stefan Beller 
---
 sha1dc/sha1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index 25eded1399..e18acee9ca 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -1756,7 +1756,7 @@ static void sha1_process(SHA1_CTX* ctx, const uint32_t 
block[16])
 void SHA1DCInit(SHA1_CTX* ctx)
 {
ctx->total = 0;
-   ctx->ihv[0] = 0x67452301;
+   ctx->ihv[0] = 0x07452301;
ctx->ihv[1] = 0xEFCDAB89;
ctx->ihv[2] = 0x98BADCFE;
ctx->ihv[3] = 0x10325476;
-- 
2.14.0.rc0.3.g6c2e499285



[PATCH 2/2] t6500: mark tests as SHA1 reliant

2017-07-28 Thread Stefan Beller
The first test marked relies on hard coded sha1:

# We need to create two object whose sha1s start with 17
# since this is what git gc counts.  As it happens, these
# two blobs will do so.
test_commit 263 &&
test_commit 410 &&

The next two seem to rely on state from the first one, I did not
investigate.

Signed-off-by: Stefan Beller 
---
 t/t6500-gc.sh | 6 +++---
 t/test-lib.sh | 4 
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 41b0be575d..3900baa01d 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -43,7 +43,7 @@ test_expect_success 'gc is not aborted due to a stale symref' 
'
)
 '
 
-test_expect_success 'auto gc with too many loose objects does not attempt to 
create bitmaps' '
+test_expect_success SHA1 'auto gc with too many loose objects does not attempt 
to create bitmaps' '
test_config gc.auto 3 &&
test_config gc.autodetach false &&
test_config pack.writebitmaps true &&
@@ -77,7 +77,7 @@ run_and_wait_for_auto_gc () {
doesnt_matter=$(git gc --auto 9>&1)
 }
 
-test_expect_success 'background auto gc does not run if gc.log is present and 
recent but does if it is old' '
+test_expect_success SHA1 'background auto gc does not run if gc.log is present 
and recent but does if it is old' '
test_commit foo &&
test_commit bar &&
git repack &&
@@ -95,7 +95,7 @@ test_expect_success 'background auto gc does not run if 
gc.log is present and re
test_line_count = 1 packs
 '
 
-test_expect_success 'background auto gc respects lock for all operations' '
+test_expect_success SHA1 'background auto gc respects lock for all operations' 
'
# make sure we run a background auto-gc
test_commit make-pack &&
git repack &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1b6e53f78a..a5a54c6d4a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1127,6 +1127,10 @@ test_lazy_prereq JGIT '
type jgit
 '
 
+test_lazy_prereq SHA1 '
+   false
+'
+
 # SANITY is about "can you correctly predict what the filesystem would
 # do by only looking at the permission bits of the files and
 # directories?"  A typical example of !SANITY is running the test
-- 
2.14.0.rc0.3.g6c2e499285



[RFD PATCH 0/2] How to migrate our test infrastructure away from sha1

2017-07-28 Thread Stefan Beller
Brian got the ball rolling with his object-id patches
to fixup the code base, but we also need to discuss how
we approach our test infrastructure eventually.

C.f.:
* jb/t8008-cleanup (2017-07-26) 1 commit
 - t8008: rely on rev-parse'd HEAD instead of sha1 value
* sb/t4005-modernize (Remove hard coded sha1 values.)

This presents an alternative approach in the second patch,
which just marks tests as "SHA1" required instead of making
the test hash-agnostic.

The first patch is just an aid to show which tests need
addressing, although there are some false positives (i.e.
the test fails for other reasons than the hash dependency,
such as corruption of data.)

Stefan 

Stefan Beller (2):
  alter hash function: expose test dependencies on sha1
  t6500: mark tests as SHA1 reliant

 sha1dc/sha1.c | 2 +-
 t/t6500-gc.sh | 6 +++---
 t/test-lib.sh | 4 
 3 files changed, 8 insertions(+), 4 deletions(-)

-- 
2.14.0.rc0.3.g6c2e499285



Re: [PATCH] hash: Allow building with the external sha1dc library

2017-07-28 Thread Ævar Arnfjörð Bjarmason
On Fri, Jul 28, 2017 at 5:58 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Tue, Jul 25, 2017 at 7:57 AM, Takashi Iwai  wrote:
>> Some distros provide SHA1 collision detect code as a shared library.
>> It's the very same code as we have in git tree, and git can link with
>> it as well; at least, it may make maintenance easier, according to our
>> security guys.
>>
>> This patch allows user to build git linking with the external sha1dc
>> library instead of the built-in sha1dc code.  User needs to define
>> DC_SHA1_EXTERNAL explicitly.  As default, the built-in sha1dc code is
>> used like before.
>
> This whole thing sounds sensible. I reviewed this (but like Junio
> haven't tested it with a lib) and I think it would be worth noting the
> following in the commit message / Makefile documentation:
>
> * The "sha1detectcoll" *.so name for the "sha1collisiondetection"
> library is not something you or suse presumably) made up, it's a name
> the sha1collisiondetection.git itself creates for its library. I think
> the Makefile docs you've added here are a bit confusing, you talk
> about the "external sha1collisiondetection library" but then link
> against sha1detectcoll". It's worth calling out this difference in the
> docs IMO. I.e. not talk about the sha1detectcoll.so library form of
> sha1collisiondetection, not the sha1collisiondetection project name as
> a library.
>
> * It might be worth noting that this is *not* linking against the same
> code we ship ourselves due to the difference in defining
> SHA1DC_INIT_SAFE_HASH_DEFAULT for the git project's needs in the one
> we build, hence your need to have a git_SHA1DCInit() wrapper whereas
> we call SHA1DCInit() directly. It might be interesting to note that
> the library version will always be *slightly* slower (although the
> difference will be trivial).
>
> * Nothing in your commit message or docs explains why DC_SHA1_LINK is
> needed. We don't have these sorts of variables for other external
> libraries we link to, why the difference?
>
> Some other things I observed:
>
> * We now have much of the same header code copy/pasted between
> sha1dc_git.h and sha1dc_git_ext.h, did you consider just always
> including the former but making what it's doing conditional on
> DC_SHA1_EXTERNAL? I don't know if it would be worth it from a cursory
> glance, but again your commit message doesn't list that among options
> considered & discarded.
>
> * I think it makes sense to spew out a "not both!" error in the
> Makefile if you set DC_SHA1_EXTERNAL=Y and DC_SHA1_SUBMODULE=Y. See my
> 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) for an
> example of how to do this.
>
> * The whole business of "#include " looks very fragile, are
> there really no other packages in e.g. suse that ship a sha1.h? Debian
> has libmd-dev that ships /usr/include/sha1.h that conflicts with this:
> https://packages.debian.org/search?searchon=contents=sha1.h=exactfilename=unstable=any
>
> Shipping a sha1.h as opposed to a sha1collisiondetection.h or
> sha1detectcoll.h or whatever seems like a *really* bad decision by
> upstream that should be the subject of at least seeing if they'll take
> a pull request to fix it before you package it or before we include
> something that'll probably need to be fixed / worked around anyway in
> Git.

I sent this last bit a tad too soon in a checkout of sha1collisiondetection.git:

$ make PREFIX=/tmp/local install >/dev/null 2>&1 && find /tmp/local/ -type f
/tmp/local/include/sha1dc/sha1.h
/tmp/local/bin/sha1dcsum
/tmp/local/bin/sha1dcsum_partialcoll
/tmp/local/lib/libsha1detectcoll.a
/tmp/local/lib/libsha1detectcoll.so.1.0.0
/tmp/local/lib/libsha1detectcoll.la

So the upstream library expects you (and it's documented in their README) to do:

#include 

But your patch is just doing:

#include 

At best this seems like a trivial bug and at worst us encoding some
Suse-specific packaging convention in git, since other distros would
presumably want to package this in /usr/include/sha1dc/sha1.h as
upstream suggests. I.e. using the ambiguous sha1.h name is not
something upstream's doing by default, it's something you're doing in
your package.

>> Signed-off-by: Takashi Iwai 
>> ---
>>  Makefile | 12 
>>  hash.h   |  4 +++-
>>  sha1dc_git_ext.c | 11 +++
>>  sha1dc_git_ext.h | 25 +
>>  4 files changed, 51 insertions(+), 1 deletion(-)
>>  create mode 100644 sha1dc_git_ext.c
>>  create mode 100644 sha1dc_git_ext.h
>>
>> diff --git a/Makefile b/Makefile
>> index 461c845d33cb..f1a262d56254 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -162,6 +162,12 @@ all::
>>  # algorithm. This is slower, but may detect attempted collision attacks.
>>  # Takes priority over other *_SHA1 knobs.
>>  #
>> +# Define DC_SHA1_EXTERNAL in addition to DC_SHA1 if you want to build / link
>> +# git with the external sha1collisiondetection library.
>> 

Re: [PATCH] hash: Allow building with the external sha1dc library

2017-07-28 Thread Ævar Arnfjörð Bjarmason
On Tue, Jul 25, 2017 at 7:57 AM, Takashi Iwai  wrote:
> Some distros provide SHA1 collision detect code as a shared library.
> It's the very same code as we have in git tree, and git can link with
> it as well; at least, it may make maintenance easier, according to our
> security guys.
>
> This patch allows user to build git linking with the external sha1dc
> library instead of the built-in sha1dc code.  User needs to define
> DC_SHA1_EXTERNAL explicitly.  As default, the built-in sha1dc code is
> used like before.

This whole thing sounds sensible. I reviewed this (but like Junio
haven't tested it with a lib) and I think it would be worth noting the
following in the commit message / Makefile documentation:

* The "sha1detectcoll" *.so name for the "sha1collisiondetection"
library is not something you or suse presumably) made up, it's a name
the sha1collisiondetection.git itself creates for its library. I think
the Makefile docs you've added here are a bit confusing, you talk
about the "external sha1collisiondetection library" but then link
against sha1detectcoll". It's worth calling out this difference in the
docs IMO. I.e. not talk about the sha1detectcoll.so library form of
sha1collisiondetection, not the sha1collisiondetection project name as
a library.

* It might be worth noting that this is *not* linking against the same
code we ship ourselves due to the difference in defining
SHA1DC_INIT_SAFE_HASH_DEFAULT for the git project's needs in the one
we build, hence your need to have a git_SHA1DCInit() wrapper whereas
we call SHA1DCInit() directly. It might be interesting to note that
the library version will always be *slightly* slower (although the
difference will be trivial).

* Nothing in your commit message or docs explains why DC_SHA1_LINK is
needed. We don't have these sorts of variables for other external
libraries we link to, why the difference?

Some other things I observed:

* We now have much of the same header code copy/pasted between
sha1dc_git.h and sha1dc_git_ext.h, did you consider just always
including the former but making what it's doing conditional on
DC_SHA1_EXTERNAL? I don't know if it would be worth it from a cursory
glance, but again your commit message doesn't list that among options
considered & discarded.

* I think it makes sense to spew out a "not both!" error in the
Makefile if you set DC_SHA1_EXTERNAL=Y and DC_SHA1_SUBMODULE=Y. See my
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) for an
example of how to do this.

* The whole business of "#include " looks very fragile, are
there really no other packages in e.g. suse that ship a sha1.h? Debian
has libmd-dev that ships /usr/include/sha1.h that conflicts with this:
https://packages.debian.org/search?searchon=contents=sha1.h=exactfilename=unstable=any

Shipping a sha1.h as opposed to a sha1collisiondetection.h or
sha1detectcoll.h or whatever seems like a *really* bad decision by
upstream that should be the subject of at least seeing if they'll take
a pull request to fix it before you package it or before we include
something that'll probably need to be fixed / worked around anyway in
Git.

> Signed-off-by: Takashi Iwai 
> ---
>  Makefile | 12 
>  hash.h   |  4 +++-
>  sha1dc_git_ext.c | 11 +++
>  sha1dc_git_ext.h | 25 +
>  4 files changed, 51 insertions(+), 1 deletion(-)
>  create mode 100644 sha1dc_git_ext.c
>  create mode 100644 sha1dc_git_ext.h
>
> diff --git a/Makefile b/Makefile
> index 461c845d33cb..f1a262d56254 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -162,6 +162,12 @@ all::
>  # algorithm. This is slower, but may detect attempted collision attacks.
>  # Takes priority over other *_SHA1 knobs.
>  #
> +# Define DC_SHA1_EXTERNAL in addition to DC_SHA1 if you want to build / link
> +# git with the external sha1collisiondetection library.
> +# Without this option, i.e. the default behavior is to build git with its
> +# own sha1dc code.  If any extra linker option is required, define them in
> +# DC_SHA1_LINK variable in addition.
> +#
>  # Define DC_SHA1_SUBMODULE in addition to DC_SHA1 to use the
>  # sha1collisiondetection shipped as a submodule instead of the
>  # non-submodule copy in sha1dc/. This is an experimental option used
> @@ -1472,6 +1478,11 @@ ifdef APPLE_COMMON_CRYPTO
> BASIC_CFLAGS += -DSHA1_APPLE
>  else
> DC_SHA1 := YesPlease
> +ifdef DC_SHA1_EXTERNAL
> +   LIB_OBJS += sha1dc_git_ext.o
> +   BASIC_CFLAGS += -DSHA1_DC -DDC_SHA1_EXTERNAL
> +   EXTLIBS += $(DC_SHA1_LINK) -lsha1detectcoll
> +else
>  ifdef DC_SHA1_SUBMODULE
> LIB_OBJS += sha1collisiondetection/lib/sha1.o
> LIB_OBJS += sha1collisiondetection/lib/ubc_check.o
> @@ -1492,6 +1503,7 @@ endif
>  endif
>  endif
>  endif
> +endif
>
>  ifdef SHA1_MAX_BLOCK_SIZE
> LIB_OBJS += compat/sha1-chunked.o
> diff --git a/hash.h b/hash.h
> index bef3e630a093..dce327d58d07 100644
> --- a/hash.h
> +++ b/hash.h
> 

Re: requesting permission to use some Git for Windows code

2017-07-28 Thread Marius Storm-Olsen

Ditto, on any of my portions, if any left :)

--
.marius

On 7/28/2017 00:31, Brandon Casey wrote:

Hi Philippe,

Please feel free to use my portions of the mentioned works under the GPLv3.

-Brandon

On Tue, Jul 25, 2017 at 6:53 AM, Johannes Schindelin
 wrote:

Hi Philippe,

I am not quite certain whether I have replied to this earlier or not.
Under the assumption that I did not, I'll send this mail; Cc:ed to the
mailing lists as discussed privately.

On Fri, 23 Jun 2017, Philippe Joyez wrote:


This message is to request the permission to use code chunks from Git
for Windows in GNU TeXmacs , to which I contribute.
The main developer of TeXmacs is Joris van der Hoeven (in cc).

Context:

Just like Git, TeXmacs originated on *nix platforms and was subsequently
ported to windows using MinGW. Naturally, some issues we have in that
port are the very same Git for Windows has faced.

One specific problem you have solved and that TeXmacs still hasn't, is
dealing with unicode filenames. By taking relevant pieces of code in Git
for windows, I could easily come up with a patch that enables TeXmacs to
handle unicode filenames in windows.

Now, the problem is that Git code is GPL V2, while TeXmacs is GPL V3:
Incorporating my patch in TeXmacs' trunk would be a violation of GPL
V2... /unless/ we are granted the permission to do so by the authors of
the code. This is precisely the reason for this message.

It is great that you can make use of the code!

As to the licensing problem, I agree it is a hassle. The biggest obstacle
is that you have to have the consent of all the authors.

You hereby have mine.


The chunks of code we would like to reuse are from these Git for Windows
files:
git-compat-util.h

This file is quite large, maybe you can cut down on the authors to contact
by restricting the `git annotate`/`git log`/`git shortlog` calls to
specific parts, using the `-L ,` option?


ctype.c

$ git shortlog -nse ctype.c
  5  Junio C Hamano 
  4  René Scharfe 
  2  Nguyễn Thái Ngọc Duy 
  1  Ben Walton 
  1  Brandon Casey 
  1  Gary V. Vaughan 
  1  Linus Torvalds 
  1  Namhyung Kim 

I *think* Ben Walton's change (189c860c9ec (kwset: use unsigned char to
store values with high-bit set, 2015-03-02)) is not copyright-able, as it
only changes the type from signed to unsigned. But I am not a lawyer ;-)

Likewise, Namhyung Kim's change (1a191a22959 (ctype.c only wants
git-compat-util.h, 2012-02-10)) only changes which header is included.
That seems to be a too-obvious/too-trivial change to me.

Also, it looks as if removing a comma as was done in 4b05548fc05 (enums:
omit trailing comma for portability, 2010-05-14) by Gary V. Vaughan would
not merit any copyright.

If in doubt, you could simply take the version of ctype.c with those
changes reverted as basis of your work.

You still have to get the consent of Junio, René, Duy, Brandon and Linus
to relicense the file's contents.


compat ¬
mingw.c

I count 35 authors other than myself for that file... Maybe you can narrow
down what you need?


mingw.h

Still 29 authors other than me...


win32.h

This is more manageable, as it only saw three authors. But then, you could
simply reimplement the functionality, it's just two functions, and I do
not think that get_file_attr() is implemented in the best way: we have a
function called err_win_to_posix() in compat/mingw.c which is much more
complete.

Having said that, err_win_to_posix() is still not implemented in the best
way. The best way is to abuse Windows' own (undocumented) _doserrmap()
function along with the information in the header files winerror.h and
errno.h to generate the mapping. Those two files, as per mingw-w64's
headers, have the very nice preamble:

 /**
  * This file has no copyright assigned and is placed in the Public 
Domain.
  * This file is part of the mingw-w64 runtime package.
  * No warranty is given; refer to the file DISCLAIMER.PD within this
  * package.
  */

Therefore, the result has no copyright assigned and is placed in the
Public Domain and we can do the very same, too.

As I wanted to have a Windows error -> errno mapping that I could
relicense as I see fit, anyway, I took this as an excellent opportunity to
generate exactly that.

Please find the header attached. Here is how I generated that header file:

-- snip --
cat >/tmp/generrmap.c <
#include 

static void map_code(unsigned long code, const char *id);

int _main(int argc, char **argv)
{
 printf("/* This file has no copyright assigned and is placed in the "
 "Public Domain. */\\n"
 "\\n"
 "#ifndef WINERR2ERRNO_H\\n"
 "#define WINERR2ERRNO_H\\n"
  

Re: [RFC PATCH 0/4] Some patches for fsck for missing objects

2017-07-28 Thread Ben Peart



On 7/27/2017 1:25 PM, Jonathan Tan wrote:

On Wed, 26 Jul 2017 23:42:38 +
"brian m. carlson"  wrote:


I looked at this and I like the direction it's going.  It's pretty
simple and straightforward, which I also like.




ditto, simple is good!


Thanks.


What I'd recommend is that instead of making lazyObject a string, we
make it an integer representing a protocol version.  We then add a
different config setting that is the actual program to invoke, using the
given protocol version.  This lets us change the way we speak to the
tool without breaking backwards compatibility, and it also allows us to
simply check the lazyObject script for supported protocols up front.


That's possible too. As for version negotiation, I think we'll end up
using a protocol similar to the clean/smudge long-running process
protocol (as documented as gitattributes), so that does not need to be
taken care of here, but making lazyObject be the version integer is fine
too.



I was also thinking the way to retrieve the missing objects be a 
versioned negotiation via the long-running process protocol.


That said, I'm a fan of including versions to make our life easier later 
if we decide to adopt an entirely different model for retrieving the 
missing objects.


Re: [RFC PATCH 2/4] fsck: support refs pointing to lazy objects

2017-07-28 Thread Ben Peart



On 7/27/2017 7:50 PM, Jonathan Tan wrote:

On Thu, 27 Jul 2017 11:59:47 -0700
Junio C Hamano  wrote:


@@ -438,6 +438,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
  
  	obj = parse_object(oid);

if (!obj) {
+   if (repository_format_lazy_object) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+   default_refs++;
+   return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;


At this point, do we know (or can we tell) if this is a missing
object or a file exists as a loose object but is corrupt?  If we
could, it would be nice to do this only for the former to avoid
sweeping a real corruption that is unrelated to the lazy fetch under
the rug.


Before all this is run, there is a check over all loose and packed
objects and I've verified that this check reports failure in
corrupt-object situations (see test below).

It is true that parse_object() cannot report the difference, but since
fsck has already verified non-corruptness, I don't think it needs to
know the difference at this point.


+test_expect_success 'fsck fails on lazy object pointed to by ref' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref, and delete it
+   git -C repo branch mybranch "$A" &&
+   delete_object repo "$A" &&
+
+   test_must_fail git -C repo fsck
+'


And a new test that uses a helper different from delete_object
(perhaps call it corrupt_object?) can be used to make sure that we
complain in that case here.


I agree that object corruption can cause this specific part of the
production code to falsely work. But I think that this specific part of
the code can and should rely on object corruption being checked
elsewhere. (I usually don't like to assume that other components work
and will continue to work, but in this case, I think that fsck checking
for object corruption is very foundational and should be relied upon.)

But if we think that defense "in depth" is a good idea, I have no
problem adding such tests (like the one below).



It's good to know that object corruption is already being checked 
elsewhere in fsck.  I agree that you should be able to rely that as long 
as it is guaranteed to run.  I'd rather see a single location in the 
code that has the logic to detect corrupted objects rather than two 
copies (or worse, two different versions).


Are there also tests for validating the object corruption detection 
code?  If not, adding some (like below) would be great.




---
delete_object () {
rm $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40)
}

corrupt_object () {
chmod a+w $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40) 
&&
echo CORRUPT >$1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut 
-c3-40)
}

setup_object_in_reflog () {
rm -rf repo &&
test_create_repo repo &&
test_commit -C repo 1 &&

A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
B=$(git -C repo commit-tree -m b HEAD^{tree}) &&

# Reference $A only from reflog
git -C repo branch mybranch "$A" &&
git -C repo branch -f mybranch "$B"
}

test_expect_success 'lazy object in reflog' '
setup_object_in_reflog &&
delete_object repo "$A" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
git -C repo fsck
'

test_expect_success 'corrupt loose object in reflog' '
setup_object_in_reflog &&
corrupt_object repo "$A" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
test_must_fail git -C repo fsck
'

test_expect_success 'missing packed object in reflog' '
setup_object_in_reflog &&
git -C repo repack -a &&
delete_object repo "$A" &&
chmod a+w repo/.git/objects/pack/*.pack &&
echo CORRUPT >"$(echo repo/.git/objects/pack/*.pack)" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
test_must_fail git -C repo fsck
'



Re: [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-28 Thread Ben Peart



On 7/27/2017 2:55 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage.



I very much like the direction this is taking.  Handling missing objects 
gracefully is an important part of the overall partial clone support.



Introduce a new repository extension option "extensions.lazyobject", of
data type string. If this is set in a repository, Git will tolerate
objects being missing in that repository.  When Git needs those objects,
it will invoke the command in that option.




I'm very glad you are doing this.  Having never used precious objects I 
was unaware of the extensions concept but it looks like a great way to 
deal with this repo specific option.



My reading hiccupped after the first sentence, as the problem
description made it sound like this was a boolean ("are we using
lazy object feature?"), after reading "data type string".  And then
"the command in that option" made me hiccup one more time, as it did
not "click" that "in that option" was trying to say that the string
is used as the command name (or is it a whole command line?  The
leading part of the command line to which some arguments are
appended before it gets invoked as a command? or what?).

Logically, I think it is more like

  - extensions.lazyobject can be set to tell Git to consider missing
objects in certain cases are not errors;

  - the value of extensions.lazyobject variable must be a string,
which is used to name the command to lazily make the object
"appear" in the repository on demand.


Teach fsck about the new state of affairs. In this commit, teach fsck
that missing objects referenced from the reflog are not an error case;
in future commits, fsck will be taught about other cases.


In any case, sounds like a small and good first step.



I agree completely with the feedback on the description.  That is quite 
the run on sentence. :)



+
+`lazyObject`
+~
+
+When the config key `extensions.lazyObject` is set to a string, Git
+tolerates objects being missing in the repository. This string contains
+the command to be run whenever a missing object is needed.


This has the same issue but to a lessor degree as there is "string
contains".  What the command will do (e.g. "makes the object
magically appear in the object store" or "emits the contents of the
object to its standard output, so that Git can hash it to make it
appear in the object store"), and how it is used (e.g. "this is a
single command name and nothing else", "this is a leading part of
command line and arguments are appended at the end, before the whole
thing is passed to the shell to be executed", etc.) will need to be
clarified in the final version of the series (not necessarily at
this step---the series can elaborate in the later patches).


diff --git a/builtin/fsck.c b/builtin/fsck.c
index fb0947009..1cfb8d98c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -402,7 +402,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!repository_format_lazy_object) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index 6c8242340..9e6bc0a21 100644
--- a/cache.h
+++ b/cache.h
@@ -853,10 +853,12 @@ extern int grafts_replace_parents;
  #define GIT_REPO_VERSION 0
  #define GIT_REPO_VERSION_READ 1
  extern int repository_format_precious_objects;
+extern char *repository_format_lazy_object;


This is not a new problem, but I think these two should be
called repository_extension_$NAME not repository_format_$NAME.


diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
new file mode 100755
index 0..36442531f
--- /dev/null
+++ b/t/t0410-lazy-object.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='lazy object'
+
+. ./test-lib.sh
+
+delete_object () {
+   rm $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40)
+}
+
+test_expect_success 'fsck fails on lazy object in reflog' '
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+   B=$(git -C repo commit-tree -m b HEAD^{tree}) &&
+
+   # Reference $A only from reflog, and delete it
+   git -C repo branch mybranch "$A" &&
+   git -C repo branch -f mybranch "$B" &&
+   delete_object repo "$A" &&

Dear Talented

2017-07-28 Thread Blue Sky Studio
Dear Talented,

I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue Sky Studio a Film 
Corporation Located in the United State, is Soliciting for the Right to use 
Your Photo/Face and Personality as One of the Semi -Major Role/ Character in 
our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) 
The Movie is Currently Filming (In Production) Please Note That There Will Be 
No Auditions, Traveling or Any Special / Professional Acting Skills, Since the 
Production of This Movie Will Be Done with our State of Art Computer 
-Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of 
$620,000.00 USD.For More Information/Understanding.

Talent Scout
Kim Sharma


Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-28 Thread Michael Haggerty
On Thu, Jul 27, 2017 at 12:40 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> On Thu, Jul 27, 2017 at 10:19:48AM -0700, Junio C Hamano wrote:
>>
>>> Makes sense.  Makes me wonder why we need a separate .new file
>>> (instead of writing into the .lock instead), but that is a different
>>> issue.
>>
>> It comes from 42dfa7ece (commit_packed_refs(): use a staging file
>> separate from the lockfile, 2017-06-23). That commit explains that we
>> want to be able to put the new contents into service before we release
>> the lock. But it doesn't say why that's useful.
>
> By being able to hold the lock on packed-refs longer, I guess
> something like this becomes possible:
>
>  * hold the lock on packed-refs
>  * hold the lock on loose ref A, B, C, ...
>  * update packed-refs to include the freshest values of these refs
>  * start serving packed-refs without releasing the lock
>  * for X in A, B, C...: delete the loose ref X and unlock X
>  * unlock the packed-refs

Hmmm, I thought that I explained somewhere why this is useful, but I
can't find it now.

The code even after this patch series is

Prepare:
1. lock loose refs
Finish:
2. perform creates and updates of loose refs, releasing their locks as we go
3. perform deletes of loose refs
4. lock packed-refs file
5. repack_without_refs() (delete packed versions of refs to be deleted)
6. commit new packed-refs file (which also unlocks it)
7. release locks on loose refs that were deleted

This is problematic because after a loose reference is deleted, it
briefly (between steps 3 and 5) reveals any packed version of the
reference, which might even point at a commit that has been GCed. It
is even worse if the attempt to acquire the packed refs lock in step 4
fails, because then the repo could be left in that broken state.

Clearly we can avoid the second problem by acquiring the packed-refs
lock during the prepare phase. You might also attempt to fix the first
problem by deleting the references from the packed-refs file before we
delete the corresponding loose references:

Prepare:
1. lock loose refs
2. lock packed-refs file
Finish:
3. perform creates and updates of loose refs, releasing their locks as we go
4. repack_without_refs() (delete packed versions of refs to be deleted)
5. commit new packed-refs file (which also unlocks it)
6. perform deletes of loose refs
7. release locks on loose refs that were deleted

But now we have a different problem. pack-refs does the following:

A. lock packed-refs file
B. read loose refs
C. commit new packed-refs file (which also unlocks it)
Then, for each loose reference that should be pruned, delete it in a
transaction with REF_ISPRUNING, which means:
D. lock the loose reference
E. lock packed-refs file
F. check that the reference's value is the same as the value just
written to packed-refs
G. delete the loose ref file if it still exists
H. unlock the loose reference

If a reference deletion is running at the same time as a pack-refs,
you could have a race where steps A and B of a pack-refs run between
steps 5 and 6 of a reference deletion. The pack-refs would see the
loose reference and would pack it into the packed-refs file, leaving
the reference at its old value even though the user was told that the
reference was deleted.

If a new packed-refs file can be activated while retaining the lock on
the file, then a reference update could keep the packed-refs file
locked during step 6 of the second algorithm, which would block step A
of pack-refs from beginning.

I have the feeling that there might be other, more improbably races
lurking here, too.

The eternal problems with races in the files backend is a main reason
that I'm enthusiastic about the reftable proposal.

Michael