Niedriges prozentuales Kreditangebot

2018-03-09 Thread CREDIT SUISSE


Sehr geehrter Herr / Frau

Mein Name ist Catherine Graf von CREDIT SUISSE, Bedarfskredit oder Finanzierung 
fr Investitionen, Projektentwicklungsfinanzierung, 
Geschftserweiterung, Schuldenfinanzierung Suche ist hier nicht der 
richtige Ort, um Geld zu bekommen. Der Zinssatz betrgt 1%. Sollten Sie an 
einer unserer finanziellen Mglichkeiten interessiert sein, fllen 
Sie bitte das folgende Formular aus.

Vor-und Nachname:
Der gewnschte Kreditbetrag:
Sexuell:
Zeitraum:

Freundliche Gre,
Katharina Graf
Direktor
E-Mail: creditsuisselo...@outlook.com


Git Merge contributor summit notes

2018-03-09 Thread Alex Vandiver
It was great to meet some of you in person!  Some notes from the
Contributor Summit at Git Merge are below.  Taken in haste, so
my apologies if there are any mis-statements.

 - Alex




  "Does anyone think there's a compelling reason for git to exist?"
- peff


Partial clone (Jeff Hostetler / Jonathan Tan)
-
 - Request that the server not send everything
 - Motivated by getting Windows into git
 - Also by not having to fetch large blobs that are in-tree
 - Allows client to request a clone that excludes some set of objects, with 
incomplete packfiles
 - Decoration on objects that include promise for later on-demand backfill
 - In `master`, have a way of:
   - omitting all blobs
   - omitting large blobs
   - sparse checkout specification stored on server
 - Hook in read_object to fetch objects in bulk

 - Future work:
   - A way to fetch blobsizes for virtual checkouts
   - Give me new blobs that this tree references relative to now
   - Omit some subset of trees
   - Modify other commits to exclude omitted blobs
   - Protocol v2 may have better verbs for sparse specification, etc

Questions:
 - Reference server implementation?
   - In git itself
   - VSTS does not support
 - What happens if a commit becomes unreachable?  Does promise still apply?
   - Probably yes?
   - If the promise is broken, probably crashes
   - Can differentiate between promise that was made, and one that wasn't
   => Demanding commitment from server to never GC seems like a strong promise
 - Interactions with external object db
   - promises include bulk fetches, as opposed to external db, which is 
one-at-a-time
   - dry-run semantics to determine which objects will be needed
   - very important for small objects, like commits/trees (which is not in 
`master`, only blobs)
   - perhaps for protocol V2
 - server has to promise more, requires some level of online operation
   - annotate that only some refs are forever?
   - requires enabling the "fetch any SHA" flags
   - rebasing might require now-missing objects?
 - No, to build on them you must have fetched them
 - Well, building on someone else's work may mean you don't have all of them
   - server is less aggressive about GC'ing by keeping "weak references" when 
there are promises?
   - hosting requires that you be able to forcibly remove information
 - being able to know where a reference came from?
   - as being able to know why an object was needed, for more advanced logic
 - Does `git grep` attempt to fetch blobs that are deferred?
   - will always attempt to fetch
   - one fetch per object, even!
   - might not be true for sparse checkouts
   - Maybe limit to skipping "binary files"?
   - Currently sparse checkout grep "works" because grep defaults to looking at 
the index, not the commit
   - Does the above behavior for grepping revisions
   - Don't yet have a flag to exclude grep on non-fetched objects
   - Should `git grep -L` die if it can't fetch the file?
   - Need a config option for "should we die, or try to move on"?
 - What's the endgame?  Only a few codepaths that are aware, or threaded 
through everywhere?
   - Fallback to fetch on demand means there's an almost-reasonable fallback
   - Better prediction with bulk fetching
   - Are most commands going to _need_ to be sensitive to it?
   - GVFS has a caching server in the building
   - A few git commands have been disabled (see recent mail from Stolee); those 
are likely candidates for code that needs to be aware of de-hydrated objects
 - Is there an API to know what objects are actually local?
   - No external API
   - GVFS has a REST API
 - Some way to later ask about files?
   - "virtualized filesystem"?
   - hook to say "focus on this world of files"
   - GVFS writes out your index currently
 - Will this always require turning off reachability checks?
   - Possibly
 - Shallow clones, instead of partial?
   - Don't download the history, just the objects
   - More of a protocol V2 property
   - Having all of the trees/commits make this reasonable
 - GVFS vs this?
   - GVFS was a first pass
   - Now trying to mainstream productize that
   - Goal is to remove features from GVFS and replace with this

Protocol V2 (Brandon)

 - Main problem is that forward compatibility negotiation wasn't possible
 - Found a way to sneak in the V2 negotiation via side-channel in all transports
 - "environment variable" GIT_PROTOCOL which server can detect
 - Ability to transmit and ignore, or not transmit, means forward/backward 
compat
 - HTTP header / environment variable
 - ...s now what?
 - Keep as similar as possible, but more layout changes to remove bad 
characteristics
 - Like fixing flush semantics
 - Remove ref advertisement (250M of refs every fetch from Android!)
 - Capabilities are currently in first packet, 1K limit
 - First response is capabilities from the server, 

Re: [PATCH/RFC v3 07/12] pack-objects: move in_pack out of struct object_entry

2018-03-09 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
> pack. Use an index isntead since the number of packs should be
> relatively small.
>
> This limits the number of packs we can handle to 16k. For now if you hit
> 16k pack files limit, pack-objects will simply fail [1].
>
> This technically saves 7 bytes. But we don't see any of that in
> practice due to padding. The saving becomes real when we pack this
> struct tighter later.

Somehow 7 and 16k do not add up.

We use 8 bytes in the original code, and a solution that potentially
saves 7 bytes would use only 1 byte instead of the original 8, which
would allow us to index/identify 1<<8 == 256 packs, but for some reason
we can handle up to 16k.

> [1] The escape hatch is .keep file to limit the non-kept pack files
> below 16k limit. Then you can go for another pack-objects run to
> combine another 16k pack files. Repeat until you're satisfied.

;-)

> +static inline unsigned int oe_add_pack(struct packing_data *pack,
> +struct packed_git *p)
> +{
> + if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
> + die(_("too many packs to handle in one go. "
> +   "Please add .keep files to exclude\n"
> +   "some pack files and keep the number "
> +   "of non-kept files below %d."),
> + 1 << OE_IN_PACK_BITS);

OK.


Re: [PATCH/RFC v3 04/12] pack-objects: use bitfield for object_entry::depth

2018-03-09 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> + Maximum value is 4095.
>   ...
> + if (depth > (1 << OE_DEPTH_BITS))
> + die(_("delta chain depth %d is greater than maximum limit %d"),
> + depth, (1 << OE_DEPTH_BITS));
> +

Do I see an off-by-one here?  Ie.

if ((1 << OE_DEPATH_BITS) <= depth)
die("depth that is 4096 or deeper is too much");



Re: Is context/directory sensitive config possible or even a god idea?

2018-03-09 Thread Junio C Hamano
Eddy Petrișor  writes:

> When I saw he include.path feature ...

Do you mean includeIf..path?


Re: [PATCH/RFC v3 02/12] pack-objects: turn type and in_pack_type to bitfields

2018-03-09 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> @@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry 
> *entry)
>   entry->depth = 0;
>  
>   oi.sizep = >size;
> - oi.typep = >type;
> + oi.typep = 
>   if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) 
> {
>   /*
>* We failed to get the info from this pack for some reason;
> @@ -1580,6 +1586,10 @@ static void drop_reused_delta(struct object_entry 
> *entry)
>*/
>   entry->type = sha1_object_info(entry->idx.oid.hash,
>  >size);

The comment immediately before this pre-context reads as such:

/*
 * We failed to get the info from this pack for some reason;
 * fall back to sha1_object_info, which may find another copy.
 * And if that fails, the error will be recorded in entry->type
 * and dealt with in prepare_pack().
 */

The rest of the code relies on the ability of entry->type to record
the error by storing an invalid (negative) type; otherwise, it cannot
detect an error where (1) the entry in _this_ pack was corrupt, and
(2) we wished to find another copy of the object elsewhere (which
would overwrite the negative entry->type we assign here), but we
didn't find any.

How should we propagate the error we found here down the control
flow in this new code?

> + } else {
> + if (type < 0)
> + die("BUG: invalid type %d", type);
> + entry->type = type;

The BUG() on this side is sensible, as packed_object_info()
shouldn't report success when it stored negative result in *oi.typep
anyway.

>   unsigned char in_pack_header_size;
> + unsigned type:TYPE_BITS;
> + unsigned in_pack_type:TYPE_BITS; /* could be delta */



Re: [PATCH/RFC v3 01/12] pack-objects: a bit of document about struct object_entry

2018-03-09 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> The role of this comment block becomes more important after we shuffle
> fields around to shrink this struct. It will be much harder to see what
> field is related to what. This also documents the holes in this struct
> according to pahole.
>
> A couple of notes on shrinking the struct:
>
> 1) The reader may notice one thing from this document and the shrinking
> business. If "delta" is NULL, all other delta-related fields should be
> irrelevant. We could group all these in a separate struct and replace
> them all with a pointer to this struct (allocated separately).
>
> This does not help much though since 85% of objects are deltified
> (source: linux-2.6.git). The gain is only from non-delta objects, which
> is not that significant.

OK.

> 2) The field in_pack_offset and idx.offset could be merged. But we need
> to be very careful. Up until the very last phase (object writing),
> idx.offset is not used and can hold in_pack_offset. Then idx.offset will
> be updated with _destination pack's_ offset, not source's. But since we
> always write delta's bases first, and we only use in_pack_offset in
> writing phase when we reuse objects, we should be ok?

By separating the processing in strict phases, I do think the result
would be OK, but at the same time, that does smell like an
invitation for future bugs.

> +/*
> + * basic object info
> + * -
> + * idx.oid is filled up before delta searching starts. idx.crc32 and
> + * is only valid after the object is written down and will be used for
> + * generating the index. idx.offset will be both gradually set and
> + * used in writing phase (base objects get offset first, then deltas
> + * refer to them)

Here, I'd feel that "written out" somehow would sound more natural
than "written down", but that is perhaps because I've seen it used
elsewhere and I am confusing familiarlity with naturalness.  In any
case, if we mean "written to the resulting packdata stream", saying
that to be more explicit is probably a good idea.  We compute crc32
and learn the offset for each object as we write them to the result.

> + * If a delta is cached in memory and is compressed, "delta" points to
> + * the data and z_delta_size contains the compressed size. If it's

Isn't it "delta_data" (aot "delta") that points at the cached delta
data?

> + * uncompressed [1], z_delta_size must be zero. delta_size is always
> + * the uncompressed size and must be valid even if the delta is not
> + * cached. Delta recreation technically only depends on "delta"
> + * pointer, but delta_size is still used to verify it's the same as
> + * before.
> + *
> + * [1] during try_delta phase we don't bother with compressing because
> + * the delta could be quickly replaced with a better one.
> + */
>  struct object_entry {
>   struct pack_idx_entry idx;
>   unsigned long size; /* uncompressed size */
> @@ -28,6 +74,7 @@ struct object_entry {
>   unsigned tagged:1; /* near the very tip of refs */
>   unsigned filled:1; /* assigned write-order */
>  
> + /* XXX 28 bits hole, try to pack */
>   /*
>* State flags for depth-first search used for analyzing delta cycles.
>*
> @@ -40,6 +87,7 @@ struct object_entry {
>   DFS_DONE
>   } dfs_state;
>   int depth;
> + /* size: 136, padding: 4 */
>  };
>  
>  struct packing_data {


Is context/directory sensitive config possible or even a god idea?

2018-03-09 Thread Eddy Petrișor
Hello,

In some situations one person could be involved in various
multi-repository projects and need to use different user.email
information, depending on the project.

A simple example can be using the project specific email identity for
all repositories part of Android, while using the personal email for
personal projects on github.

The developer has currently only 2 options, AFAICT:
a) use --global identity for the personal email and set the Android
identity in all project repos
b) use the reverse, store Android identity in --global gitconfig, and
make sure to set the identity at repo level for personal projects.

Both solutions are suboptimal, especially if both the project in
question has many repositories and the number github projects is high,
because the user must remember to set the non-global user identity
either in all github repos or in each Android repos, after checkout.

In my view, there should be a middle ground, having a
context/directory dependent setting could be a solution.

My idea is somehting like "for all repos cloned in subdirs of
directory ~/src/android, use identity johm@android.org", so the
only thing to remember to make this work is to make sure all android
checkouts are done under ~/src/android, which most people already do
because they need a clean home dir.

When I saw he include.path feature I thought this was exactly that,
but when testing it did not work as I would have liked, the inluded
config is simply added to the global config, if I add this to the
~/.gitconfig file and the target file exists.

I understand the use case I am thinking of is not the one that was
addressed by the include.file feature, so I am thinking git would need
some other setting like
'contextsensitive.include=/home/johndoe/src/android/.gitconfig' which
could mean include the settings from this config is we're in a dir on
the same level a the file, or a directory which has that dir as parent
or ancestor.


What I see already:
1) it would be a perfomance hit if the test for "do we need to apply
any context sensitive setting" must be done on every git command; for
the identity usecase might be less of a problem if only the subcomands
that need user.* would compare the prefix of the current dir with all
all contextsensitive.inludes settings and applying only the longest
one
2) what if the contextsensitive.inluclude file also includes another
contextsensitive.include, could that lead to ambiguous situations?
3) having the feature could allow the user to set other project
specific settings. Nice usecases:
if Android uses --no-ff merge.ff=false in ~/src/andoid/.gitconfig
contextsensitive.inlclude would ensure the expected strategy is used,
and rebase is the preferred behaviour for pull, instead of merge)


-- 
Eddy Petrișor


Re: git stash push -u always warns "pathspec '...' did not match any files"

2018-03-09 Thread Junio C Hamano
Marc Strapetz  writes:

> Thanks, I can confirm that the misleading warning message is fixed.
>
> What I've noticed now is that when using -u option, Git won't warn if
> the pathspec is actually not matching a file. Also, an empty stash may
> be created.

S..., does it mean that the patch Thomas posted and you
confirmed trades one issue with another issue with a similar
graveness?  Is this something we want to "fix" without adding
another breakage?



Re: [PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-09 Thread Eric Sunshine
On Fri, Mar 9, 2018 at 3:22 PM, Junio C Hamano  wrote:
> Eric Sunshine  writes:
>> On Fri, Mar 9, 2018 at 12:35 PM,   wrote:
>>> /* Don't encode to the default encoding */
>>> -   if (!strcasecmp(value, default_encoding))
>>> +   if (is_encoding_utf8(value) && is_encoding_utf8(default_encoding))
>>> return NULL;
>>
>> This change belongs in 6/10, not 10/10, methinks.
>
> It is actually worse than that, no?  When default_encoding is
> (somehow) configured not to be UTF-8, e.g. "Shift_JIS", we used to
> avoid converting from Shift_JIS to Shift_JIS, but the optimization
> no longer happens with this code.
>
> In any case, I think same_encoding() is probably a good thing to use
> here at step 6/10, so the point is moot, I guess.

Agreed, same_encoding() would be superior.


Re: [PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-09 Thread Junio C Hamano
Eric Sunshine  writes:

> On Fri, Mar 9, 2018 at 12:35 PM,   wrote:
>> [...]
>> Add 'core.checkRoundtripEncoding', which contains a comma separated
>> list of encodings, to define for what encodings Git should check the
>> conversion round trip if they are used in the 'working-tree-encoding'
>> attribute.
>> [...]
>> Signed-off-by: Lars Schneider 
>> ---
>> diff --git a/convert.c b/convert.c
>> @@ -1150,7 +1227,7 @@ static const char *git_path_check_encoding(struct 
>> attr_check_item *check)
>> /* Don't encode to the default encoding */
>> -   if (!strcasecmp(value, default_encoding))
>> +   if (is_encoding_utf8(value) && is_encoding_utf8(default_encoding))
>> return NULL;
>
> This change belongs in 6/10, not 10/10, methinks.

It is actually worse than that, no?  When default_encoding is
(somehow) configured not to be UTF-8, e.g. "Shift_JIS", we used to
avoid converting from Shift_JIS to Shift_JIS, but the optimization
no longer happens with this code.

In any case, I think same_encoding() is probably a good thing to use
here at step 6/10, so the point is moot, I guess.


Re: [PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-09 Thread Eric Sunshine
On Fri, Mar 9, 2018 at 12:35 PM,   wrote:
> [...]
> Add 'core.checkRoundtripEncoding', which contains a comma separated
> list of encodings, to define for what encodings Git should check the
> conversion round trip if they are used in the 'working-tree-encoding'
> attribute.
> [...]
> Signed-off-by: Lars Schneider 
> ---
> diff --git a/convert.c b/convert.c
> @@ -1150,7 +1227,7 @@ static const char *git_path_check_encoding(struct 
> attr_check_item *check)
> /* Don't encode to the default encoding */
> -   if (!strcasecmp(value, default_encoding))
> +   if (is_encoding_utf8(value) && is_encoding_utf8(default_encoding))
> return NULL;

This change belongs in 6/10, not 10/10, methinks.


Re: [PATCH v3] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Junio C Hamano
Takuto Ikuta  writes:

> In repository having large number of remote refs, because to check

Isn't this "When fetching from a repository with large number of
refs,"?  The number of refs (whether it is local or remote-tracking)
the local side has has nothing to do with the issue you are
addressing, no?

> existence of each refs in local repository, 'git fetch' ends up doing a
> lot of lstat(2) calls to see if it exists in loose form, which makes it
> slow.

Other than that, the above description reads much better and makes
the result easier to understand.

> This patch enumerates loose objects in hashmap beforehand and uses it to
> check existence instead of using lstat(2) to improve performance of
> fetch-pack for repositories having large number of remote refs compared
> to the number of loose objects.

We'd rather write this paragraph as if giving an order to the
codebase "to be like so", e.g.

Instead of making as many lstat(2) calls as the refs the
remote side advertised to see if these objects exist in the
loose form, first enumerate all the existing loose objects
in hashmap beforehand and use it to check existence of
them...

> I took time stat of `git fetch` disabling quickfetch, so that fetch-pack

I still do not know if a benchmark with quickfetch disabled gives
relevant numbers, for reasons I gave earlier.  The relative numbers
between Linux and MacBook look quite convincing, as they illustrate
differences of lstat(2) performance on these platforms.

>   for (ref = *refs; ref; ref = ref->next) {
>   struct object *o;
> + unsigned int flags = OBJECT_INFO_QUICK;
>  
> - if (!has_object_file_with_flags(>old_oid,
> - OBJECT_INFO_QUICK))
> - continue;
> + if (!oidset_contains(_oid_set, >old_oid)) {
> + /* I know this does not exist in the loose form,
> +  * so check if it exists in a non-loose form.
> +  */

/*
 * Our multi-line comment looks like this,
 * with opening slash-asterisk and closing
 * asterisk-slash on their own lines.
 */

Thanks.


Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Junio C Hamano
Junio C Hamano  writes:

>> Yes. But I think the default limit for the number of loose objects, 7000,
>> gives us small overhead when we do enumeration of all objects.
>
> Hmph, I didn't see the code that does the estimation of loose object
> count before starting to enumerate, though.

Another thing the code could do to avoid negative consequences on
projects that look quite different from yours (e.g. the other side
does not have insane number of refs, but there are locally quite a
few loose objects) is to count how many entries are on *refs list
before we decide to enumerate all loose objects.  When the refs list
is relatively shorter than the estimated number of loose objects
(you can actually do the estimation based on sampling, or just rely
on your assumed 7k), it may be a win _not_ to trigger the new code
you are adding to this codepath with this patch.  I would imagine
that the simplest implementaion may just count 

for (ref = *refs, count = 0; ref && count++ < LIMIT; ref = ref->next)
;
use_oidset_optim = (LIMIT <= count);

assuming your "up to 7k loose objects" and then experimenting to
determine the LIMIT which is a rough number of refs that makes the
oidset optimization worthwhile.

We need a bit better/descriptive name for the LIMIT, if we go that
route, though.

Thanks.


 


Re: [PATCH v11 08/10] convert: advise canonical UTF encoding names

2018-03-09 Thread Junio C Hamano
lars.schnei...@autodesk.com writes:

> From: Lars Schneider 
>
> The canonical name of an UTF encoding has the format UTF, dash, number,
> and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE).
> Some iconv versions support alternative names without a dash or with
> lower case characters.
>
> To avoid problems between different iconv version always suggest the
> canonical UTF names in advise messages.
>
> Signed-off-by: Lars Schneider 
> ---

I think it is probably better to squash this to earlier step,
i.e. jumping straight to the endgame solution.

> diff --git a/convert.c b/convert.c
> index b80d666a6b..9a3ae7cce1 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -279,12 +279,20 @@ static int validate_encoding(const char *path, const 
> char *enc,
>   "BOM is prohibited in '%s' if encoded as %s");
>   /*
>* This advice is shown for UTF-??BE and UTF-??LE 
> encodings.
> +  * We cut off the last two characters of the encoding 
> name
> +  # to generate the encoding name suitable for BOMs.
>*/

I somehow thought that I saw "s/#/*/" in somebody's response during
the previous round?

>   const char *advise_msg = _(
>   "The file '%s' contains a byte order "
> - "mark (BOM). Please use %.6s as "
> + "mark (BOM). Please use UTF-%s as "
>   "working-tree-encoding.");
> - advise(advise_msg, path, enc);
> + const char *stripped = "";
> + char *upper = xstrdup_toupper(enc);
> + upper[strlen(upper)-2] = '\0';
> + if (!skip_prefix(upper, "UTF-", ))
> + skip_prefix(stripped, "UTF", );
> + advise(advise_msg, path, stripped);
> + free(upper);

If this codepath is ever entered with "enc" that does not begin with
"UTF" (e.g. "Shift_JIS", which is impossible in the current code,
but I'll talk about future-proofing here), then neither of these
skip_prefix will trigger, and then you'd end up suggesting to use
"UTF-" that is nonsense.  Perhaps initialize stripped to NULL and
force advise to segv to catch such a programmer error?


Re: [PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Junio C Hamano
lars.schnei...@autodesk.com writes:

> + const char *advise_msg = _(
> + "The file '%s' contains a byte order "
> + "mark (BOM). Please use %.6s as "
> + "working-tree-encoding.");

I know that this will go away in a later step, but why ".6"?

> + advise(advise_msg, path, enc);


Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-09 Thread Junio C Hamano
lars.schnei...@autodesk.com writes:

> +static const char *default_encoding = "UTF-8";
> +
> ...
> +static const char *git_path_check_encoding(struct attr_check_item *check)
> +{
> + const char *value = check->value;
> +
> + if (ATTR_UNSET(value) || !strlen(value))
> + return NULL;
> +
> + if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
> + error(_("working-tree-encoding attribute requires a value"));
> + return NULL;
> + }

Hmph, so we decide to be loud but otherwise ignore an undefined
configuration?  Shouldn't we rather die instead to avoid touching
the user data in unexpected ways?

> +
> + /* Don't encode to the default encoding */
> + if (!strcasecmp(value, default_encoding))
> + return NULL;

Is this an optimization to avoid "recode one encoding to the same
encoding" no-op overhead?  We already have the optimization in the
same spirit in may existing codepaths that has nothing to do with
w-t-e, and I think we should share the code.  Two pieces of thought
comes to mind.

One is a lot smaller in scale: Is same_encoding() sufficient for
this callsite instead of strcasecmp()?

The other one is a lot bigger: Looking at all the existing callers
of same_encoding() that call reencode_string() when it returns false,
would it make sense to drop same_encoding() and move the optimization
to reencode_string() instead?

I suspect that the answer to the smaller one is "yes, and even if
not, it should be easy to enhance/extend same_encoding() to make it
do what we want it to, and such a change will benefit even existing
callers."  The answer to the larger one is likely "the optimization
is not about skipping only reencode_string() call but other things
are subtly different among callers of same_encoding(), so such a
refactoring would not be all that useful."

The above still holds for the code after 10/10 touches this part.


Re: [PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-09 Thread Ævar Arnfjörð Bjarmason

On Fri, Mar 09 2018, Junio C. Hamano jotted:

> Lars Schneider  writes:
>
>> I think following the boost lib makes most sense. Therefore,
>> I would like to go with "istarts_with". OK with you?
>
> I don't care too deeply; if we took starts_with() from there, where
> what we now want is defined as istarts_with(), then that sounds like
> a good thing to do.

I don't care either, but just a note that we had this exact discussion
around this time last year when I added a starts_with_icase() [1][2]
which eventually got dropped.

1. https://public-inbox.org/git/xmqqpohao2hw@gitster.mtv.corp.google.com/
2. https://public-inbox.org/git/20170326121654.22035-4-ava...@gmail.com/


Re: [PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Lars Schneider

> On 09 Mar 2018, at 20:00, Junio C Hamano  wrote:
> 
> lars.schnei...@autodesk.com writes:
> 
>> +const char *advise_msg = _(
>> +"The file '%s' contains a byte order "
>> +"mark (BOM). Please use %.6s as "
>> +"working-tree-encoding.");
> 
> I know that this will go away in a later step, but why ".6"?

I deleted the original comment in the rebase, sorry:

/*
 * This advice is shown for UTF-??BE and UTF-??LE
 * encodings. We truncate the encoding name to 6
 * chars with %.6s to cut off the last two "byte
 * order" characters.
 */

- Lars


Re: [PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Junio C Hamano
lars.schnei...@autodesk.com writes:

> + const char *advise_msg = _(
> + "The file '%s' contains a byte order "
> + "mark (BOM). Please use %.6s as "
> + "working-tree-encoding.");

I know that this will go away in a later step, but why ".6"?

> + advise(advise_msg, path, enc);


Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Junio C Hamano
Takuto Ikuta  writes:

> Yes, I just wanted to say 'git fetch' invokes fetch-pack.
> fetch-pack is skipped when running git fetch repeatedly while
> remote has no update by quickfetch. So I disabled it to see the
> performance of fetch-pack. In chromium repository, master branch
> is updated several times in an hour, so git fetch invokes fetch-pack
> in such frequency.

I do understand that if you run "git fetch" against the same place
in quick succession three times, the first one may cost a lot (not
just that it has to do the everything_local() computation that you
care about, it also has to actually do the network transfer), while
the second and third ones that are done before anything new happens
on the other end will not involve everything_local() overhead thanks
to quickfetch.

A "fetch" that is run against a remote that has nothing new, but
still triggers everything_local() only because quickfetch is
disabled, is an artificial case that has no relevance to the real
world, I suspect, because the quickfetch optimization is to solve
the "there is nothing to be done, still do_fetch_pack() spends so
much cycles only to realize that it does not have anything to do,
why?" issue.

Isn't running the "git fetch" command with the "--dry-run" option
many times in quick succession a lot closer to what you really want
to measure, I wonder?  That way, your first fetch won't be touching
the state of the local side to affect your second and subsequent
fetches.

>> In any case, do_fetch_pack() tries to see if all of the tip commits
>> we are going to fetch exist locally, so when you are trying a fetch
>> that grabs huge number of refs (by the way, it means that the first
>> sentence of the proposed log message is not quite true---it is "When
>> fetching a large number of refs", as it does not matter how many
>> refs _we_ have, no?), everything_local() ends up making repeated
>> calls to has_object_file_with_flags() to all of the refs.
>
> I fixed description by changing 'refs' to 'remote refs'. In my understanding,
> git tries to check existence of remote refs even if we won't fetch such refs.

During fetch, everything_local() tries to mark common part by
walking the refs the other side advertised upon the first contact,
so it is correct that the number of checks is not reduced in a fetch
that does not fetch many refs, but the number of remote-tracking refs
you have has no effect, so I doubt such a rephrasing would make the
description more understandable.  "When fetching from a repository
with large number of refs" is probably what you want to say, no?

> Yes. But I think the default limit for the number of loose objects, 7000,
> gives us small overhead when we do enumeration of all objects.

Hmph, I didn't see the code that does the estimation of loose object
count before starting to enumerate, though.

>> Hmm, OBJECT_INFO_QUICK optimization was added in dfdd4afc
>> ("sha1_file: teach sha1_object_info_extended more flags",
>> 2017-06-21), but since 8b4c0103 ("sha1_file: support lazily fetching
>> missing objects", 2017-12-08) it appears that passing
>> OBJECT_INFO_QUICK down the codepath does not do anything
>> interesting.  Jonathan (cc'ed), are all remaining hits from "git
>> grep OBJECT_INFO_QUICK" all dead no-ops these days?
>
> Yes the flag is no-op now, but let me untouched the flag in this patch.

Yeah, I do not want you to be touching that in this change.  It is
an independent/orthogonal clean-up.

Thanks.


Re: [PATCH] Support long format for log-based submodule diff

2018-03-09 Thread Junio C Hamano
Stefan Beller  writes:

>> $ git diff --submodule=log --submodule-log-detail=(long|short)
>>
>> I'm not sure what makes sense here. I welcome thoughts/discussion and
>> will provide follow-up patches.
>
> The case of merges is usually configured with --[no-]merges, or
> --min-parents=.

But that is a knob that controls an irrelevant aspect of the detail
in the context of this discussion, isn't it?  This code is about "to
what degree the things that happened between two submodule commits
in an adjacent pair of commits in the superproject are summarized?"
and the current one unilaterally decides that something similar to
what you would see in the output from "log --oneline --first-parent
--left-right" is sufficient, which is a position to heavily favour
projects whose histories are very clean by either being:

 (1) totally linear, each individual commit appearing on the
 first-parent chain; or

 (2) totally topic-branch based, everything appearing as merges of
 a topic branch to the trunk

The hack Robert illustrates below is to change it to stop favouring
such projects with "clean" histories, and show "log --oneline
--no-merges --left-right".  When presented that way, clean histories
of topic-branch based projects will suffer by losing conciseness,
but clean histories of totally linear projects will still be shown
the same way, and messy history that sometimes merges, sometimes
merges mergy histories, and sometimes directly builds on the trunk
will be shown as an enumeration of individual commits in a flat way
by ignoring merges and not restricting the traversal to the first
parent chains, which would appear more uniform than what the current
code shows.

I do not see a point in introducing --min/max-parents as a knob to
control how the history is summarized.

This is a strongly related tangent, but I wonder if we can and/or
want to share more code with the codepath that prepares the log
message for a merge.  It summarizes what happened on the side branch
since it forked from the history it is joining back to (I think it
is merge.c::shortlog() that computes this) and it is quite similar
to what Robert wants to use for submodules here.  On the other hand,
in a project _without_ submodule, if you are pulling history made by
your lieutenant whose history is full of linear merges of topic
branches to the mainline, it may not be a bad idea to allow
fmt-merge-msg to alternatively show something similar to the "diff
--submodule=log" gives us, i.e. summarize the history of the side
branch being merged by just listing the commits on the first-parent
chain.  So I sense some opportunity for cross pollination here.


[PATCH v11 04/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#bom10

Signed-off-by: Lars Schneider 
---
 utf8.c | 26 ++
 utf8.h |  9 +
 2 files changed, 35 insertions(+)

diff --git a/utf8.c b/utf8.c
index 2c27ce0137..e4b99580f0 100644
--- a/utf8.c
+++ b/utf8.c
@@ -538,6 +538,32 @@ char *reencode_string_len(const char *in, int insz,
 }
 #endif
 
+static int has_bom_prefix(const char *data, size_t len,
+ const char *bom, size_t bom_len)
+{
+   return (len >= bom_len) && !memcmp(data, bom, bom_len);
+}
+
+static const char utf16_be_bom[] = {0xFE, 0xFF};
+static const char utf16_le_bom[] = {0xFF, 0xFE};
+static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
+static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+ (!strcasecmp(enc, "UTF-16BE") || !strcasecmp(enc, "UTF-16LE") ||
+  !strcasecmp(enc, "UTF16BE") || !strcasecmp(enc, "UTF16LE")) &&
+ (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+  has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+ (!strcasecmp(enc, "UTF-32BE") || !strcasecmp(enc, "UTF-32LE") ||
+  !strcasecmp(enc, "UTF32BE") || !strcasecmp(enc, "UTF32LE")) &&
+ (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+  has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 6bbcf31a83..0db1db4519 100644
--- a/utf8.h
+++ b/utf8.h
@@ -70,4 +70,13 @@ typedef enum {
 void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int 
width,
   const char *s);
 
+/*
+ * If a data stream is declared as UTF-16BE or UTF-16LE, then a UTF-16
+ * BOM must not be used [1]. The same applies for the UTF-32 equivalents.
+ * The function returns true if this rule is violated.
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#bom10
+ */
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.16.2



[PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.

Signed-off-by: Lars Schneider 
---
 convert.c| 48 ++
 t/t0028-working-tree-encoding.sh | 56 
 2 files changed, 104 insertions(+)

diff --git a/convert.c b/convert.c
index aa59ecfe49..b80d666a6b 100644
--- a/convert.c
+++ b/convert.c
@@ -266,6 +266,51 @@ static int will_convert_lf_to_crlf(size_t len, struct 
text_stat *stats,
 
 }
 
+static int validate_encoding(const char *path, const char *enc,
+ const char *data, size_t len, int die_on_error)
+{
+   /* We only check for UTF here as UTF?? can be an alias for UTF-?? */
+   if (istarts_with(enc, "UTF")) {
+   /*
+* Check for detectable errors in UTF encodings
+*/
+   if (has_prohibited_utf_bom(enc, data, len)) {
+   const char *error_msg = _(
+   "BOM is prohibited in '%s' if encoded as %s");
+   /*
+* This advice is shown for UTF-??BE and UTF-??LE 
encodings.
+*/
+   const char *advise_msg = _(
+   "The file '%s' contains a byte order "
+   "mark (BOM). Please use %.6s as "
+   "working-tree-encoding.");
+   advise(advise_msg, path, enc);
+   if (die_on_error)
+   die(error_msg, path, enc);
+   else {
+   return error(error_msg, path, enc);
+   }
+
+   } else if (is_missing_required_utf_bom(enc, data, len)) {
+   const char *error_msg = _(
+   "BOM is required in '%s' if encoded as %s");
+   const char *advise_msg = _(
+   "The file '%s' is missing a byte order "
+   "mark (BOM). Please use %sBE or %sLE "
+   "(depending on the byte order) as "
+   "working-tree-encoding.");
+   advise(advise_msg, path, enc, enc);
+   if (die_on_error)
+   die(error_msg, path, enc);
+   else {
+   return error(error_msg, path, enc);
+   }
+   }
+
+   }
+   return 0;
+}
+
 static const char *default_encoding = "UTF-8";
 
 static int encode_to_git(const char *path, const char *src, size_t src_len,
@@ -291,6 +336,9 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
if (!buf && !src)
return 1;
 
+   if (validate_encoding(path, enc, src, src_len, die_on_error))
+   return 0;
+
dst = reencode_string_len(src, src_len, default_encoding, enc,
  _len);
if (!dst) {
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index e492945a01..e8408dfe5c 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -62,6 +62,46 @@ test_expect_success 'check $GIT_DIR/info/attributes support' 
'
 
 for i in 16 32
 do
+   test_expect_success "check prohibited UTF-${i} BOM" '
+   test_when_finished "git reset --hard HEAD" &&
+
+   echo "*.utf${i}be text working-tree-encoding=utf-${i}be" 
>>.gitattributes &&
+   echo "*.utf${i}le text working-tree-encoding=utf-${i}LE" 
>>.gitattributes &&
+
+   # Here we add a UTF-16 (resp. UTF-32) files with BOM 
(big/little-endian)
+   # but we tell Git to treat it as UTF-16BE/UTF-16LE (resp. 
UTF-32).
+   # In these cases the BOM is prohibited.
+   cp bebom.utf${i}be.raw bebom.utf${i}be &&
+   test_must_fail git add bebom.utf${i}be 2>err.out &&
+   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" err.out 
&&
+
+   cp lebom.utf${i}le.raw lebom.utf${i}be &&
+   test_must_fail git add lebom.utf${i}be 2>err.out &&
+   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" err.out 
&&
+
+   cp bebom.utf${i}be.raw bebom.utf${i}le &&
+   test_must_fail git add bebom.utf${i}le 2>err.out &&
+   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}LE" err.out 
&&
+
+   cp lebom.utf${i}le.raw lebom.utf${i}le &&
+   test_must_fail git add lebom.utf${i}le 2>err.out &&
+   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}LE" err.out
+   '
+
+   test_expect_success "check required UTF-${i} BOM" '
+   test_when_finished "git 

[PATCH v11 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.

Signed-off-by: Lars Schneider 
---
 convert.c| 25 +
 t/t0028-working-tree-encoding.sh |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/convert.c b/convert.c
index 9a3ae7cce1..d739078016 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,29 @@ static int validate_encoding(const char *path, const char 
*enc,
return 0;
 }
 
+static void trace_encoding(const char *context, const char *path,
+  const char *encoding, const char *buf, size_t len)
+{
+   static struct trace_key coe = TRACE_KEY_INIT(WORKING_TREE_ENCODING);
+   struct strbuf trace = STRBUF_INIT;
+   int i;
+
+   strbuf_addf(, "%s (%s, considered %s):\n", context, path, 
encoding);
+   for (i = 0; i < len && buf; ++i) {
+   strbuf_addf(
+   ,"| \e[2m%2i:\e[0m %2x \e[2m%c\e[0m%c",
+   i,
+   (unsigned char) buf[i],
+   (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '),
+   ((i+1) % 8 && (i+1) < len ? ' ' : '\n')
+   );
+   }
+   strbuf_addchars(, '\n', 1);
+
+   trace_strbuf(, );
+   strbuf_release();
+}
+
 static const char *default_encoding = "UTF-8";
 
 static int encode_to_git(const char *path, const char *src, size_t src_len,
@@ -352,6 +375,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
if (validate_encoding(path, enc, src, src_len, die_on_error))
return 0;
 
+   trace_encoding("source", path, enc, src, src_len);
dst = reencode_string_len(src, src_len, default_encoding, enc,
  _len);
if (!dst) {
@@ -369,6 +393,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
return 0;
}
}
+   trace_encoding("destination", path, default_encoding, dst, dst_len);
 
strbuf_attach(buf, dst, dst_len, dst_len + 1);
return 1;
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 1bee7b9f71..f68e282c5e 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -4,6 +4,8 @@ test_description='working-tree-encoding conversion via 
gitattributes'
 
 . ./test-lib.sh
 
+GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
+
 test_expect_success 'setup test files' '
git config core.eol lf &&
 
-- 
2.16.2



[PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front ends do not
visualize the content.

Add an attribute to tell Git what encoding the user has defined for a
given file. If the content is added to the index, then Git converts the
content to a canonical UTF-8 representation. On checkout Git will
reverse the conversion.

Signed-off-by: Lars Schneider 
---
 Documentation/gitattributes.txt  |  80 ++
 convert.c| 114 ++-
 convert.h|   1 +
 sha1_file.c  |   2 +-
 t/t0028-working-tree-encoding.sh | 144 +++
 5 files changed, 339 insertions(+), 2 deletions(-)
 create mode 100755 t/t0028-working-tree-encoding.sh

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 30687de81a..31a4f92840 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -272,6 +272,86 @@ few exceptions.  Even though...
   catch potential problems early, safety triggers.
 
 
+`working-tree-encoding`
+^^^
+
+Git recognizes files encoded in ASCII or one of its supersets (e.g.
+UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other
+encodings (e.g. UTF-16) are interpreted as binary and consequently
+built-in Git text processing tools (e.g. 'git diff') as well as most Git
+web front ends do not visualize the contents of these files by default.
+
+In these cases you can tell Git the encoding of a file in the working
+directory with the `working-tree-encoding` attribute. If a file with this
+attribute is added to Git, then Git reencodes the content from the
+specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
+content in its internal data structure (called "the index"). On checkout
+the content is reencoded back to the specified encoding.
+
+Please note that using the `working-tree-encoding` attribute may have a
+number of pitfalls:
+
+- Alternative Git implementations (e.g. JGit or libgit2) and older Git
+  versions (as of March 2018) do not support the `working-tree-encoding`
+  attribute. If you decide to use the `working-tree-encoding` attribute
+  in your repository, then it is strongly recommended to ensure that all
+  clients working with the repository support it.
+
+  For example, Microsoft Visual Studio resources files (`*.rc`) or
+  PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16.
+  If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with
+  a `working-tree-encoding` enabled Git client, then `foo.ps1` will be
+  stored as UTF-8 internally. A client without `working-tree-encoding`
+  support will checkout `foo.ps1` as UTF-8 encoded file. This will
+  typically cause trouble for the users of this file.
+
+  If a Git client, that does not support the `working-tree-encoding`
+  attribute, adds a new file `bar.ps1`, then `bar.ps1` will be
+  stored "as-is" internally (in this example probably as UTF-16).
+  A client with `working-tree-encoding` support will interpret the
+  internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
+  That operation will fail and cause an error.
+
+- Reencoding content requires resources that might slow down certain
+  Git operations (e.g 'git checkout' or 'git add').
+
+Use the `working-tree-encoding` attribute only if you cannot store a file
+in UTF-8 encoding and if you want Git to be able to process the content
+as text.
+
+As an example, use the following attributes if your '*.ps1' files are
+UTF-16 encoded with byte order mark (BOM) and you want Git to perform
+automatic line ending conversion based on your platform.
+
+
+*.ps1  text working-tree-encoding=UTF-16
+
+
+Use the following attributes if your '*.ps1' files are UTF-16 little
+endian encoded without BOM and you want Git to use Windows line endings
+in the working directory. Please note, it is highly recommended to
+explicitly define the line endings with `eol` if the `working-tree-encoding`
+attribute is used to avoid ambiguity.
+
+
+*.ps1  text working-tree-encoding=UTF-16LE eol=CRLF
+
+
+You can get a list of all available encodings on your platform with the
+following command:
+
+
+iconv --list
+
+
+If you do not know the encoding of a file, then you can use the `file`
+command to guess the encoding:
+
+
+file foo.ps1
+
+
+
 `ident`
 ^^^
 
diff --git a/convert.c b/convert.c
index b976eb968c..aa59ecfe49 100644
--- a/convert.c
+++ b/convert.c
@@ -7,6 +7,7 @@

[PATCH v11 03/10] strbuf: add a case insensitive starts_with()

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Check in a case insensitive manner if one string is a prefix of another
string.

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 git-compat-util.h | 1 +
 strbuf.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 68b2ad531e..95c9b34832 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -455,6 +455,7 @@ extern void (*get_warn_routine(void))(const char *warn, 
va_list params);
 extern void set_die_is_recursing_routine(int (*routine)(void));
 
 extern int starts_with(const char *str, const char *prefix);
+extern int istarts_with(const char *str, const char *prefix);
 
 /*
  * If the string "str" begins with the string found in "prefix", return 1.
diff --git a/strbuf.c b/strbuf.c
index b635f0bdc4..99812b8488 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -11,6 +11,15 @@ int starts_with(const char *str, const char *prefix)
return 0;
 }
 
+int istarts_with(const char *str, const char *prefix)
+{
+   for (; ; str++, prefix++)
+   if (!*prefix)
+   return 1;
+   else if (tolower(*str) != tolower(*prefix))
+   return 0;
+}
+
 int skip_to_optional_arg_default(const char *str, const char *prefix,
 const char **arg, const char *def)
 {
-- 
2.16.2



[PATCH v11 05/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-09 Thread lars . schneider
From: Lars Schneider 

If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.

The Unicode standard instructs to assume big-endian if there in no BOM
for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
in HTML5 recommends to assume little-endian to "deal with deployed
content" [3]. Strictly requiring a BOM seems to be the safest option
for content in Git.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#gen6
[2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
 Section 3.10, D98, page 132
[3] https://encoding.spec.whatwg.org/#utf-16le

Signed-off-by: Lars Schneider 
---
 utf8.c | 13 +
 utf8.h | 19 +++
 2 files changed, 32 insertions(+)

diff --git a/utf8.c b/utf8.c
index e4b99580f0..81c6678df1 100644
--- a/utf8.c
+++ b/utf8.c
@@ -564,6 +564,19 @@ int has_prohibited_utf_bom(const char *enc, const char 
*data, size_t len)
);
 }
 
+int is_missing_required_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+  (!strcasecmp(enc, "UTF-16") || !strcasecmp(enc, "UTF16")) &&
+  !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+  (!strcasecmp(enc, "UTF-32") || !strcasecmp(enc, "UTF32")) &&
+  !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 0db1db4519..cce654a64a 100644
--- a/utf8.h
+++ b/utf8.h
@@ -79,4 +79,23 @@ void strbuf_utf8_align(struct strbuf *buf, align_type 
position, unsigned int wid
  */
 int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
 
+/*
+ * If the endianness is not defined in the encoding name, then we
+ * require a BOM. The function returns true if a required BOM is missing.
+ *
+ * The Unicode standard instructs to assume big-endian if there in no
+ * BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard
+ * used in HTML5 recommends to assume little-endian to "deal with
+ * deployed content" [3].
+ *
+ * Therefore, strictly requiring a BOM seems to be the safest option for
+ * content in Git.
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#gen6
+ * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
+ * Section 3.10, D98, page 132
+ * [3] https://encoding.spec.whatwg.org/#utf-16le
+ */
+int is_missing_required_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.16.2



[PATCH v11 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we
allocate the buffer for the lower case string with xmallocz(). This
already ensures a NUL at the end of the allocated buffer.

Remove the unnecessary assignment.

Signed-off-by: Lars Schneider 
---
 strbuf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 1df674e919..55b7daeb35 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -781,7 +781,6 @@ char *xstrdup_tolower(const char *string)
result = xmallocz(len);
for (i = 0; i < len; i++)
result[i] = tolower(string[i]);
-   result[i] = '\0';
return result;
 }
 
-- 
2.16.2



[PATCH v11 08/10] convert: advise canonical UTF encoding names

2018-03-09 Thread lars . schneider
From: Lars Schneider 

The canonical name of an UTF encoding has the format UTF, dash, number,
and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE).
Some iconv versions support alternative names without a dash or with
lower case characters.

To avoid problems between different iconv version always suggest the
canonical UTF names in advise messages.

Signed-off-by: Lars Schneider 
---
 convert.c| 21 +
 t/t0028-working-tree-encoding.sh | 10 --
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index b80d666a6b..9a3ae7cce1 100644
--- a/convert.c
+++ b/convert.c
@@ -279,12 +279,20 @@ static int validate_encoding(const char *path, const char 
*enc,
"BOM is prohibited in '%s' if encoded as %s");
/*
 * This advice is shown for UTF-??BE and UTF-??LE 
encodings.
+* We cut off the last two characters of the encoding 
name
+# to generate the encoding name suitable for BOMs.
 */
const char *advise_msg = _(
"The file '%s' contains a byte order "
-   "mark (BOM). Please use %.6s as "
+   "mark (BOM). Please use UTF-%s as "
"working-tree-encoding.");
-   advise(advise_msg, path, enc);
+   const char *stripped = "";
+   char *upper = xstrdup_toupper(enc);
+   upper[strlen(upper)-2] = '\0';
+   if (!skip_prefix(upper, "UTF-", ))
+   skip_prefix(stripped, "UTF", );
+   advise(advise_msg, path, stripped);
+   free(upper);
if (die_on_error)
die(error_msg, path, enc);
else {
@@ -296,10 +304,15 @@ static int validate_encoding(const char *path, const char 
*enc,
"BOM is required in '%s' if encoded as %s");
const char *advise_msg = _(
"The file '%s' is missing a byte order "
-   "mark (BOM). Please use %sBE or %sLE "
+   "mark (BOM). Please use UTF-%sBE or UTF-%sLE "
"(depending on the byte order) as "
"working-tree-encoding.");
-   advise(advise_msg, path, enc, enc);
+   const char *stripped = "";
+   char *upper = xstrdup_toupper(enc);
+   if (!skip_prefix(upper, "UTF-", ))
+   skip_prefix(stripped, "UTF", );
+   advise(advise_msg, path, stripped, stripped);
+   free(upper);
if (die_on_error)
die(error_msg, path, enc);
else {
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index e8408dfe5c..1bee7b9f71 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -74,18 +74,22 @@ do
cp bebom.utf${i}be.raw bebom.utf${i}be &&
test_must_fail git add bebom.utf${i}be 2>err.out &&
test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" err.out 
&&
+   test_i18ngrep "use UTF-${i} as working-tree-encoding" err.out &&
 
cp lebom.utf${i}le.raw lebom.utf${i}be &&
test_must_fail git add lebom.utf${i}be 2>err.out &&
test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" err.out 
&&
+   test_i18ngrep "use UTF-${i} as working-tree-encoding" err.out &&
 
cp bebom.utf${i}be.raw bebom.utf${i}le &&
test_must_fail git add bebom.utf${i}le 2>err.out &&
test_i18ngrep "fatal: BOM is prohibited .* utf-${i}LE" err.out 
&&
+   test_i18ngrep "use UTF-${i} as working-tree-encoding" err.out &&
 
cp lebom.utf${i}le.raw lebom.utf${i}le &&
test_must_fail git add lebom.utf${i}le 2>err.out &&
-   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}LE" err.out
+   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}LE" err.out 
&&
+   test_i18ngrep "use UTF-${i} as working-tree-encoding" err.out
'
 
test_expect_success "check required UTF-${i} BOM" '
@@ -96,10 +100,12 @@ do
cp nobom.utf${i}be.raw nobom.utf${i} &&
test_must_fail git add nobom.utf${i} 2>err.out &&
test_i18ngrep "fatal: BOM is required .* utf-${i}" err.out &&
+   test_i18ngrep "use UTF-${i}BE or UTF-${i}LE" 

[PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-09 Thread lars . schneider
From: Lars Schneider 

UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1].

Add 'core.checkRoundtripEncoding', which contains a comma separated
list of encodings, to define for what encodings Git should check the
conversion round trip if they are used in the 'working-tree-encoding'
attribute.

Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'.

[1] 
https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode

Signed-off-by: Lars Schneider 
---
 Documentation/config.txt |  6 +++
 Documentation/gitattributes.txt  |  8 
 config.c |  5 +++
 convert.c| 79 +++-
 convert.h|  1 +
 environment.c|  1 +
 t/t0028-working-tree-encoding.sh | 39 
 7 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92b..7dcac9b540 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -530,6 +530,12 @@ core.autocrlf::
This variable can be set to 'input',
in which case no output conversion is performed.
 
+core.checkRoundtripEncoding::
+   A comma and/or whitespace separated list of encodings that Git
+   performs UTF-8 round trip checks on if they are used in an
+   `working-tree-encoding` attribute (see linkgit:gitattributes[5]).
+   The default value is `SHIFT-JIS`.
+
 core.symlinks::
If false, symbolic links are checked out as small plain files that
contain the link text. linkgit:git-update-index[1] and
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 31a4f92840..aa3deae392 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -312,6 +312,14 @@ number of pitfalls:
   internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
   That operation will fail and cause an error.
 
+- Reencoding content to non-UTF encodings can cause errors as the
+  conversion might not be UTF-8 round trip safe. If you suspect your
+  encoding to not be round trip safe, then add it to
+  `core.checkRoundtripEncoding` to make Git check the round trip
+  encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
+  set) is known to have round trip issues with UTF-8 and is checked by
+  default.
+
 - Reencoding content requires resources that might slow down certain
   Git operations (e.g 'git checkout' or 'git add').
 
diff --git a/config.c b/config.c
index 1f003fbb90..d0ada9fcd4 100644
--- a/config.c
+++ b/config.c
@@ -1172,6 +1172,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.checkroundtripencoding")) {
+   check_roundtrip_encoding = xstrdup(value);
+   return 0;
+   }
+
if (!strcmp(var, "core.notesref")) {
notes_ref_name = xstrdup(value);
return 0;
diff --git a/convert.c b/convert.c
index d739078016..c2d24882c1 100644
--- a/convert.c
+++ b/convert.c
@@ -347,6 +347,42 @@ static void trace_encoding(const char *context, const char 
*path,
strbuf_release();
 }
 
+static int check_roundtrip(const char *enc_name)
+{
+   /*
+* check_roundtrip_encoding contains a string of comma and/or
+* space separated encodings (eg. "UTF-16, ASCII, CP1125").
+* Search for the given encoding in that string.
+*/
+   const char *found = strcasestr(check_roundtrip_encoding, enc_name);
+   const char *next;
+   int len;
+   if (!found)
+   return 0;
+   next = found + strlen(enc_name);
+   len = strlen(check_roundtrip_encoding);
+   return (found && (
+   /*
+* check that the found encoding is at the
+* beginning of check_roundtrip_encoding or
+* that it is prefixed with a space or comma
+*/
+   found == check_roundtrip_encoding || (
+   (isspace(found[-1]) || found[-1] == ',')
+   )
+   ) && (
+   /*
+* check that the found encoding is at the
+* end of check_roundtrip_encoding or
+* that it is suffixed with a space or comma
+*/
+   next == check_roundtrip_encoding + len || (
+   next < check_roundtrip_encoding + len &&
+   (isspace(next[0]) || 

[PATCH v11 02/10] strbuf: add xstrdup_toupper()

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 strbuf.c | 12 
 strbuf.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index 55b7daeb35..b635f0bdc4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -784,6 +784,18 @@ char *xstrdup_tolower(const char *string)
return result;
 }
 
+char *xstrdup_toupper(const char *string)
+{
+   char *result;
+   size_t len, i;
+
+   len = strlen(string);
+   result = xmallocz(len);
+   for (i = 0; i < len; i++)
+   result[i] = toupper(string[i]);
+   return result;
+}
+
 char *xstrvfmt(const char *fmt, va_list ap)
 {
struct strbuf buf = STRBUF_INIT;
diff --git a/strbuf.h b/strbuf.h
index 14c8c10d66..df7ced53ed 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -607,6 +607,7 @@ __attribute__((format (printf,2,3)))
 extern int fprintf_ln(FILE *fp, const char *fmt, ...);
 
 char *xstrdup_tolower(const char *);
+char *xstrdup_toupper(const char *);
 
 /**
  * Create a newly allocated string using printf format. You can do this easily
-- 
2.16.2



[PATCH v11 00/10] convert: add support for different encodings

2018-03-09 Thread lars . schneider
From: Lars Schneider 

Hi,

Patches 1-5,9 are preparation and helper functions.
Patch 6-8,10 are the actual change. Patch 8 is new.

This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is
already in master.

Changes since v10:

* rename startscase_with() to istarts_with() (Duy)
* validate_encoding() advises the canonical form of the UTF
  encoding name to the user (Junio)
  --> I added it this as a separate commit that you could be dropped
  if desired by the reviewers.
* fix documentation for roundtrip check (Junio)
* use isspace() to check whitespace/tab delimiter
  in core.checkRoundtripEncoding (Junio)
* remove dead code in roundtrip check (Junio)
* fix invalid # in comment (Eric)
* detect UTF8 and UTF-8 as default encoding (Eric)
* make asterisk stick to the variable, not type (Junio)
* print an error if "w-t-e" does not have a proper value (Junio)
  --> BTW: I noticed that the attribute is not set to "git_attr__false"
  even if I define "-working-tree-encoding". I haven't investigated
  further yet. Might that be a bug? If yes, then this should be
  addresses in a separate patch series.

Thanks,
Lars


  RFC: 
https://public-inbox.org/git/bdb9b884-6d17-4be3-a83c-f67e2afa2...@gmail.com/
   v1: 
https://public-inbox.org/git/20171211155023.1405-1-lars.schnei...@autodesk.com/
   v2: 
https://public-inbox.org/git/2017122915.39680-1-lars.schnei...@autodesk.com/
   v3: 
https://public-inbox.org/git/20180106004808.77513-1-lars.schnei...@autodesk.com/
   v4: 
https://public-inbox.org/git/20180120152418.52859-1-lars.schnei...@autodesk.com/
   v5: https://public-inbox.org/git/20180129201855.9182-1-tbo...@web.de/
   v6: 
https://public-inbox.org/git/20180209132830.55385-1-lars.schnei...@autodesk.com/
   v7: 
https://public-inbox.org/git/20180215152711.158-1-lars.schnei...@autodesk.com/
   v8: 
https://public-inbox.org/git/20180224162801.98860-1-lars.schnei...@autodesk.com/
   v9: 
https://public-inbox.org/git/20180304201418.60958-1-lars.schnei...@autodesk.com/
  v10: 
https://public-inbox.org/git/20180307173026.30058-1-lars.schnei...@autodesk.com/

Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/afc02ce2e0
Checkout: git fetch https://github.com/larsxschneider/git encoding-v11 && git 
checkout afc02ce2e0


### Interdiff (v10..v11):

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d7a56054a5..7dcac9b540 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -531,10 +531,10 @@ core.autocrlf::
in which case no output conversion is performed.

 core.checkRoundtripEncoding::
-   A comma separated list of encodings that Git performs UTF-8 round
-   trip checks on if they are used in an `working-tree-encoding`
-   attribute (see linkgit:gitattributes[5]). The default value is
-   `SHIFT-JIS`.
+   A comma and/or whitespace separated list of encodings that Git
+   performs UTF-8 round trip checks on if they are used in an
+   `working-tree-encoding` attribute (see linkgit:gitattributes[5]).
+   The default value is `SHIFT-JIS`.

 core.symlinks::
If false, symbolic links are checked out as small plain files that
diff --git a/convert.c b/convert.c
index e861f1abbc..c2d24882c1 100644
--- a/convert.c
+++ b/convert.c
@@ -270,7 +270,7 @@ static int validate_encoding(const char *path, const char 
*enc,
  const char *data, size_t len, int die_on_error)
 {
/* We only check for UTF here as UTF?? can be an alias for UTF-?? */
-   if (startscase_with(enc, "UTF")) {
+   if (istarts_with(enc, "UTF")) {
/*
 * Check for detectable errors in UTF encodings
 */
@@ -284,12 +284,15 @@ static int validate_encoding(const char *path, const char 
*enc,
 */
const char *advise_msg = _(
"The file '%s' contains a byte order "
-   "mark (BOM). Please use %s as "
+   "mark (BOM). Please use UTF-%s as "
"working-tree-encoding.");
-   char *upper_enc = xstrdup_toupper(enc);
-   upper_enc[strlen(upper_enc)-2] = '\0';
-   advise(advise_msg, path, upper_enc);
-   free(upper_enc);
+   const char *stripped = "";
+   char *upper = xstrdup_toupper(enc);
+   upper[strlen(upper)-2] = '\0';
+   if (!skip_prefix(upper, "UTF-", ))
+   skip_prefix(stripped, "UTF", );
+   advise(advise_msg, path, stripped);
+   free(upper);
if (die_on_error)
die(error_msg, path, enc);
else {
@@ -301,12 +304,15 @@ 

Good day

2018-03-09 Thread GRACE FOLLY
Good day

Blessings to you,am contacting you based on a mutual benefit
inheritance transaction of ($10.5 million US dollars)
that has to do with your last name contact me for more details.
Contact email [ gracefollyt...@gmail.com ]

Regards,
Mrs Grace Folly


Re: [PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-09 Thread Junio C Hamano
Lars Schneider  writes:

> I think following the boost lib makes most sense. Therefore,
> I would like to go with "istarts_with". OK with you?

I don't care too deeply; if we took starts_with() from there, where
what we now want is defined as istarts_with(), then that sounds like
a good thing to do.

Thanks.


Re: What's cooking in git.git (Mar 2018, #02; Tue, 6)

2018-03-09 Thread Junio C Hamano
Martin Ågren  writes:

>>  Is this ready for 'next'?
>
> I am not aware of any open questions or issues. You thought out loud
> about how the series was structured, in particular about introducing a
> successful test, then redefining it, as opposed to introducing it as a
> failing test, then making it succeed. I hope I managed to motivate my
> choice better in v2 (which is what you have picked up).
>
> Duy wondered if it was sane to use a pager when we know that we are
> "--get"-ing at most one config item. In v2, I addressed this by turning
> on paging for a more careful selection of "--get"-ters.

Yeah, I am aware of these exchanges, and they are resolved nicely, I
think.  I was mostly asking if other people have concerns we haven't
thought of yet.

Let's merge this to 'next', then.

Thanks.


Re: How to use filter-branch with --state-branch?

2018-03-09 Thread Michele Locati
2018-03-09 14:23 GMT+01:00 Ian Campbell :
> On Fri, 2018-03-09 at 14:04 +0100, Michele Locati wrote:
>> Just a couple of questions:
>>
>> 1. it seems to me it's not possible to process all the branches in one
>> go. Am I right?
>
> I'm not sure, I've never done such a thing, in fact I didn't know you
> could.
>
> Really all this feature does is record the `.git/rewrite-map` (or
> whatever the correct name is) at the end of the rewrite and reinstate
> it again the next time, so it shouldn't really interact with many of
> the other options.
>
> My method for storeing "last version processed" in a branch does
> conflict I suppose (since that branch would be rewritten) but that's an
> artefact of the surrounding scaffolding -- you could equally well keep
> the record in some file on the local system or in a non-branch-ish ref
> (I guess).
>
>> 2. Why do you have this line in filter.sh?
>> `rm -f .git/refs/original/refs/heads/${UPSTREAM_REWRITTEN}`
>
> TBH I'm not really sure. I originally wrote this patch many years ago,
> it's just recently that I got around to upstreaming, so my memory is
> more fuzzy than might be expected.
>
> I think perhaps I was trying to avoid this error:
>
> A previous backup already exists in $orig_namespace
> Force overwriting the backup with -f"
>
> which comes if there is an existing backup (a safety feature in the
> non-incremental case).
>
> Note quite sure why I didn't use `-f` as the message says, but I guess
> because it forces other things too which I didn't want to do?
>
> Perhaps what I should have done is make that check conditional on the
> use of --state-branch.
>
> I wonder if you could use the `original/refs/...` as the "last version
> processed"? Would be a lot less manual messing around than what I do!
>
> Ian.


I managed to get a general script that seems to work: see
https://github.com/mlocati/incremental-git-filter-branch

Thanks again, Ian.

--
Michele


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-09 Thread Johannes Schindelin
Hi Junio,

On Thu, 8 Mar 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> If we are talking about a drastic change, a few more days may not be
> >> sufficient, but we are not in a hurry, as this already sounds like a
> >> 2.18 material anyway.
> >
> > It is not at all a drastic change. It will actually make the current patch
> > series better (simplifying the "can we fast-forward?" check).
> >
> > I just want to make sure that I already have Phillip's strategy working,
> > but it will be yet another topic branch on top of the topic branch that
> > will add support for octopus merges *after* the current --recreate-merges
> > topic branch ;-)
> 
> Oh, if the "not redoing the merge afresh, but attempt to reuse the
> previous merge" that was discussed is going to be done as an
> update/addition to the "redo the merge afresh" you already had in
> production forever (and I had in 'pu' for quite a while in various
> polished-ness during iteration), then I do prefer merging down what
> has already proven to be 'next' worthy without waiting for the
> discussion and your local verification of Phillip's new thing,
> especially given that you'll be giving an explicit control to the
> users which variant of "merge" insn will be used and the addition
> of the Phillip's thing won't be a backward-compatibility issue when
> it comes later.

I would like to stress that the `--recreate-merges` functionality has
*not* beed in production forever. It is a reimplementation in pure C of
the Unix shell script that has been in production for five years
(unchanged only for the last half year, the last critical fix happened in
January last year).

So I do see the value here to use `next` as test bed, and once the patch
series will hit `next`, I will also merge it into Git for Windows (as an
EXPERIMENTAL feature).

As such, I am quite comfortable with refactoring a bit here and there, for
example how to handle the case where a `merge` can be fast-forwarded.

I think the changes I made in the last few days were an actual improvement
to readability, even if the only reason for those changes was that I
wanted to accommodate the "rebase merge commits" thing.

FWIW I am fine with bumping this down to 2.18. I got a little bit of
valuable feedback at GitMerge, e.g. that --recreate-merges does not (yet)
have a mode where it can update refs corresponding to the rebased commits.
(This could actually turn out to be independent of --recreate-merges,
even).

I already pushed out the updated branch, and it can be inspected at

https://github.com/git/git/pull/447

The reason I did not send this out yet is that I want to give it a final
look-over myself, so that I do not waste people's time again (as I did
with that monster interdiff that was pretty bogus, too).

Ciao,
Dscho


Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Lars Schneider

> On 07 Mar 2018, at 19:04, Eric Sunshine  wrote:
> 
> On Wed, Mar 7, 2018 at 12:30 PM,   wrote:
>> Check that new content is valid with respect to the user defined
>> 'working-tree-encoding' attribute.
>> 
>> Signed-off-by: Lars Schneider 
>> ---
>> diff --git a/convert.c b/convert.c
>> @@ -266,6 +266,58 @@ static int will_convert_lf_to_crlf(size_t len, struct 
>> text_stat *stats,
>> +static int validate_encoding(const char *path, const char *enc,
>> + const char *data, size_t len, int die_on_error)
>> +{
>> +   /* We only check for UTF here as UTF?? can be an alias for UTF-?? */
>> +   if (startscase_with(enc, "UTF")) {
>> +   /*
>> +* Check for detectable errors in UTF encodings
>> +*/
>> +   if (has_prohibited_utf_bom(enc, data, len)) {
>> +   const char *error_msg = _(
>> +   "BOM is prohibited in '%s' if encoded as 
>> %s");
>> +   /*
>> +* This advice is shown for UTF-??BE and UTF-??LE 
>> encodings.
>> +* We cut off the last two characters of the 
>> encoding name
>> +# to generate the encoding name suitable for BOMs.
> 
> s/#/*/

Of course!


>> diff --git a/t/t0028-working-tree-encoding.sh 
>> b/t/t0028-working-tree-encoding.sh
>> @@ -62,6 +62,46 @@ test_expect_success 'check $GIT_DIR/info/attributes 
>> support' '
>> for i in 16 32
>> do
>> +   test_expect_success "check prohibited UTF-${i} BOM" '
>> +   test_when_finished "git reset --hard HEAD" &&
>> +
>> +   echo "*.utf${i}be text working-tree-encoding=utf-${i}be" 
>> >>.gitattributes &&
>> +   echo "*.utf${i}le text working-tree-encoding=utf-${i}le" 
>> >>.gitattributes &&
> 
> v10 is checking only hyphenated lowercase encoding name; earlier
> versions checked uppercase. For better coverage, it would be nice to
> check several combinations: all uppercase, all lowercase, mixed case,
> hyphenated, not hyphenated.
> 
> I'm not suggesting running all the tests repeatedly but rather just
> varying the format of the encoding name in these tests you're adding.
> For instance, the above could instead be:
> 
>echo "*.utf${i}be text working-tree-encoding=UTF-${i}be" >>.gitattributes 
> &&
>echo "*.utf${i}le text working-tree-encoding=utf${i}LE" >>.gitattributes &&
> 
> or something.

The casing is a good idea - I will do that. I don't want to do "hyphenated, not 
hyphenated" as this would make the tests fail on macOS (and I believe on 
Windows).


Thanks,
Lars

Re: [PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-09 Thread Lars Schneider

> On 09 Mar 2018, at 00:12, Junio C Hamano  wrote:
> 
> Duy Nguyen  writes:
> 
>>> extern int starts_with(const char *str, const char *prefix);
>>> +extern int startscase_with(const char *str, const char *prefix);
>> 
>> This name is a bit hard to read. Boost [1] goes with istarts_with. I
>> wonder if it's better. If not I guess either starts_with_case or
>> starts_case_with will improve readability.
> 
> starts_with_case() sounds quite strange even though
> starts_with_icase() may make it clear that it is "a variant of
> starts_with() function that ignores case".  I dunno.
> dir.c::cmp_icase() takes the _icase suffix for not quite the way
> that is consistent with that understanding, though.

I think following the boost lib makes most sense. Therefore,
I would like to go with "istarts_with". OK with you?

- Lars


git submodule update - reset instead of checkout?

2018-03-09 Thread Andreas Krey
Hi everyone,

I've been reading up on the current state of git submodules
(our customer's customers want to use them, causing a slight
groan from our customer).

The usability thing I wonder about - with git submodule update,
is it necessary to detach the head to the current (or upstream)
head, instead of resetting the current branch to that state?

Primary interest in the question: Seeing 'detached head' scares
almost everybody. To brainstorm:

- as we can already use 'submodule update --remote' to update
  to the remote head of a branch, it would be logical to have
  that branch checked out locally (and unfortunately, potentially
  have that branch's name conflict with the remote branch setup).

- when developers more or less accidentally commit on the detached
  head, all is not lost yet (I remember this being differently),
  but if the next 'submodule update' just resets again, the commit
  just made is still dropped, just as in the detached head state.

- So, we'd need to have 'submodule update' act closer to the pull or
  rebase counterparts and refuse to just lose commits (or uncommitted
  changes).

Having a checked-out branch in the submodule would have the advantage
that I can 'just' push local commits. At the moment, doing that requires
a relatively intricate dance, not at all similar to working in the
base (parent) module.

I'm working on hooks that automatically update the submodules after
a commit change (merge, pull, checkout) in the parent module, but
with the additional feature of (a) keeping a branch checked out
and (b) avoid discarding local changes. Probably means I shouldn't
invoke 'submodule update' at all, and deal with everyting myself.

Any thoughs/comments/helpful hints?

(Addional complexity: egit/jgit is in use as well, and the work model
we will arrive at probabaly needs to work with the current egit.)

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800


Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Takuto Ikuta
2018-03-09 3:42 GMT+09:00 Junio C Hamano :
> Takuto Ikuta  writes:
>> This patch stores existing loose objects in hashmap beforehand and use
>> it to check existence instead of using lstat.
>>
>> With this patch, the number of lstat calls in `git fetch` is reduced
>> from 411412 to 13794 for chromium repository.
>>
>> I took time stat of `git fetch` disabling quickfetch for chromium
>> repository 3 time on linux with SSD.
>
> Now you drop a clue that would help to fill in the blanks above, but
> I am not sure what the significance of your having to disable
> quickfetch in order to take measurements---it makes it sound as if
> it is an articificial problem that does not exist in real life
> (i.e. when quickfetch is not disabled), but I am getting the feeling
> that it is not what you wanted to say here.
>

Yes, I just wanted to say 'git fetch' invokes fetch-pack.
fetch-pack is skipped when running git fetch repeatedly while
remote has no update by quickfetch. So I disabled it to see the
performance of fetch-pack. In chromium repository, master branch
is updated several times in an hour, so git fetch invokes fetch-pack
in such frequency.

> In any case, do_fetch_pack() tries to see if all of the tip commits
> we are going to fetch exist locally, so when you are trying a fetch
> that grabs huge number of refs (by the way, it means that the first
> sentence of the proposed log message is not quite true---it is "When
> fetching a large number of refs", as it does not matter how many
> refs _we_ have, no?), everything_local() ends up making repeated
> calls to has_object_file_with_flags() to all of the refs.
>

I fixed description by changing 'refs' to 'remote refs'. In my understanding,
git tries to check existence of remote refs even if we won't fetch such refs.

> I like the idea---this turns "for each of these many things, check
> if it exists with lstat(2)" into "enumerate what exists with
> lstat(2), and then use that for the existence test"; if you need to
> try N objects for existence, and you only have M objects loose where
> N is vastly larger than M, it will be a huge win.  If you have very
> many loose objects and checking only a handful of objects for
> existence check, you would lose big, though, no?
>

Yes. But I think the default limit for the number of loose objects, 7000,
gives us small overhead when we do enumeration of all objects.

>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index d97461296..1658487f7 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -711,6 +711,15 @@ static void mark_alternate_complete(struct object *obj)
>>   mark_complete(>oid);
>>  }
>>
>> +static int add_loose_objects_to_set(const struct object_id *oid,
>> + const char *path,
>> + void *data)
>> +{
>> + struct oidset* set = (struct oidset*)(data);
>
> Style: in our codebase, asterisk does not stick to the type.
>
> struct oidset *set = (struct oidset *)(data);
>
>> @@ -719,16 +728,21 @@ static int everything_local(struct fetch_pack_args 
>> *args,
>>   int retval;
>>   int old_save_commit_buffer = save_commit_buffer;
>>   timestamp_t cutoff = 0;
>> + struct oidset loose_oid_set = OIDSET_INIT;
>> +
>> + for_each_loose_object(add_loose_objects_to_set, _oid_set, 0);
>
> OK, so this is the "enumerate all loose objects" phase.
>
>>   save_commit_buffer = 0;
>>
>>   for (ref = *refs; ref; ref = ref->next) {
>>   struct object *o;
>> + unsigned int flag = OBJECT_INFO_QUICK;
>
> Hmm, OBJECT_INFO_QUICK optimization was added in dfdd4afc
> ("sha1_file: teach sha1_object_info_extended more flags",
> 2017-06-21), but since 8b4c0103 ("sha1_file: support lazily fetching
> missing objects", 2017-12-08) it appears that passing
> OBJECT_INFO_QUICK down the codepath does not do anything
> interesting.  Jonathan (cc'ed), are all remaining hits from "git
> grep OBJECT_INFO_QUICK" all dead no-ops these days?
>

Yes the flag is no-op now, but let me untouched the flag in this patch.

>> diff --git a/sha1_file.c b/sha1_file.c
>> index 1b94f39c4..c903cbcec 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char 
>> *sha1, struct object_info *oi,
>>   if (find_pack_entry(real, ))
>>   break;
>>
>> + if (flags & OBJECT_INFO_SKIP_LOOSE)
>> + return -1;
>> +
>
> I cannot quite convince myself that this is done at the right layer;
> it smells to be at a bit too low a layer.  This change makes sense
> only to a caller that is interested in the existence test.  If the
> flag is named after what it does, i.e. "ignore loose object", then
> it does sort-of make sense, though.
>

Couldn't come up with good alternative for this, I followed your
flag name suggestion in patch v3.

Takuto


Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Takuto Ikuta
2018-03-09 2:19 GMT+09:00 René Scharfe :
> Am 08.03.2018 um 13:06 schrieb Takuto Ikuta:
>> +static int add_loose_objects_to_set(const struct object_id *oid,
>> + const char *path,
>> + void *data)
>> +{
>> + struct oidset* set = (struct oidset*)(data);
>
> This cast is not needed (unlike in C++).  And the asterisk should be stuck
> to the variable, not the type (see Documentation/CodingGuidelines).
>
>> + oidset_insert(set, oid);
>
> In fact, you could just put "data" in here instead of "set" (without a
> cast), with no loss in readability or safety.
>

Thank you for review, changed to use data directly in v2.

>> + return 0;
>> +}
>> +
>>   static int everything_local(struct fetch_pack_args *args,
>>   struct ref **refs,
>>   struct ref **sought, int nr_sought)
>> @@ -719,16 +728,21 @@ static int everything_local(struct fetch_pack_args 
>> *args,
>>   int retval;
>>   int old_save_commit_buffer = save_commit_buffer;
>>   timestamp_t cutoff = 0;
>> + struct oidset loose_oid_set = OIDSET_INIT;
>> +
>> + for_each_loose_object(add_loose_objects_to_set, _oid_set, 0);
>>
>>   save_commit_buffer = 0;
>>
>>   for (ref = *refs; ref; ref = ref->next) {
>>   struct object *o;
>> + unsigned int flag = OBJECT_INFO_QUICK;
>>
>> - if (!has_object_file_with_flags(>old_oid,
>> - OBJECT_INFO_QUICK))
>> - continue;
>> + if (!oidset_contains(_oid_set, >old_oid))
>> + flag |= OBJECT_INFO_SKIP_LOOSE;
>>
>> + if (!has_object_file_with_flags(>old_oid, flag))
>> + continue;
>>   o = parse_object(>old_oid);
>>   if (!o)
>>   continue;
>> @@ -744,6 +758,8 @@ static int everything_local(struct fetch_pack_args *args,
>>   }
>>   }
>>
>> + oidset_clear(_oid_set);
>> +
>
> This part looks fine to me.  (Except perhaps call the variable "flags"
> because you sometimes have two?)
>

Changed flag names.

> Why not include packed objects as well?  Probably because packs have
> indexes which can queried quickly to determine object existence, and
> because there are only few loose objects in typical repositories,
> right?
>

Correct. In my target repository, chromium, fetch-pack's slowness
comes from many lstat
to non-existing loose objects for remote refs. I focus on to remove such lstat.

> A similar cache was introduced by cc817ca3ef (sha1_name: cache
> readdir(3) results in find_short_object_filename()) to speed up
> finding unambiguous shorts object hashes.  I wonder if it could be
> used here as well, but I don't see an easy way.
>
>>   if (!args->no_dependents) {
>>   if (!args->deepen) {
>>   for_each_ref(mark_complete_oid, NULL);
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 1b94f39c4..c903cbcec 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char 
>> *sha1, struct object_info *oi,
>>   if (find_pack_entry(real, ))
>>   break;
>>
>> + if (flags & OBJECT_INFO_SKIP_LOOSE)
>> + return -1;
>> +
>>   /* Most likely it's a loose object. */
>>   if (!sha1_loose_object_info(real, oi, flags))
>>   return 0;
>>
>
> This early return doesn't just skip checking loose objects.  It
> also skips reloading packs and fetching missing objects for
> partial clones.  That may not be a problem for fetch-pack, but
> it means the flag has a misleading name.  Do you get the same
> performance improvement if you make it only skip that
> sha1_loose_object_info() call?
>

I changed the flag name following Junio's suggestion. If we skip
sha1_loose_object_info call,
reprepare_packed_git(...) runs for every non-existing refs and git
fetch becomes slower.

Takuto


[PATCH v3] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Takuto Ikuta
In repository having large number of remote refs, because to check
existence of each refs in local repository, 'git fetch' ends up doing a
lot of lstat(2) calls to see if it exists in loose form, which makes it
slow.

This patch enumerates loose objects in hashmap beforehand and uses it to
check existence instead of using lstat(2) to improve performance of
fetch-pack for repositories having large number of remote refs compared
to the number of loose objects.

With this patch, the number of lstat(2) calls in `git fetch` is reduced
from 411412 to 13794 for chromium repository, it has more than 48
remote refs.

I took time stat of `git fetch` disabling quickfetch, so that fetch-pack
runs, for chromium repository 3 time on linux with SSD.
* with this patch
8.105s
8.309s
7.640s
avg: 8.018s

* master
12.287s
11.175s
12.227s
avg: 11.896s

On my MacBook Air which has slower lstat(2).
* with this patch
14.501s

* master
1m16.027s

`git fetch` on slow disk will be improved largely.

Signed-off-by: Takuto Ikuta 
---
 cache.h  |  2 ++
 fetch-pack.c | 26 +++---
 sha1_file.c  |  3 +++
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index d06932ed0..6a72f54d7 100644
--- a/cache.h
+++ b/cache.h
@@ -1773,6 +1773,8 @@ struct object_info {
 #define OBJECT_INFO_SKIP_CACHED 4
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
+/* Do not check loose object */
+#define OBJECT_INFO_IGNORE_LOOSE 16
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
 
 /*
diff --git a/fetch-pack.c b/fetch-pack.c
index d97461296..aed4aa213 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -711,6 +711,14 @@ static void mark_alternate_complete(struct object *obj)
mark_complete(>oid);
 }
 
+static int add_loose_objects_to_set(const struct object_id *oid,
+   const char *path,
+   void *data)
+{
+   oidset_insert(data, oid);
+   return 0;
+}
+
 static int everything_local(struct fetch_pack_args *args,
struct ref **refs,
struct ref **sought, int nr_sought)
@@ -719,16 +727,26 @@ static int everything_local(struct fetch_pack_args *args,
int retval;
int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
+   struct oidset loose_oid_set = OIDSET_INIT;
+
+   /* Enumerate all loose objects. */
+   for_each_loose_object(add_loose_objects_to_set, _oid_set, 0);
 
save_commit_buffer = 0;
 
for (ref = *refs; ref; ref = ref->next) {
struct object *o;
+   unsigned int flags = OBJECT_INFO_QUICK;
 
-   if (!has_object_file_with_flags(>old_oid,
-   OBJECT_INFO_QUICK))
-   continue;
+   if (!oidset_contains(_oid_set, >old_oid)) {
+   /* I know this does not exist in the loose form,
+* so check if it exists in a non-loose form.
+*/
+   flags |= OBJECT_INFO_IGNORE_LOOSE;
+   }
 
+   if (!has_object_file_with_flags(>old_oid, flags))
+   continue;
o = parse_object(>old_oid);
if (!o)
continue;
@@ -744,6 +762,8 @@ static int everything_local(struct fetch_pack_args *args,
}
}
 
+   oidset_clear(_oid_set);
+
if (!args->no_dependents) {
if (!args->deepen) {
for_each_ref(mark_complete_oid, NULL);
diff --git a/sha1_file.c b/sha1_file.c
index 1b94f39c4..c0a197947 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
if (find_pack_entry(real, ))
break;
 
+   if (flags & OBJECT_INFO_IGNORE_LOOSE)
+   return -1;
+
/* Most likely it's a loose object. */
if (!sha1_loose_object_info(real, oi, flags))
return 0;
-- 
2.16.2



Re: How to use filter-branch with --state-branch?

2018-03-09 Thread Ian Campbell
On Fri, 2018-03-09 at 14:04 +0100, Michele Locati wrote:
> Just a couple of questions:
> 
> 1. it seems to me it's not possible to process all the branches in one
> go. Am I right?

I'm not sure, I've never done such a thing, in fact I didn't know you
could.

Really all this feature does is record the `.git/rewrite-map` (or
whatever the correct name is) at the end of the rewrite and reinstate
it again the next time, so it shouldn't really interact with many of
the other options.

My method for storeing "last version processed" in a branch does
conflict I suppose (since that branch would be rewritten) but that's an
artefact of the surrounding scaffolding -- you could equally well keep
the record in some file on the local system or in a non-branch-ish ref
(I guess).

> 2. Why do you have this line in filter.sh?
> `rm -f .git/refs/original/refs/heads/${UPSTREAM_REWRITTEN}`

TBH I'm not really sure. I originally wrote this patch many years ago,
it's just recently that I got around to upstreaming, so my memory is
more fuzzy than might be expected.

I think perhaps I was trying to avoid this error:

A previous backup already exists in $orig_namespace
Force overwriting the backup with -f"

which comes if there is an existing backup (a safety feature in the
non-incremental case).

Note quite sure why I didn't use `-f` as the message says, but I guess
because it forces other things too which I didn't want to do?

Perhaps what I should have done is make that check conditional on the
use of --state-branch.

I wonder if you could use the `original/refs/...` as the "last version
processed"? Would be a lot less manual messing around than what I do!

Ian.


[PATCH v2 1/1] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Takuto Ikuta
In repository having large number of remote refs, because to check
existence of each refs in local repository, 'git fetch' ends up doing a
lot of lstat(2) calls to see if it exists in loose form, which makes it
slow.

This patch enumerates loose objects in hashmap beforehand and uses it to
check existence instead of using lstat(2) to improve performance of
fetch-pack for repositories having large number of remote refs compared
to the number of loose objects.

With this patch, the number of lstat(2) calls in `git fetch` is reduced
from 411412 to 13794 for chromium repository, it has more than 48
remote refs.

I took time stat of `git fetch` disabling quickfetch, so that fetch-pack
runs, for chromium repository 3 time on linux with SSD.
* with this patch
8.105s
8.309s
7.640s
avg: 8.018s

* master
12.287s
11.175s
12.227s
avg: 11.896s

On my MacBook Air which has slower lstat(2).
* with this patch
14.501s

* master
1m16.027s

`git fetch` on slow disk will be improved largely.

Signed-off-by: Takuto Ikuta 
---
 cache.h  |  2 ++
 fetch-pack.c | 26 +++---
 sha1_file.c  |  3 +++
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index d06932ed0..6a72f54d7 100644
--- a/cache.h
+++ b/cache.h
@@ -1773,6 +1773,8 @@ struct object_info {
 #define OBJECT_INFO_SKIP_CACHED 4
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
+/* Do not check loose object */
+#define OBJECT_INFO_IGNORE_LOOSE 16
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
 
 /*
diff --git a/fetch-pack.c b/fetch-pack.c
index d97461296..ef8b93424 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -711,6 +711,14 @@ static void mark_alternate_complete(struct object *obj)
mark_complete(>oid);
 }
 
+static int add_loose_objects_to_set(const struct object_id *oid,
+   const char *path,
+   void *data)
+{
+   oidset_insert(data, oid);
+   return 0;
+}
+
 static int everything_local(struct fetch_pack_args *args,
struct ref **refs,
struct ref **sought, int nr_sought)
@@ -719,16 +727,26 @@ static int everything_local(struct fetch_pack_args *args,
int retval;
int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
+   struct oidset loose_oid_set = OIDSET_INIT;
+
+   /* Enumerate all loose objects. */
+   for_each_loose_object(add_loose_objects_to_set, _oid_set, 0);
 
save_commit_buffer = 0;
 
for (ref = *refs; ref; ref = ref->next) {
struct object *o;
+   unsigned int flag = OBJECT_INFO_QUICK;
 
-   if (!has_object_file_with_flags(>old_oid,
-   OBJECT_INFO_QUICK))
-   continue;
+   if (!oidset_contains(_oid_set, >old_oid)) {
+   /* I know this does not exist in the loose form,
+* so check if it exists in a non-loose form.
+*/
+   flag |= OBJECT_INFO_IGNORE_LOOSE;
+   }
 
+   if (!has_object_file_with_flags(>old_oid, flag))
+   continue;
o = parse_object(>old_oid);
if (!o)
continue;
@@ -744,6 +762,8 @@ static int everything_local(struct fetch_pack_args *args,
}
}
 
+   oidset_clear(_oid_set);
+
if (!args->no_dependents) {
if (!args->deepen) {
for_each_ref(mark_complete_oid, NULL);
diff --git a/sha1_file.c b/sha1_file.c
index 1b94f39c4..c0a197947 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
if (find_pack_entry(real, ))
break;
 
+   if (flags & OBJECT_INFO_IGNORE_LOOSE)
+   return -1;
+
/* Most likely it's a loose object. */
if (!sha1_loose_object_info(real, oi, flags))
return 0;
-- 
2.16.2



[PATCH v2 0/1] fetch-pack.c: use oidset to check existence of loose object

2018-03-09 Thread Takuto Ikuta
I removed unnecessary type cast, added comment in everthing_local(...),
changed flag name and updated description.

Takuto Ikuta (1):
  fetch-pack.c: use oidset to check existence of loose object

 cache.h  |  2 ++
 fetch-pack.c | 26 +++---
 sha1_file.c  |  3 +++
 3 files changed, 28 insertions(+), 3 deletions(-)

--
2.16.2


Re: How to use filter-branch with --state-branch?

2018-03-09 Thread Michele Locati
2018-03-08 10:40 GMT+01:00 Ian Campbell :
>
> On Thu, 2018-03-08 at 10:25 +0100, Ævar Arnfjörð Bjarmason wrote:
>
> > > The first filter-branch call required 7168 steps, so did the second 
> > > call...
> > > I also tried without the --prune option of remote update (I had to add
> > > --force to the second filter-branch), but nothing changed.
>
> You can see an example of the usage in:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/
>
> in the `scripts/` sub dir (flow is `cronjob` → `filter.sh` → `git
> filter-branch...`.
>
> I think the big difference is rather than `--all` you need to give it
> the `previous..now` range since that is the update you wish to do
> (first time around you just give it `now`).
>
> The devicetree-rebasing scripting arranges that by keeping the previous
> in a separate branch.
>
> Ian.

Thank you for your quick reply, Ian.

Just a couple of questions:

1. it seems to me it's not possible to process all the branches in one
go. Am I right?

2. Why do you have this line in filter.sh?
`rm -f .git/refs/original/refs/heads/${UPSTREAM_REWRITTEN}`

Thank you again,
Michele


Re: [PATCH v1.1 1/2] t9400-git-cvsserver-server: don't rely on the output of 'test_cmp'

2018-03-09 Thread SZEDER Gábor
On Fri, Mar 9, 2018 at 12:44 AM, Junio C Hamano  wrote:
> SZEDER Gábor  writes:

>>> This makes "rm -f failures &&" unnecessary, no?
>>
>> Indeed, it does.
>
> OK, no need to resend, as I'll amend it locally before queuing
> (unless there is something else that needs updating, that is).

Thanks.

That 'rm' was unnecessary to begin with, because none of the previous
tests wrote and left behind a file 'failures', ever.  Though one could
argue that the original author was careful and added that 'rm failures'
as a protection against future changes to previous tests...  Dunno, we
are talking about a test that checked the stderr of 'test_cmp', so
anything is possible :)


Re: What's cooking in git.git (Mar 2018, #02; Tue, 6)

2018-03-09 Thread Duy Nguyen
On Fri, Mar 9, 2018 at 1:15 PM, Martin Ågren  wrote:
> On 7 March 2018 at 00:34, Junio C Hamano  wrote:
>
>> * ma/config-page-only-in-list-mode (2018-02-21) 3 commits
>>  - config: change default of `pager.config` to "on"
>>  - config: respect `pager.config` in list/get-mode only
>>  - t7006: add tests for how git config paginates
>>
>>  In a way similar to how "git tag" learned to honor the pager
>>  setting only in the list mode, "git config" learned to ignore the
>>  pager setting when it is used for setting values (i.e. when the
>>  purpose of the operation is not to "show").
>>
>>  Is this ready for 'next'?
>
> I am not aware of any open questions or issues. You thought out loud
> about how the series was structured, in particular about introducing a
> successful test, then redefining it, as opposed to introducing it as a
> failing test, then making it succeed. I hope I managed to motivate my
> choice better in v2 (which is what you have picked up).
>
> Duy wondered if it was sane to use a pager when we know that we are
> "--get"-ing at most one config item. In v2, I addressed this by turning
> on paging for a more careful selection of "--get"-ters.

Yeah I got busy with stuff and didn't look at it. I've just checked
what's in 'pu'. Looks good to me.
-- 
Duy


Re: recent glob expansion breakage on Windows?

2018-03-09 Thread Laszlo Ersek
On 03/08/18 23:03, Jonathan Nieder wrote:
> +git-for-windows
> Hi,
> 
> Laszlo Ersek wrote:
> 
>> Jaben reports that git-send-email is suddenly failing to expand the
>> "*.patch" glob for him, at the Windows CMD prompt:
>>
>> -
>> E:\...>git send-email --suppress-cc=author --suppress-cc=self 
>> --suppress-cc=cc --suppress-cc=sob --dry-run *.patch
>>
>> No patch files specified!
>> -
>>
>> Whereas, moving the same patch files to another subdir, and then passing
>> the subdir to git-send-email, works fine.
>>
>> I seem to have found some $^O based perl code in the git tree that
>> expands the glob manually (in place of the shell) on Windows -- however,
>> that code looks ancient ^W very stable, and doesn't seem to explain the
>> sudden breakage.
>>
>> Is it possible that a separate perl update on Jaben's Windows box is
>> causing this? Or does the breakage look consistent with a recent git change?
>>
>> Has anyone else reported something similar recently?
> 
> This reminds me of https://github.com/git-for-windows/git/issues/339.
> There, Johannes Schindelin writes (about a different command):
> 
> | This is expected because neither PowerShell nor cmd.exe nor git.exe
> | expand wildcards. Those examples you found were written with a shell
> | in mind, and the shell expands wildcards (hence Git does not think
> | it needs to).
> 
> That may or may not also apply to send-email.

Thank you for the reference -- I can't say whether closing issue #339 as
WONTFIX was justified or not, but it certainly seems inconsistent with
Jaben's earlier experience (to my understanding), i.e. that git did
expand the glob.

> In what version did it work?

Jaben, can you please answer that? (One version in which it is broken is
2.14.1.windows.1.) Can you perhaps ask your teammates about their
git/windows versions (assuming the *.patch glob is expanded correctly
for them)?

Thank you, Jonathan,
Laszlo


Re: [PATCH] Support long format for log-based submodule diff

2018-03-09 Thread Stefan Beller
On Wed, Mar 7, 2018 at 1:11 PM, Robert Dailey  wrote:
> I am experimenting with a version of submodule diff (using log style)
> that prints the commits brought in from merges, while excluding the
> merge commits themselves. This is useful in cases where a merge commit's
> summary does not fully explain the changes being merged (for example,
> for longer-lived branches).
>
> I could have gone through the effort to make this more configurable, but
> before doing that level of work I wanted to get some discussion going to
> understand first if this is a useful change and second how it should be
> configured. For example, we could allow:
>
> $ git diff --submodule=long-log
>
> Or a supplementary option such as:
>
> $ git diff --submodule=log --submodule-log-detail=(long|short)
>
> I'm not sure what makes sense here. I welcome thoughts/discussion and
> will provide follow-up patches.

The case of merges is usually configured with --[no-]merges, or
--min-parents=.

I would think we would want to have different settings per repository,
i.e. these settings would only apply to the superproject, however
we could keep the same names for submodules, such that we could do

git log --min-parents=0 --submodules=--no-merges

We started an effort to have a repository object handle in most functions
some time ago, but the option parsing for the revision walking doesn't
take a repository yet, otherwise the generic revision parsing for submodules
would be easy to implement.

Thoughts on this generic approach?
Stefan

>
> Signed-off-by: Robert Dailey 
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index 2967704317..a0a62ad7bd 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -428,7 +428,8 @@ static int prepare_submodule_summary(struct rev_info 
> *rev, const char *path,
> init_revisions(rev, NULL);
> setup_revisions(0, NULL, rev, NULL);
> rev->left_right = 1;
> -   rev->first_parent_only = 1;
> +   rev->max_parents = 1;
> +   rev->first_parent_only = 0;
> left->object.flags |= SYMMETRIC_LEFT;
> add_pending_object(rev, >object, path);
> add_pending_object(rev, >object, path);
> --
> 2.13.1.windows.2
>