change the filetype from binary to text after the file is commited to a git repo

2017-07-23 Thread tonka tonka
Hey everybody,

I have a problem with an already committed file into my repo. This git
repo was converted from svn to git some years ago. Last week I have
change some lines in a file and I saw in the diff that it is marked as
binary (it's a simple .cpp file). I think on the first commit it was
detected as an utf-16 file (on windows). But no matter what I do I
can't get it back to a "normal text" text file (git does not detect
that), but I is now only utf-8. I also replace the whole content of
the file with just 'a' and git say it's binary.


Is the only way to get it back to text-mode?:
* copy a utf-8 version of the original file
* delete the file
* make a commit
* add the old file as a new one

I think that will work but it will also break my history.

Is there a better way to get these behavior without losing history?

Best regards
Tonka


Re: Should I store large text files on Git LFS?

2017-07-23 Thread Andrew Ardill
Hi Farshid,

On 24 July 2017 at 13:45, Farshid Zavareh  wrote:
> I'll probably test this myself, but would modifying and committing a 4GB
> text file actually add 4GB to the repository's size? I anticipate that it
> won't, since Git keeps track of the changes only, instead of storing a copy
> of the whole file (whereas this is not the case with binary files, hence the
> need for LFS).

I decided to do a little test myself. I add three versions of the same
data set (sometimes slightly different cuts of the parent data set,
which I don't have) each between 2 and 4GB in size.
Each time I added a new version it added ~500MB to the repository, and
operations on the repository took 35-45 seconds to complete.
Running `git gc` compressed the objects fairly well, saving ~400MB of
space. I would imagine that even more space would be saved
(proportionally) if there were a lot more similar files in the repo.
The time to checkout different commits didn't change much, I presume
that most of the time is spent copying the large file into the working
directory, but I didn't test that. I did test adding some other small
files, and sometimes it was slow (when cold I think?) and other times
fast.

Overall, I think as long as the files change rarely, and the
repository remains responsive, having these large files in the
repository is ok. They're still big, and if most people will never use
them it will be annoying for people to clone and checkout updated
versions of the files. If you have a lot of the files, or they update
often, or most people don't need all the files, using something like
LFS will help a lot.

$ git version  # running on my windows machine at work
git version 2.6.3.windows.1

$ git init git-csv-test && cd git-csv-test
$ du -h --max-depth=2  # including here to compare after large data
files are added
35K ./.git/hooks
1.0K./.git/info
0   ./.git/objects
0   ./.git/refs
43K ./.git
43K .

$ git add data.txt  # first version of the data file, 3.2 GB
$ git commit
$ du -h --max-depth=2  # the data gets compressed down to ~580M of
objects in the git store
35K ./.git/hooks
1.0K./.git/info
2.0K./.git/logs
580M./.git/objects
1.0K./.git/refs
581M./.git
3.7G.


$ git add data.txt  # second version of the data file, 3.6 GB
$ git commit
$ du -h --max-depth=1  # an extra ~520M of objects added
1.2G./.git
4.7G.


$ time git add data.txt  # 42.344s - second version of the data file, 2.2 GB
$ git commit  # takes about 30 seconds to load editor
$ du -h --max-depth=1
1.7G./.git
3.9G.

$ time git checkout HEAD^  # 36.509s
$ time git checkout HEAD^  # 44.658s
$ time git checkout master  # 38.267s

$ git gc
$ du -h --max-depth=1
1.3G./.git
3.4G.

$ time git checkout HEAD^  # 34.743s
$ time git checkout HEAD^  # 41.226s

Regards,

Andrew Ardill


Re: Should I store large text files on Git LFS?

2017-07-23 Thread David Lang

On Mon, 24 Jul 2017, Farshid Zavareh wrote:

I'll probably test this myself, but would modifying and committing a 4GB text 
file actually add 4GB to the repository's size? I anticipate that it won't, 
since Git keeps track of the changes only, instead of storing a copy of the 
whole file (whereas this is not the case with binary files, hence the need for 
LFS).


well, it wouldn't be 4G because text compresses well, but if the file changes 
drastically from version to version (say a quarterly report), the diff won't 
help.


David Lang


Re: Should I store large text files on Git LFS?

2017-07-23 Thread David Lang

On Mon, 24 Jul 2017, Farshid Zavareh wrote:


I see your point. So I guess it really comes down to how the file is
anticipated to change. If only one or two line are going to change every
now and then, then LFS is not really necessary. But, as you mentioned, text
files that change drastically will affect the repository in the same way
that binaries do.


Not quite the same way that binaries do, because text files compress well. but 
close.


David Lang


Re: Should I store large text files on Git LFS?

2017-07-23 Thread Farshid Zavareh
I see your point. So I guess it really comes down to how the file is 
anticipated to change. If only one or two line are going to change every now 
and then, then LFS is not really necessary. But, as you mentioned, text files 
that change drastically will affect the repository in the same way that 
binaries do.


> On 24 Jul 2017, at 2:13 pm, David Lang  wrote:
> 
> On Mon, 24 Jul 2017, Farshid Zavareh wrote:
> 
>> I'll probably test this myself, but would modifying and committing a 4GB 
>> text file actually add 4GB to the repository's size? I anticipate that it 
>> won't, since Git keeps track of the changes only, instead of storing a copy 
>> of the whole file (whereas this is not the case with binary files, hence the 
>> need for LFS).
> 
> well, it wouldn't be 4G because text compresses well, but if the file changes 
> drastically from version to version (say a quarterly report), the diff won't 
> help.
> 
> David Lang



Re: git gc seems to break --symbolic-full-name

2017-07-23 Thread Jacob Keller
On Sun, Jul 23, 2017 at 12:23 PM, Stas Sergeev  wrote:
> 23.07.2017 11:40, Jacob Keller пишет:
>>
>> On Fri, Jul 21, 2017 at 12:03 PM, Stas Sergeev  wrote:
>>>
>>> I wanted some kind of file to use it as a
>>> build dependency for the files that needs
>>> to be re-built when the head changes.
>>> This works very well besides git gc.
>>> What other method can be used as simply
>>> as that? git show-ref does not seem to be
>>> giving this.
>>
>> There's no real way to do this, and even prior to 2007 when the file
>> always existed, there's no guarantee it's modification time is valid.
>>
>> I'd suggest you have a phony rule which you always run, that checks
>> the ref, and sees if it's different from "last time" and then updates
>> a different file if that's the case. Then the build can depend on the
>> generated file, and you'd be able to figure it out.
>
> OK, thanks, that looks quite simple too.
> I will have to create the file by hands that
> I expected git to already have, but it appears
> not.
>
>> What's the real goal for depending on when the ref changes?
>
> So that when users fill in the bug report, I can
> see at what revision have the bug happened. :)
> While seemingly "just a debugging sugar", the
> hard experience shows this to be exceptionally
> useful.
> I think even linux kernel does something like
> this, and solves that task the hard way. For
> example I can see a script at scripts/setlocalversion
> whose output seems to go to
> include/config/kernel.release and a lot of
> logic in the toplevel makefile about this.
> So not liking the fact that every project solves
> this differently, I was trying to get the solution
> directly from git. But I'll try otherwise.

generally, I'd suggest using "git describe" to output a version based
on tag, and as part of your build system set that in some sort of
--version output of some kind.

Thanks,
Jake


Re: Should I store large text files on Git LFS?

2017-07-23 Thread Farshid Zavareh
Hi Andrew.

Thanks for your reply.

I'll probably test this myself, but would modifying and committing a 4GB text 
file actually add 4GB to the repository's size? I anticipate that it won't, 
since Git keeps track of the changes only, instead of storing a copy of the 
whole file (whereas this is not the case with binary files, hence the need for 
LFS).

Kind regards,
Farshid

> On 24 Jul 2017, at 12:29 pm, Andrew Ardill  wrote:
> 
> Hi Farshid,
> 
> On 24 July 2017 at 12:01, Farshid Zavareh  wrote:
>> I'v been handed over a project that uses Git LFS for storing large CSV files.
>> 
>> My understanding is that the main benefit of using Git LFS is to keep the 
>> repository small for binary files, where Git can't keep track of the changes 
>> and ends up storing whole files for each revision. For a text file, that 
>> problem does not exist to begin with and Git can store only the changes. At 
>> the same time, this is going to make checkouts unnecessarily slow, not to 
>> mention the financial cost of storing the whole file for each revision.
>> 
>> Is there something I'm missing here?
> 
> Git LFS gives benefits when working on *large* files, not just large
> *binary* files.
> 
> I can imagine a few reasons for using LFS for some CSV files
> (especially the kinds of files I deal with sometimes!).
> 
> The main one is that many users don't need or want to download the
> large files, or all versions of the large file. Moreover, you probably
> don't care about changes between those files, or there would be so
> many that using the git machinery for comparing them would be
> cumbersome and ineffective.
> 
> For me, if I was storing any CSV file over a couple of hundred
> megabyte I would consider using something like LFS. An example would
> be a large Dunn & Bradstreet data file, which I do an analysis on
> every quarter. I want to include the file in the repository, so that
> the analysis can be replicated later on, but I don't want to add 4GB
> of data to the repo every single time the dataset gets updated (also
> every quarter). Storing that in LFS would be a good solution then.
> 
> Regards,
> 
> Andrew Ardill



Re: Should I store large text files on Git LFS?

2017-07-23 Thread Andrew Ardill
Hi Farshid,

On 24 July 2017 at 12:01, Farshid Zavareh  wrote:
> I'v been handed over a project that uses Git LFS for storing large CSV files.
>
> My understanding is that the main benefit of using Git LFS is to keep the 
> repository small for binary files, where Git can't keep track of the changes 
> and ends up storing whole files for each revision. For a text file, that 
> problem does not exist to begin with and Git can store only the changes. At 
> the same time, this is going to make checkouts unnecessarily slow, not to 
> mention the financial cost of storing the whole file for each revision.
>
> Is there something I'm missing here?

Git LFS gives benefits when working on *large* files, not just large
*binary* files.

I can imagine a few reasons for using LFS for some CSV files
(especially the kinds of files I deal with sometimes!).

The main one is that many users don't need or want to download the
large files, or all versions of the large file. Moreover, you probably
don't care about changes between those files, or there would be so
many that using the git machinery for comparing them would be
cumbersome and ineffective.

For me, if I was storing any CSV file over a couple of hundred
megabyte I would consider using something like LFS. An example would
be a large Dunn & Bradstreet data file, which I do an analysis on
every quarter. I want to include the file in the repository, so that
the analysis can be replicated later on, but I don't want to add 4GB
of data to the repo every single time the dataset gets updated (also
every quarter). Storing that in LFS would be a good solution then.

Regards,

Andrew Ardill


Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-23 Thread Jiang Xin
2017-07-23 10:33 GMT+08:00 Jean-Noël AVILA :
> Plus, I hope that some day, instead of translators finding afterwards
> that a change broke i18n capabilities, developpers would have some kind
> of sanity check. Requiring special versions of i18n tooling stops this hope.
>

It would be fun to create some tools to help l10n guys finding l10n
changes on every git commit.


-- 
Jiang Xin


Should I store large text files on Git LFS?

2017-07-23 Thread Farshid Zavareh
Hey all. 

I'v been handed over a project that uses Git LFS for storing large CSV files.

My understanding is that the main benefit of using Git LFS is to keep the 
repository small for binary files, where Git can't keep track of the changes 
and ends up storing whole files for each revision. For a text file, that 
problem does not exist to begin with and Git can store only the changes. At the 
same time, this is going to make checkouts unnecessarily slow, not to mention 
the financial cost of storing the whole file for each revision.

Is there something I'm missing here?

Thanks

Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-23 Thread Jiang Xin
2017-07-22 23:48 GMT+08:00 Junio C Hamano :
> Johannes Schindelin  writes:
>
>>> >> A very small hack on gettext.
>>
>> I am 100% opposed to this hack. It is already cumbersome enough to find
>> out what is involved in i18n (it took *me* five minutes to find out that
>> much of the information is in po/README, with a lot of information stored
>> *on an external site*, and I still managed to miss the `make pot` target).
>>
>> If at all, we need to make things easier instead of harder.
>>
>> Requiring potential volunteers to waste their time to compile an
>> unnecessary fork of gettext? Not so great an idea.
>>
>> Plus, each and every Git build would now have to compile their own
>> gettext, too, as the vanilla one would not handle the .po files containing
>> %!!!
>>
>> And that requirement would impact instantaneously people like me, and even
>> worse: some other packagers might be unaware of the new requirement which
>> would not be caught during the build, and neither by the test suite.
>> Double bad idea.
>
> If I understand correctly, the patch hacks the input processing of
> xgettext (which reads our source code and generates po/git.pot) so
> that when it sees PRItime, pretend that it saw PRIuMAX, causing it
> to output % in its output.
>
> In our workflow,
>
> * The po/git.pot file is updated only by the l10n coordinator,
>   and then the result is committed to our tree.
>
> * Translators build on that commit by (1) running msgmerge which
>   takes po/git.pot and wiggles its entries into their existing
>   po/$lang.po file so that po/$lang.po file has new entries from
>   po/git.pot and (2) editing po/$lang.po file.  The result is
>   committed to our tree.
>
> * The build procedure builders use runs the resulting
>   po/$lang.po files through msgfmt to produce po/$lang.mo files,
>   which will be installed.
>
> As long as the first step results in % (not % or
> anything that plain vanilla msgmerge and msgfmt do not understand),
> the second step and third step do not require any hacked version of
> gettext tools.
>
> Even though I tend to agree with your conclusion that pre-processing
> our source before passing it to xgettext is probably a better
> solution in the longer term, I think the most of the objections in
> your message come from your misunderstanding of what Jiang's patch
> does and are not based on facts.  My understanding is that
> translators do not need to compile a custom msgmerge and builders do
> not need a custom msgfmt.
>

I appreciate Junio's explanation. I totally agree.

-- 
Jiang Xin


Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-23 Thread Jiang Xin
2017-07-22 19:28 GMT+08:00 Johannes Schindelin :
> Hi,
>
> On Sat, 22 Jul 2017, Jiang Xin wrote:
>
>> 2017-07-22 7:34 GMT+08:00 Junio C Hamano :
>> > Jiang Xin  writes:
>> >
>> >> A very small hack on gettext.
>
> I am 100% opposed to this hack.

It's really very small, see:

*  https://github.com/jiangxin/gettext/commit/b0a72643
*  
https://public-inbox.org/git/a87e7252bf9de8a87e5dc7712946f72459778d6c.1500684532.git.worldhello@gmail.com/

> It is already cumbersome enough to find
> out what is involved in i18n (it took *me* five minutes to find out that
> much of the information is in po/README, with a lot of information stored
> *on an external site*, and I still managed to miss the `make pot` target).
>
> If at all, we need to make things easier instead of harder.

If it is only the l10n coordinate's duty to generate po/git.pot, the
tweak is OK.  But if other guys need to recreate po/git.pot, it's
hard, especially for guys working on Mac or Windows.

>
> Requiring potential volunteers to waste their time to compile an
> unnecessary fork of gettext? Not so great an idea.
>
> Plus, each and every Git build would now have to compile their own
> gettext, too, as the vanilla one would not handle the .po files containing
> %!!!

No, only l10n coordinator and potential po/git.pot generator are involved.

>
> So let's go with Junio's patch.

I agree.  We just go with the sed-then-cleanup version until we meet
ambiguities (I mean some words other than PRItime need to be
replaced).

-- 
Jiang Xin


Re: reftable [v2]: new ref storage format

2017-07-23 Thread Ævar Arnfjörð Bjarmason

On Sun, Jul 23 2017, Shawn Pearce jotted:

> My apologies for not responding to this piece of feedback earlier.
>
> On Wed, Jul 19, 2017 at 7:02 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> On Tue, Jul 18 2017, Shawn Pearce jotted:
>>> On Mon, Jul 17, 2017 at 12:51 PM, Junio C Hamano  wrote:
 Shawn Pearce  writes:
> where `time_sec` is the update time in seconds since the epoch.  The
> `reverse_int32` function inverses the value so lexographical ordering
> the network byte order time sorts more recent records first:
>
> reverse_int(int32 t) {
>   return 0x - t;
> }

 Is 2038 an issue, or by that time we'd all be retired together with
 this file format and it won't be our problem?
>>>
>>> Based on discussion with Michael Haggerty, this is now an 8 byte field
>>> storing microseconds since the epoch. We should be good through year
>>> .
>>
>> I think this should be s/microseconds/nanoseconds/, not because there's
>> some great need to get better resolution than nanoseconds, but because:
>>
>>  a) We already have WIP code (bp/fsmonitor) that's storing 64 bit
>> nanoseconds since the epoch, albeit for the index, not for refs.
>>
>>  b) There are several filesystems that have nanosecond resolution now,
>> and it's likely more will start using that.
>
> The time in a reflog and the time returned by lstat(2) to detect dirty
> files in the working tree are unrelated. Of course we want the
> dircache to be reflecting the highest precision available from lstat,
> to reduce the number of files that must be content hashed for racily
> clean detection. So if a filesystem is using nanoseconds, dircache
> maybe should support it.
>
>> Thus:
>>
>>  x) If you use such a filesystem you'll lose time resolution with this
>> ref backend v.s. storing them on disk, which isn't itself a big
>> deal, but more importantly you lose 1=1 time mapping as you
>> transition and convert between the two.
>
> No, you won't. The reflog today ($GIT_DIR/logs) is storing second
> precision in the log record. What precision the filesystem is using as
> an mtime is irrelevant.

To this & the point above: Sorry about being unclear, I'm talking about
the mtime on the modified loose ref. This format proposes to replace
both loose & packed refs, does it not? The reflog time is not the only
place were we store the mtime of a ref. On my local ext4:

$ tail -n 1 .git/logs/refs/heads/master
  Ævar Arnfjörð Bjarmason  1500852355 +0200   
commit: test
$ perl -wE 'say ~~localtime shift' 1500852355
Mon Jul 24 01:25:55 2017
$ stat -c %y .git/logs/refs/heads/master
2017-07-24 01:25:55.531379799 +0200

Of course you lose this information as soon as you "git pack-refs", but
it's there now & implicitly part of our current FS-backed on-disk
format.

So what I meant by "x" is that if to test this new reftable backend you
write a "git pack-reftable" you won't be able to 1=1 map it to the
mtimes you have on the fs showing when the ref was updated, but I see
now that you were perhaps never intending to use the more accurate FS
time at all for the loose refs, but just use the second resolution
reflog data.

> Further, microsecond is sufficient resolution for reflog data. From my
> benchmarking just reading a reference from a very hot reftable costs
> ~20.2 usec. Any update of a reference requires a read-compare-modify
> cycle, and so updates aren't going to be more frequent than 20 usec.

Right, I'm not arguing that it isn't sufficient, just that it's
introducing a needless variation by adding a third timestamp resolution
to git.

Even if it's not the same logical area in git (dir management v.s. ref
management) code to e.g. pretty format timestamps of sec/usec/nsec
resolution would tend to get shared, so we'd end up with 3 variants of
those instead of 2.

That's of course trivial, but so would be just deciding that ~500 years
of future proofing is good enough without any extra storage size for
those 64 bits and doing away with 1/3.

Just standardizing that makes more sense than picking the exact right
time resolution for every use case IMO. Otherwise we'll come up with
some other thingy in the future that just needs e.g. millisecond in its
format, and then end up with 4 variants

I also see from "Update transactions" that unlike the current loose
backend the reftable backend wouldn't support multiple writers on
multiple machines (think NFS-mounted git master) updating unrelated
refs, which would break this usec assumption (but which holds due to the
locking involved in the new backend).

>>  y) Our own code will need to juggle second resolution epochs
>> (traditional FSs, any 32bit epoch format), microseconds (this
>> proposal), and nanoseconds (new FSs, bp/fsmonitor) internally in
>> various places.
>
> But these are also unrelated areas. IMHO, the nanosecond stuff 

Re: reftable: new ref storage format

2017-07-23 Thread Shawn Pearce
On Sun, Jul 23, 2017 at 3:56 PM, Shawn Pearce  wrote:
> On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty  
> wrote:
>> On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce  wrote:
>>> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty  
>>> wrote:
>
>> * What would you think about being extravagant and making the
>> value_type a full byte? It would make the format a tiny bit easier to
>> work with, and would leave room for future enhancements (e.g.,
>> pseudorefs, peeled symrefs, support for the successors of SHA-1s)
>> without having to change the file format dramatically.
>
> I reran my 866k file with full byte value_type. It pushes up the
> average bytes per ref from 33 to 34, but the overall file size is
> still 28M (with 64 block size). I think its reasonable to expand this
> to the full byte as you suggest.

FYI, I went back on this in the v3 draft I posted on Jul 22 in
https://public-inbox.org/git/CAJo=hJvxWg2J-yRiCK3szux=eym2thjt0kwo-sffooc1rkx...@mail.gmail.com/

I expanded value_type from 2 bits to 3 bits, but kept it as a bit
field in a varint. I just couldn't justify the additional byte per ref
in these large files. The prefix compression works well enough that
many refs are still able to use only a single byte for the
suffix_length << 3 | value_type varint, keeping the average at 33
bytes per ref.

The reftable format uses values 0-3, leaving 4-7 available. I reserved
4 for an arbitrary payload like MERGE_HEAD type files.


Re: reftable: new ref storage format

2017-07-23 Thread Shawn Pearce
+git@vger.kernel.org. I originally sent the below reply privately by mistake.

On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty  wrote:
> On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce  wrote:
>> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty  
>> wrote:
>
> On second thought, the idea of having HEAD (or maybe all pseudorefs)
> in the same system would open a few interesting possibilities that
> derive from having a global, atomic view of all references:
>
> 1. We could store backlinks from references to the symbolic references
> that refer to them. This would allow us to update the reflogs for
> symbolic refs properly. (Currently, there is special-case code to
> update the reflogs for HEAD when the reference that it points at is
> modified, but not for other symrefs.)

This is a good idea, but makes for some difficult transition code. We
have to keep the special case for HEAD, but other symrefs would log
when in a reftable.

> 2. We could store "peeled" versions of symbolic refs. These would have
> to be updated whenever the pointed-at reference is updated, but that
> would have two nice advantages: HEAD would usually be resolvable based
> on the top reftable in the stack, and it would be resolvable in one
> step (without having the follow the symref explicitly).

Great observation. I wish I saw that sooner. Its a pain in the neck to
resolve symrefs, and has caused us a few bugs in JGit on our current
non-standard storage. It depends on the back pointer being present and
accurate to ensure an update of master also updates the cached HEAD.

I'll have to mull on these a bit. I'm not folding them into my
documentation and implementation just yet.


[...]
> I'm still not quite resigned to non-Google users wanting to use blocks
> as large as 64k, but (short of doing actual experiments, yuck!) I
> can't estimate whether it would make any detectable difference in the
> real world.

I think it is only likely to matter with NFS, and then its a balancing
act of how much of that block did you need vs. not need. :)

> On the other end of the spectrum, I might mention that the
> shared-storage "network.git" repositories that we use at GitHub often
> have a colossal number of references (basically, the sum of the number
> of references in all of the forks in a "repository network", including
> some hidden references that users don't see). For example, one
> "network.git" repository has 56M references(!) Mercifully, we
> currently only have to access these repositories for batch jobs, but,
> given a better reference storage backend, that might change.

A larger block size right now has the advantage of a smaller index,
which could make a single ref lookup more efficient. Otherwise, the
block size doesn't have a big impact on streaming through many
references.


>>> 2. The stacking of multiple reftable files together
[...]
>> At $DAY_JOB we can do this successfully with pack files, which are
>> larger and more costly to combine than reftable. I think we can get
>> reftable to do support a reasonable stack depth.
>
> Are you saying that you merge subsets of packfiles without merging all
> of them? Does this work together with bitmaps, or do you only have
> bitmaps for the biggest packfile?
>
> We've thought about merging packfiles in that way, but don't want to
> give up the benefits of bitmaps.

Yes. We compact smaller pack files together into a larger pack file,
and try to keep a repository at:

 - 2 compacted packs, each <20 MiB
 - 1 base pack + bitmap

We issue a daily GC for any repository that isn't just 1 base pack.
But during a business day this compacting process lets us handle most
read traffic quite well, despite the bitmaps being incomplete.


>>> I haven't reviewed your proposal for storing reflogs in reftables in
[...]
>
> Those sizes don't sound that scary. Do your reflogs include
> significant information in the log messages, or are they all "push"
> "push" "push"? We record quite a bit of information in our audit_log
> entries (our equivalent of reflogs), so I would expect ours to
> compress much less well.

These were pretty sparse in the comment field, and frequent reuse of a
message. So it may not be representative of what you are storing.

> We also tend to use our audit_logs to see what was happening in a
> repository; e.g., around the time that a problem occurred. So for us
> it is useful that the entries are in chronological order across
> references, as opposed to having the entries for each reference
> grouped together. We might be the oddballs here though, and in fact it
> is possible that this would be an argument for us to stick to our
> audit_log scheme rather than use reflogs stored in reftables.

I think of reflogs about a single ref, not the whole repository. So
I'm inclined to say the reftable storage of them should be by ref,
then time. Anyone who wants a repository view must either scan the
entire log segment, or 

Re: Remove help advice text from git editors for interactive rebase and reword

2017-07-23 Thread Kirill Likhodedov

> On 24 Jul 2017, at 01:09 , Junio C Hamano  wrote:
> 
> Who is running "git commit --amend" and "git rebase -i" in the
> workflow of a user of your tool?  Is it the end user who types these
> commands to the shell command prompt, or does your tool formulate
> the command line and does an equivalent of system(3) to run it?
> 
> I am assuming that the answer is the latter in my response.

Yes, it is the latter case: the tool formulates the command line and forks a 
process.

> Not at all interested, as that would mean your tool will tell its
> users to set such a configuration variable and their interactive use
> of Git outside your tool will behave differently from other people
> who use vanilla Git, and they will complain to us.

That's not true, since the tool can (and would) use the `git -c 
config.var=value rebase -i` syntax to set the configuration variable just for 
this particular command, without affecting the environment.

Btw, if my proposal is so uninteresting, why the existing advice.* variables 
were previously introduced? I don't know the motivation, but assume that it was 
about making Git less wordy for experienced users. So I don't see any 
difference here.

> But stepping back a bit, as you said in the parentheses, your tool
> would need to grab these "hints" from Git, instead of having a
> separate hardcoded hints that will go stale while the underlying Git
> command improves, to be able to show them "separately".  

There is no need to call Git to get these "hints". They are quite obvious, 
well-known and can be hardcoded. However, I don't plan to use these hints 
anyway, since they are a bit foreign to the GUI of the tool I develop. For 
instance, for reword I'd like to show an editor containing just the plain 
commit message that the user is about to change. 


smime.p7s
Description: S/MIME cryptographic signature


Re: Bug^Feature? fetch protects only current working tree branch

2017-07-23 Thread Junio C Hamano
Andreas Heiduk  writes:

> A `git fetch . origin/master:master` protects the currently checked out 
> branch (HEAD) unless the `-u/--update-head-ok` is supplied. This avoids a
> mismatch between the index and HEAD. BUT branches which are HEADs in other
> working trees do not get that care - their state is silently screwed up.
>
> Is this intended behaviour or and just an oversight while implementing
> `git worktree`?

The latter.  "git worktree" is an interesting feature and has
potential to become useful in wider variety of workflows than it
currently is, but end users should consider it still experimental as
it still is with many such small rough edges like this one.

Patches to help improving the feature is of course very welcome.


Re: Remove help advice text from git editors for interactive rebase and reword

2017-07-23 Thread Junio C Hamano
Kirill Likhodedov  writes:

> My motivation is the following: I'm improving the Git client
> inside of IntelliJ IDEA IDE and I would like to provide only the
> plain commit message text to the user (any hints can be shown
> separately, not inside the editor).

Who is running "git commit --amend" and "git rebase -i" in the
workflow of a user of your tool?  Is it the end user who types these
commands to the shell command prompt, or does your tool formulate
the command line and does an equivalent of system(3) to run it?

I am assuming that the answer is the latter in my response.

> If there is no way to do it now, do you think it makes sense to
> provide a configuration variable for this, e.g. to introduce more
> advice.* config variables in addition to existing ones?

Not at all interested, as that would mean your tool will tell its
users to set such a configuration variable and their interactive use
of Git outside your tool will behave differently from other people
who use vanilla Git, and they will complain to us.

But I do not think adding a new command line option that only is
passed by a tool like yours when it runs "git rebase -i" via
system(3) equivalent would introduce such an issue, so that may be
workable.

But stepping back a bit, as you said in the parentheses, your tool
would need to grab these "hints" from Git, instead of having a
separate hardcoded hints that will go stale while the underlying Git
command improves, to be able to show them "separately".  Which means
to me that you would need to get the output Git would normally show
to the end user and do your own splitting and parsing anyway.  Which
in turn would mean that a configuration or a command line option to
squelch these, which would rob your tool the ability to read what
Git would have told to your users, would be a bad idea and not a
useful addition to the overall system.  So...




Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-23 Thread Junio C Hamano
Jean-Noël AVILA  writes:

> Le 22/07/2017 à 02:43, Jiang Xin a écrit :
>>
>> Benefit of using the tweak version of gettext:
>>
>> 1. `make pot` can be run in a tar extract directory (without git controlled).
>
> This issue is real for packet maintainers who can patch the original
> source and run their own set of utilities outside of a git repo. This
> can be possible with Junio's proposition by writing the files to a
> temporary directory before running the xgettext, then removing the
> temporary directory.
>
> Please note that with respect to this issue, the patched xgettext
> approach is completely disruptive.

OK, so what you are saying is that my assumption that Jiang (at
least for now, and his successor l10n coordinator in sometime in the
future) would be the only one who needs to have access to the
machinery to update po/git.pot and that it does not matter that much
what that exact machinery is as long as the resulting po/git.pot
lists messages with % and other known ones because plain
vanilla tools will grok such po/git.pot file just fine, were both
too optimistic.

I think binary packagers, who update the software with their own
changes, produce their own modified po/git.pot and have that
translated into multiple languages, are capable of coping with any
method we use ourselves, but being capable of doing something and
being happy to do that thing are two different things, and we need
to aim for the latter---we should not make things unnecessarily
cumbersome for them.

So I'll leave the s/PRItime/PRIuMAX/ patch in the 'master' without
Jiang's change for 2.14-rc1.  The approach to require private
edition of xgettext, while it may technically be a fun exercise,
would not fly very well in the real world.

For those who want to work with a tarball extract without being in a
Git repository, it would be sufficient fot them to run "git init &&
git commit --allow-empty -m import" immediately after extracting the
tarball, even if we require that "make pot" must be run in a clean
repository.  And I'd prefer to go that route than copying into a
temporary directory, primarily because I do not want to having to
worry about what to copy---when we know we pass $foo.c through
xgettext, we know we want to put the modified copy of $foo.c in the
temporary, but I do not want to even think if we need to also copy
the header files $foo.c "#include"s, for example.

Thanks.


Re: reftable [v2]: new ref storage format

2017-07-23 Thread Shawn Pearce
My apologies for not responding to this piece of feedback earlier.

On Wed, Jul 19, 2017 at 7:02 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Tue, Jul 18 2017, Shawn Pearce jotted:
>> On Mon, Jul 17, 2017 at 12:51 PM, Junio C Hamano  wrote:
>>> Shawn Pearce  writes:
 where `time_sec` is the update time in seconds since the epoch.  The
 `reverse_int32` function inverses the value so lexographical ordering
 the network byte order time sorts more recent records first:

 reverse_int(int32 t) {
   return 0x - t;
 }
>>>
>>> Is 2038 an issue, or by that time we'd all be retired together with
>>> this file format and it won't be our problem?
>>
>> Based on discussion with Michael Haggerty, this is now an 8 byte field
>> storing microseconds since the epoch. We should be good through year
>> .
>
> I think this should be s/microseconds/nanoseconds/, not because there's
> some great need to get better resolution than nanoseconds, but because:
>
>  a) We already have WIP code (bp/fsmonitor) that's storing 64 bit
> nanoseconds since the epoch, albeit for the index, not for refs.
>
>  b) There are several filesystems that have nanosecond resolution now,
> and it's likely more will start using that.

The time in a reflog and the time returned by lstat(2) to detect dirty
files in the working tree are unrelated. Of course we want the
dircache to be reflecting the highest precision available from lstat,
to reduce the number of files that must be content hashed for racily
clean detection. So if a filesystem is using nanoseconds, dircache
maybe should support it.

> Thus:
>
>  x) If you use such a filesystem you'll lose time resolution with this
> ref backend v.s. storing them on disk, which isn't itself a big
> deal, but more importantly you lose 1=1 time mapping as you
> transition and convert between the two.

No, you won't. The reflog today ($GIT_DIR/logs) is storing second
precision in the log record. What precision the filesystem is using as
an mtime is irrelevant.

Further, microsecond is sufficient resolution for reflog data. From my
benchmarking just reading a reference from a very hot reftable costs
~20.2 usec. Any update of a reference requires a read-compare-modify
cycle, and so updates aren't going to be more frequent than 20 usec.

>  y) Our own code will need to juggle second resolution epochs
> (traditional FSs, any 32bit epoch format), microseconds (this
> proposal), and nanoseconds (new FSs, bp/fsmonitor) internally in
> various places.

But these are also unrelated areas. IMHO, the nanosecond stuff should
be confined to the dircache management code and working tree
comparison code, and not be leaking out of there. Commit objects are
still recorded with second precision, and that isn't going to change.

Therefore I decided to stick with microseconds in the reftable v3
draft that I posted on July 22nd.


Re: reftable [v3]: new ref storage format

2017-07-23 Thread Ævar Arnfjörð Bjarmason
On Sat, Jul 22, 2017 at 8:29 PM, Shawn Pearce  wrote:
> 3rd iteration of the reftable storage format.
>
> You can read a rendered version of this here:
> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>
> Significant changes from v2:
> - efficient lookup by SHA-1 for allow-tip-sha1-in-want.
> - type 0x4 for FETCH_HEAD, MERGE_HEAD.
> - file size up (27.7 M in v1, 34.4 M in v3)

I had some feedback on v2 here which still applies:
https://public-inbox.org/git/87k234tti7@gmail.com/

It would be good to either get a reply to that, or if you don't think
it's sensible for whatever reason and left it out of v3 have a
"feedback received but discarded for " in these summaries as
you're sending new versions.

Aside from the mail I sent I think that would be very useful in
general if there's been any other such feedback (I honestly don't know
if there has, I haven't been following this actively).


Re: git gc seems to break --symbolic-full-name

2017-07-23 Thread Stas Sergeev

23.07.2017 11:40, Jacob Keller пишет:

On Fri, Jul 21, 2017 at 12:03 PM, Stas Sergeev  wrote:

I wanted some kind of file to use it as a
build dependency for the files that needs
to be re-built when the head changes.
This works very well besides git gc.
What other method can be used as simply
as that? git show-ref does not seem to be
giving this.

There's no real way to do this, and even prior to 2007 when the file
always existed, there's no guarantee it's modification time is valid.

I'd suggest you have a phony rule which you always run, that checks
the ref, and sees if it's different from "last time" and then updates
a different file if that's the case. Then the build can depend on the
generated file, and you'd be able to figure it out.

OK, thanks, that looks quite simple too.
I will have to create the file by hands that
I expected git to already have, but it appears
not.


What's the real goal for depending on when the ref changes?

So that when users fill in the bug report, I can
see at what revision have the bug happened. :)
While seemingly "just a debugging sugar", the
hard experience shows this to be exceptionally
useful.
I think even linux kernel does something like
this, and solves that task the hard way. For
example I can see a script at scripts/setlocalversion
whose output seems to go to
include/config/kernel.release and a lot of
logic in the toplevel makefile about this.
So not liking the fact that every project solves
this differently, I was trying to get the solution
directly from git. But I'll try otherwise.


Re: [PATCH v2 00/10] tag: only respect `pager.tag` in list-mode

2017-07-23 Thread Martin Ågren
On 21 July 2017 at 00:27, Junio C Hamano  wrote:
> I tend to agree with you that 1-3/10 may be better off being a
> single patch (or 3/10 dropped, as Brandon is working on losing it
> nearby).  I would have expected 7-8/10 to be a single patch, as by
> the time a reader reaches 07/10, because of the groundwork laid by
> 04-06/10, it is obvious that the general direction is to allow the
> caller, i.e. cmd_tag(), to make a call to setup_auto_pager() only in
> some but not all circumstances, and 07/10 being faithful to the
> original behaviour (only to be updated in 08/10) is somewhat counter
> intuitive.  It is not wrong per-se; it was just unexpected.

Thanks for your comments. I will be away for a few days, but once I
get back, I'll try to produce a v3 based on this and any further
feedback.

Martin


Re: Expected behavior of "git check-ignore"...

2017-07-23 Thread Philip Oakley

From: "John Szakmeister" 
Sent: Thursday, July 20, 2017 11:37 AM

A StackOverflow user posted a question about how to reliably check
whether a file would be ignored by "git add" and expected "git
check-ignore" to return results that matched git add's behavior.  It
turns out that it doesn't.  If there is a negation rule, we end up
returning that exclude and printing it and exiting with 0 (there are
some ignored files) even though the file has been marked to not be
ignored.

Is the expected behavior of "git check-ignore" to return 0 even if the
file is not ignore when a negation is present?


I'm testing this on..
$ git --version

git version 2.10.0.windows.1







git init .
echo 'foo/*' > .gitignore
echo '!foo/bar' > .gitignore


Is this missing the >> append to get the full two line .gitignore?
adding in a `cat .gitignore` would help check.



mkdir foo
touch foo/bar
I don't think you need these. It's the given pathnames that are checked, not 
the file system content.



git check-ignore foo/bar


Does this need the `-q` option to set the exit status?

echo $? # to display the status.





I expect the last command to return 1 (no files are ignored), but it
doesn't.  The StackOverflow user had the same expectation, and imagine
others do as well.  OTOH, it looks like the command is really meant to
be a debugging tool--to show me the line in a .gitignore associated
with this file, if there is one.  In which case, the behavior is
correct but the return code description is a bit misleading (0 means
the file is ignored, which isn't true here).


Maybe the logic isn't that clear? Maybe it is simply detecting if any one of 
the ignore lines is active, and doesn't reset the status for a negation?


I appear to get the same response as yourself, but I haven't spent much time 
on it - I'm clearing a backlog of work at the moment.


I also tried the -v -n options, and if I swap the ignore lines around it 
still says line 2 is the one that ignores.
It gets more interesting if two paths are given `foo/bar foo/baz`, to see 
which line picks up which pathname (and with the swapped ignore lines).


Is there a test for this in the test suite?



Thoughts?  It seems like this question was asked before several years
ago but didn't get a response.

Thanks!

-John

PS The SO question is here:
https://stackoverflow.com/questions/45210790/how-to-reliably-check-whether-a-file-is-ignored-by-git


--
Philip 



Bug^Feature? fetch protects only current working tree branch

2017-07-23 Thread Andreas Heiduk
A `git fetch . origin/master:master` protects the currently checked out 
branch (HEAD) unless the `-u/--update-head-ok` is supplied. This avoids a
mismatch between the index and HEAD. BUT branches which are HEADs in other
working trees do not get that care - their state is silently screwed up.

Is this intended behaviour or and just an oversight while implementing
`git worktree`?


Steps to reproduce

# setup

git clone -b master $SOMETHING xtemp
cd xtemp
git reset --hard HEAD~5 # pretend to be back some time
git worktree add ../xtemp-wt1
git worktree add ../xtemp-wt2

# test

git fetch . origin/master:master

fatal: Refusing to fetch into current branch refs/heads/master  of 
non-bare repository
fatal: The remote end hung up unexpectedly

# OK, current working tree is protected, try another one:

git fetch . origin/master:xtemp-wt1

From .
   b4d1278..6e7b60d  origin/master -> xtemp-wt1

cd ../xtemp-wt1
git status

# admire messed up working tree here

# The protection is really "current working tree", not "first/main working 
tree"!

git fetch . origin/master:master

From .
   b4d1278..6e7b60d  origin/master -> master

cd ../xtemp
git status

# now it's messed up here too

# Try with "--update-head-ok" but check first.

cd ../xtemp-wt2

git fetch . origin/master:xtemp-wt2

fatal: Refusing to fetch into current branch refs/heads/xtemp-wt2 of 
non-bare repository
fatal: The remote end hung up unexpectedly

git fetch --update-head-ok . origin/master:xtemp-wt2

From .
   b4d1278..6e7b60d  origin/master -> xtemp-wt2




Re: Remove help advice text from git editors for interactive rebase and reword

2017-07-23 Thread Alexei Lozovsky
On 23 July 2017 at 13:03, Kirill Likhodedov wrote:
> Hello,
>
> is it possible to remove the helping text which appears at the bottom
> of the Git interactive rebase editor (the one with the list of
> instructions)

I believe currently there is not way to do it. The interactive rebase
is implemented in git-rebase--interactive.sh which always makes a call
to append_todo_help to append the help text to the todo list of commits.

> and the one which appears at the bottom of the commit editor (which
> appears on rewording a commit or squashing commits)?

This one too seems to be hardcoded in builtin/commit.c.

> I can parse and strip out the help pages (but it is not very reliable
> since the text may change in future)

I doubt the syntax of the interactive rebase todo list will ever change,
so you can reliably remove all lines that are empty or start with the
$(git config --get core.commentchar) or '#' if that's empty or 'auto'.

However, it's harder with the commit messages during --amend as the
comment character is not really fixed and can be dynamically selected
to not conflict with the characters used in the commit message if the
core.commentchar is set to 'auto'.

> However I suppose that experienced command line users could also
> benefit from such configuration, since this helping text is intended
> only for newbies and is more like a noise for advanced users.

Well, the text is appended to the todo list of commits, so not that it
gets too much in the way of editing the list by humans.


Remove help advice text from git editors for interactive rebase and reword

2017-07-23 Thread Kirill Likhodedov
Hello,

is it possible to remove the helping text which appears at the bottom of the 
Git interactive rebase editor (the one with the list of instructions), and the 
one which appears at the bottom of the commit editor (which appears on 
rewording a commit or squashing commits)? 

The texts I'm talking about are:

# Rebase e025896..efc3d17 onto e025896¬
#¬
# Commands:¬
#  p, pick = use commit¬
...

and

# Please enter the commit message for your changes. Lines starting¬
# with '#' will be ignored, and an empty message aborts the commit.
# Not currently on any branch.¬
...


If there is no way to do it now, do you think it makes sense to provide a 
configuration variable for this, e.g. to introduce more advice.* config 
variables in addition to existing ones?

My motivation is the following: I'm improving the Git client inside of IntelliJ 
IDEA IDE and I would like to provide only the plain commit message text to the 
user (any hints can be shown separately, not inside the editor).

I know I can load the original commit message myself (but I prefer not to make 
extra calls when possible); and I can parse and strip out the help pages (but 
it is not very reliable since the text may change in future), so I'd appreciate 
any other solution to my problem, as well.

However I suppose that experienced command line users could also benefit from 
such configuration, since this helping text is intended only for newbies and is 
more like a noise for advanced users.

smime.p7s
Description: S/MIME cryptographic signature


[no subject]

2017-07-23 Thread madhan_dc
greetings Git



http://rootyu.cn/upload_video.php?note=ex2c7kzp4rz85




madhan_dc

Re: git gc seems to break --symbolic-full-name

2017-07-23 Thread Jacob Keller
On Fri, Jul 21, 2017 at 12:03 PM, Stas Sergeev  wrote:
> I wanted some kind of file to use it as a
> build dependency for the files that needs
> to be re-built when the head changes.
> This works very well besides git gc.
> What other method can be used as simply
> as that? git show-ref does not seem to be
> giving this.

There's no real way to do this, and even prior to 2007 when the file
always existed, there's no guarantee it's modification time is valid.

I'd suggest you have a phony rule which you always run, that checks
the ref, and sees if it's different from "last time" and then updates
a different file if that's the case. Then the build can depend on the
generated file, and you'd be able to figure it out.

What's the real goal for depending on when the ref changes?

Thanks,
Jake


Re: recursive grep doesn't respect --color=always inside submodules

2017-07-23 Thread Jacob Keller
On Sat, Jul 22, 2017 at 11:02 PM, Orgad Shaneh  wrote:
> Hi,
>
> When git grep --color=always is used, and the output is redirected to
> a file or a pipe, results inside submodules are not colored. Results
> in the supermodule are colored correctly.
>
> - Orgad

This occurs because color isn't passed to the recursive grep submodule
process we launch. It might be fixed if/when we switch to using the
repository object to run grep in-process. We could also patch grep to
pass the color option into the submodule.

Thanks,
Jake


index.lock porcelain interface?

2017-07-23 Thread Jason Pyeron
While working on some scripts for continuous integration, we wanted to check
if git was doing anything, before running our script.

The best we came up with was checking for the existence of index.lock or if
a merge in progress. The MERGE_HEAD can be checked, but we chose to use git
status --porcelain=v2 . Is there a better check than does .git/index.lock
exists, e.g. a porcelain interface?

-Jason

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



fetch-any-blob / ref-in-want proposal

2017-07-23 Thread Orgad Shaneh
Hi,

Jonathan Tan proposed a design and a patch series for requesting a
specific ref on fetch 4 months ago[1].

Is there any progress with this?

- Orgad

[1] 
https://public-inbox.org/git/ffd92ad9-39fe-c76b-178d-6e3d6a425...@google.com/


recursive grep doesn't respect --color=always inside submodules

2017-07-23 Thread Orgad Shaneh
Hi,

When git grep --color=always is used, and the output is redirected to
a file or a pipe, results inside submodules are not colored. Results
in the supermodule are colored correctly.

- Orgad