change the filetype from binary to text after the file is commited to a git repo
Hey everybody, I have a problem with an already committed file into my repo. This git repo was converted from svn to git some years ago. Last week I have change some lines in a file and I saw in the diff that it is marked as binary (it's a simple .cpp file). I think on the first commit it was detected as an utf-16 file (on windows). But no matter what I do I can't get it back to a "normal text" text file (git does not detect that), but I is now only utf-8. I also replace the whole content of the file with just 'a' and git say it's binary. Is the only way to get it back to text-mode?: * copy a utf-8 version of the original file * delete the file * make a commit * add the old file as a new one I think that will work but it will also break my history. Is there a better way to get these behavior without losing history? Best regards Tonka
Re: Should I store large text files on Git LFS?
Hi Farshid, On 24 July 2017 at 13:45, Farshid Zavarehwrote: > I'll probably test this myself, but would modifying and committing a 4GB > text file actually add 4GB to the repository's size? I anticipate that it > won't, since Git keeps track of the changes only, instead of storing a copy > of the whole file (whereas this is not the case with binary files, hence the > need for LFS). I decided to do a little test myself. I add three versions of the same data set (sometimes slightly different cuts of the parent data set, which I don't have) each between 2 and 4GB in size. Each time I added a new version it added ~500MB to the repository, and operations on the repository took 35-45 seconds to complete. Running `git gc` compressed the objects fairly well, saving ~400MB of space. I would imagine that even more space would be saved (proportionally) if there were a lot more similar files in the repo. The time to checkout different commits didn't change much, I presume that most of the time is spent copying the large file into the working directory, but I didn't test that. I did test adding some other small files, and sometimes it was slow (when cold I think?) and other times fast. Overall, I think as long as the files change rarely, and the repository remains responsive, having these large files in the repository is ok. They're still big, and if most people will never use them it will be annoying for people to clone and checkout updated versions of the files. If you have a lot of the files, or they update often, or most people don't need all the files, using something like LFS will help a lot. $ git version # running on my windows machine at work git version 2.6.3.windows.1 $ git init git-csv-test && cd git-csv-test $ du -h --max-depth=2 # including here to compare after large data files are added 35K ./.git/hooks 1.0K./.git/info 0 ./.git/objects 0 ./.git/refs 43K ./.git 43K . $ git add data.txt # first version of the data file, 3.2 GB $ git commit $ du -h --max-depth=2 # the data gets compressed down to ~580M of objects in the git store 35K ./.git/hooks 1.0K./.git/info 2.0K./.git/logs 580M./.git/objects 1.0K./.git/refs 581M./.git 3.7G. $ git add data.txt # second version of the data file, 3.6 GB $ git commit $ du -h --max-depth=1 # an extra ~520M of objects added 1.2G./.git 4.7G. $ time git add data.txt # 42.344s - second version of the data file, 2.2 GB $ git commit # takes about 30 seconds to load editor $ du -h --max-depth=1 1.7G./.git 3.9G. $ time git checkout HEAD^ # 36.509s $ time git checkout HEAD^ # 44.658s $ time git checkout master # 38.267s $ git gc $ du -h --max-depth=1 1.3G./.git 3.4G. $ time git checkout HEAD^ # 34.743s $ time git checkout HEAD^ # 41.226s Regards, Andrew Ardill
Re: Should I store large text files on Git LFS?
On Mon, 24 Jul 2017, Farshid Zavareh wrote: I'll probably test this myself, but would modifying and committing a 4GB text file actually add 4GB to the repository's size? I anticipate that it won't, since Git keeps track of the changes only, instead of storing a copy of the whole file (whereas this is not the case with binary files, hence the need for LFS). well, it wouldn't be 4G because text compresses well, but if the file changes drastically from version to version (say a quarterly report), the diff won't help. David Lang
Re: Should I store large text files on Git LFS?
On Mon, 24 Jul 2017, Farshid Zavareh wrote: I see your point. So I guess it really comes down to how the file is anticipated to change. If only one or two line are going to change every now and then, then LFS is not really necessary. But, as you mentioned, text files that change drastically will affect the repository in the same way that binaries do. Not quite the same way that binaries do, because text files compress well. but close. David Lang
Re: Should I store large text files on Git LFS?
I see your point. So I guess it really comes down to how the file is anticipated to change. If only one or two line are going to change every now and then, then LFS is not really necessary. But, as you mentioned, text files that change drastically will affect the repository in the same way that binaries do. > On 24 Jul 2017, at 2:13 pm, David Langwrote: > > On Mon, 24 Jul 2017, Farshid Zavareh wrote: > >> I'll probably test this myself, but would modifying and committing a 4GB >> text file actually add 4GB to the repository's size? I anticipate that it >> won't, since Git keeps track of the changes only, instead of storing a copy >> of the whole file (whereas this is not the case with binary files, hence the >> need for LFS). > > well, it wouldn't be 4G because text compresses well, but if the file changes > drastically from version to version (say a quarterly report), the diff won't > help. > > David Lang
Re: git gc seems to break --symbolic-full-name
On Sun, Jul 23, 2017 at 12:23 PM, Stas Sergeevwrote: > 23.07.2017 11:40, Jacob Keller пишет: >> >> On Fri, Jul 21, 2017 at 12:03 PM, Stas Sergeev wrote: >>> >>> I wanted some kind of file to use it as a >>> build dependency for the files that needs >>> to be re-built when the head changes. >>> This works very well besides git gc. >>> What other method can be used as simply >>> as that? git show-ref does not seem to be >>> giving this. >> >> There's no real way to do this, and even prior to 2007 when the file >> always existed, there's no guarantee it's modification time is valid. >> >> I'd suggest you have a phony rule which you always run, that checks >> the ref, and sees if it's different from "last time" and then updates >> a different file if that's the case. Then the build can depend on the >> generated file, and you'd be able to figure it out. > > OK, thanks, that looks quite simple too. > I will have to create the file by hands that > I expected git to already have, but it appears > not. > >> What's the real goal for depending on when the ref changes? > > So that when users fill in the bug report, I can > see at what revision have the bug happened. :) > While seemingly "just a debugging sugar", the > hard experience shows this to be exceptionally > useful. > I think even linux kernel does something like > this, and solves that task the hard way. For > example I can see a script at scripts/setlocalversion > whose output seems to go to > include/config/kernel.release and a lot of > logic in the toplevel makefile about this. > So not liking the fact that every project solves > this differently, I was trying to get the solution > directly from git. But I'll try otherwise. generally, I'd suggest using "git describe" to output a version based on tag, and as part of your build system set that in some sort of --version output of some kind. Thanks, Jake
Re: Should I store large text files on Git LFS?
Hi Andrew. Thanks for your reply. I'll probably test this myself, but would modifying and committing a 4GB text file actually add 4GB to the repository's size? I anticipate that it won't, since Git keeps track of the changes only, instead of storing a copy of the whole file (whereas this is not the case with binary files, hence the need for LFS). Kind regards, Farshid > On 24 Jul 2017, at 12:29 pm, Andrew Ardillwrote: > > Hi Farshid, > > On 24 July 2017 at 12:01, Farshid Zavareh wrote: >> I'v been handed over a project that uses Git LFS for storing large CSV files. >> >> My understanding is that the main benefit of using Git LFS is to keep the >> repository small for binary files, where Git can't keep track of the changes >> and ends up storing whole files for each revision. For a text file, that >> problem does not exist to begin with and Git can store only the changes. At >> the same time, this is going to make checkouts unnecessarily slow, not to >> mention the financial cost of storing the whole file for each revision. >> >> Is there something I'm missing here? > > Git LFS gives benefits when working on *large* files, not just large > *binary* files. > > I can imagine a few reasons for using LFS for some CSV files > (especially the kinds of files I deal with sometimes!). > > The main one is that many users don't need or want to download the > large files, or all versions of the large file. Moreover, you probably > don't care about changes between those files, or there would be so > many that using the git machinery for comparing them would be > cumbersome and ineffective. > > For me, if I was storing any CSV file over a couple of hundred > megabyte I would consider using something like LFS. An example would > be a large Dunn & Bradstreet data file, which I do an analysis on > every quarter. I want to include the file in the repository, so that > the analysis can be replicated later on, but I don't want to add 4GB > of data to the repo every single time the dataset gets updated (also > every quarter). Storing that in LFS would be a good solution then. > > Regards, > > Andrew Ardill
Re: Should I store large text files on Git LFS?
Hi Farshid, On 24 July 2017 at 12:01, Farshid Zavarehwrote: > I'v been handed over a project that uses Git LFS for storing large CSV files. > > My understanding is that the main benefit of using Git LFS is to keep the > repository small for binary files, where Git can't keep track of the changes > and ends up storing whole files for each revision. For a text file, that > problem does not exist to begin with and Git can store only the changes. At > the same time, this is going to make checkouts unnecessarily slow, not to > mention the financial cost of storing the whole file for each revision. > > Is there something I'm missing here? Git LFS gives benefits when working on *large* files, not just large *binary* files. I can imagine a few reasons for using LFS for some CSV files (especially the kinds of files I deal with sometimes!). The main one is that many users don't need or want to download the large files, or all versions of the large file. Moreover, you probably don't care about changes between those files, or there would be so many that using the git machinery for comparing them would be cumbersome and ineffective. For me, if I was storing any CSV file over a couple of hundred megabyte I would consider using something like LFS. An example would be a large Dunn & Bradstreet data file, which I do an analysis on every quarter. I want to include the file in the repository, so that the analysis can be replicated later on, but I don't want to add 4GB of data to the repo every single time the dataset gets updated (also every quarter). Storing that in LFS would be a good solution then. Regards, Andrew Ardill
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
2017-07-23 10:33 GMT+08:00 Jean-Noël AVILA: > Plus, I hope that some day, instead of translators finding afterwards > that a change broke i18n capabilities, developpers would have some kind > of sanity check. Requiring special versions of i18n tooling stops this hope. > It would be fun to create some tools to help l10n guys finding l10n changes on every git commit. -- Jiang Xin
Should I store large text files on Git LFS?
Hey all. I'v been handed over a project that uses Git LFS for storing large CSV files. My understanding is that the main benefit of using Git LFS is to keep the repository small for binary files, where Git can't keep track of the changes and ends up storing whole files for each revision. For a text file, that problem does not exist to begin with and Git can store only the changes. At the same time, this is going to make checkouts unnecessarily slow, not to mention the financial cost of storing the whole file for each revision. Is there something I'm missing here? Thanks
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
2017-07-22 23:48 GMT+08:00 Junio C Hamano: > Johannes Schindelin writes: > >>> >> A very small hack on gettext. >> >> I am 100% opposed to this hack. It is already cumbersome enough to find >> out what is involved in i18n (it took *me* five minutes to find out that >> much of the information is in po/README, with a lot of information stored >> *on an external site*, and I still managed to miss the `make pot` target). >> >> If at all, we need to make things easier instead of harder. >> >> Requiring potential volunteers to waste their time to compile an >> unnecessary fork of gettext? Not so great an idea. >> >> Plus, each and every Git build would now have to compile their own >> gettext, too, as the vanilla one would not handle the .po files containing >> %!!! >> >> And that requirement would impact instantaneously people like me, and even >> worse: some other packagers might be unaware of the new requirement which >> would not be caught during the build, and neither by the test suite. >> Double bad idea. > > If I understand correctly, the patch hacks the input processing of > xgettext (which reads our source code and generates po/git.pot) so > that when it sees PRItime, pretend that it saw PRIuMAX, causing it > to output % in its output. > > In our workflow, > > * The po/git.pot file is updated only by the l10n coordinator, > and then the result is committed to our tree. > > * Translators build on that commit by (1) running msgmerge which > takes po/git.pot and wiggles its entries into their existing > po/$lang.po file so that po/$lang.po file has new entries from > po/git.pot and (2) editing po/$lang.po file. The result is > committed to our tree. > > * The build procedure builders use runs the resulting > po/$lang.po files through msgfmt to produce po/$lang.mo files, > which will be installed. > > As long as the first step results in % (not % or > anything that plain vanilla msgmerge and msgfmt do not understand), > the second step and third step do not require any hacked version of > gettext tools. > > Even though I tend to agree with your conclusion that pre-processing > our source before passing it to xgettext is probably a better > solution in the longer term, I think the most of the objections in > your message come from your misunderstanding of what Jiang's patch > does and are not based on facts. My understanding is that > translators do not need to compile a custom msgmerge and builders do > not need a custom msgfmt. > I appreciate Junio's explanation. I totally agree. -- Jiang Xin
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
2017-07-22 19:28 GMT+08:00 Johannes Schindelin: > Hi, > > On Sat, 22 Jul 2017, Jiang Xin wrote: > >> 2017-07-22 7:34 GMT+08:00 Junio C Hamano : >> > Jiang Xin writes: >> > >> >> A very small hack on gettext. > > I am 100% opposed to this hack. It's really very small, see: * https://github.com/jiangxin/gettext/commit/b0a72643 * https://public-inbox.org/git/a87e7252bf9de8a87e5dc7712946f72459778d6c.1500684532.git.worldhello@gmail.com/ > It is already cumbersome enough to find > out what is involved in i18n (it took *me* five minutes to find out that > much of the information is in po/README, with a lot of information stored > *on an external site*, and I still managed to miss the `make pot` target). > > If at all, we need to make things easier instead of harder. If it is only the l10n coordinate's duty to generate po/git.pot, the tweak is OK. But if other guys need to recreate po/git.pot, it's hard, especially for guys working on Mac or Windows. > > Requiring potential volunteers to waste their time to compile an > unnecessary fork of gettext? Not so great an idea. > > Plus, each and every Git build would now have to compile their own > gettext, too, as the vanilla one would not handle the .po files containing > %!!! No, only l10n coordinator and potential po/git.pot generator are involved. > > So let's go with Junio's patch. I agree. We just go with the sed-then-cleanup version until we meet ambiguities (I mean some words other than PRItime need to be replaced). -- Jiang Xin
Re: reftable [v2]: new ref storage format
On Sun, Jul 23 2017, Shawn Pearce jotted: > My apologies for not responding to this piece of feedback earlier. > > On Wed, Jul 19, 2017 at 7:02 AM, Ævar Arnfjörð Bjarmason >wrote: >> On Tue, Jul 18 2017, Shawn Pearce jotted: >>> On Mon, Jul 17, 2017 at 12:51 PM, Junio C Hamano wrote: Shawn Pearce writes: > where `time_sec` is the update time in seconds since the epoch. The > `reverse_int32` function inverses the value so lexographical ordering > the network byte order time sorts more recent records first: > > reverse_int(int32 t) { > return 0x - t; > } Is 2038 an issue, or by that time we'd all be retired together with this file format and it won't be our problem? >>> >>> Based on discussion with Michael Haggerty, this is now an 8 byte field >>> storing microseconds since the epoch. We should be good through year >>> . >> >> I think this should be s/microseconds/nanoseconds/, not because there's >> some great need to get better resolution than nanoseconds, but because: >> >> a) We already have WIP code (bp/fsmonitor) that's storing 64 bit >> nanoseconds since the epoch, albeit for the index, not for refs. >> >> b) There are several filesystems that have nanosecond resolution now, >> and it's likely more will start using that. > > The time in a reflog and the time returned by lstat(2) to detect dirty > files in the working tree are unrelated. Of course we want the > dircache to be reflecting the highest precision available from lstat, > to reduce the number of files that must be content hashed for racily > clean detection. So if a filesystem is using nanoseconds, dircache > maybe should support it. > >> Thus: >> >> x) If you use such a filesystem you'll lose time resolution with this >> ref backend v.s. storing them on disk, which isn't itself a big >> deal, but more importantly you lose 1=1 time mapping as you >> transition and convert between the two. > > No, you won't. The reflog today ($GIT_DIR/logs) is storing second > precision in the log record. What precision the filesystem is using as > an mtime is irrelevant. To this & the point above: Sorry about being unclear, I'm talking about the mtime on the modified loose ref. This format proposes to replace both loose & packed refs, does it not? The reflog time is not the only place were we store the mtime of a ref. On my local ext4: $ tail -n 1 .git/logs/refs/heads/master Ævar Arnfjörð Bjarmason 1500852355 +0200 commit: test $ perl -wE 'say ~~localtime shift' 1500852355 Mon Jul 24 01:25:55 2017 $ stat -c %y .git/logs/refs/heads/master 2017-07-24 01:25:55.531379799 +0200 Of course you lose this information as soon as you "git pack-refs", but it's there now & implicitly part of our current FS-backed on-disk format. So what I meant by "x" is that if to test this new reftable backend you write a "git pack-reftable" you won't be able to 1=1 map it to the mtimes you have on the fs showing when the ref was updated, but I see now that you were perhaps never intending to use the more accurate FS time at all for the loose refs, but just use the second resolution reflog data. > Further, microsecond is sufficient resolution for reflog data. From my > benchmarking just reading a reference from a very hot reftable costs > ~20.2 usec. Any update of a reference requires a read-compare-modify > cycle, and so updates aren't going to be more frequent than 20 usec. Right, I'm not arguing that it isn't sufficient, just that it's introducing a needless variation by adding a third timestamp resolution to git. Even if it's not the same logical area in git (dir management v.s. ref management) code to e.g. pretty format timestamps of sec/usec/nsec resolution would tend to get shared, so we'd end up with 3 variants of those instead of 2. That's of course trivial, but so would be just deciding that ~500 years of future proofing is good enough without any extra storage size for those 64 bits and doing away with 1/3. Just standardizing that makes more sense than picking the exact right time resolution for every use case IMO. Otherwise we'll come up with some other thingy in the future that just needs e.g. millisecond in its format, and then end up with 4 variants I also see from "Update transactions" that unlike the current loose backend the reftable backend wouldn't support multiple writers on multiple machines (think NFS-mounted git master) updating unrelated refs, which would break this usec assumption (but which holds due to the locking involved in the new backend). >> y) Our own code will need to juggle second resolution epochs >> (traditional FSs, any 32bit epoch format), microseconds (this >> proposal), and nanoseconds (new FSs, bp/fsmonitor) internally in >> various places. > > But these are also unrelated areas. IMHO, the nanosecond stuff
Re: reftable: new ref storage format
On Sun, Jul 23, 2017 at 3:56 PM, Shawn Pearcewrote: > On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty > wrote: >> On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: >>> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty >>> wrote: > >> * What would you think about being extravagant and making the >> value_type a full byte? It would make the format a tiny bit easier to >> work with, and would leave room for future enhancements (e.g., >> pseudorefs, peeled symrefs, support for the successors of SHA-1s) >> without having to change the file format dramatically. > > I reran my 866k file with full byte value_type. It pushes up the > average bytes per ref from 33 to 34, but the overall file size is > still 28M (with 64 block size). I think its reasonable to expand this > to the full byte as you suggest. FYI, I went back on this in the v3 draft I posted on Jul 22 in https://public-inbox.org/git/CAJo=hJvxWg2J-yRiCK3szux=eym2thjt0kwo-sffooc1rkx...@mail.gmail.com/ I expanded value_type from 2 bits to 3 bits, but kept it as a bit field in a varint. I just couldn't justify the additional byte per ref in these large files. The prefix compression works well enough that many refs are still able to use only a single byte for the suffix_length << 3 | value_type varint, keeping the average at 33 bytes per ref. The reftable format uses values 0-3, leaving 4-7 available. I reserved 4 for an arbitrary payload like MERGE_HEAD type files.
Re: reftable: new ref storage format
+git@vger.kernel.org. I originally sent the below reply privately by mistake. On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggertywrote: > On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: >> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty >> wrote: > > On second thought, the idea of having HEAD (or maybe all pseudorefs) > in the same system would open a few interesting possibilities that > derive from having a global, atomic view of all references: > > 1. We could store backlinks from references to the symbolic references > that refer to them. This would allow us to update the reflogs for > symbolic refs properly. (Currently, there is special-case code to > update the reflogs for HEAD when the reference that it points at is > modified, but not for other symrefs.) This is a good idea, but makes for some difficult transition code. We have to keep the special case for HEAD, but other symrefs would log when in a reftable. > 2. We could store "peeled" versions of symbolic refs. These would have > to be updated whenever the pointed-at reference is updated, but that > would have two nice advantages: HEAD would usually be resolvable based > on the top reftable in the stack, and it would be resolvable in one > step (without having the follow the symref explicitly). Great observation. I wish I saw that sooner. Its a pain in the neck to resolve symrefs, and has caused us a few bugs in JGit on our current non-standard storage. It depends on the back pointer being present and accurate to ensure an update of master also updates the cached HEAD. I'll have to mull on these a bit. I'm not folding them into my documentation and implementation just yet. [...] > I'm still not quite resigned to non-Google users wanting to use blocks > as large as 64k, but (short of doing actual experiments, yuck!) I > can't estimate whether it would make any detectable difference in the > real world. I think it is only likely to matter with NFS, and then its a balancing act of how much of that block did you need vs. not need. :) > On the other end of the spectrum, I might mention that the > shared-storage "network.git" repositories that we use at GitHub often > have a colossal number of references (basically, the sum of the number > of references in all of the forks in a "repository network", including > some hidden references that users don't see). For example, one > "network.git" repository has 56M references(!) Mercifully, we > currently only have to access these repositories for batch jobs, but, > given a better reference storage backend, that might change. A larger block size right now has the advantage of a smaller index, which could make a single ref lookup more efficient. Otherwise, the block size doesn't have a big impact on streaming through many references. >>> 2. The stacking of multiple reftable files together [...] >> At $DAY_JOB we can do this successfully with pack files, which are >> larger and more costly to combine than reftable. I think we can get >> reftable to do support a reasonable stack depth. > > Are you saying that you merge subsets of packfiles without merging all > of them? Does this work together with bitmaps, or do you only have > bitmaps for the biggest packfile? > > We've thought about merging packfiles in that way, but don't want to > give up the benefits of bitmaps. Yes. We compact smaller pack files together into a larger pack file, and try to keep a repository at: - 2 compacted packs, each <20 MiB - 1 base pack + bitmap We issue a daily GC for any repository that isn't just 1 base pack. But during a business day this compacting process lets us handle most read traffic quite well, despite the bitmaps being incomplete. >>> I haven't reviewed your proposal for storing reflogs in reftables in [...] > > Those sizes don't sound that scary. Do your reflogs include > significant information in the log messages, or are they all "push" > "push" "push"? We record quite a bit of information in our audit_log > entries (our equivalent of reflogs), so I would expect ours to > compress much less well. These were pretty sparse in the comment field, and frequent reuse of a message. So it may not be representative of what you are storing. > We also tend to use our audit_logs to see what was happening in a > repository; e.g., around the time that a problem occurred. So for us > it is useful that the entries are in chronological order across > references, as opposed to having the entries for each reference > grouped together. We might be the oddballs here though, and in fact it > is possible that this would be an argument for us to stick to our > audit_log scheme rather than use reflogs stored in reftables. I think of reflogs about a single ref, not the whole repository. So I'm inclined to say the reftable storage of them should be by ref, then time. Anyone who wants a repository view must either scan the entire log segment, or
Re: Remove help advice text from git editors for interactive rebase and reword
> On 24 Jul 2017, at 01:09 , Junio C Hamanowrote: > > Who is running "git commit --amend" and "git rebase -i" in the > workflow of a user of your tool? Is it the end user who types these > commands to the shell command prompt, or does your tool formulate > the command line and does an equivalent of system(3) to run it? > > I am assuming that the answer is the latter in my response. Yes, it is the latter case: the tool formulates the command line and forks a process. > Not at all interested, as that would mean your tool will tell its > users to set such a configuration variable and their interactive use > of Git outside your tool will behave differently from other people > who use vanilla Git, and they will complain to us. That's not true, since the tool can (and would) use the `git -c config.var=value rebase -i` syntax to set the configuration variable just for this particular command, without affecting the environment. Btw, if my proposal is so uninteresting, why the existing advice.* variables were previously introduced? I don't know the motivation, but assume that it was about making Git less wordy for experienced users. So I don't see any difference here. > But stepping back a bit, as you said in the parentheses, your tool > would need to grab these "hints" from Git, instead of having a > separate hardcoded hints that will go stale while the underlying Git > command improves, to be able to show them "separately". There is no need to call Git to get these "hints". They are quite obvious, well-known and can be hardcoded. However, I don't plan to use these hints anyway, since they are a bit foreign to the GUI of the tool I develop. For instance, for reword I'd like to show an editor containing just the plain commit message that the user is about to change. smime.p7s Description: S/MIME cryptographic signature
Re: Bug^Feature? fetch protects only current working tree branch
Andreas Heidukwrites: > A `git fetch . origin/master:master` protects the currently checked out > branch (HEAD) unless the `-u/--update-head-ok` is supplied. This avoids a > mismatch between the index and HEAD. BUT branches which are HEADs in other > working trees do not get that care - their state is silently screwed up. > > Is this intended behaviour or and just an oversight while implementing > `git worktree`? The latter. "git worktree" is an interesting feature and has potential to become useful in wider variety of workflows than it currently is, but end users should consider it still experimental as it still is with many such small rough edges like this one. Patches to help improving the feature is of course very welcome.
Re: Remove help advice text from git editors for interactive rebase and reword
Kirill Likhodedovwrites: > My motivation is the following: I'm improving the Git client > inside of IntelliJ IDEA IDE and I would like to provide only the > plain commit message text to the user (any hints can be shown > separately, not inside the editor). Who is running "git commit --amend" and "git rebase -i" in the workflow of a user of your tool? Is it the end user who types these commands to the shell command prompt, or does your tool formulate the command line and does an equivalent of system(3) to run it? I am assuming that the answer is the latter in my response. > If there is no way to do it now, do you think it makes sense to > provide a configuration variable for this, e.g. to introduce more > advice.* config variables in addition to existing ones? Not at all interested, as that would mean your tool will tell its users to set such a configuration variable and their interactive use of Git outside your tool will behave differently from other people who use vanilla Git, and they will complain to us. But I do not think adding a new command line option that only is passed by a tool like yours when it runs "git rebase -i" via system(3) equivalent would introduce such an issue, so that may be workable. But stepping back a bit, as you said in the parentheses, your tool would need to grab these "hints" from Git, instead of having a separate hardcoded hints that will go stale while the underlying Git command improves, to be able to show them "separately". Which means to me that you would need to get the output Git would normally show to the end user and do your own splitting and parsing anyway. Which in turn would mean that a configuration or a command line option to squelch these, which would rob your tool the ability to read what Git would have told to your users, would be a bad idea and not a useful addition to the overall system. So...
Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility
Jean-Noël AVILAwrites: > Le 22/07/2017 à 02:43, Jiang Xin a écrit : >> >> Benefit of using the tweak version of gettext: >> >> 1. `make pot` can be run in a tar extract directory (without git controlled). > > This issue is real for packet maintainers who can patch the original > source and run their own set of utilities outside of a git repo. This > can be possible with Junio's proposition by writing the files to a > temporary directory before running the xgettext, then removing the > temporary directory. > > Please note that with respect to this issue, the patched xgettext > approach is completely disruptive. OK, so what you are saying is that my assumption that Jiang (at least for now, and his successor l10n coordinator in sometime in the future) would be the only one who needs to have access to the machinery to update po/git.pot and that it does not matter that much what that exact machinery is as long as the resulting po/git.pot lists messages with % and other known ones because plain vanilla tools will grok such po/git.pot file just fine, were both too optimistic. I think binary packagers, who update the software with their own changes, produce their own modified po/git.pot and have that translated into multiple languages, are capable of coping with any method we use ourselves, but being capable of doing something and being happy to do that thing are two different things, and we need to aim for the latter---we should not make things unnecessarily cumbersome for them. So I'll leave the s/PRItime/PRIuMAX/ patch in the 'master' without Jiang's change for 2.14-rc1. The approach to require private edition of xgettext, while it may technically be a fun exercise, would not fly very well in the real world. For those who want to work with a tarball extract without being in a Git repository, it would be sufficient fot them to run "git init && git commit --allow-empty -m import" immediately after extracting the tarball, even if we require that "make pot" must be run in a clean repository. And I'd prefer to go that route than copying into a temporary directory, primarily because I do not want to having to worry about what to copy---when we know we pass $foo.c through xgettext, we know we want to put the modified copy of $foo.c in the temporary, but I do not want to even think if we need to also copy the header files $foo.c "#include"s, for example. Thanks.
Re: reftable [v2]: new ref storage format
My apologies for not responding to this piece of feedback earlier. On Wed, Jul 19, 2017 at 7:02 AM, Ævar Arnfjörð Bjarmasonwrote: > On Tue, Jul 18 2017, Shawn Pearce jotted: >> On Mon, Jul 17, 2017 at 12:51 PM, Junio C Hamano wrote: >>> Shawn Pearce writes: where `time_sec` is the update time in seconds since the epoch. The `reverse_int32` function inverses the value so lexographical ordering the network byte order time sorts more recent records first: reverse_int(int32 t) { return 0x - t; } >>> >>> Is 2038 an issue, or by that time we'd all be retired together with >>> this file format and it won't be our problem? >> >> Based on discussion with Michael Haggerty, this is now an 8 byte field >> storing microseconds since the epoch. We should be good through year >> . > > I think this should be s/microseconds/nanoseconds/, not because there's > some great need to get better resolution than nanoseconds, but because: > > a) We already have WIP code (bp/fsmonitor) that's storing 64 bit > nanoseconds since the epoch, albeit for the index, not for refs. > > b) There are several filesystems that have nanosecond resolution now, > and it's likely more will start using that. The time in a reflog and the time returned by lstat(2) to detect dirty files in the working tree are unrelated. Of course we want the dircache to be reflecting the highest precision available from lstat, to reduce the number of files that must be content hashed for racily clean detection. So if a filesystem is using nanoseconds, dircache maybe should support it. > Thus: > > x) If you use such a filesystem you'll lose time resolution with this > ref backend v.s. storing them on disk, which isn't itself a big > deal, but more importantly you lose 1=1 time mapping as you > transition and convert between the two. No, you won't. The reflog today ($GIT_DIR/logs) is storing second precision in the log record. What precision the filesystem is using as an mtime is irrelevant. Further, microsecond is sufficient resolution for reflog data. From my benchmarking just reading a reference from a very hot reftable costs ~20.2 usec. Any update of a reference requires a read-compare-modify cycle, and so updates aren't going to be more frequent than 20 usec. > y) Our own code will need to juggle second resolution epochs > (traditional FSs, any 32bit epoch format), microseconds (this > proposal), and nanoseconds (new FSs, bp/fsmonitor) internally in > various places. But these are also unrelated areas. IMHO, the nanosecond stuff should be confined to the dircache management code and working tree comparison code, and not be leaking out of there. Commit objects are still recorded with second precision, and that isn't going to change. Therefore I decided to stick with microseconds in the reftable v3 draft that I posted on July 22nd.
Re: reftable [v3]: new ref storage format
On Sat, Jul 22, 2017 at 8:29 PM, Shawn Pearcewrote: > 3rd iteration of the reftable storage format. > > You can read a rendered version of this here: > https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md > > Significant changes from v2: > - efficient lookup by SHA-1 for allow-tip-sha1-in-want. > - type 0x4 for FETCH_HEAD, MERGE_HEAD. > - file size up (27.7 M in v1, 34.4 M in v3) I had some feedback on v2 here which still applies: https://public-inbox.org/git/87k234tti7@gmail.com/ It would be good to either get a reply to that, or if you don't think it's sensible for whatever reason and left it out of v3 have a "feedback received but discarded for " in these summaries as you're sending new versions. Aside from the mail I sent I think that would be very useful in general if there's been any other such feedback (I honestly don't know if there has, I haven't been following this actively).
Re: git gc seems to break --symbolic-full-name
23.07.2017 11:40, Jacob Keller пишет: On Fri, Jul 21, 2017 at 12:03 PM, Stas Sergeevwrote: I wanted some kind of file to use it as a build dependency for the files that needs to be re-built when the head changes. This works very well besides git gc. What other method can be used as simply as that? git show-ref does not seem to be giving this. There's no real way to do this, and even prior to 2007 when the file always existed, there's no guarantee it's modification time is valid. I'd suggest you have a phony rule which you always run, that checks the ref, and sees if it's different from "last time" and then updates a different file if that's the case. Then the build can depend on the generated file, and you'd be able to figure it out. OK, thanks, that looks quite simple too. I will have to create the file by hands that I expected git to already have, but it appears not. What's the real goal for depending on when the ref changes? So that when users fill in the bug report, I can see at what revision have the bug happened. :) While seemingly "just a debugging sugar", the hard experience shows this to be exceptionally useful. I think even linux kernel does something like this, and solves that task the hard way. For example I can see a script at scripts/setlocalversion whose output seems to go to include/config/kernel.release and a lot of logic in the toplevel makefile about this. So not liking the fact that every project solves this differently, I was trying to get the solution directly from git. But I'll try otherwise.
Re: [PATCH v2 00/10] tag: only respect `pager.tag` in list-mode
On 21 July 2017 at 00:27, Junio C Hamanowrote: > I tend to agree with you that 1-3/10 may be better off being a > single patch (or 3/10 dropped, as Brandon is working on losing it > nearby). I would have expected 7-8/10 to be a single patch, as by > the time a reader reaches 07/10, because of the groundwork laid by > 04-06/10, it is obvious that the general direction is to allow the > caller, i.e. cmd_tag(), to make a call to setup_auto_pager() only in > some but not all circumstances, and 07/10 being faithful to the > original behaviour (only to be updated in 08/10) is somewhat counter > intuitive. It is not wrong per-se; it was just unexpected. Thanks for your comments. I will be away for a few days, but once I get back, I'll try to produce a v3 based on this and any further feedback. Martin
Re: Expected behavior of "git check-ignore"...
From: "John Szakmeister"Sent: Thursday, July 20, 2017 11:37 AM A StackOverflow user posted a question about how to reliably check whether a file would be ignored by "git add" and expected "git check-ignore" to return results that matched git add's behavior. It turns out that it doesn't. If there is a negation rule, we end up returning that exclude and printing it and exiting with 0 (there are some ignored files) even though the file has been marked to not be ignored. Is the expected behavior of "git check-ignore" to return 0 even if the file is not ignore when a negation is present? I'm testing this on.. $ git --version git version 2.10.0.windows.1 git init . echo 'foo/*' > .gitignore echo '!foo/bar' > .gitignore Is this missing the >> append to get the full two line .gitignore? adding in a `cat .gitignore` would help check. mkdir foo touch foo/bar I don't think you need these. It's the given pathnames that are checked, not the file system content. git check-ignore foo/bar Does this need the `-q` option to set the exit status? echo $? # to display the status. I expect the last command to return 1 (no files are ignored), but it doesn't. The StackOverflow user had the same expectation, and imagine others do as well. OTOH, it looks like the command is really meant to be a debugging tool--to show me the line in a .gitignore associated with this file, if there is one. In which case, the behavior is correct but the return code description is a bit misleading (0 means the file is ignored, which isn't true here). Maybe the logic isn't that clear? Maybe it is simply detecting if any one of the ignore lines is active, and doesn't reset the status for a negation? I appear to get the same response as yourself, but I haven't spent much time on it - I'm clearing a backlog of work at the moment. I also tried the -v -n options, and if I swap the ignore lines around it still says line 2 is the one that ignores. It gets more interesting if two paths are given `foo/bar foo/baz`, to see which line picks up which pathname (and with the swapped ignore lines). Is there a test for this in the test suite? Thoughts? It seems like this question was asked before several years ago but didn't get a response. Thanks! -John PS The SO question is here: https://stackoverflow.com/questions/45210790/how-to-reliably-check-whether-a-file-is-ignored-by-git -- Philip
Bug^Feature? fetch protects only current working tree branch
A `git fetch . origin/master:master` protects the currently checked out branch (HEAD) unless the `-u/--update-head-ok` is supplied. This avoids a mismatch between the index and HEAD. BUT branches which are HEADs in other working trees do not get that care - their state is silently screwed up. Is this intended behaviour or and just an oversight while implementing `git worktree`? Steps to reproduce # setup git clone -b master $SOMETHING xtemp cd xtemp git reset --hard HEAD~5 # pretend to be back some time git worktree add ../xtemp-wt1 git worktree add ../xtemp-wt2 # test git fetch . origin/master:master fatal: Refusing to fetch into current branch refs/heads/master of non-bare repository fatal: The remote end hung up unexpectedly # OK, current working tree is protected, try another one: git fetch . origin/master:xtemp-wt1 From . b4d1278..6e7b60d origin/master -> xtemp-wt1 cd ../xtemp-wt1 git status # admire messed up working tree here # The protection is really "current working tree", not "first/main working tree"! git fetch . origin/master:master From . b4d1278..6e7b60d origin/master -> master cd ../xtemp git status # now it's messed up here too # Try with "--update-head-ok" but check first. cd ../xtemp-wt2 git fetch . origin/master:xtemp-wt2 fatal: Refusing to fetch into current branch refs/heads/xtemp-wt2 of non-bare repository fatal: The remote end hung up unexpectedly git fetch --update-head-ok . origin/master:xtemp-wt2 From . b4d1278..6e7b60d origin/master -> xtemp-wt2
Re: Remove help advice text from git editors for interactive rebase and reword
On 23 July 2017 at 13:03, Kirill Likhodedov wrote: > Hello, > > is it possible to remove the helping text which appears at the bottom > of the Git interactive rebase editor (the one with the list of > instructions) I believe currently there is not way to do it. The interactive rebase is implemented in git-rebase--interactive.sh which always makes a call to append_todo_help to append the help text to the todo list of commits. > and the one which appears at the bottom of the commit editor (which > appears on rewording a commit or squashing commits)? This one too seems to be hardcoded in builtin/commit.c. > I can parse and strip out the help pages (but it is not very reliable > since the text may change in future) I doubt the syntax of the interactive rebase todo list will ever change, so you can reliably remove all lines that are empty or start with the $(git config --get core.commentchar) or '#' if that's empty or 'auto'. However, it's harder with the commit messages during --amend as the comment character is not really fixed and can be dynamically selected to not conflict with the characters used in the commit message if the core.commentchar is set to 'auto'. > However I suppose that experienced command line users could also > benefit from such configuration, since this helping text is intended > only for newbies and is more like a noise for advanced users. Well, the text is appended to the todo list of commits, so not that it gets too much in the way of editing the list by humans.
Remove help advice text from git editors for interactive rebase and reword
Hello, is it possible to remove the helping text which appears at the bottom of the Git interactive rebase editor (the one with the list of instructions), and the one which appears at the bottom of the commit editor (which appears on rewording a commit or squashing commits)? The texts I'm talking about are: # Rebase e025896..efc3d17 onto e025896¬ #¬ # Commands:¬ # p, pick = use commit¬ ... and # Please enter the commit message for your changes. Lines starting¬ # with '#' will be ignored, and an empty message aborts the commit. # Not currently on any branch.¬ ... If there is no way to do it now, do you think it makes sense to provide a configuration variable for this, e.g. to introduce more advice.* config variables in addition to existing ones? My motivation is the following: I'm improving the Git client inside of IntelliJ IDEA IDE and I would like to provide only the plain commit message text to the user (any hints can be shown separately, not inside the editor). I know I can load the original commit message myself (but I prefer not to make extra calls when possible); and I can parse and strip out the help pages (but it is not very reliable since the text may change in future), so I'd appreciate any other solution to my problem, as well. However I suppose that experienced command line users could also benefit from such configuration, since this helping text is intended only for newbies and is more like a noise for advanced users. smime.p7s Description: S/MIME cryptographic signature
[no subject]
greetings Git http://rootyu.cn/upload_video.php?note=ex2c7kzp4rz85 madhan_dc
Re: git gc seems to break --symbolic-full-name
On Fri, Jul 21, 2017 at 12:03 PM, Stas Sergeevwrote: > I wanted some kind of file to use it as a > build dependency for the files that needs > to be re-built when the head changes. > This works very well besides git gc. > What other method can be used as simply > as that? git show-ref does not seem to be > giving this. There's no real way to do this, and even prior to 2007 when the file always existed, there's no guarantee it's modification time is valid. I'd suggest you have a phony rule which you always run, that checks the ref, and sees if it's different from "last time" and then updates a different file if that's the case. Then the build can depend on the generated file, and you'd be able to figure it out. What's the real goal for depending on when the ref changes? Thanks, Jake
Re: recursive grep doesn't respect --color=always inside submodules
On Sat, Jul 22, 2017 at 11:02 PM, Orgad Shanehwrote: > Hi, > > When git grep --color=always is used, and the output is redirected to > a file or a pipe, results inside submodules are not colored. Results > in the supermodule are colored correctly. > > - Orgad This occurs because color isn't passed to the recursive grep submodule process we launch. It might be fixed if/when we switch to using the repository object to run grep in-process. We could also patch grep to pass the color option into the submodule. Thanks, Jake
index.lock porcelain interface?
While working on some scripts for continuous integration, we wanted to check if git was doing anything, before running our script. The best we came up with was checking for the existence of index.lock or if a merge in progress. The MERGE_HEAD can be checked, but we chose to use git status --porcelain=v2 . Is there a better check than does .git/index.lock exists, e.g. a porcelain interface? -Jason -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100- - +1 (443) 269-1555 x333Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
fetch-any-blob / ref-in-want proposal
Hi, Jonathan Tan proposed a design and a patch series for requesting a specific ref on fetch 4 months ago[1]. Is there any progress with this? - Orgad [1] https://public-inbox.org/git/ffd92ad9-39fe-c76b-178d-6e3d6a425...@google.com/
recursive grep doesn't respect --color=always inside submodules
Hi, When git grep --color=always is used, and the output is redirected to a file or a pipe, results inside submodules are not colored. Results in the supermodule are colored correctly. - Orgad