Re: [PATCH v2 00/16] First class shallow clone
Duy Nguyen napisał: >On Wed, Jul 24, 2013 at 3:30 PM, Piotr Krukowiecki > wrote: >> (resending, as my phone mail client decided to send it in html, sorry >> about that) >> >> On Wed, Jul 24, 2013 at 3:57 AM, Duy Nguyen >wrote: >>> On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley > wrote: There have been comments on the git-user list about the problem of accidental adding of large files which then make the >repo's foot print pretty large as one use case [Git is consuming very much >RAM]. The bigFileThreshold being one way of spotting such files as separate >objects, and 'trimming' them. >>> >>> I think rewriting history to remove those accidents is better than >>> working around it (the same for accidentally committing password). >We >>> might be able to spot problems early, maybe warn user at commit time >>> that they have added an exceptionally large blob, maybe before push >>> time.. >> >> I can imagine a situation where large files were part of the project >> at some point in history (they were required to build/use it) and >> later were removed because build/project has changed. >> >> It would be useful to have the history for log/blame/etc even if you >> could not build/use old versions. A warning when checking >> out/branching such incomplete tree would be needed. > >That's what shallow clone is for. You fetch the latest (not including >old large blobs) and work on top. For archaeology, make a full clone. >Or do you mean log/blame/etc other paths that don't touch big blobs, >and the clone is still incomplete? Yes, for example if large files were removed recently the last-n-commits-shallow would be useless from blame/log POV. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
On Wed, Jul 24, 2013 at 3:30 PM, Piotr Krukowiecki wrote: > (resending, as my phone mail client decided to send it in html, sorry > about that) > > On Wed, Jul 24, 2013 at 3:57 AM, Duy Nguyen wrote: >> On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley wrote: >>> There have been comments on the git-user list about the >>> problem of accidental adding of large files which then make the repo's foot >>> print pretty large as one use case [Git is consuming very much RAM]. The >>> bigFileThreshold being one way of spotting such files as separate objects, >>> and 'trimming' them. >> >> I think rewriting history to remove those accidents is better than >> working around it (the same for accidentally committing password). We >> might be able to spot problems early, maybe warn user at commit time >> that they have added an exceptionally large blob, maybe before push >> time.. > > I can imagine a situation where large files were part of the project > at some point in history (they were required to build/use it) and > later were removed because build/project has changed. > > It would be useful to have the history for log/blame/etc even if you > could not build/use old versions. A warning when checking > out/branching such incomplete tree would be needed. That's what shallow clone is for. You fetch the latest (not including old large blobs) and work on top. For archaeology, make a full clone. Or do you mean log/blame/etc other paths that don't touch big blobs, and the clone is still incomplete? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
(resending, as my phone mail client decided to send it in html, sorry about that) On Wed, Jul 24, 2013 at 3:57 AM, Duy Nguyen wrote: > On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley wrote: >> There have been comments on the git-user list about the >> problem of accidental adding of large files which then make the repo's foot >> print pretty large as one use case [Git is consuming very much RAM]. The >> bigFileThreshold being one way of spotting such files as separate objects, >> and 'trimming' them. > > I think rewriting history to remove those accidents is better than > working around it (the same for accidentally committing password). We > might be able to spot problems early, maybe warn user at commit time > that they have added an exceptionally large blob, maybe before push > time.. I can imagine a situation where large files were part of the project at some point in history (they were required to build/use it) and later were removed because build/project has changed. It would be useful to have the history for log/blame/etc even if you could not build/use old versions. A warning when checking out/branching such incomplete tree would be needed. -- Piotr Krukowiecki -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
From: "Duy Nguyen" Sent: Wednesday, July 24, 2013 2:57 AM On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley wrote: In some sense a project with a sub-module is a narrow clone, split at a 'commit' object. Yes, except narrow clone is more flexible. You have to decide the split boundary at commit time for sub-module, while you decide the same at clone time for narrow clone. True. It was the thought experiment part I was referring to. There have been comments on the git-user list about the problem of accidental adding of large files which then make the repo's foot print pretty large as one use case [Git is consuming very much RAM]. The bigFileThreshold being one way of spotting such files as separate objects, and 'trimming' them. I think rewriting history to remove those accidents is better than working around it (the same for accidentally committing password). We might be able to spot problems early, maybe warn user at commit time that they have added an exceptionally large blob, maybe before push time.. Again, it was a thought experiment which related to a recent git-user list comment. I'd expect a real use case could be a team where one member who is preparing documentation adds a [large] video to his branch and others then get a bit concerned when they try to track it / pull it as they really don't want it yet. The guy may have many versions on the central repo before a final rebase has a single compressed version. Colleagues may want to review the text surrounding it but not pull the video itself. (remembering 50 % of 'idiots' are twice as dumb as the average... ;-) The "Git is consuming very much RAM" part is not right. We try to keep memory usage under a limit regardless of the size of a blob. There may be some cases we haven't fixed yet. Reports are welcome. I think this was a Windows user, but reports do pop up every now and again. Some times its disc pressure, or just perceived slowness (from others) -- Duy Philip -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley wrote: > In some sense a project with a sub-module is a narrow clone, split at a > 'commit' object. Yes, except narrow clone is more flexible. You have to decide the split boundary at commit time for sub-module, while you decide the same at clone time for narrow clone. > There have been comments on the git-user list about the > problem of accidental adding of large files which then make the repo's foot > print pretty large as one use case [Git is consuming very much RAM]. The > bigFileThreshold being one way of spotting such files as separate objects, > and 'trimming' them. I think rewriting history to remove those accidents is better than working around it (the same for accidentally committing password). We might be able to spot problems early, maybe warn user at commit time that they have added an exceptionally large blob, maybe before push time.. The "Git is consuming very much RAM" part is not right. We try to keep memory usage under a limit regardless of the size of a blob. There may be some cases we haven't fixed yet. Reports are welcome. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
From: "Duy Nguyen" Sent: Tuesday, July 23, 2013 2:20 AM On Tue, Jul 23, 2013 at 6:41 AM, Philip Oakley wrote: From: "Nguyễn Thái Ngọc Duy" Subject: [PATCH v2 00/16] First class shallow clone It's nice to see that shallow can be a first class clone. Thinking outside the box, does this infrastructure offer the opportunity to maybe add a date based depth option that would establish the shallow watermark based on date rather than count. (e.g. the "deepen" SP depth could I've been carefully avoiding the deepen issues because, as you see, it's complicated. But no, this series does not enable or disable new deeepen mechanisms. They can always be added as protocol extensions. Still thinking if it's worth exposing a (restricted form of) rev-list to the protocol.. Interesting idea. have an alternate with a leading 'T' to indicate a time limit ratherv than revision count - I'm expecting such a format would be an error for existing servers). My other thought was this style of cut limit list may also allow a big file limit to do a similar process of listing objects (e.g. blobs) that are size-shallow in the repo, though it maybe a long list on some repos, or with a small size limit. This one, on the other hand, changes the "shape" of the repo (now with holes) and might need to go through the same process we do with this series. Maybe we should prepare for it now. Do you have a use case for size-based filtering? What can we do with a repo with some arbitrary blobs missing? Another form of this is narrow clone, where we cut by paths, not by blob size. Narrow clone sounds more useful to me because it's easier to control what we leave out. In some sense a project with a sub-module is a narrow clone, split at a 'commit' object. There have been comments on the git-user list about the problem of accidental adding of large files which then make the repo's foot print pretty large as one use case [Git is consuming very much RAM]. The bigFileThreshold being one way of spotting such files as separate objects, and 'trimming' them. It doesn't feel right to 'track files and directories` as paths for doing a narrow clone - it'd probably fall into the same trap as tracking file renames. However if one tracks trees and blobs (as a list of sha1 values, possibly with their source path) then it should it should be possible to allow work on the repo with those empty directories/files in the same manner as is used for sub-modules, possibly with some form of git-link file as an alternate marker. The thought process is to map sub-module working onto the other object types (blobs and trees). The user would be unable to edit the trimmed files/directories anyway, so its sha1 value can't change, allowing it to be included in the next commit in the branch series. Philip -- Duy -- -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
On Tue, Jul 23, 2013 at 11:08 AM, Junio C Hamano wrote: > Duy Nguyen writes: > >> This one, on the other hand, changes the "shape" of the repo (now with >> holes) and might need to go through the same process we do with this >> series. Maybe we should prepare for it now. Do you have a use case for >> size-based filtering? What can we do with a repo with some arbitrary >> blobs missing? Another form of this is narrow clone, where we cut by >> paths, not by blob size. Narrow clone sounds more useful to me because >> it's easier to control what we leave out. > > I was about to say "Hear, hear", but then stopped with a question to > myself: why are these "some people do not want them" paths in the > same repository in the first place? I think there are situations that splitting repos is not the best choice but I can't think of any. There's one case though that such "some people" exist: when they migrate from another version control to git and do not want to change the directory layout (because it used to work ok, because of the cost of updating toolchain...) -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
Duy Nguyen writes: > This one, on the other hand, changes the "shape" of the repo (now with > holes) and might need to go through the same process we do with this > series. Maybe we should prepare for it now. Do you have a use case for > size-based filtering? What can we do with a repo with some arbitrary > blobs missing? Another form of this is narrow clone, where we cut by > paths, not by blob size. Narrow clone sounds more useful to me because > it's easier to control what we leave out. I was about to say "Hear, hear", but then stopped with a question to myself: why are these "some people do not want them" paths in the same repository in the first place? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
On Tue, Jul 23, 2013 at 6:41 AM, Philip Oakley wrote: > From: "Nguyễn Thái Ngọc Duy" > Subject: [PATCH v2 00/16] First class shallow clone > > It's nice to see that shallow can be a first class clone. > > Thinking outside the box, does this infrastructure offer the opportunity to > maybe add a date based depth option that would establish the shallow > watermark based on date rather than count. (e.g. the "deepen" SP depth could I've been carefully avoiding the deepen issues because, as you see, it's complicated. But no, this series does not enable or disable new deeepen mechanisms. They can always be added as protocol extensions. Still thinking if it's worth exposing a (restricted form of) rev-list to the protocol.. > have an alternate with a leading 'T' to indicate a time limit ratherv than > revision count - I'm expecting such a format would be an error for existing > servers). > > My other thought was this style of cut limit list may also allow a big file > limit to do a similar process of listing objects (e.g. blobs) that are > size-shallow in the repo, though it maybe a long list on some repos, or with > a small size limit. This one, on the other hand, changes the "shape" of the repo (now with holes) and might need to go through the same process we do with this series. Maybe we should prepare for it now. Do you have a use case for size-based filtering? What can we do with a repo with some arbitrary blobs missing? Another form of this is narrow clone, where we cut by paths, not by blob size. Narrow clone sounds more useful to me because it's easier to control what we leave out. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/16] First class shallow clone
From: "Nguyễn Thái Ngọc Duy" Subject: [PATCH v2 00/16] First class shallow clone It's nice to see that shallow can be a first class clone. Thinking outside the box, does this infrastructure offer the opportunity to maybe add a date based depth option that would establish the shallow watermark based on date rather than count. (e.g. the "deepen" SP depth could have an alternate with a leading 'T' to indicate a time limit ratherv than revision count - I'm expecting such a format would be an error for existing servers). My other thought was this style of cut limit list may also allow a big file limit to do a similar process of listing objects (e.g. blobs) that are size-shallow in the repo, though it maybe a long list on some repos, or with a small size limit. Philip v2 includes: - fix Junio comments, especially the one that may lead to incomplete commit islands. - fix send-pack setting up temporary shallow file, but never passes it to index-pack/unpack-objects (also fix the tests to catch this) - support smart http - add core.noshallow for repos that wish to be always complete - fix locally cloning a shallow repository - make upload-pack pass --shallow-file to pack-objects in order to remove duplicate object counting code just for shallow case. Nguyễn Thái Ngọc Duy (16): send-pack: forbid pushing from a shallow repository {receive,upload}-pack: advertise shallow graft information connect.c: teach get_remote_heads to parse "shallow" lines Move setup_alternate_shallow and write_shallow_commits to shallow.c fetch-pack: support fetching from a shallow repository {send,receive}-pack: support pushing from a shallow clone send-pack: support pushing to a shallow clone upload-pack: let pack-objects do the object counting in shallow case pack-protocol.txt: a bit about smart http Add document for command arguments for supporting smart http {fetch,upload}-pack: support fetching from a shallow clone via smart http receive-pack: support pushing to a shallow clone via http send-pack: support pushing from a shallow clone via http git-clone.txt: remove shallow clone limitations config: add core.noshallow to prevent turning a repo into a shallow one clone: use git protocol for cloning shallow repo locally Documentation/config.txt | 5 + Documentation/git-clone.txt | 7 +- Documentation/git-fetch-pack.txt | 11 +- Documentation/git-receive-pack.txt| 16 ++- Documentation/git-send-pack.txt | 9 +- Documentation/git-upload-pack.txt | 13 ++- Documentation/technical/pack-protocol.txt | 76 - builtin/clone.c | 14 ++- builtin/fetch-pack.c | 6 +- builtin/receive-pack.c| 76 +++-- builtin/send-pack.c | 7 +- cache.h | 4 +- commit.h | 27 + config.c | 5 + connect.c | 12 +- environment.c | 1 + fetch-pack.c | 90 ++- fetch-pack.h | 1 + remote-curl.c | 4 +- send-pack.c | 57 +- send-pack.h | 4 +- shallow.c | 147 + t/t5530-upload-pack-error.sh | 3 - t/t5536-fetch-shallow.sh (new +x) | 141 t/t5537-push-shallow.sh (new +x) | 176 ++ t/t5601-clone.sh | 7 ++ transport.c | 14 ++- upload-pack.c | 132 ++ 28 files changed, 858 insertions(+), 207 deletions(-) create mode 100755 t/t5536-fetch-shallow.sh create mode 100755 t/t5537-push-shallow.sh -- 1.8.2.83.gc99314b -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html - No virus found in this message. Checked by AVG - www.avg.com Version: 2013.0.3349 / Virus Database: 3204/6504 - Release Date: 07/19/13 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 00/16] First class shallow clone
v2 includes: - fix Junio comments, especially the one that may lead to incomplete commit islands. - fix send-pack setting up temporary shallow file, but never passes it to index-pack/unpack-objects (also fix the tests to catch this) - support smart http - add core.noshallow for repos that wish to be always complete - fix locally cloning a shallow repository - make upload-pack pass --shallow-file to pack-objects in order to remove duplicate object counting code just for shallow case. Nguyễn Thái Ngọc Duy (16): send-pack: forbid pushing from a shallow repository {receive,upload}-pack: advertise shallow graft information connect.c: teach get_remote_heads to parse "shallow" lines Move setup_alternate_shallow and write_shallow_commits to shallow.c fetch-pack: support fetching from a shallow repository {send,receive}-pack: support pushing from a shallow clone send-pack: support pushing to a shallow clone upload-pack: let pack-objects do the object counting in shallow case pack-protocol.txt: a bit about smart http Add document for command arguments for supporting smart http {fetch,upload}-pack: support fetching from a shallow clone via smart http receive-pack: support pushing to a shallow clone via http send-pack: support pushing from a shallow clone via http git-clone.txt: remove shallow clone limitations config: add core.noshallow to prevent turning a repo into a shallow one clone: use git protocol for cloning shallow repo locally Documentation/config.txt | 5 + Documentation/git-clone.txt | 7 +- Documentation/git-fetch-pack.txt | 11 +- Documentation/git-receive-pack.txt| 16 ++- Documentation/git-send-pack.txt | 9 +- Documentation/git-upload-pack.txt | 13 ++- Documentation/technical/pack-protocol.txt | 76 - builtin/clone.c | 14 ++- builtin/fetch-pack.c | 6 +- builtin/receive-pack.c| 76 +++-- builtin/send-pack.c | 7 +- cache.h | 4 +- commit.h | 27 + config.c | 5 + connect.c | 12 +- environment.c | 1 + fetch-pack.c | 90 ++- fetch-pack.h | 1 + remote-curl.c | 4 +- send-pack.c | 57 +- send-pack.h | 4 +- shallow.c | 147 + t/t5530-upload-pack-error.sh | 3 - t/t5536-fetch-shallow.sh (new +x) | 141 t/t5537-push-shallow.sh (new +x) | 176 ++ t/t5601-clone.sh | 7 ++ transport.c | 14 ++- upload-pack.c | 132 ++ 28 files changed, 858 insertions(+), 207 deletions(-) create mode 100755 t/t5536-fetch-shallow.sh create mode 100755 t/t5537-push-shallow.sh -- 1.8.2.83.gc99314b -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html