Re: [PATCH v2 00/16] First class shallow clone

2013-07-24 Thread Piotr Krukowiecki
Duy Nguyen  napisał:
>On Wed, Jul 24, 2013 at 3:30 PM, Piotr Krukowiecki
> wrote:
>> (resending, as my phone mail client decided to send it in html, sorry
>> about that)
>>
>> On Wed, Jul 24, 2013 at 3:57 AM, Duy Nguyen 
>wrote:
>>> On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley
> wrote:
 There have been comments on the git-user list about the
 problem of accidental adding of large files which then make the
>repo's foot
 print pretty large as one use case [Git is consuming very much
>RAM]. The
 bigFileThreshold being one way of spotting such files as separate
>objects,
 and 'trimming' them.
>>>
>>> I think rewriting history to remove those accidents is better than
>>> working around it (the same for accidentally committing password).
>We
>>> might be able to spot problems early, maybe warn user at commit time
>>> that they have added an exceptionally large blob, maybe before push
>>> time..
>>
>> I can imagine a situation where large files were part of the project
>> at some point in history (they were required to build/use it) and
>> later were removed because build/project has changed.
>>
>> It would be useful to have the history for log/blame/etc even if you
>> could not build/use old versions. A warning when checking
>> out/branching such incomplete tree would be needed.
>
>That's what shallow clone is for. You fetch the latest (not including
>old large blobs) and work on top. For archaeology, make a full clone.
>Or do you mean log/blame/etc other paths that don't touch big blobs,
>and the clone is still incomplete?


Yes, for example if large files were removed recently the 
last-n-commits-shallow would be useless from blame/log POV. 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-24 Thread Duy Nguyen
On Wed, Jul 24, 2013 at 3:30 PM, Piotr Krukowiecki
 wrote:
> (resending, as my phone mail client decided to send it in html, sorry
> about that)
>
> On Wed, Jul 24, 2013 at 3:57 AM, Duy Nguyen  wrote:
>> On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley  wrote:
>>> There have been comments on the git-user list about the
>>> problem of accidental adding of large files which then make the repo's foot
>>> print pretty large as one use case [Git is consuming very much RAM]. The
>>> bigFileThreshold being one way of spotting such files as separate objects,
>>> and 'trimming' them.
>>
>> I think rewriting history to remove those accidents is better than
>> working around it (the same for accidentally committing password). We
>> might be able to spot problems early, maybe warn user at commit time
>> that they have added an exceptionally large blob, maybe before push
>> time..
>
> I can imagine a situation where large files were part of the project
> at some point in history (they were required to build/use it) and
> later were removed because build/project has changed.
>
> It would be useful to have the history for log/blame/etc even if you
> could not build/use old versions. A warning when checking
> out/branching such incomplete tree would be needed.

That's what shallow clone is for. You fetch the latest (not including
old large blobs) and work on top. For archaeology, make a full clone.
Or do you mean log/blame/etc other paths that don't touch big blobs,
and the clone is still incomplete?
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-24 Thread Piotr Krukowiecki
(resending, as my phone mail client decided to send it in html, sorry
about that)

On Wed, Jul 24, 2013 at 3:57 AM, Duy Nguyen  wrote:
> On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley  wrote:
>> There have been comments on the git-user list about the
>> problem of accidental adding of large files which then make the repo's foot
>> print pretty large as one use case [Git is consuming very much RAM]. The
>> bigFileThreshold being one way of spotting such files as separate objects,
>> and 'trimming' them.
>
> I think rewriting history to remove those accidents is better than
> working around it (the same for accidentally committing password). We
> might be able to spot problems early, maybe warn user at commit time
> that they have added an exceptionally large blob, maybe before push
> time..

I can imagine a situation where large files were part of the project
at some point in history (they were required to build/use it) and
later were removed because build/project has changed.

It would be useful to have the history for log/blame/etc even if you
could not build/use old versions. A warning when checking
out/branching such incomplete tree would be needed.

-- 
Piotr Krukowiecki
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-24 Thread Philip Oakley

From: "Duy Nguyen" 
Sent: Wednesday, July 24, 2013 2:57 AM
On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley  
wrote:
In some sense a project with a sub-module is a narrow clone, split at 
a

'commit' object.


Yes, except narrow clone is more flexible. You have to decide the
split boundary at commit time for sub-module, while you decide the
same at clone time for narrow clone.



True. It was the thought experiment part I was referring to.


There have been comments on the git-user list about the
problem of accidental adding of large files which then make the 
repo's foot
print pretty large as one use case [Git is consuming very much RAM]. 
The
bigFileThreshold being one way of spotting such files as separate 
objects,

and 'trimming' them.


I think rewriting history to remove those accidents is better than
working around it (the same for accidentally committing password). We
might be able to spot problems early, maybe warn user at commit time
that they have added an exceptionally large blob, maybe before push
time..


Again, it was a thought experiment which related to a recent git-user 
list comment.
I'd expect a real use case could be a team where one member who is 
preparing documentation adds a [large] video to his branch and others 
then get a bit concerned when they try to track it / pull it as they 
really don't want it yet. The guy may have many versions on the central 
repo before a final rebase has a single compressed version. Colleagues 
may want to review the text surrounding it but not pull the video 
itself. (remembering 50 % of 'idiots' are twice as dumb as the 
average... ;-)




The "Git is consuming very much RAM" part is not right. We try to keep
memory usage under a limit regardless of the size of a blob. There may
be some cases we haven't fixed yet. Reports are welcome.


I think this was a Windows user, but reports do pop up every now and 
again. Some times its disc pressure, or just perceived slowness (from 
others)




--
Duy


Philip


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-23 Thread Duy Nguyen
On Wed, Jul 24, 2013 at 5:33 AM, Philip Oakley  wrote:
> In some sense a project with a sub-module is a narrow clone, split at a
> 'commit' object.

Yes, except narrow clone is more flexible. You have to decide the
split boundary at commit time for sub-module, while you decide the
same at clone time for narrow clone.

> There have been comments on the git-user list about the
> problem of accidental adding of large files which then make the repo's foot
> print pretty large as one use case [Git is consuming very much RAM]. The
> bigFileThreshold being one way of spotting such files as separate objects,
> and 'trimming' them.

I think rewriting history to remove those accidents is better than
working around it (the same for accidentally committing password). We
might be able to spot problems early, maybe warn user at commit time
that they have added an exceptionally large blob, maybe before push
time..

The "Git is consuming very much RAM" part is not right. We try to keep
memory usage under a limit regardless of the size of a blob. There may
be some cases we haven't fixed yet. Reports are welcome.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-23 Thread Philip Oakley

From: "Duy Nguyen" 
Sent: Tuesday, July 23, 2013 2:20 AM
On Tue, Jul 23, 2013 at 6:41 AM, Philip Oakley  
wrote:

From: "Nguyễn Thái Ngọc Duy" 
Subject: [PATCH v2 00/16] First class shallow clone

It's nice to see that shallow can be a first class clone.

Thinking outside the box, does this infrastructure offer the 
opportunity to

maybe add a date based depth option that would establish the shallow
watermark based on date rather than count. (e.g. the "deepen" SP 
depth could


I've been carefully avoiding the deepen issues because, as you see,
it's complicated. But no, this series does not enable or disable new
deeepen mechanisms. They can always be added as protocol extensions.
Still thinking if it's worth exposing a (restricted form of) rev-list
to the protocol..


Interesting idea.


have an alternate with a leading 'T' to indicate a time limit ratherv 
than
revision count - I'm expecting such a format would be an error for 
existing

servers).

My other thought was this style of cut limit list may also allow a 
big file
limit to do a similar process of listing objects (e.g. blobs) that 
are
size-shallow in the repo, though it maybe a long list on some repos, 
or with

a small size limit.


This one, on the other hand, changes the "shape" of the repo (now with
holes) and might need to go through the same process we do with this
series. Maybe we should prepare for it now. Do you have a use case for
size-based filtering? What can we do with a repo with some arbitrary
blobs missing? Another form of this is narrow clone, where we cut by
paths, not by blob size. Narrow clone sounds more useful to me because
it's easier to control what we leave out.


In some sense a project with a sub-module is a narrow clone, split at a 
'commit' object. There have been comments on the git-user list about the 
problem of accidental adding of large files which then make the repo's 
foot print pretty large as one use case [Git is consuming very much 
RAM]. The bigFileThreshold being one way of spotting such files as 
separate objects, and 'trimming' them.


It doesn't feel right to 'track files and directories` as paths for 
doing a narrow clone - it'd probably fall into the same trap as tracking 
file renames. However if one tracks trees and blobs (as a list of sha1 
values, possibly with their source path) then it should it should be 
possible to allow work on the repo with those empty directories/files in 
the same manner as is used for sub-modules, possibly with some form of 
git-link file as an alternate marker.


The thought process is to map sub-module working onto the other object 
types (blobs and trees). The user would be unable to edit the trimmed 
files/directories anyway, so its sha1 value can't change, allowing it to 
be included in the next commit in the branch series.


Philip


--
Duy
--


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-22 Thread Duy Nguyen
On Tue, Jul 23, 2013 at 11:08 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> This one, on the other hand, changes the "shape" of the repo (now with
>> holes) and might need to go through the same process we do with this
>> series. Maybe we should prepare for it now. Do you have a use case for
>> size-based filtering? What can we do with a repo with some arbitrary
>> blobs missing? Another form of this is narrow clone, where we cut by
>> paths, not by blob size. Narrow clone sounds more useful to me because
>> it's easier to control what we leave out.
>
> I was about to say "Hear, hear", but then stopped with a question to
> myself: why are these "some people do not want them" paths in the
> same repository in the first place?

I think there are situations that splitting repos is not the best
choice but I can't think of any. There's one case though that such
"some people" exist: when they migrate from another version control to
git and do not want to change the directory layout (because it used to
work ok, because of the cost of updating toolchain...)
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-22 Thread Junio C Hamano
Duy Nguyen  writes:

> This one, on the other hand, changes the "shape" of the repo (now with
> holes) and might need to go through the same process we do with this
> series. Maybe we should prepare for it now. Do you have a use case for
> size-based filtering? What can we do with a repo with some arbitrary
> blobs missing? Another form of this is narrow clone, where we cut by
> paths, not by blob size. Narrow clone sounds more useful to me because
> it's easier to control what we leave out.

I was about to say "Hear, hear", but then stopped with a question to
myself: why are these "some people do not want them" paths in the
same repository in the first place?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-22 Thread Duy Nguyen
On Tue, Jul 23, 2013 at 6:41 AM, Philip Oakley  wrote:
> From: "Nguyễn Thái Ngọc Duy" 
> Subject: [PATCH v2 00/16] First class shallow clone
>
> It's nice to see that shallow can be a first class clone.
>
> Thinking outside the box, does this infrastructure offer the opportunity to
> maybe add a date based depth option that would establish the shallow
> watermark based on date rather than count. (e.g. the "deepen" SP depth could

I've been carefully avoiding the deepen issues because, as you see,
it's complicated. But no, this series does not enable or disable new
deeepen mechanisms. They can always be added as protocol extensions.
Still thinking if it's worth exposing a (restricted form of) rev-list
to the protocol..

> have an alternate with a leading 'T' to indicate a time limit ratherv than
> revision count - I'm expecting such a format would be an error for existing
> servers).
>
> My other thought was this style of cut limit list may also allow a big file
> limit to do a similar process of listing objects (e.g. blobs) that are
> size-shallow in the repo, though it maybe a long list on some repos, or with
> a small size limit.

This one, on the other hand, changes the "shape" of the repo (now with
holes) and might need to go through the same process we do with this
series. Maybe we should prepare for it now. Do you have a use case for
size-based filtering? What can we do with a repo with some arbitrary
blobs missing? Another form of this is narrow clone, where we cut by
paths, not by blob size. Narrow clone sounds more useful to me because
it's easier to control what we leave out.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/16] First class shallow clone

2013-07-22 Thread Philip Oakley

From: "Nguyễn Thái Ngọc Duy" 
Subject: [PATCH v2 00/16] First class shallow clone

It's nice to see that shallow can be a first class clone.

Thinking outside the box, does this infrastructure offer the opportunity 
to maybe add a date based depth option that would establish the shallow 
watermark based on date rather than count. (e.g. the "deepen" SP depth 
could have an alternate with a leading 'T' to indicate a time limit 
ratherv than revision count - I'm expecting such a format would be an 
error for existing servers).


My other thought was this style of cut limit list may also allow a big 
file limit to do a similar process of listing objects (e.g. blobs) that 
are size-shallow in the repo, though it maybe a long list on some repos, 
or with a small size limit.


Philip


v2 includes:

- fix Junio comments, especially the one that may lead to incomplete
  commit islands.
- fix send-pack setting up temporary shallow file, but never passes
  it to index-pack/unpack-objects (also fix the tests to catch this)
- support smart http
- add core.noshallow for repos that wish to be always complete
- fix locally cloning a shallow repository
- make upload-pack pass --shallow-file to pack-objects in order to
  remove duplicate object counting code just for shallow case.

Nguyễn Thái Ngọc Duy (16):
 send-pack: forbid pushing from a shallow repository
 {receive,upload}-pack: advertise shallow graft information
 connect.c: teach get_remote_heads to parse "shallow" lines
 Move setup_alternate_shallow and write_shallow_commits to shallow.c
 fetch-pack: support fetching from a shallow repository
 {send,receive}-pack: support pushing from a shallow clone
 send-pack: support pushing to a shallow clone
 upload-pack: let pack-objects do the object counting in shallow case
 pack-protocol.txt: a bit about smart http
 Add document for command arguments for supporting smart http
 {fetch,upload}-pack: support fetching from a shallow clone via smart 
http

 receive-pack: support pushing to a shallow clone via http
 send-pack: support pushing from a shallow clone via http
 git-clone.txt: remove shallow clone limitations
 config: add core.noshallow to prevent turning a repo into a shallow 
one

 clone: use git protocol for cloning shallow repo locally

Documentation/config.txt  |   5 +
Documentation/git-clone.txt   |   7 +-
Documentation/git-fetch-pack.txt  |  11 +-
Documentation/git-receive-pack.txt|  16 ++-
Documentation/git-send-pack.txt   |   9 +-
Documentation/git-upload-pack.txt |  13 ++-
Documentation/technical/pack-protocol.txt |  76 -
builtin/clone.c   |  14 ++-
builtin/fetch-pack.c  |   6 +-
builtin/receive-pack.c|  76 +++--
builtin/send-pack.c   |   7 +-
cache.h   |   4 +-
commit.h  |  27 +
config.c  |   5 +
connect.c |  12 +-
environment.c |   1 +
fetch-pack.c  |  90 ++-
fetch-pack.h  |   1 +
remote-curl.c |   4 +-
send-pack.c   |  57 +-
send-pack.h   |   4 +-
shallow.c | 147 
+

t/t5530-upload-pack-error.sh  |   3 -
t/t5536-fetch-shallow.sh (new +x) | 141 

t/t5537-push-shallow.sh (new +x)  | 176 
++

t/t5601-clone.sh  |   7 ++
transport.c   |  14 ++-
upload-pack.c | 132 ++
28 files changed, 858 insertions(+), 207 deletions(-)
create mode 100755 t/t5536-fetch-shallow.sh
create mode 100755 t/t5537-push-shallow.sh

--
1.8.2.83.gc99314b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2013.0.3349 / Virus Database: 3204/6504 - Release Date: 
07/19/13




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/16] First class shallow clone

2013-07-20 Thread Nguyễn Thái Ngọc Duy
v2 includes:

 - fix Junio comments, especially the one that may lead to incomplete
   commit islands.
 - fix send-pack setting up temporary shallow file, but never passes
   it to index-pack/unpack-objects (also fix the tests to catch this)
 - support smart http
 - add core.noshallow for repos that wish to be always complete
 - fix locally cloning a shallow repository
 - make upload-pack pass --shallow-file to pack-objects in order to
   remove duplicate object counting code just for shallow case.

Nguyễn Thái Ngọc Duy (16):
  send-pack: forbid pushing from a shallow repository
  {receive,upload}-pack: advertise shallow graft information
  connect.c: teach get_remote_heads to parse "shallow" lines
  Move setup_alternate_shallow and write_shallow_commits to shallow.c
  fetch-pack: support fetching from a shallow repository
  {send,receive}-pack: support pushing from a shallow clone
  send-pack: support pushing to a shallow clone
  upload-pack: let pack-objects do the object counting in shallow case
  pack-protocol.txt: a bit about smart http
  Add document for command arguments for supporting smart http
  {fetch,upload}-pack: support fetching from a shallow clone via smart http
  receive-pack: support pushing to a shallow clone via http
  send-pack: support pushing from a shallow clone via http
  git-clone.txt: remove shallow clone limitations
  config: add core.noshallow to prevent turning a repo into a shallow one
  clone: use git protocol for cloning shallow repo locally

 Documentation/config.txt  |   5 +
 Documentation/git-clone.txt   |   7 +-
 Documentation/git-fetch-pack.txt  |  11 +-
 Documentation/git-receive-pack.txt|  16 ++-
 Documentation/git-send-pack.txt   |   9 +-
 Documentation/git-upload-pack.txt |  13 ++-
 Documentation/technical/pack-protocol.txt |  76 -
 builtin/clone.c   |  14 ++-
 builtin/fetch-pack.c  |   6 +-
 builtin/receive-pack.c|  76 +++--
 builtin/send-pack.c   |   7 +-
 cache.h   |   4 +-
 commit.h  |  27 +
 config.c  |   5 +
 connect.c |  12 +-
 environment.c |   1 +
 fetch-pack.c  |  90 ++-
 fetch-pack.h  |   1 +
 remote-curl.c |   4 +-
 send-pack.c   |  57 +-
 send-pack.h   |   4 +-
 shallow.c | 147 +
 t/t5530-upload-pack-error.sh  |   3 -
 t/t5536-fetch-shallow.sh (new +x) | 141 
 t/t5537-push-shallow.sh (new +x)  | 176 ++
 t/t5601-clone.sh  |   7 ++
 transport.c   |  14 ++-
 upload-pack.c | 132 ++
 28 files changed, 858 insertions(+), 207 deletions(-)
 create mode 100755 t/t5536-fetch-shallow.sh
 create mode 100755 t/t5537-push-shallow.sh

-- 
1.8.2.83.gc99314b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html