RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Konstantin Ryabitsev
Hi, all:

Every now and again I come across a patch sent to LKML without a leading
"diff a/foo b/foo" -- usually produced by quilt. E.g.:

https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/

I am guessing quilt does not bother including the leading "diff a/foo
b/foo" because it's redundant with the next two lines, however this
remains a valid patch recognized by git-am.

If you pipe that patch via git-patch-id, it produces nothing, but if I
put in the leading "diff", like so:

diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c

then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".

Can we please teach git-patch-id to work without the leading diff a/foo
b/foo, same as git-am?

Best,
-K


signature.asc
Description: PGP signature


Re: insteadOf and git-request-pull output

2018-11-15 Thread Konstantin Ryabitsev
On Thu, Nov 15, 2018 at 07:54:32PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > I think that if we use the "principle of least surprise," insteadOf
> > rules shouldn't be applied for git-request-pull URLs.
> 
> I haven't used request-pull so I don't have much of an opinion on this,
> but do you think the same applies to 'git remote get-url '?
> 
> I.e. should it also show the original unmunged URL, or the munged one as
> it does now?

I don't know, maybe both? As opposed to git-request-pull, this is not
exposing the insteadOf URL to someone other than the person who set it
up, so even if it does return the munged URL, it wouldn't be unexpected.

-K


insteadOf and git-request-pull output

2018-11-15 Thread Konstantin Ryabitsev
Hi, all:

Looks like setting url.insteadOf rules alters the output of
git-request-pull. I'm not sure that's the intended use of insteadOf,
which is supposed to replace URLs for local use, not to expose them
publicly (but I may be wrong). E.g.:

$ git request-pull HEAD^ git://foo.example.com/example | grep example
  git://foo.example.com/example

$ git config url.ssh://bar.insteadOf git://foo

$ git request-pull HEAD^ git://foo.example.com/example | grep example
  ssh://bar.example.com/example

I think that if we use the "principle of least surprise," insteadOf
rules shouldn't be applied for git-request-pull URLs.

Best,
-K


Re: Generate more revenue from Git

2018-05-17 Thread Konstantin Ryabitsev

Michal:

This is strictly a development list. If you would like to discuss any
and all monetization features, please feel free to reach out to me via
email.

Regards,
-K

On Thu, May 17, 2018 at 04:45:18PM +0300, Michal Sapozhnikov wrote:

Hi,

I would like to schedule a quick call this week.

What's the best way to schedule a 15 minute call?

Thanks,
--
Michal Sapozhnikov | Business Manager, Luminati SDK | +972-50-2826778 | Skype:
live:michals_43
http://luminati.io/sdk

On 10-May-18 14:04, 7d (by eremind) wrote:


Hi,

I am writing with the hope of talking to the appropriate person who handles
the
app's monetization.
If it makes sense to have a call, let me know how your schedule looks.

Best Regards,



Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev
On 05/16/18 15:37, Jeff King wrote:
> Yes, that's pretty close to what we do at GitHub. Before doing any
> repacking in the mother repo, we actually do the equivalent of:
> 
>   git fetch --prune ../$id.git +refs/*:refs/remotes/$id/*
>   git repack -Adl
> 
> from each child to pick up any new objects to de-duplicate (our "mother"
> repos are not real repos at all, but just big shared-object stores).

Yes, I keep thinking of doing the same, too -- instead of using
torvalds/linux.git for alternates, have an internal repo where objects
from all forks are stored. This conversation may finally give me the
shove I've been needing to poke at this. :)

Is your delta-islands patch heading into upstream, or is that something
that's going to remain external?

> I say "equivalent" because those commands can actually be a bit slow. So
> we do some hacky tricks like directly moving objects in the filesystem.
> 
> In theory the fetch means that it's safe to actually prune in the mother
> repo, but in practice there are still races. They don't come up often,
> but if you have enough repositories, they do eventually. :)

I feel like a whitepaper on "how we deal with bajillions of forks at
GitHub" would be nice. :) I was previously told that it's unlikely such
paper could be written due to so many custom-built things at GH, but I
would be very happy if that turned out not to be the case.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev
On 05/16/18 15:23, Jeff King wrote:
> I implemented "repack -k", which keeps all objects and just rolls them
> into the new pack (along with any currently-loose unreachable objects).
> Aside from corner cases (e.g., where somebody accidentally added a 20GB
> file to an otherwise 100MB-repo and then rolled it back), it usually
> doesn't significantly affect the repository size.

Hmm... I should read manpages more often! :)

So, do you suggest that this is a better approach:

- mother repos: "git repack -adk"
- child repos: "git repack -Adl" (followed by prune)

Currently, we do "-Adl" regardless, but we already track whether a repo
is being used for alternates anywhere (so we don't prune it) and can do
different flags if that improves performance.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev
On 05/16/18 15:03, Martin Fick wrote:
>> I'm undecided about that. On the one hand this does create
>> lots of small files and inevitably causes (some)
>> performance degradation. On the other hand, I don't want
>> to keep useless objects in the pack, because that would
>> also cause performance degradation for people cloning the
>> "mother repo." If my assumptions on any of that are
>> incorrect, I'm happy to learn more.
> My suggestion is to use science, not logic or hearsay. :) 
> i.e. test it!

I think the answer will be "it depends." In many of our cases the repos
that need those loose objects are rarely accessed -- usually because
they are forks with older data (hence why they need objects that are no
longer used by the mother repo). Therefore, performance impacts of
occasionally touching a handful of loose objects will be fairly
negligible. This is especially true on non-spinning media where seek
times are low anyway. Having slimmer packs for the mother repo would be
more beneficial in this case.

On the other hand, if the "child repo" is frequently used, then the
impact of needing a bunch of loose objects would be greater. For the
sake of simplicity, I think I'll leave things as they are -- it's
cheaper to fix this via reducing seek times than by applying complicated
logic trying to optimize on a per-repo basis.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev
On 05/16/18 14:26, Martin Fick wrote:
> If you are going to keep the unreferenced objects around 
> forever, it might be better to keep them around in packed 
> form?

I'm undecided about that. On the one hand this does create lots of small
files and inevitably causes (some) performance degradation. On the other
hand, I don't want to keep useless objects in the pack, because that
would also cause performance degradation for people cloning the "mother
repo." If my assumptions on any of that are incorrect, I'm happy to
learn more.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev
On 05/16/18 14:02, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, May 16 2018, Konstantin Ryabitsev wrote:
> 
>> Maybe git-repack can be told to only borrow parent objects if they are
>> in packs. Anything not in packs should be hardlinked into the child
>> repo. That's my wishful think for the day. :)
> 
> Can you elaborate on how this would help?
> 
> We're just going to create loose objects on interactive "git commit",
> presumably you're not adding someone's working copy as the alternate.

The loose objects I'm thinking of are those that are generated when we
do "git repack -Ad" -- this takes all unreachable objects and loosens
them (see man git-repack for more info). Normally, these would be pruned
after a certain period, but we're deliberately keeping them around
forever just in case another repo relies on them via alternates. I want
those repos to "claim" these loose objects via hardlinks, such that we
can run git-prune on the mother repo instead of dragging all the
unreachable objects on forever just in case.

> Otherwise if it's just being pushed to all those pushes are going to be
> in packs, and the packs may contain e.g. pushes for the "pu" branch or
> whatever, which are objects that'll go away.

There are lots of cases where unreachable objects in one repo would
never become unreachable in another -- for example, if the author had
stopped updating it.

Hope this helps.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev
On 05/16/18 13:14, Martin Fick wrote:
> On Wednesday, May 16, 2018 10:58:19 AM Konstantin Ryabitsev 
> wrote:
>>
>> 1. Find every repo mentioning the parent repository in
>> their alternates 2. Repack them without the -l switch
>> (which copies all the borrowed objects into those repos)
>> 3. Once all child repos have been repacked this way, prune
>> the parent repo (it's safe now)
> 
> This is probably only true if the repos are in read-only 
> mode?  I suspect this is still racy on a busy server with no 
> downtime.

We don't actually do this anywhere. :) It's a feature I keep hoping to
add one day to grokmirror, but keep putting off because of various
considerations. As you can imagine, if we have 300 forks of linux.git
all using torvalds/linux.git as their alternates, then repacking them
all without -l would balloon our disk usage 300-fold. At this time it's
just cheaper to keep a bunch of loose objects around forever at the cost
of decreased performance.

Maybe git-repack can be told to only borrow parent objects if they are
in packs. Anything not in packs should be hardlinked into the child
repo. That's my wishful think for the day. :)

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev

On Wed, May 16, 2018 at 05:34:34PM +0200, Ævar Arnfjörð Bjarmason wrote:

I may have missed some edge case, but I believe this entire workaround
isn't needed if you guarantee that the parent repo doesn't contain any
objects that will get un-referenced.


You can't guarantee that, because the parent repo can have its history
rewritten either via a forced push, or via a rebase. Obviously, this
won't happen in something like torvalds/linux.git, which is why it's
pretty safe to alternate off of that repo for us, but codeaurora.org
repos aren't always strictly-ff (e.g. because they may rebase themselves
based on what is in upstream AOSP repos) -- so objects in them may
become unreferenced and pruned away, corrupting any repos using them for
alternates.


I'm very interested in GVFS, because it would certainly make my life
easier maintaining source.codeaurora.org, which is many thousands of
repos that are mostly forks of the same stuff. However, GVFS appears to
only exist for Windows (hint-hint, nudge-nudge). :)


This should make you happy:

https://arstechnica.com/gadgets/2017/11/microsoft-and-github-team-up-to-take-git-virtual-file-system-to-macos-linux/

But I don't know what the current status is or where it can be followed.


Very good to know, thanks!

-K


Re: worktrees vs. alternates

2018-05-16 Thread Konstantin Ryabitsev
On 05/16/18 09:02, Derrick Stolee wrote:
> This is the biggest difference. You cannot have the same ref checked out
> in multiple worktrees, as they both may edit that ref. The alternates
> allow you to share data in a "read only" fashion. If you have one repo
> that is the "base" repo that manages that objects dir, then that is
> probably a good way to reduce the duplication. I'm not familiar with
> what happens when a "child" repo does 'git gc' or 'git repack', will it
> delete the local objects that is sees exist in the alternate?

The parent repo is not keeping track of any other repositories that may
be using it for alternates, which is why you basically:

1. never run auto-gc in the parent repo
2. repack it manually using -Ad to keep loose objects that other repos
may be borrowing (but we don't know if they are)
3. never prune the parent repo, because this may delete objects other
repos are borrowing

Very infrequently you may consider this extra set of maintenance steps:

1. Find every repo mentioning the parent repository in their alternates
2. Repack them without the -l switch (which copies all the borrowed
objects into those repos)
3. Once all child repos have been repacked this way, prune the parent
repo (it's safe now)
4. Repack child repos again, this time with the -l flag, to get your
savings back.

I would heartily love a way to teach git-repack to recognize when an
object it's borrowing from the parent repo is in danger of being pruned.
The cheapest way of doing this would probably be to hardlink loose
objects into its own objects directory and only consider "safe" objects
those that are part of the parent repository's pack. This should make
alternates a lot safer, just in case git-prune happens to run by accident.

> GVFS uses alternates in this same way: we create a drive-wide "shared
> object cache" that GVFS manages. We put our prefetch packs filled with
> commits and trees in there, and any loose objects that are downloaded
> via the object virtualization are placed as loose objects in the
> alternate. We also store the multi-pack-index and commit-graph in that
> alternate. This means that the only objects in each src dir are those
> created by the developer doing their normal work.

I'm very interested in GVFS, because it would certainly make my life
easier maintaining source.codeaurora.org, which is many thousands of
repos that are mostly forks of the same stuff. However, GVFS appears to
only exist for Windows (hint-hint, nudge-nudge). :)

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: main url for linking to git source?

2018-05-07 Thread Konstantin Ryabitsev
On Tue, May 08, 2018 at 01:51:30AM +, brian m. carlson wrote:
> On Mon, May 07, 2018 at 11:15:46AM -0700, Stefan Beller wrote:
> > There I would try to mirror Junios list of "public repositories"
> > https://git-blame.blogspot.com/p/git-public-repositories.html
> > without officially endorsing one over another.
> 
> I think I would also prefer a list of available repositories over a
> hard-coded choice.  It may be that some places (say, Australia) have
> better bandwidth to one over the other, and users will be able to have a
> better experience with certain mirrors.
> 
> While I'm sympathetic to the idea of referring to kernel.org because
> it's open-source and non-profit, users outside of North America are
> likely to have a less stellar experience with its mirrors, since they're
> all in North America.

I'm a bit worried that I'll come across as some kind of annoying pedant,
but git.kernel.org is actually 6 different systems available in US,
Europe, Hong Kong, and Australia. :)

We use geodns to map users to the nearest server (I know, GeoDNS is not
the best, but it's what we have for free).

-K



signature.asc
Description: PGP signature


Re: main url for linking to git source?

2018-05-07 Thread Konstantin Ryabitsev
On 05/07/18 07:38, Johannes Schindelin wrote:
>> The git-scm.com site currently links to https://github.com/git/git for
>> the (non-tarball) source code. Somebody raised the question[1] of
>> whether it should point to kernel.org instead.
>>
>> Do people find one interface more or less pleasing than the other? Do we
>> want to prefer kernel.org as more "official" or less commercial?
> 
> I don't really care about "official" vs "commercial", as kernel.org is
> also run by a business, so it is all "commercial" to me.

Kernel.org is a registered US non-profit organization, managed by a
non-profit industry consortium (The Linux Foundation). The entire stack
behind kernel.org is free software, excepting any firmware blobs on the
physical hardware.

I'm not trying to influence anyone's opinion of where the links should
be pointing at, but it's important to point out that kernel.org and
GitHub serve different purposes:

- kernel.org provides free-as-in-liberty archive hosting on a platform
that is not locked into any vendor.

- github.com provides an integrated development infrastructure that is
fully closed-source, excepting the protocols.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: Is offloading to GPU a worthwhile feature?

2018-04-09 Thread Konstantin Ryabitsev
On 04/08/18 09:59, Jakub Narebski wrote:
>> This is an entirely idle pondering kind of question, but I wanted to
>> ask. I recently discovered that some edge providers are starting to
>> offer systems with GPU cards in them -- primarily for clients that need
>> to provide streaming video content, I guess. As someone who needs to run
>> a distributed network of edge nodes for a fairly popular git server, I
>> wondered if git could at all benefit from utilizing a GPU card for
>> something like delta calculations or compression offload, or if benefits
>> would be negligible.
> 
> The problem is that you need to transfer the data from the main memory
> (host memory) geared towards low-latency thanks to cache hierarchy, to
> the GPU memory (device memory) geared towards bandwidth and parallel
> access, and back again.  So to make sense the time for copying data plus
> the time to perform calculations on GPU (and not all kinds of
> computations can be speed up on GPU -- you need fine-grained massively
> data-parallel task) must be less than time to perform calculations on
> CPU (with multi-threading).

Would something like this be well-suited for tasks like routine fsck,
repacking and bitmap generation? That's the kind of workloads I was
imagining it would be most well-suited for.

> Also you would need to keep non-GPU and GPGPU code in sync.  Some parts
> of code do not change much; and there also solutions to generate dual
> code from one source.
> 
> Still, it might be good idea,

I'm still totally the wrong person to be implementing this, but I do
have access to Packet.net's edge systems which carry powerful GPUs for
projects that might be needing these for video streaming services. It
seems a shame to have them sitting idle if I can offload some of the
RAM- and CPU-hungry tasks like repacking to be running there.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: The most efficient way to test if repositories share the same objects

2018-03-23 Thread Konstantin Ryabitsev
On 03/22/18 17:44, Junio C Hamano wrote:
> Wouldn't it be more efficient to avoid doing so one-by-one?  
> That is, wouldn't
> 
>   rev-list --max-parents=0 --all
> 
> be a bit faster than
> 
>   for-each-ref |
>   while read object type refname
>   do
>   rev-list --max-parents=0 $refname
>   done
> 
> I wonder?

Yeah, you're right -- I forgot that we can pass --all. The check takes
30 seconds, which is a lot better than 12 hours. :) It's a bit heavy
still, but msm kernel repos are one of the heaviest outliers, so let me
try to run with this.

Thanks for the suggestion!

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


Re: The most efficient way to test if repositories share the same objects

2018-03-22 Thread Konstantin Ryabitsev
On 03/22/18 15:35, Junio C Hamano wrote:
> I am not sure how Konstantin defines "the most efficient", but if it
> is "with the smallest number of bits exchanged between the
> repositories", then the answer would probably be to find the root
> commit(s) in each repository and if they share any common root(s).
> If there isn't then there is no hope to share objects between them,
> of course.

Hmm... so, this a cool idea that I'd like to use, but there are two
annoying gotchas:

1. I cannot assume that refs/heads/master is meaningful -- my problem is
actually with something like
https://source.codeaurora.org/quic/la/kernel/msm-3.18 -- you will find
that master is actually unborn and there are 7700 other heads (don't get
me started on that unless you're buying me a lot of drinks).

2. Even if there is a HEAD I know I can use, it's pretty slow on large
repos (e.g. linux.git):

$ time git rev-list --max-parents=0 HEAD
a101ad945113be3d7f283a181810d76897f0a0d6
cd26f1bd6bf3c73cc5afe848677b430ab342a909
be0e5c097fc206b863ce9fe6b3cfd6974b0110f4
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2

real0m6.311s
user0m6.153s
sys 0m0.110s

If I try to do this for each of the 7700 heads, this will take roughly
12 hours.

My current strategy has been pretty much:

git -C msm-3.10.git show-ref --tags -s | sort -u > /tmp/refs1
git -C msm-3.18.git show-ref --tags -s | sort -u > /tmp/refs2

and then checking if an intersection of these matches at least half of
refs in either repo:


#/usr/bin/env python
import numpy

refs1 = numpy.array(open('/tmp/refs1').readlines())
refs2 = numpy.array(open('/tmp/refs2').readlines())

in_common = len(numpy.intersect1d(refs1, refs2))
if in_common > len(refs1)/2 or in_common > len(refs2)/2:
print('Lots of shared refs')
else:
print('None or too few shared refs')


This works well enough at least for those repos with lots of shared
tags, but will miss potentially large repos where there's only heads
that can be pointing at commits that aren't necessarily the same between
two repos.

Thanks for your help!

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation



signature.asc
Description: OpenPGP digital signature


The most efficient way to test if repositories share the same objects

2018-03-22 Thread Konstantin Ryabitsev
Hi, all:

What is the most efficient way to test if repoA and repoB share common
commits? My goal is to automatically figure out if repoB can benefit
from setting alternates to repoA and repacking. I currently do it by
comparing the output of "show-ref --tags -s", but that does not work for
repos without tags.

Best,
-- 
Konstantin Ryabitsev



signature.asc
Description: OpenPGP digital signature


Is offloading to GPU a worthwhile feature?

2018-02-27 Thread Konstantin Ryabitsev
Hi, all:

This is an entirely idle pondering kind of question, but I wanted to
ask. I recently discovered that some edge providers are starting to
offer systems with GPU cards in them -- primarily for clients that need
to provide streaming video content, I guess. As someone who needs to run
a distributed network of edge nodes for a fairly popular git server, I
wondered if git could at all benefit from utilizing a GPU card for
something like delta calculations or compression offload, or if benefits
would be negligible.

I realize this would be silly amounts of work. But, if it's worth it,
perhaps we can benefit from all the GPU computation libs written for
cryptocoin mining and use them for something good. :)

Best,
-- 
Konstantin Ryabitsev



signature.asc
Description: OpenPGP digital signature


Re: Repacking a repository uses up all available disk space

2016-06-12 Thread Konstantin Ryabitsev
On Sun, Jun 12, 2016 at 05:38:04PM -0400, Jeff King wrote:
> > - When attempting to repack, creates millions of files and eventually
> >   eats up all available disk space
> 
> That means these objects fall into the unreachable category. Git will
> prune unreachable loose objects after a grace period based on the
> filesystem mtime of the objects; the default is 2 weeks.
> 
> For unreachable packed objects, their mtime is jumbled in with the rest
> of the objects in the packfile.  So Git's strategy is to "eject" such
> objects from the packfiles into individual loose objects, and let them
> "age out" of the grace period individually.
> 
> Generally this works just fine, but there are corner cases where you
> might have a very large number of such objects, and the loose storage is
> much more expensive than the packed (e.g., because each object is stored
> individually, not as a delta).
> 
> It sounds like this is the case you're running into.
> 
> The solution is to lower the grace period time, with something like:
> 
>   git gc --prune=5.minutes.ago
> 
> or even:
> 
>   git gc --prune=now

You are correct, this solves the problem, however I'm curious. The usual
maintenance for these repositories is a regular run of:

- git fsck --full
- git repack -Adl -b --pack-kept-objects
- git pack-refs --all
- git prune

The reason it's split into repack + prune instead of just gc is because
we use alternates to save on disk space and try not to prune repos that
are used as alternates by other repos in order to avoid potential
corruption.

Am I not doing something that needs to be doing in order to avoid the
same problem?

Thanks for your help.

Regards,
-- 
Konstantin Ryabitsev
Linux Foundation Collab Projects
Montréal, Québec


signature.asc
Description: PGP signature


Repacking a repository uses up all available disk space

2016-06-12 Thread Konstantin Ryabitsev
Hello:

I have a problematic repository that:

- Takes up 9GB on disk
- Passes 'git fsck --full' with no errors
- When cloned with --mirror, takes up 38M on the target system
- When attempting to repack, creates millions of files and eventually
  eats up all available disk space

Repacking the result of 'git clone --mirror' shows no problem, so it's
got to be something really weird with that particular instance of the
repository.

If anyone is interested in poking at this particular problem to figure
out what causes the repack process to eat up all available disk space,
you can find the tarball of the problematic repository here:

http://mricon.com/misc/src.git.tar.xz (warning: 6.6GB)

You can clone the non-problematic version of this repository from
git://codeaurora.org/quic/chrome4sdp/breakpad/breakpad/src.git

Best,
-- 
Konstantin Ryabitsev
Linux Foundation Collab Projects
Montréal, Québec


signature.asc
Description: PGP signature


Re: Resumable git clone?

2016-03-02 Thread Konstantin Ryabitsev
On Wed, Mar 02, 2016 at 12:41:20AM -0800, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > If you clone a repository, and the connection drops, the next attempt
> > will have to start from scratch.  This can add significant time and
> > expense if you're on a low-bandwidth or metered connection trying to
> > clone something like Linux.
> 
> For this particular issue, your friendly k.org administrator already
> has a solution.  Torvalds/linux.git is made into a bundle weekly
> with
> 
> $ git bundle create clone.bundle --all
> 
> and the result placed on k.org CDN.  So low-bandwidth cloners can
> grab it over resumable http, clone from the bundle, and then fill
> the most recent part by fetching from k.org already.

I finally got around to documenting this here:
https://kernel.org/cloning-linux-from-a-bundle.html

> The tooling to allow this kind of "bundle" (and possibly other forms
> of "CDN offload" material) transparently used by "git clone" was the
> proposal by Shawn Pearce mentioned elsewhere in this thread.

To reiterate, I believe that would be an awesome feature.

Regards,
-- 
Konstantin Ryabitsev
Linux Foundation Collab Projects
Montréal, Québec


signature.asc
Description: PGP signature


Re: [PATCH v2 3/3] http-backend: spool ref negotiation requests to buffer

2015-05-25 Thread Konstantin Ryabitsev
On 20 May 2015 at 03:37, Jeff King p...@peff.net wrote:
 +   /* partial read from read_in_full means we hit EOF */
 +   len += cnt;
 +   if (len  alloc) {
 +   *out = buf;
 +   warning(request size was %lu, (unsigned long)len);
 +   return len;
 +   }

Jeff:

This patch appears to work well -- the only complaint I have is that I
now have warning: request size was NNN all over my error logs. :) Is
it supposed to convey an actual warning message, or is it merely a
debug statement?

Best,
-- 
Konstantin Ryabitsev
Sr. Systems Administrator
Linux Foundation Collab Projects
541-224-6067
Montréal, Québec
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sources for 3.18-rc1 not uploaded

2014-10-20 Thread Konstantin Ryabitsev
On 20/10/14 02:28 PM, Junio C Hamano wrote:
 I have to wonder why 10f343ea (archive: honor tar.umask even for pax
 headers, 2014-08-03) is a problem but an earlier change v1.8.1.1~8^2
 (archive-tar: split long paths more carefully, 2013-01-05), which
 also should have broken bit-for-bit compatibility, went unnoticed,
 though.  What I am getting at is that correcting past mistakes in
 the output should not be forbidden unconditionally with a complaint
 like this.

I think Greg actually ran into that one, and uses a separate 1.7 git
tree for this reason.

I can update our servers to git 2.1 (which most of them already have),
which should help with previous incompatibilities -- but not the future
ones obviously. :)

-K
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sources for 3.18-rc1 not uploaded

2014-10-20 Thread Konstantin Ryabitsev
On 20/10/14 06:28 PM, brian m. carlson wrote:
 Junio, quite frankly, I don't think that that fix was a good idea. I'd
  suggest having a *separate* umask for the pax headers, so that we do
  not  break this long-lasting stability of git archive output in ways
  that are unfixable and not compatible. kernel.org has relied (for a
  *long* time) on being able to just upload the signature of the
  resulting tar-file, because both sides can generate the same tar-fiel
  bit-for-bit.
 It sounds like kernel.org has a bug, then.  Perhaps that's the
 appropriate place to fix the issue.

It's not a bug, it's a feature (TM). KUP relies on git-archive's ability
to create identical tar archives across platforms and versions. The
benefit is that Linus or Greg can create a detached PGP signature
against a tarball created from git archive [tag] on their system, and
just tell kup to create the same archive remotely, thus saving them the
trouble of uploading 80Mb each time they cut a release.

With their frequent travel to places where upload bandwidth is both slow
and unreliable, this ability to not have to upload hundreds of Mbs each
time they cut a release is very handy and certainly helps keep kernel
releases on schedule.

So, while it's fair to point out that git-archive was never intended to
always create bit-for-bit identical outputs, it would be *very nice* if
this remained in place, as at least one large-ish deployment (us) finds
it really handy.

-K



signature.asc
Description: OpenPGP digital signature


Re: git archve --format=tar output changed from 1.8.1 to 1.8.2.1

2013-01-31 Thread Konstantin Ryabitsev
On 31/01/13 12:41 PM, Greg KH wrote:
 Ugh, uploading a 431Mb file, over a flaky wireless connection (I end up
 doing lots of kernel releases while traveling), would be a horrible
 change.  I'd rather just keep using the same older version of git that
 kernel.org is running instead.

Well, we do accept compressed archives, so you would be uploading about
80MB instead of 431MB, but that would still be a problem for anyone
releasing large tarballs over unreliable connections. I know you
routinely do 2-3 releases at once, so that would still mean uploading
120-180MB.

I don't have immediate statistics on how many people release using kup
--tar, but I know that at least you and Linus rely on that exclusively.


Regards,
-- 
Konstantin Ryabitsev
Systems Administrator
Linux Foundation, kernel.org
Montréal, Québec



signature.asc
Description: OpenPGP digital signature