[RFC] Proposal for a new config-based git signing interface
Hello, This is a follow-up on my previous emails related to the proposal of a new signing interface: https://public-inbox.org/git/CACi-FhDeAZecXSM36zroty6kpf2BCWLS=0r+duwub96lqfk...@mail.gmail.com/T/#r43cbf31b86642ab5118e6e7b3d4098bade5f5a0a https://public-inbox.org/git/Z2XOTcGuVovMKhcdrrO08KWI2I7L9s0CyFITvvj3jkmGTQPB6FkCiyOtTm6GdYWbnf25dsPD8M08kDCuD37EE1B-sxHQ3se9Kn1zVBrCPZw=@pm.me/T/#u https://public-inbox.org/git/N31G34oKnfr3MVifk42-Kt3YtM_3fHuCp3V1cpGOK5f1jn1vbg1TaSCy9ukI-YD8qRfu4xMcHcPc78xFE0MSwJQWNrSvuQuer9wSNugNRLg=@pm.me/T/#u https://public-inbox.org/git/8AMhjK19PJ35u3LCR57IvtAzOBN5bKK2vUn0Ns-4mmZzK9U14W5CGW5R8aITNXBm78J4Z7nd09RTVKW2pGaB4PnF7p2PireF_vzRST8DngE=@pm.me/T/#u https://public-inbox.org/git/0oTOrSdJdIaEfs3NVkfRmLxjYRvUPkucwwaXPuhCjS2QL3ztRJLfIlBkcpjSRiZQaY70SKSkg8_w20rxnuD4Vu3IbRcGOZM-fht8G7ySEHk=@pm.me/T/#u https://public-inbox.org/git/T4zS1hogOjySpdv7lDjVaZV83KKSeK9fx8m33SIo-e_BH4RtKcm67btmGzTPeflbRnQr7mWjTpObB0hCkX8VkGZElkQbLEgbrETg6Aq4nUg=@pm.me/T/#u https://public-inbox.org/git/74R10RrvOffzj20d_Owd_1WFMh1bWq8mIhEEBSzbhkHfbvW5BLHZj-L-AgHYnpqkxgZdCfW5b72GoIvKHucQz7tdiGZEzietp0IKpU1_wuI=@pm.me/T/#u The main feedback we received from the previous RFCs was that the drivers for external signing tools were still written in C and that we should go more for a configuration based interface. I'v been thinking about how to go about it and would love to have your feedback on my proposed approach: - Implement updated user configuration to define signing tools - Implement a tool-agnostic signing interface in C code - Add the possibility to use bash helper scripts to drive additional tools in case the default interface don't work as intended. - The same configuration aliases can be passed to command line arguments You can find below a detailed description of the proposed config and command line options: https://hackmd.io/ZHsddYXkSmyb6rYajdyGLg https://hackmd.io/yxS9nfiQSvmRZntcfnHOGQ The configuration part would look like this: ``` [signing] format = openpgp [signing "openpgp"] program = "/usr/bin/gpg" keyring = "--keyring pubring.kbx --no-default-keyring" identity = "--local-user \"Jane Committer \"" sign = "--sign --status-fd=2 --detach-sign --ascii" verify = "--verify --status-fd=2" [signing "openpgp.signature"] regex = "^-BEGIN PGP SIGNATURE-$[^-]*^-END PGP SIGNATURE-$" multiline = true ``` The equivilent command line to do a digitally signed commit looks like: ``` git commit \ --sign --signing-format=openpgp \ --signing-openpgp-program="/usr/bin/gpg" \ --signing-openpgp-keyring="--keyring pubring.kbx --no-default-keyring" \ --signing-openpgp-identity="--local-user \"Jane Committer \"" \ --signing-openpgp-sign="--sign --status-fd=2 --detach-sign --ascii" ``` Cheers, Ibrahim
PROPOSAL
Hello, We wish to retain your service as an intermediary representative on a contract basis.If interested please advise for contract info. Manager.
RES: PROPOSAL.
De: José Luiz Fabris Enviado: terça-feira, 30 de julho de 2019 18:37 Para: José Luiz Fabris Assunto: PROPOSAL. Good Day, I am Mrs.Margaret Ko May-Yee Leung Deputy Managing Director and Executive Director of Chong Hing Bank Limited. I write briefly to seek your collaboration in a multi-million transaction with good return for us on participation reply to my private email address below. Please before we proceed further, I'd like to know your FIRST and LAST name so I will cross check with what I have on my file before proceeding with the details of our proposal. E-mail: margaretkoleung...@gmail.com for more details send FIRST and LAST name to My private email addreess: margaretkoleung...@gmail.com Thank you and I look forward to hearing from you shortly. Regards, Dir. Margaret Ko May-Yee Leung. Esta mensagem (incluindo anexos) contém informação confidencial destinada a um usuário específico e seu conteúdo é protegido por lei. Se você não é o destinatário correto deve apagar esta mensagem. O emitente desta mensagem é responsável por seu conteúdo e endereçamento. Cabe ao destinatário cuidar quanto ao tratamento adequado. A divulgação, reprodução e/ou distribuição sem a devida autorização ou qualquer outra ação sem conformidade com as normas internas do Ifes são proibidas e passíveis de sanção disciplinar, cível e criminal.
Re: [Proposal] git am --check
Duy Nguyen writes: > On Mon, Jun 3, 2019 at 4:29 PM Christian Couder > wrote: >> >> On Sun, Jun 2, 2019 at 7:38 PM Drew DeVault wrote: >> > >> > This flag would behave similarly to git apply --check, or in other words >> > would exit with a nonzero status if the patch is not applicable without >> > actually applying the patch otherwise. >> >> `git am` uses the same code as `git apply` to apply patches, so there >> should be no difference between `git am --check` and `git apply >> --check`. > > One difference (that still annoys me) is "git apply" must be run at > topdir. "git am" can be run anywhere and it will automatically find > topdir. > > "git am" can also consume multiple patches, so it's some extra work if > we just use "git apply" directly, although I don't think that's a very > good argument for "am --check". Another is that "am" has preprocessing phase performed by mailsplit that deals with MIME garbage, which "apply" will totally choke on without even attempting to cope with. I haven't carefully read the "proposal" or any rfc patches yet, but would/should the command make a commit if the patch cleanly applies? I wonder if a "--dry-run" option is more useful (i.e. checks and reports with the exit status *if* the command without "--dry-run" would cleanly succeed, but never makes a commit or touches the index or the working tree), given the motivating use case is a Git aware MUA that helps the user by saying "if you are busy you could perhaps skip this message as the patch would not apply to your tree anyway".
Re: [Proposal] git am --check
On Mon, Jun 3, 2019 at 4:29 PM Christian Couder wrote: > > On Sun, Jun 2, 2019 at 7:38 PM Drew DeVault wrote: > > > > This flag would behave similarly to git apply --check, or in other words > > would exit with a nonzero status if the patch is not applicable without > > actually applying the patch otherwise. > > `git am` uses the same code as `git apply` to apply patches, so there > should be no difference between `git am --check` and `git apply > --check`. One difference (that still annoys me) is "git apply" must be run at topdir. "git am" can be run anywhere and it will automatically find topdir. "git am" can also consume multiple patches, so it's some extra work if we just use "git apply" directly, although I don't think that's a very good argument for "am --check". -- Duy
Re: [Proposal] git am --check
On Sun, Jun 2, 2019 at 7:38 PM Drew DeVault wrote: > > This flag would behave similarly to git apply --check, or in other words > would exit with a nonzero status if the patch is not applicable without > actually applying the patch otherwise. `git am` uses the same code as `git apply` to apply patches, so there should be no difference between `git am --check` and `git apply --check`. > Rationale: I'm working on an email client which has some git > integration, and when you scroll over a patch I want to quickly test its > applicability and show an indication of the result. > > Thoughts on the approach are welcome; my initial naive patch just tried > to add --check to the apply flags but that didn't work as I had hoped. > Will take another crack at a patch soon(ish). Could you tell us about what didn't work as you hoped? And how `git am --check` would be different from `git apply --check`?
[Proposal] git am --check
This flag would behave similarly to git apply --check, or in other words would exit with a nonzero status if the patch is not applicable without actually applying the patch otherwise. Rationale: I'm working on an email client which has some git integration, and when you scroll over a patch I want to quickly test its applicability and show an indication of the result. Thoughts on the approach are welcome; my initial naive patch just tried to add --check to the apply flags but that didn't work as I had hoped. Will take another crack at a patch soon(ish).
Re: Proposal: object negotiation for partial clones
Hi, Matthew DeVore wrote: > On 2019/05/09, at 11:00, Jonathan Tan wrote: >> - Supporting any combination of filter means that we have more to >> implement and test, especially if we want to support more filters in >> the future. In particular, the different filters (e.g. blob, tree) >> have different code paths now in Git. One way to solve it would be to >> combine everything into one monolith, but I would like to avoid it if >> possible (after having to deal with revision walking a few times...) > > I don’t believe there is any need to introduce monolithic code. The > bulk of the filter implementation is in list-objects-filter.c, and I > don’t think the file will get much longer with an additional filter > that “combines” the existing filter. The new filter is likely > simpler than the sparse filter. Once I add the new filter and send > out the initial patch set, we can discuss splitting up the file, if > it appears to be necessary. > > My idea - if it is not clear already - is to add another OO-like > interface to list-objects-filter.c which parallels the 5 that are > already there. Sounds good to me. For what it's worth, my assumption has always been that we would eventually want the filters to be stackable. So I'm glad you're looking into it. Jonathan's reminder to clean up as you go is a welcome one. Thanks, Jonathan
Re: Proposal: object negotiation for partial clones
> On 2019/05/09, at 11:00, Jonathan Tan wrote: > > Thanks for the numbers. Let me think about it some more, but I'm still > reluctant to introduce multiple filter support in the protocol and the > implementation for the following reasons: Correction to the original command - I was tweaking it in the middle of running it, and introduced an error that I didn’t notice. Here is one that will work for an entire repo: $ git rev-list --objects --filter=blob:none HEAD: | awk '{print $1}' | xargs -n 1 git cat-file -s | awk '{ total += $1; print total }' When run to completion, Chromium totaled 17 301 144 bytes. > > - For large projects like Linux and Chromium, it may be reasonable to > expect that an infrequent checkout would result in a few-megabyte > download. Anyone developing on Chromium would definitely consider a 17 MB original clone to be an improvement over the status quo, but it is still not ideal. And the 17MB initial download is only incurred once *assuming* the next idea is implemented: > - (After some in-office discussion) It may be possible to mitigate much > of that by sending root trees that we have as "have" (e.g. by > consulting the reflog), and that wouldn't need any protocol change. This would complicate the code - not in Git itself, but in my FUSE-related logic. We would have to explore the reflog and try to find the closest commits in history to the target commit being checked out. This is sounding a bit hacky and round-about, and it assumes that at the FUSE layer we can detect when a checkout is happening cleanly and sufficiently early (rather than when one of the sub-sub-trees is being accessed). > - Supporting any combination of filter means that we have more to > implement and test, especially if we want to support more filters in > the future. In particular, the different filters (e.g. blob, tree) > have different code paths now in Git. One way to solve it would be to > combine everything into one monolith, but I would like to avoid it if > possible (after having to deal with revision walking a few times...) I don’t believe there is any need to introduce monolithic code. The bulk of the filter implementation is in list-objects-filter.c, and I don’t think the file will get much longer with an additional filter that “combines” the existing filter. The new filter is likely simpler than the sparse filter. Once I add the new filter and send out the initial patch set, we can discuss splitting up the file, if it appears to be necessary. My idea - if it is not clear already - is to add another OO-like interface to list-objects-filter.c which parallels the 5 that are already there.
Re: Proposal: Remembering message IDs sent with git send-email
On 2019-05-09 11:51 AM, Emily Shaffer wrote: > I'm still not sure I see the value of the extra header proposed here. > I'd appreciate an explanation of how you think it would be used, Drew. I'm not just thinking about your run of the mill mail reader, but also mail readers which are aware of git and could use it to provide git-specific features for browsing patchsets. Distinguishing it from the mecahnism used for normal conversation allows us to have fewer heuristics in such software.
Re: Proposal: Remembering message IDs sent with git send-email
Drew DeVault wrote: > --in-reply-to=ask doesn't exist, that's what I'm looking to add. This > convenient storage mechanism is exactly what I'm talking about. Sorry > for the confusion. Using Net::NNTP to query NNTP servers using ->xover([recent-ish range]) to scan for Message-IDs and Subjects matching the current ident could be an option, too. It could cache the xover result for --dry-run and format-patch cases; and Net::NNTP is a standard Perl module. Going online to do this query also benefits people who work across different machines/environments, as it's one less thing to sync. Fwiw, this list has: nntp://news.gmane.org/gmane.comp.version-control.git nntp://news.public-inbox.org/inbox.comp.version-control.git And there's a bunch of kernel lists at nntp://nntp.lore.kernel.org/
Re: Proposal: Remembering message IDs sent with git send-email
On Thu, May 09, 2019 at 12:50:25PM -0400, Drew DeVault wrote: > On 2019-05-08 5:19 PM, Emily Shaffer wrote: > > What I think might be useful (and what I was hoping you were going to > > talk about when I saw the subject line) would be if the Message-Id is > > conveniently stored during `git send-email` on v1 and somehow saved in a > > useful place in order to apply to the In-Reply-To field on v2 > > automatically upon `git format-patch -v2`. I'll admit I didn't know > > about --in-reply-to=ask and that helps with the pain point I've > > experienced sending out v2 before. > > --in-reply-to=ask doesn't exist, that's what I'm looking to add. This > convenient storage mechanism is exactly what I'm talking about. Sorry > for the confusion. Looking at the documentation, I suppose I hadn't realized before that --thread will generate a Message-Id for your cover letter. It does seem like we could teach --thread to check for the previous patch's cover letter in the directory provided by -o. Of course, this wouldn't work if the author was generating v2 and didn't have the v1 files available (i.e. different workstation or different author picking up the set). I'm still not sure I see the value of the extra header proposed here. I'd appreciate an explanation of how you think it would be used, Drew. I don't know much about emailed workflows outside of Git; is this something likely to be useful to other communities? - Emily
Re: Proposal: object negotiation for partial clones
> > On 2019/05/07, at 11:34, Jonathan Tan wrote: > > > > To get an enumeration of available objects, don't you need to use only > > "blob:none"? Combining filters (once that's implemented) will get all > > objects only up to a certain depth. > > > > Combining "tree:" and "blob:none" would allow us to reduce the number > > of trees transmitted, but I would imagine that the savings would be > > significant only for very large repositories. Do you have a specific use > > case in mind that isn't solved by "blob:none"? > > I am interested in supporting large repositories. The savings seem to be > larger than one may expect. I tried the following command on two huge repos > to find out how much it costs to fetch “blob:none” for a single commit: > > $ git rev-list --objects --filter=blob:none HEAD: | xargs -n 2 bash -c 'git > cat-file -s $1' | awk '{ total += $1; print total }' > > Note the “:” after HEAD - this limits it to the current commit. > > And the results were: > - Linux: 2 684 054 bytes > - Chromium: > 16 139 570 bytes (then I got tired of waiting for it to finish) Thanks for the numbers. Let me think about it some more, but I'm still reluctant to introduce multiple filter support in the protocol and the implementation for the following reasons: - For large projects like Linux and Chromium, it may be reasonable to expect that an infrequent checkout would result in a few-megabyte download. - (After some in-office discussion) It may be possible to mitigate much of that by sending root trees that we have as "have" (e.g. by consulting the reflog), and that wouldn't need any protocol change. - Supporting any combination of filter means that we have more to implement and test, especially if we want to support more filters in the future. In particular, the different filters (e.g. blob, tree) have different code paths now in Git. One way to solve it would be to combine everything into one monolith, but I would like to avoid it if possible (after having to deal with revision walking a few times...)
Re: Proposal: Remembering message IDs sent with git send-email
On 2019-05-08 5:19 PM, Emily Shaffer wrote: > What I think might be useful (and what I was hoping you were going to > talk about when I saw the subject line) would be if the Message-Id is > conveniently stored during `git send-email` on v1 and somehow saved in a > useful place in order to apply to the In-Reply-To field on v2 > automatically upon `git format-patch -v2`. I'll admit I didn't know > about --in-reply-to=ask and that helps with the pain point I've > experienced sending out v2 before. --in-reply-to=ask doesn't exist, that's what I'm looking to add. This convenient storage mechanism is exactly what I'm talking about. Sorry for the confusion.
Re: Proposal: Remembering message IDs sent with git send-email
On Wed, May 08, 2019 at 07:10:13PM -0400, Drew DeVault wrote: > I want to gather some thoughts about this. Say you've written a patch > series and are getting ready to send a -v2. If you set > --in-reply-to=ask, it'll show you a list of emails you've recently sent, > and their subject lines, and ask you to pick one to use the message ID > from. It'll set the In-Reply-To header to your selection. It sounds to me like you mean to call this during `git format-patch` - that is, `git format-patch -v2 --cover-letter --in-reply-to=ask master..branch -o branch/`. That should set the In-Reply-To: header on your cover letter. There's also the possibility that you mean `git send-email --in-reply-to=ask branch/v2*` - in which case I imagine the In-Reply-To: is added as the message is sent, but not added to the cover letter text file. > > I'd also like to add a custom header, X-Patch-Supersedes: , > with a similar behavior & purpose. Is the hope to store the message ID you choose from --in-reply-to=ask into the X-Patch-Supersedes: header? I'm not sure I understand what you're trying to solve; if you use `git format-patch --in-reply-to` it sounds like the X-Patch-Supersedes: and In-Reply-To: would be redundant. Is it possible you mean you want (sorry for pseudocode scribblings) [PATCH v2 1/1]->X-Patch-Supersedes = [PATCH 1/1]->Message-Id ? I think that wouldn't look good in a threaded mail client? > > Thoughts? Or maybe I totally misunderstood :) What I think might be useful (and what I was hoping you were going to talk about when I saw the subject line) would be if the Message-Id is conveniently stored during `git send-email` on v1 and somehow saved in a useful place in order to apply to the In-Reply-To field on v2 automatically upon `git format-patch -v2`. I'll admit I didn't know about --in-reply-to=ask and that helps with the pain point I've experienced sending out v2 before. - Emily
Proposal: Remembering message IDs sent with git send-email
I want to gather some thoughts about this. Say you've written a patch series and are getting ready to send a -v2. If you set --in-reply-to=ask, it'll show you a list of emails you've recently sent, and their subject lines, and ask you to pick one to use the message ID from. It'll set the In-Reply-To header to your selection. I'd also like to add a custom header, X-Patch-Supersedes: , with a similar behavior & purpose. Thoughts?
Re: Proposal: object negotiation for partial clones
> On 2019/05/07, at 11:34, Jonathan Tan wrote: > > To get an enumeration of available objects, don't you need to use only > "blob:none"? Combining filters (once that's implemented) will get all > objects only up to a certain depth. > > Combining "tree:" and "blob:none" would allow us to reduce the number > of trees transmitted, but I would imagine that the savings would be > significant only for very large repositories. Do you have a specific use > case in mind that isn't solved by "blob:none"? I am interested in supporting large repositories. The savings seem to be larger than one may expect. I tried the following command on two huge repos to find out how much it costs to fetch “blob:none” for a single commit: $ git rev-list --objects --filter=blob:none HEAD: | xargs -n 2 bash -c 'git cat-file -s $1' | awk '{ total += $1; print total }' Note the “:” after HEAD - this limits it to the current commit. And the results were: - Linux: 2 684 054 bytes - Chromium: > 16 139 570 bytes (then I got tired of waiting for it to finish)
Re: Proposal: object negotiation for partial clones
> > My main question is: we can get the same list of objects (in the form of > > tree objects) if we fetch with "blob:none" filter. Admittedly, we will > > get extra data (file names, etc.) - if the extra bandwidth saving is > > necessary, this should be called out. (And some of the savings will be > > offset by the fact that we will actually need some of those tree > > objects.) > That's a very good point. The data the first request gives us is > basically the tree objects minus file names and modes. So I think a > better feature to implement would be combining of multiple filters. > That way, the client can combine "tree:" and > "blob:none" and basically get an "enumeration" of available objects. To get an enumeration of available objects, don't you need to use only "blob:none"? Combining filters (once that's implemented) will get all objects only up to a certain depth. Combining "tree:" and "blob:none" would allow us to reduce the number of trees transmitted, but I would imagine that the savings would be significant only for very large repositories. Do you have a specific use case in mind that isn't solved by "blob:none"?
Re: Proposal: object negotiation for partial clones
Matthew DeVore wrote: > On 2019/05/06, at 12:46, Jonathan Nieder wrote: >> Ah, interesting. When this was discussed before, the proposal has been >> that the client can say "have" anyway. They don't have the commit and >> all referenced objects, but they have the commit and a *promise* that >> they can obtain all referenced objects, which is almost as good. >> That's what "git fetch" currently implements. > > Doesn’t that mean the “have” may indicate that the client has the > entire repository already, even though it’s only a partial clone? If > so, then the client intends to ask for some tree plus trees and > blobs 2-3 levels down deeper, how would the server distinguish > between those objects the client *really* has and those that were > just promised to them? Because the whole purpose of this > hypothetical request is to get a bunch of promises fulfilled of > which 0-99% are fulfilled already. For blobs, the answer is simple: the server returns any object explicitly named in a "want", even if the client already should have it. For trees, the current behavior is the same: if you declare that you "have" everything, then if you "want" a tree with filter tree:2, you only get that tree. So here there's already room for improvement. [...] > Maybe something like this (conceptually based on original proposal) ? > > 1. Client sends request for an object or objects with an extra flag > which means “I can’t really tell you what I already have since it’s > a chaotic subset of the object database of the repo” > > 2. Server responds back with set of objects, represented by deltas > if that is how the server has them on disk, along with a list of > object-IDs needed in order to resolve the content of all the > objects. These object-IDs can go several layers of deltas back, and > they go back as far as it takes to get to an object stored in its > entirety by the server. > > 3. Client responds back with another request (this time the extra > flag sent from step 1 is not necessary) which has “want”s for every > object the server named which the client already has. > > Very hand-wavey, but I think you see my idea. The only downside I see is that the list of objects may itself be large, and the server has to check reachability for each one. But maybe that's fine. Perhaps after that initial response, instead of sending the list of individual objects the client wants, it could send a list of relevant objects it has (combined with the original set of "want"s). That could be a smaller request and it means less work for the server to check each "want" for reachability. What do you think? [...] > That's a very good point. The data the first request gives us is > basically the tree objects minus file names and modes. So I think a > better feature to implement would be combining of multiple filters. > That way, the client can combine "tree:" and > "blob:none" and basically get an "enumeration" of available objects. This might be simpler. Combining filters would be useful for other uses, too. Thanks, Jonathan
Re: Proposal: object negotiation for partial clones
> On 2019/05/06, at 12:46, Jonathan Nieder wrote: > > Hi, > > Jonathan Tan wrote: >> Matthew DeVore wrote: > >>> I'm considering implementing a feature in the Git protocol which would >>> enable efficient and accurate object negotiation when the client is a >>> partial clone. I'd like to refine and get some validation of my >>> approach before I start to write any code, so I've written a proposal >>> for anyone interested to review. Your comments would be appreciated. >> >> Thanks. Let me try to summarize: The issue is that, during a fetch, >> normally the client can say "have" to inform the server that it has a >> commit and all its referenced objects (barring shallow lines), but we >> can't do the same if the client is a partial clone (because having a >> commit doesn't necessarily mean that we have all referenced objects). > > Ah, interesting. When this was discussed before, the proposal has been > that the client can say "have" anyway. They don't have the commit and > all referenced objects, but they have the commit and a *promise* that > they can obtain all referenced objects, which is almost as good. > That's what "git fetch" currently implements. Doesn’t that mean the “have” may indicate that the client has the entire repository already, even though it’s only a partial clone? If so, then the client intends to ask for some tree plus trees and blobs 2-3 levels down deeper, how would the server distinguish between those objects the client *really* has and those that were just promised to them? Because the whole purpose of this hypothetical request is to get a bunch of promises fulfilled of which 0-99% are fulfilled already. > > For blob filters, if I ignore the capability advertisements (there's > an optimization that hasn't yet been implemented to allow > single-round-trip fetches), the current behavior takes the same number > of round trips as this proposal. Where the current approach has been > lacking is in delta base selection during fetch-on-demand. Ideas for > improving that? Maybe something like this (conceptually based on original proposal) ? 1. Client sends request for an object or objects with an extra flag which means “I can’t really tell you what I already have since it’s a chaotic subset of the object database of the repo” 2. Server responds back with set of objects, represented by deltas if that is how the server has them on disk, along with a list of object-IDs needed in order to resolve the content of all the objects. These object-IDs can go several layers of deltas back, and they go back as far as it takes to get to an object stored in its entirety by the server. 3. Client responds back with another request (this time the extra flag sent from step 1 is not necessary) which has “want”s for every object the server named which the client already has. Very hand-wavey, but I think you see my idea.
Re: Proposal: object negotiation for partial clones
On Mon, May 6, 2019 at 12:28 PM Jonathan Tan wrote: > > > I'm considering implementing a feature in the Git protocol which would > > enable efficient and accurate object negotiation when the client is a > > partial clone. I'd like to refine and get some validation of my > > approach before I start to write any code, so I've written a proposal > > for anyone interested to review. Your comments would be appreciated. > > Thanks. Let me try to summarize: The issue is that, during a fetch, > normally the client can say "have" to inform the server that it has a > commit and all its referenced objects (barring shallow lines), but we > can't do the same if the client is a partial clone (because having a > commit doesn't necessarily mean that we have all referenced objects). > And not doing this means that the server sends a lot of unnecessary > objects in the sent packfile. The solution is to do the fetch in 2 > parts: one to get the list of objects that would be sent, and after the > client filters that, one to get the objects themselves. > > It was unclear to me whether this is meant for (1) fetches directly > initiated by the user that fetch commits (e.g. "git fetch origin", > reusing the configured "core.partialclonefilter") and/or for (2) lazy > fetching of missing objects. My assumption is that this is only for (2). Yes, that was my intention. The client doesn't really know anything about the hashes reported, so it can't really make an informed selection from the candidate list given by the server after the first request. I guess if we wanted to just reject *all* objects on the initial clone, this feature would make that possible. But that can also be achieved more embracively with a better filter system. > > My main question is: we can get the same list of objects (in the form of > tree objects) if we fetch with "blob:none" filter. Admittedly, we will > get extra data (file names, etc.) - if the extra bandwidth saving is > necessary, this should be called out. (And some of the savings will be > offset by the fact that we will actually need some of those tree > objects.) That's a very good point. The data the first request gives us is basically the tree objects minus file names and modes. So I think a better feature to implement would be combining of multiple filters. That way, the client can combine "tree:" and "blob:none" and basically get an "enumeration" of available objects. > > Assuming that we do need that bandwidth saving, here's my review of that > document. > > The document describes the 1st request exactly as I envision - a > specific parameter sent by the client, and the server responds with a > list of object names. > > For the 2nd request, the document describes it as repeating the original > query of the 1st request while also giving the full list of objects > wanted as "choose-refs". I'm still not convinced that repeating the > original query is necessary - I would just give the list of objects as > wants. The rationale given for repeating the original query is: > > > The original query is helpful because it means the server only needs > > to do a single reachability check, rather than many separate ones. > > But this omits the fact that, if doing it the document's way, the server > needs to perform an object walk in addition to the "single reachability > check", and it is not true that if doing it my way, "many separate ones" > need to be done because the server can check reachability of all objects > at once. After considering more carefully how reachability works (and getting your explanation of it out-of-band), I would assume that my approach is no better than marginally faster, and possibly worse, than just doing a plain reachability check of multiple objects using the current implementation. My current priorities preclude this kind of benchmarking+micro-optimization. So I believe what is more important to me is to simply enable combining multiple filters. > > Also, my way means that supporting the 2nd request does not require any > code or protocol change - it already works today. Assuming we follow my > approach, the discussion thus lies in supporting the 1st request. > > Some more thoughts: > > - Changes in server and client scalability: Currently, the server checks > reachability of all wants, then enumerates, then sends all objects. > With this change, the server checks reachability of all wants, then > enumerates, then sends an object list, then checks reachability of all > objects in the filtered list, then sends some objects. There is > additional overhead in the extra reachability check and lists of
Re: Proposal: object negotiation for partial clones
Hi, Jonathan Tan wrote: > Matthew DeVore wrote: >> I'm considering implementing a feature in the Git protocol which would >> enable efficient and accurate object negotiation when the client is a >> partial clone. I'd like to refine and get some validation of my >> approach before I start to write any code, so I've written a proposal >> for anyone interested to review. Your comments would be appreciated. > > Thanks. Let me try to summarize: The issue is that, during a fetch, > normally the client can say "have" to inform the server that it has a > commit and all its referenced objects (barring shallow lines), but we > can't do the same if the client is a partial clone (because having a > commit doesn't necessarily mean that we have all referenced objects). Ah, interesting. When this was discussed before, the proposal has been that the client can say "have" anyway. They don't have the commit and all referenced objects, but they have the commit and a *promise* that they can obtain all referenced objects, which is almost as good. That's what "git fetch" currently implements. But there's a hitch: when doing the fetch-on-demand for an object access, the client currently does not say "have". Sure, even there, they have a *promise* that they can obtain all referenced objects, but this could get out of hand: the first pack may contain a delta against an object the client doesn't have, triggering another fetch which contains a delta against another object they don't have, and so on. Too many round trips. > And not doing this means that the server sends a lot of unnecessary > objects in the sent packfile. The solution is to do the fetch in 2 > parts: one to get the list of objects that would be sent, and after the > client filters that, one to get the objects themselves. This helps with object selection but not with delta base selection. For object selection, I think the current approach already works okay, at least where tree and blob filters are involved. For commit filters, in the current approach the fetch-on-demand sends way too much because there's no "filter=commit:none" option to pass. Is that what this proposal aims to address? For blob filters, if I ignore the capability advertisements (there's an optimization that hasn't yet been implemented to allow single-round-trip fetches), the current behavior takes the same number of round trips as this proposal. Where the current approach has been lacking is in delta base selection during fetch-on-demand. Ideas for improving that? Thanks, Jonathan
Re: Proposal: object negotiation for partial clones
> I'm considering implementing a feature in the Git protocol which would > enable efficient and accurate object negotiation when the client is a > partial clone. I'd like to refine and get some validation of my > approach before I start to write any code, so I've written a proposal > for anyone interested to review. Your comments would be appreciated. Thanks. Let me try to summarize: The issue is that, during a fetch, normally the client can say "have" to inform the server that it has a commit and all its referenced objects (barring shallow lines), but we can't do the same if the client is a partial clone (because having a commit doesn't necessarily mean that we have all referenced objects). And not doing this means that the server sends a lot of unnecessary objects in the sent packfile. The solution is to do the fetch in 2 parts: one to get the list of objects that would be sent, and after the client filters that, one to get the objects themselves. It was unclear to me whether this is meant for (1) fetches directly initiated by the user that fetch commits (e.g. "git fetch origin", reusing the configured "core.partialclonefilter") and/or for (2) lazy fetching of missing objects. My assumption is that this is only for (2). My main question is: we can get the same list of objects (in the form of tree objects) if we fetch with "blob:none" filter. Admittedly, we will get extra data (file names, etc.) - if the extra bandwidth saving is necessary, this should be called out. (And some of the savings will be offset by the fact that we will actually need some of those tree objects.) Assuming that we do need that bandwidth saving, here's my review of that document. The document describes the 1st request exactly as I envision - a specific parameter sent by the client, and the server responds with a list of object names. For the 2nd request, the document describes it as repeating the original query of the 1st request while also giving the full list of objects wanted as "choose-refs". I'm still not convinced that repeating the original query is necessary - I would just give the list of objects as wants. The rationale given for repeating the original query is: > The original query is helpful because it means the server only needs > to do a single reachability check, rather than many separate ones. But this omits the fact that, if doing it the document's way, the server needs to perform an object walk in addition to the "single reachability check", and it is not true that if doing it my way, "many separate ones" need to be done because the server can check reachability of all objects at once. Also, my way means that supporting the 2nd request does not require any code or protocol change - it already works today. Assuming we follow my approach, the discussion thus lies in supporting the 1st request. Some more thoughts: - Changes in server and client scalability: Currently, the server checks reachability of all wants, then enumerates, then sends all objects. With this change, the server checks reachability of all wants, then enumerates, then sends an object list, then checks reachability of all objects in the filtered list, then sends some objects. There is additional overhead in the extra reachability check and lists of objects being sent twice (once by server and once by client), but sending fewer objects means that I/O (server, network, client) and disk space usage (client) is reduced. - Usefulness outside partial clone: If the user ever wants a list of objects referenced by an object but without their file names, the user could use this, but I can't think of such a scenario.
Re: Proposal: object negotiation for partial clones
Hi, Matthew DeVore wrote: > I'm considering implementing a feature in the Git protocol which would > enable efficient and accurate object negotiation when the client is a > partial clone. I'd like to refine and get some validation of my > approach before I start to write any code, so I've written a proposal > for anyone interested to review. Your comments would be appreciated. Yay! Thanks for looking into this, and sorry I didn't respond sooner. I know the doc has a "use case" section, but I suppose I am not sure that I understand the use case yet. Is this about improving the filter syntax to handle features like directory listing? Or is this about being able to make better use of deltas in a partial clone, to decrease bandwidth consumption and overhead that is proportional to size? Thanks, Jonathan
Proposal: object negotiation for partial clones
Hello, I'm considering implementing a feature in the Git protocol which would enable efficient and accurate object negotiation when the client is a partial clone. I'd like to refine and get some validation of my approach before I start to write any code, so I've written a proposal for anyone interested to review. Your comments would be appreciated. Remember this is a publicly-accessible document so be sure to not discuss any confidential topics in the comments! Tiny URL: http://tinyurl.com/yxz747cy Full URL: https://docs.google.com/document/d/1bcDKCgd2Dw5Cl6H9TrNi0ekqzaT8rbyK8EpPE3RcvPA/edit# Thank you, Matt
Re: [GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries.
Junio C Hamano writes: > As to the design, it does not quite matter if you add four or more > separate trees to represent stage #[0123] entries in the index to > the already octopus merge commit that represents a stash entry ... I forgot that I was planning to expand on this part while writing the message I am following up. There are a few things you must take into account while designing a new format for a stash entry: - Your new feature will *NOT* be the last extension to the stash subsystem. Always leave room to other developers to extend it further, without breaking backward compatiblity when your new feature is int in use. - Even though you may have never encountered in your projects, higher stage entries can have duplicates. When merging two branches into your current branch, and there are three merge bases for such an octopus merge, the system (and the index format) is designed to allow a merge backend to store 3 stage #1 entries (because there are that many common ancestor versions in the example), 1 stage #2 entry (because there is only one "current brahch" a merge is made into) and 2 stage #3 entries (because there are that many other branches you are merging into the current branch), all for the same path. So, a design that says: A stash entry in the current system is recorded as a merge commit, whose tree represents the state of the tracked working tree files, whose first parent records the HEAD commit the stash entry was created on, and whose second parent records the tree that would have been created if "git write-tree" were done on the index when the stash entry was created. Optionally, it can have the third parent whose tree records the state of untracked files. Let's add three more parents. IOW, the fourth parent's tree records the result of "git write-tree" of the index after removing all the entries other than those at stage #1 and moving the remainder from stage #1 down to stage #0, and similarly the fifth is for stage #2 and the sixth is for stage #3. is bad at multiple counts. - It does not say what should happen to the third parent when this new "record unmerged state" feature is used without using the "record untracked paths" feature. - It does not allow multiple stage #1 and/or stage #3 entries. For the first point, I think a trick to record the same commit as the first parent may be a good hack to say "this is not used"; we might need to allow commit-tree not to complain about duplicate parents if we go that route. FOr the second one, there may be multiple solutions. A quick-and-dirty and obvious way may be to add only one new parent to the merge commit that represents a stash entry (i.e. the fourth parent). Make that new parent a merge of three commits, each of which represents what was in stage #1, stage #2 and stage #3 (we can reuse the second parent of the stash entry that usually records the index state to store stage #0 entries). As we allow multiple stage #1 or stage #3 entries in the index, and there is no fundamental reason why we should not allow multiple stage #2 entries, make each of these three commits able to represent multiple entries at the same stage, perhaps by - iterate over the index and count the maximum occurrence of the same path at the same stage #$n; - make that stage #$n commit a merge of that many parent commits. The tree recorded in that stage #$n commit can be an empty tree. I am not saying this is a good design. I am merely showing the expected level of detail when your design gets in a presentable shape and shared with the list. Have fun.
Re: [GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries.
Kapil Jain writes: > Plan to implement the project. > > Objective: > > Description: > > Implementation Idea: > > Relevant Discussions: > > Idea Execution Plan: Divided into 2 parts. Two things missing before implementation idea are design, and more importantly, the success criteria. What lets you and your mentor declare victory? As to the design, it does not quite matter if you add four or more separate trees to represent stage #[0123] entries in the index to the already octopus merge commit that represents a stash entry (i.e. when keeping the untracked ones, I think the stash entry's "result of the merge" tree records the state of the tracked files in the working tree, and the "result of the merge" commit records the the-current HEAD, a commit that records the state of the index and anothre commit that records the state of the untracked files, as its parents---that's already a 3-parent octopus). The fact that a stash entry is represented as a merge commit is a mere implementation detail, and there is *NO* need to worry about resolving merge conflicts while recording a stash. If the result of this GSoC task is to be any usable together with the current version in a backward compatible way, you must record these extra states as extra parents of the merge, so it is sort of given already that you'd be using some form of an octopus merge. The real challenge would be how the unstashing part of such a stash entry that records unmerged state should work. Personally I do not think it will be very useful to allow unstashing such a stash entry on top of any arbitrary commit---rather, I suspect that the user would want to come back to the exact HEAD the user had trouble resolving conflicts at, without having to first checking it out. IOW, a usual way to use "git stash" is $ git checkout topic $ edit edit edit ... I am happily hacking away ... ... the boss appears with an ultra-urgent task ... $ git stash save -m WIP $ git checkout master $ edit-and-build-and-test $ git commit ... now the emergency is over ... $ git checkout topic ... sync with the work others may have done on topic ... while I was dealing with the boss $ git pull --rebase origin topic $ git stash pop IOW, it is expected to be applied on top of an updated commit. But I have a moderately strong suspicion that a stash that holds unmerged state (i.e. a conflicted merge in progress) is created with a use case, which is very different from the normal use case, in mind. When creating such a stash entry, the above sequence would go more like this: $ git checkout topic $ git merge ... ... oops, conflicted, and it takes time to resolve ... $ edit edit inspect edit ... the boss appears $ git stash save -m "Merge in progress" $ git checkout master ... deal with the emergency the same way ... $ git checkout topic ... go back to the conflict resolution first without ... touching what may have happened on the branch in ... the meantime---a human brain cannot afford to deal ... with two or more parallel conflicts at the same ... time. $ git stash pop ... now deal with the conflict we were looking at ... before the boss interrupted us. $ edit inspect edit ... be satisfied with the result $ git commit ... now let's see if others have something else that ... is interesting $ git pull --rebase origin topic And if we assume that the primary use of a stash for a conflicted state is to bring us back to the exact state (rather than allowing us to pretend as if we started form a different HEAD), it might even make sense to teach "git stash pop" step to barf if HEAD does not match the first parent of the merge commit that represents the stash entry being applied (again, stash^{tree} is the working tree, stash^1 is then-current HEAD). That would make the application side a lot simpler and manageable by developers who are not intimately familiar with the code. Others may disagree with the above assumption (i.e. "a stash for a conflicted state does not have to be applicable), though, making your task a lot harder ;-). Quite honestly, I do not think you can design a system that attempts to "stash apply/pop" a recorded unmerged state on top of any arbitrary HEAD and leave a state useful for the end user to deal with when the "stash apply/pop" step itself introduces _new_ conflicts due to the differences between the then-current HEAD the stash entry is based on and the HEAD the "stash apply" is attempted on top of. Even the current "stash apply/pop with the change between the HEAD and the index" does punt when it cannot make a clean application, and that is without any unmerged entries in the recorded index state. The key point is "a state useful for the end user"---it is
[GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries.
Plan to implement the project. Objective: Teach git stash to handle unmerged index entries. Description: When the index is unmerged, git stash refuses to do anything. That is unnecessary, though, as it could easily craft e.g. an octopus merge of the various stages. A subsequent git stash apply can detect that octopus and re-generate the unmerged index. Implementation Idea: Performing an octopus merge of all `stage n` (n>0) unmerged index entries, could solve the problem, but What if there are conflicts in merging ? In this case, we would store(commit) the conflicted state, so they can be regenerated when git stash is applied. How to store the conflicted files ? create a tree from the merge using `git-write-tree` and then commit that tree using `git-commit-tree`. Relevant Discussions: https://colabti.org/irclogger/irclogger_log/git-devel?date=2019-04-05#l92 https://colabti.org/irclogger/irclogger_log/git-devel?date=2019-04-09#l47 Idea Execution Plan: Divided into 2 parts. Part 1: Store the unmerged index entries this part will work with `git stash push` stash.sh: file would be changed to accommodate the below implementation. Step 1: Extract all the unmerged entries from index file and store them in a temporary index file. read-cache.c: this file is responsible for reading index file, probably this implementation will end up in this file. Step 2: cache-tree.c: study and implement a slightly modified version of the function `write_index_as_tree()` int write_index_as_tree(struct object_id *oid, struct index_state *index_state, const char *index_path, int flags, const char *prefix); this function is responsible for writing tree from index file. Currently in this function, the index must be in a fully merged state, and we are dealing with its exact opposite. So a version to write tree for unmerged index entries will be implemented. Step 3: write-tree.c: some possible changes will go here, so as to use the modified version of write_index_as_tree() function. Step 4: use git-commit-tree to commit the written tree and store the hash in some file say `stash_conflicting_merge` Step 5: Write tests for all implementation till this point. Part 2: Retrieve the tree hash and regenerate the state of repository as it was earlier. Step 6: Modify implementation of `git stash apply` for regenerating the committed tree. Step 7: Write tests.
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On 2019-04-08 21:36, Matheus Tavares Bernardino wrote: > On Mon, Apr 8, 2019 at 4:19 PM Philip Oakley wrote: >> >> Hi Matheus >> >> On 08/04/2019 18:04, Matheus Tavares Bernardino wrote: Another "32-bit problem" should also be expressly considered during the GSoC work because of the MS Windows definition of uInt / long to be only 32 bits, leading to much of the Git code failing on the Git for Windows port and on the Git LFS (for Windows) for packs and files greater than 4Gb.https://github.com/git-for-windows/git/issues/1063 >> >>> Thanks for pointing it out. I didn't get it, thought, if your >>> suggestion was to also propose tackling this issue in this GSoC >>> project. Was it that? I read the link but it seems to be a kind of >>> unrelated problem from what I'm planing to do with the pack access >>> code (which is tread-safety). I may have understood this wrongly, >>> though. Please, let me know if that's the case :) >>> >> The main point was to avoid accidental regressions by re-introducing >> simple 'longs' where memsized types were more appropriate. >> >> Torsten has already done a lot of work at >> https://github.com/tboegi/git/tree/tb.190402_1552_convert_size_t_only_git_master_181124_mk_size_t > > Got it. Thanks, Philip! > >> HTH >> Philip >> (I'm off line for a few days) Thanks for the reminder - I will probably send something out the next days/weeks.
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 4:19 PM Philip Oakley wrote: > > Hi Matheus > > On 08/04/2019 18:04, Matheus Tavares Bernardino wrote: > >> Another "32-bit problem" should also be expressly considered during the > >> GSoC work because of the MS Windows definition of uInt / long to be only > >> 32 bits, leading to much of the Git code failing on the Git for Windows > >> port and on the Git LFS (for Windows) for packs and files greater than > >> 4Gb.https://github.com/git-for-windows/git/issues/1063 > > > Thanks for pointing it out. I didn't get it, thought, if your > > suggestion was to also propose tackling this issue in this GSoC > > project. Was it that? I read the link but it seems to be a kind of > > unrelated problem from what I'm planing to do with the pack access > > code (which is tread-safety). I may have understood this wrongly, > > though. Please, let me know if that's the case :) > > > The main point was to avoid accidental regressions by re-introducing > simple 'longs' where memsized types were more appropriate. > > Torsten has already done a lot of work at > https://github.com/tboegi/git/tree/tb.190402_1552_convert_size_t_only_git_master_181124_mk_size_t Got it. Thanks, Philip! > HTH > Philip > (I'm off line for a few days)
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
Hi Matheus On 08/04/2019 18:04, Matheus Tavares Bernardino wrote: Another "32-bit problem" should also be expressly considered during the GSoC work because of the MS Windows definition of uInt / long to be only 32 bits, leading to much of the Git code failing on the Git for Windows port and on the Git LFS (for Windows) for packs and files greater than 4Gb.https://github.com/git-for-windows/git/issues/1063 Thanks for pointing it out. I didn't get it, thought, if your suggestion was to also propose tackling this issue in this GSoC project. Was it that? I read the link but it seems to be a kind of unrelated problem from what I'm planing to do with the pack access code (which is tread-safety). I may have understood this wrongly, though. Please, let me know if that's the case :) The main point was to avoid accidental regressions by re-introducing simple 'longs' where memsized types were more appropriate. Torsten has already done a lot of work at https://github.com/tboegi/git/tree/tb.190402_1552_convert_size_t_only_git_master_181124_mk_size_t HTH Philip (I'm off line for a few days)
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 6:26 AM Philip Oakley wrote: > > On 08/04/2019 02:23, Duy Nguyen wrote: > > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder > > wrote: > >>> Git has a very optimized mechanism to compactly store > >>> objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > >>> created by[3]: > >>> > >>> 1. listing objects; > >>> 2. sorting the list with some good heuristics; > >>> 3. traversing the list with a sliding window to find similar objects in > >>> the window, in order to do delta decomposing; > >>> 4. compress the objects with zlib and write them to the packfile. > >>> > >>> What we are calling pack access code in this document, is the set of > >>> functions responsible for retrieving the objects stored at the > >>> packfiles. This process consists, roughly speaking, in three parts: > >>> > >>> 1. Locate and read the blob from packfile, using the index file; > >>> 2. If the blob is a delta, locate and read the base object to apply the > >>> delta on top of it; > >>> 3. Once the full content is read, decompress it (using zlib inflate). > >>> > >>> Note: There is a delta cache for the second step so that if another > >>> delta depends on the same base object, it is already in memory. This > >>> cache is global; also, the sliding windows, are global per packfile. > >> Yeah, but the sliding windows are used only when creating pack files, > >> not when reading them, right? > > These windows are actually for reading. We used to just mmap the whole > > pack file in the early days but that was impossible for 4+ GB packs on > > 32-bit platforms, which was one of the reasons, I think, that sliding > > windows were added, to map just the parts we want to read. > > Another "32-bit problem" should also be expressly considered during the > GSoC work because of the MS Windows definition of uInt / long to be only > 32 bits, leading to much of the Git code failing on the Git for Windows > port and on the Git LFS (for Windows) for packs and files greater than > 4Gb. https://github.com/git-for-windows/git/issues/1063 Thanks for pointing it out. I didn't get it, thought, if your suggestion was to also propose tackling this issue in this GSoC project. Was it that? I read the link but it seems to be a kind of unrelated problem from what I'm planing to do with the pack access code (which is tread-safety). I may have understood this wrongly, though. Please, let me know if that's the case :) > Mainly it is just substitution of size_t for long, but there can be > unexpected coercions when mixed data types get coerced down to a local > 32-bit long. This is made worse by it being implementation defined, so > one needs to be explicit about some casts up to pointer/memsized types. > >>> # Points to work on > >>> > >>> * Investigate pack access call chains and look for non-thread-safe > >>> operations on then. > >>> * Protect packfile.c read-and-write global variables, such as > >>> pack_open_windows, pack_open_fds and etc., using mutexes. > >> Do you want to work on making both packfile reading and packfile > >> writing thread safe? Or just packfile reading? > > Packfile writing is probably already or pretty close to thread-safe > > (at least the main writing code path in git-pack-objects; the > > streaming blobs to a pack, i'm not so sure). > -- > Philip
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Sun, Apr 7, 2019 at 7:52 PM Christian Couder wrote: > > Hi Matheus > > On Sun, Apr 7, 2019 at 10:48 PM Matheus Tavares Bernardino > wrote: > > > > This is my proposal for GSoC with the subject "Make pack access code > > thread-safe". > > Thanks! > > > I'm late in schedule but I would like to ask for your > > comments on it. Any feedback will be highly appreciated. > > > > The "rendered" version can be seen here: > > https://docs.google.com/document/d/1QXT3iiI5zjwusplcZNf6IbYc04-9diziVKdOGkTHeIU/edit?usp=sharing > > Thanks for the link! > > > Besides administrative questions and contributions to FLOSS projects, at > > FLUSP, I’ve been mentoring people who want to start contributing to the > > Linux Kernel and now, to Git, as well. > > Nice! Do you have links about that? Unfortunately not :( Maybe just the mentoring slides (e.g. https://flusp.ime.usp.br/materials/Kernel_Primeiros_Passos.pdf). But they are all in Portuguese, so I don't know wether it would be valuable to add them here... > > # The Project > > > > As direct as possible, the goal with this project is to make more of > > Git’s codebase thread-safe, so that we can improve parallelism in > > various commands. The motivation behind this are the complaints from > > developers experiencing slow Git commands when working with large > > repositories[1], such as chromium and Android. And since nowadays, most > > personal computers have multi-core CPUs, it is a natural step trying to > > improve parallel support so that we can better use the available resources. > > > > With this in mind, pack access code is a good target for improvement, > > since it’s used by many Git commands (e.g., checkout, grep, blame, diff, > > log, etc.). This section of the codebase is still sequential and has > > many global states, which should be protected before we can work to > > improve parallelism. > > I think it's better if global state can be made local or perhaps > removed, rather than protected (though of course that's not always > possible). Indeed! I just added this to the docs version. Thanks > > ## The Pack Access Code > > > > To better describe what the pack access code is, we must talk about > > Git’s object storing (in a simplified way): > > Maybe s/storing/storage/ Thanks. Already changed. > > Besides what are called loose objects, > > s/loose object/loose object files/ Done, thanks! > > Git has a very optimized mechanism to compactly store > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > created by[3]: > > > > 1. listing objects; > > 2. sorting the list with some good heuristics; > > 3. traversing the list with a sliding window to find similar objects in > > the window, in order to do delta decomposing; > > 4. compress the objects with zlib and write them to the packfile. > > > > What we are calling pack access code in this document, is the set of > > functions responsible for retrieving the objects stored at the > > packfiles. This process consists, roughly speaking, in three parts: > > > > 1. Locate and read the blob from packfile, using the index file; > > 2. If the blob is a delta, locate and read the base object to apply the > > delta on top of it; > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > Note: There is a delta cache for the second step so that if another > > delta depends on the same base object, it is already in memory. This > > cache is global; also, the sliding windows, are global per packfile. > > Yeah, but the sliding windows are used only when creating pack files, > not when reading them, right? > > > If these steps were thread-safe, the ability to perform the delta > > reconstruction (together with the delta cache lookup) and zlib inflation > > in parallel could bring a good speedup. At git-blame, for example, > > 24%[4] of the time is spent in the call stack originated at > > read_object_file_extended. Not only this but once we have this big > > section of the codebase thread-safe, we can work to parallelize even > > more work at higher levels of the call stack. Therefore, with this > > project, we aim to make room for many future optimizations in many Git > > commands. > > Nice. > > > # Plan > > > > I will probably be working mainly with packfile.c, sha1-file.c, > > object-store.h, object.c and pack.h, however, I may also need to tackle > > other files. I will be focusing on the following three pack access call > > chains, found in git-g
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 3:58 AM Christian Couder wrote: > > On Mon, Apr 8, 2019 at 5:32 AM Duy Nguyen wrote: > > > > On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen wrote: > > > > > > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder > > > wrote: > > > > > Git has a very optimized mechanism to compactly store > > > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > > > > created by[3]: > > > > > > > > > > 1. listing objects; > > > > > 2. sorting the list with some good heuristics; > > > > > 3. traversing the list with a sliding window to find similar objects > > > > > in > > > > > the window, in order to do delta decomposing; > > > > > 4. compress the objects with zlib and write them to the packfile. > > > > > > > > > > What we are calling pack access code in this document, is the set of > > > > > functions responsible for retrieving the objects stored at the > > > > > packfiles. This process consists, roughly speaking, in three parts: > > > > > > > > > > 1. Locate and read the blob from packfile, using the index file; > > > > > 2. If the blob is a delta, locate and read the base object to apply > > > > > the > > > > > delta on top of it; > > > > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > > > > > > > Note: There is a delta cache for the second step so that if another > > > > > delta depends on the same base object, it is already in memory. This > > > > > cache is global; also, the sliding windows, are global per packfile. > > > > > > > > Yeah, but the sliding windows are used only when creating pack files, > > > > not when reading them, right? > > > > > > These windows are actually for reading. We used to just mmap the whole > > > pack file in the early days but that was impossible for 4+ GB packs on > > > 32-bit platforms, which was one of the reasons, I think, that sliding > > > windows were added, to map just the parts we want to read. > > > > To clarify (I think I see why you mentioned pack creation now), there > > are actually two window concepts. core.packedGitWindowSize is about > > reading pack files. pack.window is for generating pack files. The > > second window should already be thread-safe since we do all the > > heuristics to find best base object candidates in threads. > > Yeah, it is not very clear in the proposal which windows it is talking > about as I think a window is first mentioned when describing the steps > to create a packfile in: > > "3. traversing the list with a sliding window to find similar objects > in the window, in order to do delta decomposing;" > > Also the proposal plans to "Protect packfile.c read-and-write global > variables ..." which made me wonder if it was also about improving > thread safety when generating pack files. Sorry, it is indeed unclear. The idea here was to say that variables which are both read and updated in code that must be thread-safe should be protected. I will refactor this, thanks. Oh, also I'm targeting just packfile reading. The explanation on how packfiles are created was written just as a contextualization. But perhaps it leaded to some confusion on the proposal's objective. Thanks for this feedback too. > Thanks for clarifying!
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 12:32 AM Duy Nguyen wrote: > > On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen wrote: > > > > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder > > wrote: > > > > Git has a very optimized mechanism to compactly store > > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > > > created by[3]: > > > > > > > > 1. listing objects; > > > > 2. sorting the list with some good heuristics; > > > > 3. traversing the list with a sliding window to find similar objects in > > > > the window, in order to do delta decomposing; > > > > 4. compress the objects with zlib and write them to the packfile. > > > > > > > > What we are calling pack access code in this document, is the set of > > > > functions responsible for retrieving the objects stored at the > > > > packfiles. This process consists, roughly speaking, in three parts: > > > > > > > > 1. Locate and read the blob from packfile, using the index file; > > > > 2. If the blob is a delta, locate and read the base object to apply the > > > > delta on top of it; > > > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > > > > > Note: There is a delta cache for the second step so that if another > > > > delta depends on the same base object, it is already in memory. This > > > > cache is global; also, the sliding windows, are global per packfile. > > > > > > Yeah, but the sliding windows are used only when creating pack files, > > > not when reading them, right? > > > > These windows are actually for reading. We used to just mmap the whole > > pack file in the early days but that was impossible for 4+ GB packs on > > 32-bit platforms, which was one of the reasons, I think, that sliding > > windows were added, to map just the parts we want to read. > > To clarify (I think I see why you mentioned pack creation now), there > are actually two window concepts. core.packedGitWindowSize is about > reading pack files. pack.window is for generating pack files. The > second window should already be thread-safe since we do all the > heuristics to find best base object candidates in threads. I was indeed confusing this two concepts, thanks for clarifying it! I took a quick look around the usage of core.packedGitWindowSize arround the code (at packfile.c) and it seems to be already thread-safe (I may be wrong thought). > -- > Duy
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On 08/04/2019 02:23, Duy Nguyen wrote: On Mon, Apr 8, 2019 at 5:52 AM Christian Couder wrote: Git has a very optimized mechanism to compactly store objects (blobs, trees, commits, etc.) in packfiles[2]. These files are created by[3]: 1. listing objects; 2. sorting the list with some good heuristics; 3. traversing the list with a sliding window to find similar objects in the window, in order to do delta decomposing; 4. compress the objects with zlib and write them to the packfile. What we are calling pack access code in this document, is the set of functions responsible for retrieving the objects stored at the packfiles. This process consists, roughly speaking, in three parts: 1. Locate and read the blob from packfile, using the index file; 2. If the blob is a delta, locate and read the base object to apply the delta on top of it; 3. Once the full content is read, decompress it (using zlib inflate). Note: There is a delta cache for the second step so that if another delta depends on the same base object, it is already in memory. This cache is global; also, the sliding windows, are global per packfile. Yeah, but the sliding windows are used only when creating pack files, not when reading them, right? These windows are actually for reading. We used to just mmap the whole pack file in the early days but that was impossible for 4+ GB packs on 32-bit platforms, which was one of the reasons, I think, that sliding windows were added, to map just the parts we want to read. Another "32-bit problem" should also be expressly considered during the GSoC work because of the MS Windows definition of uInt / long to be only 32 bits, leading to much of the Git code failing on the Git for Windows port and on the Git LFS (for Windows) for packs and files greater than 4Gb. https://github.com/git-for-windows/git/issues/1063 Mainly it is just substitution of size_t for long, but there can be unexpected coercions when mixed data types get coerced down to a local 32-bit long. This is made worse by it being implementation defined, so one needs to be explicit about some casts up to pointer/memsized types. # Points to work on * Investigate pack access call chains and look for non-thread-safe operations on then. * Protect packfile.c read-and-write global variables, such as pack_open_windows, pack_open_fds and etc., using mutexes. Do you want to work on making both packfile reading and packfile writing thread safe? Or just packfile reading? Packfile writing is probably already or pretty close to thread-safe (at least the main writing code path in git-pack-objects; the streaming blobs to a pack, i'm not so sure). -- Philip
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 5:32 AM Duy Nguyen wrote: > > On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen wrote: > > > > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder > > wrote: > > > > Git has a very optimized mechanism to compactly store > > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > > > created by[3]: > > > > > > > > 1. listing objects; > > > > 2. sorting the list with some good heuristics; > > > > 3. traversing the list with a sliding window to find similar objects in > > > > the window, in order to do delta decomposing; > > > > 4. compress the objects with zlib and write them to the packfile. > > > > > > > > What we are calling pack access code in this document, is the set of > > > > functions responsible for retrieving the objects stored at the > > > > packfiles. This process consists, roughly speaking, in three parts: > > > > > > > > 1. Locate and read the blob from packfile, using the index file; > > > > 2. If the blob is a delta, locate and read the base object to apply the > > > > delta on top of it; > > > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > > > > > Note: There is a delta cache for the second step so that if another > > > > delta depends on the same base object, it is already in memory. This > > > > cache is global; also, the sliding windows, are global per packfile. > > > > > > Yeah, but the sliding windows are used only when creating pack files, > > > not when reading them, right? > > > > These windows are actually for reading. We used to just mmap the whole > > pack file in the early days but that was impossible for 4+ GB packs on > > 32-bit platforms, which was one of the reasons, I think, that sliding > > windows were added, to map just the parts we want to read. > > To clarify (I think I see why you mentioned pack creation now), there > are actually two window concepts. core.packedGitWindowSize is about > reading pack files. pack.window is for generating pack files. The > second window should already be thread-safe since we do all the > heuristics to find best base object candidates in threads. Yeah, it is not very clear in the proposal which windows it is talking about as I think a window is first mentioned when describing the steps to create a packfile in: "3. traversing the list with a sliding window to find similar objects in the window, in order to do delta decomposing;" Also the proposal plans to "Protect packfile.c read-and-write global variables ..." which made me wonder if it was also about improving thread safety when generating pack files. Thanks for clarifying!
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen wrote: > > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder > wrote: > > > Git has a very optimized mechanism to compactly store > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > > created by[3]: > > > > > > 1. listing objects; > > > 2. sorting the list with some good heuristics; > > > 3. traversing the list with a sliding window to find similar objects in > > > the window, in order to do delta decomposing; > > > 4. compress the objects with zlib and write them to the packfile. > > > > > > What we are calling pack access code in this document, is the set of > > > functions responsible for retrieving the objects stored at the > > > packfiles. This process consists, roughly speaking, in three parts: > > > > > > 1. Locate and read the blob from packfile, using the index file; > > > 2. If the blob is a delta, locate and read the base object to apply the > > > delta on top of it; > > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > > > Note: There is a delta cache for the second step so that if another > > > delta depends on the same base object, it is already in memory. This > > > cache is global; also, the sliding windows, are global per packfile. > > > > Yeah, but the sliding windows are used only when creating pack files, > > not when reading them, right? > > These windows are actually for reading. We used to just mmap the whole > pack file in the early days but that was impossible for 4+ GB packs on > 32-bit platforms, which was one of the reasons, I think, that sliding > windows were added, to map just the parts we want to read. To clarify (I think I see why you mentioned pack creation now), there are actually two window concepts. core.packedGitWindowSize is about reading pack files. pack.window is for generating pack files. The second window should already be thread-safe since we do all the heuristics to find best base object candidates in threads. -- Duy
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 5:52 AM Christian Couder wrote: > > Git has a very optimized mechanism to compactly store > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > created by[3]: > > > > 1. listing objects; > > 2. sorting the list with some good heuristics; > > 3. traversing the list with a sliding window to find similar objects in > > the window, in order to do delta decomposing; > > 4. compress the objects with zlib and write them to the packfile. > > > > What we are calling pack access code in this document, is the set of > > functions responsible for retrieving the objects stored at the > > packfiles. This process consists, roughly speaking, in three parts: > > > > 1. Locate and read the blob from packfile, using the index file; > > 2. If the blob is a delta, locate and read the base object to apply the > > delta on top of it; > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > Note: There is a delta cache for the second step so that if another > > delta depends on the same base object, it is already in memory. This > > cache is global; also, the sliding windows, are global per packfile. > > Yeah, but the sliding windows are used only when creating pack files, > not when reading them, right? These windows are actually for reading. We used to just mmap the whole pack file in the early days but that was impossible for 4+ GB packs on 32-bit platforms, which was one of the reasons, I think, that sliding windows were added, to map just the parts we want to read. > > # Points to work on > > > > * Investigate pack access call chains and look for non-thread-safe > > operations on then. > > * Protect packfile.c read-and-write global variables, such as > > pack_open_windows, pack_open_fds and etc., using mutexes. > > Do you want to work on making both packfile reading and packfile > writing thread safe? Or just packfile reading? Packfile writing is probably already or pretty close to thread-safe (at least the main writing code path in git-pack-objects; the streaming blobs to a pack, i'm not so sure). -- Duy
Re: [GSoC][RFC v3] Proposal: Improve consistency of sequencer commands
Hi Rohit, On Sun, Apr 7, 2019 at 2:17 PM Rohit Ashiwal wrote: > > On Sun, 7 Apr 2019 09:15:30 +0200 Christian Couder > wrote: > > > As we are close to the deadline (April 9th) for proposal submissions, > > I think it's a good idea to already upload your draft proposal on the > > GSoC site. I think you will be able to upload newer versions until the > > deadline, but uploading soon avoid possible last minute issues and > > mistakes. > > Sure, I'll upload my proposal as soon as possible. Great! > > It looks like you copy pasted the Git Rev News article without > > updating the content. The improvement has been released a long time > > ago. > > The intention was to document how the project started and *major* milestones > or > turning points of the project. Here they are. Yeah, the intention is good, though it would be nice if the details were a bit more polished. > > Maybe s/rebases/rebase/ > > Yes, :P > > > It seems to me that there has been more recent work than this and also > > perhaps interesting suggestions and discussions about possible > > sequencer related improvements on the mailing list. > > Again the idea was to document earlier stages of project, "recent" discussions > have been on the optimizations which are not exactly relevant. I think there were ideas (from Elijah) about using the sequencer in the regular (non interactive) rebase too. > Should I write more about recent developments? I think Alban's GSoC project was relevant too. So yeah, if you have time after uploading your proposal to the GSoC web site, it would be nice if you can update it with a bit more information about what happened recently. Thanks, Christian.
Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
Hi Matheus On Sun, Apr 7, 2019 at 10:48 PM Matheus Tavares Bernardino wrote: > > This is my proposal for GSoC with the subject "Make pack access code > thread-safe". Thanks! > I'm late in schedule but I would like to ask for your > comments on it. Any feedback will be highly appreciated. > > The "rendered" version can be seen here: > https://docs.google.com/document/d/1QXT3iiI5zjwusplcZNf6IbYc04-9diziVKdOGkTHeIU/edit?usp=sharing Thanks for the link! > Besides administrative questions and contributions to FLOSS projects, at > FLUSP, I’ve been mentoring people who want to start contributing to the > Linux Kernel and now, to Git, as well. Nice! Do you have links about that? > # The Project > > As direct as possible, the goal with this project is to make more of > Git’s codebase thread-safe, so that we can improve parallelism in > various commands. The motivation behind this are the complaints from > developers experiencing slow Git commands when working with large > repositories[1], such as chromium and Android. And since nowadays, most > personal computers have multi-core CPUs, it is a natural step trying to > improve parallel support so that we can better use the available resources. > > With this in mind, pack access code is a good target for improvement, > since it’s used by many Git commands (e.g., checkout, grep, blame, diff, > log, etc.). This section of the codebase is still sequential and has > many global states, which should be protected before we can work to > improve parallelism. I think it's better if global state can be made local or perhaps removed, rather than protected (though of course that's not always possible). > ## The Pack Access Code > > To better describe what the pack access code is, we must talk about > Git’s object storing (in a simplified way): Maybe s/storing/storage/ > Besides what are called loose objects, s/loose object/loose object files/ > Git has a very optimized mechanism to compactly store > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > created by[3]: > > 1. listing objects; > 2. sorting the list with some good heuristics; > 3. traversing the list with a sliding window to find similar objects in > the window, in order to do delta decomposing; > 4. compress the objects with zlib and write them to the packfile. > > What we are calling pack access code in this document, is the set of > functions responsible for retrieving the objects stored at the > packfiles. This process consists, roughly speaking, in three parts: > > 1. Locate and read the blob from packfile, using the index file; > 2. If the blob is a delta, locate and read the base object to apply the > delta on top of it; > 3. Once the full content is read, decompress it (using zlib inflate). > > Note: There is a delta cache for the second step so that if another > delta depends on the same base object, it is already in memory. This > cache is global; also, the sliding windows, are global per packfile. Yeah, but the sliding windows are used only when creating pack files, not when reading them, right? > If these steps were thread-safe, the ability to perform the delta > reconstruction (together with the delta cache lookup) and zlib inflation > in parallel could bring a good speedup. At git-blame, for example, > 24%[4] of the time is spent in the call stack originated at > read_object_file_extended. Not only this but once we have this big > section of the codebase thread-safe, we can work to parallelize even > more work at higher levels of the call stack. Therefore, with this > project, we aim to make room for many future optimizations in many Git > commands. Nice. > # Plan > > I will probably be working mainly with packfile.c, sha1-file.c, > object-store.h, object.c and pack.h, however, I may also need to tackle > other files. I will be focusing on the following three pack access call > chains, found in git-grep and/or git-blame: > > read_object_file → repo_read_object_file → read_object_file_extended → > read_object → oid_object_info_extended → find_pack_entry → > fill_pack_entry → find_pack_entry_one → bsearch_pack and > nth_packed_object_offset > > oid_object_info → oid_object_info_extended → > > read_object_with_reference → read_object_file → > > Ideally, at the end of the project, it will be possible to call > read_object_file, oid_object_info and read_object_with_reference with > thread-safety, so that these operations can be, latter, performed in > parallel. > > Here are some threads on Git’s mailing list where I started discussing > my project: > > * > https://public-inbox.org/git/CAHd-oW7onvn4ugEjXzAX_OSVEfCboH3-FnGR00dU8iaoc+b8=q...@mail.gmail.com/ > * >
[GSoC][RFC] Proposal: Make pack access code thread-safe
Hi, everyone This is my proposal for GSoC with the subject "Make pack access code thread-safe". I'm late in schedule but I would like to ask for your comments on it. Any feedback will be highly appreciated. The "rendered" version can be seen here: https://docs.google.com/document/d/1QXT3iiI5zjwusplcZNf6IbYc04-9diziVKdOGkTHeIU/edit?usp=sharing I kindly ask you to read the text at the google docs link, because in the conversion to plain text I noticed it discards some information :( But for those who prefer to comment by email, here it is: Thanks, Matheus Tavares === Making pack access code thread-safe April, 2019 #Contact Info Name Matheus Tavares Bernardino Timezone GMT-3 Email matheus.bernard...@usp.br IRC Nick matheustavares on #git-devel Telefone [...] Postal address [...] Github https://github.com/MatheusBernardino/ Gitlab https://gitlab.com/MatheusTavares # About me I’m a senior student at the University of São Paulo (USP), attending the Bachelor’s degree in Computer Science course. Currently, I’m at the end of a one year undergraduate research in High-Performance Computing. The goal of this project was to accelerate astrophysical software for black hole studies using GPUs. Also, I’m working as a teaching assistant on IME-USP’s Concurrent and Parallel Programming course, giving lectures and developing/grading programming assignments. Besides parallel and high-performance computing I’m very passionate about software development in general, but especially low-level coding, and FLOSS. # About me and FLOSS ## Linux Kernel Last year, I started contributing to the Linux Kernel in the IIO subsystem, together with a group of colleagues. I worked with another student, to move the ad2s90 module out of staging area to Kernel’s mainline, which we accomplished by the end of the year. In total, I authored 11 patches and co-authored 3 (all of which are already at Torvald’s repo). If you want to know more about my contributions to Linux Kernel, take a look at the Appendix section. ## FLUSP: FLOSS at USP After the amazing experience contributing to the Linux Kernel, we decided to found FLUSP: FLOSS at USP, a group opened to undergraduate and graduate students that aims to contribute to FLOSS software. Since then, the group has grown and evolved a lot: Currently, we have members contributing to the Kernel, GCC, IGT GPU Tools, Git and some projects of our own such as KernelWorkflow. And as a recognition of our endeavor with free software, we received some donations from AnalogDevices and DigitalOcean. Besides administrative questions and contributions to FLOSS projects, at FLUSP, I’ve been mentoring people who want to start contributing to the Linux Kernel and now, to Git, as well. # About me and Git I joined Git community in February and, so far, I have sent the following patches: clone: test for our behavior on odd objects/* content clone: better handle symlinked files at .git/objects/ dir-iterator: add flags parameter to dir_iterator_begin clone: copy hidden paths at local clone clone: extract function from copy_or_link_directory clone: use dir-iterator to avoid explicit dir traversal clone: Replace strcmp by fspathcmp And three more patches for git.github.io: rn-50: Add git-send-email links to light readings SoC-2019-Microprojects: Remove git-credential-cache SoC-2019-Microprojects: Remove all trailing spaces Participating at FLUSP, I’ve also been part of some Git related activities: * I actively helped to organize a Git workshop for newcomer students. * I’ve written an article at our website to help people configure and use git-send-email to send patches. * I’ve been writing a ‘First steps at Git’ article (not finished yet), in which I’m registering what I’ve learned in the Git community so far, since downloading the source, subscribing to the mailings list and joining the channel at IRC until how to use travis-ci and begin sending patches. # The Project As direct as possible, the goal with this project is to make more of Git’s codebase thread-safe, so that we can improve parallelism in various commands. The motivation behind this are the complaints from developers experiencing slow Git commands when working with large repositories[1], such as chromium and Android. And since nowadays, most personal computers have multi-core CPUs, it is a natural step trying to improve parallel support so that we can better use the available resources. With this in mind, pack access code is a good target for improvement, since it’s used by many Git commands (e.g., checkout, grep, blame, diff, log, etc.). This section of the codebase is still sequential and has many global states, which should be protected before we can work to improve parallelism. ## The Pack Access Code To better describe what the pack access code is, we must talk about Git’s object storing (in a
Re: [GSoC][RFC v3] Proposal: Improve consistency of sequencer commands
Hey Chris! On Sun, 7 Apr 2019 09:15:30 +0200 Christian Couder wrote: > As we are close to the deadline (April 9th) for proposal submissions, > I think it's a good idea to already upload your draft proposal on the > GSoC site. I think you will be able to upload newer versions until the > deadline, but uploading soon avoid possible last minute issues and > mistakes. Sure, I'll upload my proposal as soon as possible. > It looks like you copy pasted the Git Rev News article without > updating the content. The improvement has been released a long time > ago. The intention was to document how the project started and *major* milestones or turning points of the project. Here they are. > Maybe s/rebases/rebase/ Yes, :P > It seems to me that there has been more recent work than this and also > perhaps interesting suggestions and discussions about possible > sequencer related improvements on the mailing list. Again the idea was to document earlier stages of project, "recent" discussions have been on the optimizations which are not exactly relevant. Should I write more about recent developments? Regards Rohit
Re: [GSoC][RFC v3] Proposal: Improve consistency of sequencer commands
Hi Rohit, On Fri, Apr 5, 2019 at 11:32 PM Rohit Ashiwal wrote: > > Here is one more iteration of my draft proposal[1]. RFC. Nice, thanks for iterating on this! As we are close to the deadline (April 9th) for proposal submissions, I think it's a good idea to already upload your draft proposal on the GSoC site. I think you will be able to upload newer versions until the deadline, but uploading soon avoid possible last minute issues and mistakes. In the version you upload, please add one or more links to the discussion of your proposal on the mailing list. > ### List of Contributions at Git: > > Repo |Status |Title > --||--- > [git/git][8] | [Will merge in master][13] | > [Micro][3]**:** Use helper functions in test script > [git-for-windows/git][9] | Merged and released| > [#2077][4]**:** [FIX] git-archive error, gzip -cn : command not found. > [git-for-windows/build-extra][10] | Merged and released| > [#235][5]**:** installer: Fix version of installer and installed file. Nice! > Overview > > Since when it was created in 2005, the `git rebase` command has been > implemented using shell scripts that are calling other git commands. Commands > like `git format-patch` to create a patch series for some commits, and then > `git am` to apply the patch series on top of a different commit in case of > regular rebase and the interactive rebase calls `git cherry-pick` repeatedly > for the same. > > Neither of these approaches has been very efficient though, and the main > reason > behind that is that repeatedly calling a git command has a significant > overhead. Even the regular git rebase would do that as `git am` had been > implemented by launching `git apply` on each of the patches. > > The overhead is especially big on Windows where creating a new process is > quite > slow, but even on other Operating Systems it requires setting up everything > from scratch, then reading the index from disk, and then, after performing > some > changes, writing the index back to the disk. > > Stephan Beyer \ tried to introduce git-sequencer as his GSoC > 2008 [project][6] which executed a sequence of git instructions to \ or > \ and the sequence was given by a \ or through `stdin`. The > git-sequencer wants to become the common backend for git-am, git-rebase and > other git commands, so as to improve performance, since then it eliminated the > need to spawn a new process. > > Unfortunately, most of the code did not get merged during the SoC period but > he > continued his contributions to the project along with Christian Couder > \ and then mentor Daniel Barkalow > \. > > The project was continued by Ramkumar Ramachandra \ in > [2011][7], extending its domain to git-cherry-pick. The sequencer code got > merged and it was now possible to "continue" and "abort" when cherry-picking > or > reverting many commits. > > A patch series by Christian Couder \ was merged in > [2016][16] to the `master` branch that makes `git am` call `git apply`’s > internal functions without spawning the latter as a separate process. So the > regular rebase will be significantly faster especially on Windows and for big > repositories in the next Git feature release. It looks like you copy pasted the Git Rev News article without updating the content. The improvement has been released a long time ago. > Despite the success (of GSoC '11), Dscho had to improve a lot of things to > make > it possible to reuse the sequencer in the interactive rebases making it > faster. Maybe s/rebases/rebase/ > His work can be found [here][15]. It seems to me that there has been more recent work than this and also perhaps interesting suggestions and discussions about possible sequencer related improvements on the mailing list. > The learnings from all those works will serve as a huge headstart this year > for > me. > > As of now, there are still some inconsistencies among these commands, e.g., > there is no `--skip` flag in `git-cherry-pick` while one exists for > `git-rebase`. This project aims to remove inconsistencies in how the command > line options are handled.
Re: Proposal
I have a deal for you
[GSoC][RFC v3] Proposal: Improve consistency of sequencer commands
Hiya Here is one more iteration of my draft proposal[1]. RFC. Thanks Rohit [1]: https://gist.github.com/r1walz/5588d11065d5231ee451c0136400610e -- >8 -- # Improve consistency of sequencer commands ## About Me ### Personal Information Name | Rohit Ashiwal ---| Major | Computer Science and Engineering E-mail | \ IRC| __rohit Skype | rashiwal Ph no | [ ph_no ] Github | [r1walz](https://github.com/r1walz/) Linkedin | [rohit-ashiwal](https://linkedin.com/in/rohit-ashiwal/) Address| [ Address ] Postal Code| [ postal_code ] Time Zone | IST (UTC +0530) ### Background I am a sophomore at the [Indian Institute of Technology Roorkee][1], pursuing my bachelor's degree in Computer Science and Engineering. I was introduced to programming at a very early stage of my life. Since then, I've been trying out new technologies by taking up various projects and participating in contests. I am passionate about system software development and competitive programming, and I also actively contribute to open-source projects. At college, I joined the Mobile Development Group ([MDG][2]), IIT Roorkee - a student group that fosters mobile development within the campus. I have been an active part of the Git community since February of this year, contributing to git-for-windows. ### Dev-Env I am fluent in C/C++, Java and Shell Scripting, otherwise, I can also program in Python, JavaScript. I use both Ubuntu 18.04 and Windows 10 x64 on my laptop. I prefer Linux for development unless the work is specific to Windows. \ VCS **:** git \ Editor **:** VS Code with gdb integrated ## Contributions to Open Source My contributions to open source have helped me gain experience in understanding the flow of any pre-written code at a rapid pace and enabled me to edit/add new features. ### List of Contributions at Git: Repo |Status |Title --||--- [git/git][8] | [Will merge in master][13] | [Micro][3]**:** Use helper functions in test script [git-for-windows/git][9] | Merged and released| [#2077][4]**:** [FIX] git-archive error, gzip -cn : command not found. [git-for-windows/build-extra][10] | Merged and released| [#235][5]**:** installer: Fix version of installer and installed file. ## The Project ### _Improve consistency of sequencer commands_ Overview Since when it was created in 2005, the `git rebase` command has been implemented using shell scripts that are calling other git commands. Commands like `git format-patch` to create a patch series for some commits, and then `git am` to apply the patch series on top of a different commit in case of regular rebase and the interactive rebase calls `git cherry-pick` repeatedly for the same. Neither of these approaches has been very efficient though, and the main reason behind that is that repeatedly calling a git command has a significant overhead. Even the regular git rebase would do that as `git am` had been implemented by launching `git apply` on each of the patches. The overhead is especially big on Windows where creating a new process is quite slow, but even on other Operating Systems it requires setting up everything from scratch, then reading the index from disk, and then, after performing some changes, writing the index back to the disk. Stephan Beyer \ tried to introduce git-sequencer as his GSoC 2008 [project][6] which executed a sequence of git instructions to \ or \ and the sequence was given by a \ or through `stdin`. The git-sequencer wants to become the common backend for git-am, git-rebase and other git commands, so as to improve performance, since then it eliminated the need to spawn a new process. Unfortunately, most of the code did not get merged during the SoC period but he continued his contributions to the project along with Christian Couder \ and then mentor Daniel Barkalow \. The project was continued by Ramkumar Ramachandra \ in [2011][7], extending its domain to git-cherry-pick. The sequencer code got merged and it was now possible to "continue" and "abort" when cherry-picking or reverting many commits. A patch series by Christian Couder \ was merged in [2016][16] to the `master` branch that makes `git am` call `git apply`’s internal functions without spawning the latter as a separate process. So the regular rebase will be significantly faster especially on Windows and for big repositories in the next Git feature release. Despite the success (of GSoC '11), D
Re: [GSoC][RFC] proposal: convert git-submodule to builtin script
Hi, On Tue, Apr 2, 2019 at 10:34 PM Khalid Ali wrote: > > My name is Khalid Ali and I am looking to convert the git-submodule to > a builtin C script. The link below contains my first proposal draft > [1] and my microproject is at [2]. My main concern is that my second > task is not verbose enough. I am not sure if I should add a specific > breakdown of large items within the submodule command. There was a GSoC project about the same subject a few years ago: https://public-inbox.org/git/CAME+mvXtA6iZNfErTX5tYB-o-5xa1yesAG5h=ip_z2_zl_k...@mail.gmail.com/ I think you should take a look at the work that was done (merged and not merged) and report about it in your proposal. Thanks, Christian.
Re: [GSoC][RFC] proposal: convert git-submodule to builtin script
First of all, thank you so much for the detailed feedback. I wasn't sure how much to include in the proposal, but I see it still needs a lot of work. > When you talk about "Convert each main task in git-submodule into a C > function." and "If certain functionality is missing, add it to the correct > script.", it is a good idea to back that up by concrete examples. > > Like, study `git-submodule.sh` and extract the list of "main tasks", and > then mention that in your proposal. I see that you listed 9 main tasks, > but it is not immediately clear whether you extracted that list from the > usage text, from the manual page, or from the script itself. If the latter > (which I think would be the best, given the goal of converting the code in > that script), it would make a ton of sense to mention the function names > and maybe add a permalink to the corresponding code (you could use e.g. > GitHub's permalinks). Yes, I actually did extract the tasks straight from git-submodule.sh. I will definitely add the appropriate function names and permalinks to the proposal. > And then look at one of those main tasks, think of something that you > believe should be covered in the test suite, describe it, then figure out > whether it is already covered. If it is, mention that, together with the > location, otherwise state which script would be the best location, and > why. Ah, alright. I'll have a look at the test suite to see what is covered and include a section in my proposal. > Besides, if you care to have a bit of a deeper look into the > `git-submodule.sh` script, you will see a peculiar pattern in some of the > subcommands, e.g. in `cmd_foreach`: > https://github.com/git/git/blob/v2.21.0/git-submodule.sh#L320-L349 > > Essentially, it spends two handfuls of lines on option parsing, and then > the real business logic is performed by the `submodule--helper`, which is > *already* a built-in. > > Even better: most of that business logic is implemented in a file that has > the very file name you proposed already: `submodule.c`. > > So if I were you, I would add a section to your proposal (which in the end > would no doubt dwarf the existing sections) that has as subsections each > of those commands in `git-submodule.sh` that do *not* yet follow this > pattern "parse options then hand off to submodule--helper". > > I would then study the commit history of the ones that *do* use the > `submodule--helper` to see how they were converted, what conventions were > used, whether there were recurring patterns, etc. > > In each of those subsections, I would then discuss what the > still-to-be-converted commands do, try to find the closest command that > already uses the `submodule--helper`, and then assess what it would take > to convert them, how much code it would probably need, whether it could > reuse parts that are already in `submodule.c`, etc. I definitely noticed the option parsing in multiple parts of the function, but the pattern didn't click until you mentioned it. I'll do as you recommended and take a look at submodule.c to see how the code and functionality in git-submodule.sh can be merged. > Judging from past projects to convert scripts to C, I would say that the > most successful strategy was to chomp off manageable parts and move them > from the script to C. I am sure that you will find tons of good examples > for this strategy by looking at the commit history of `git-submodule.sh` > and then searching for the corresponding patches in the Git mailing list > archive (e.g. https://public-inbox.org/git/). > > Do not expect those "chomped off" parts to hit `master` very quickly, > though. Most likely, you would work on one patch series (very closely with > your mentor at first, to avoid unnecessary blocks and to get a better feel > for the way the Git community works right from the start), then, when that > patch series is robust and solid and ready to be contributed, you would > send it to the Git mailing list and immediately start working on the next > patch series, all the while the reviews will trickle in. Those reviews > will help you to improve the patch series, and it is a good idea to > incorporate the good suggestions, and to discuss the ones you think are > not necessary, for a few days before sending the next patch series > iteration. > > Essentially, you will work in parallel on a few patch series at all times. > Those patch series stack on top of each other, and they should, one after > the other, make it into `pu` first, then, when they are considered ready > for testing into `next`, and eventually to `master`. Whenever you > contribute a new patch series iteration, you then rebase the remaining > patch series on top.
Re: [GSoC][RFC] proposal: convert git-submodule to builtin script
Hi, On Tue, 2 Apr 2019, Khalid Ali wrote: > My name is Khalid Ali and I am looking to convert the git-submodule to > a builtin C script. The link below contains my first proposal draft > [1] and my microproject is at [2]. My main concern is that my second > task is not verbose enough. I am not sure if I should add a specific > breakdown of large items within the submodule command. Nice! Please note that while I used to be the mentor who basically helped all of the GSoC/Outreachy students through their "convert to built-in" projects in the recent years, I am not available to mentor this year. Having said that, I think I can help you to improve your proposal. When you talk about "Convert each main task in git-submodule into a C function." and "If certain functionality is missing, add it to the correct script.", it is a good idea to back that up by concrete examples. Like, study `git-submodule.sh` and extract the list of "main tasks", and then mention that in your proposal. I see that you listed 9 main tasks, but it is not immediately clear whether you extracted that list from the usage text, from the manual page, or from the script itself. If the latter (which I think would be the best, given the goal of converting the code in that script), it would make a ton of sense to mention the function names and maybe add a permalink to the corresponding code (you could use e.g. GitHub's permalinks). And then look at one of those main tasks, think of something that you believe should be covered in the test suite, describe it, then figure out whether it is already covered. If it is, mention that, together with the location, otherwise state which script would be the best location, and why. Further, I would like to caution you about "If there is still some time"... The `git-submodule.sh` script weighs in with just over 1,000 lines. We had three GSoC projects to convert scripts last year, and they converted scripts' weights (at the time) were 750 lines for `git-stash.sh`, 674 lines for `git-rebase.sh` and 1,036 lines for `git-rebase--interactive.sh`, respectively. That last number should be taken with a big grain of salt, as is not quite the number of lines that were converted: as part of the GSoC project, the `git-rebase--preserve-merges.sh` script was split out, never intended to be converted, but to be deprecated instead (in favor of `git rebase -r`), and there were "only" some 283 lines to be converted to C remaining after that. Out of those three, the project converting the smallest number of lines clearly got integrated first (and there was actually time to do more stuff in that project, and those things are partially still being cooked). The converted `git stash` is still not in `master`... So... converting 1,000 lines of code is quite a challenge for 3 months. Having said that, I would not consider your project a failure if even "only" as much as half of the lines of code were converted to C. Besides, if you care to have a bit of a deeper look into the `git-submodule.sh` script, you will see a peculiar pattern in some of the subcommands, e.g. in `cmd_foreach`: https://github.com/git/git/blob/v2.21.0/git-submodule.sh#L320-L349 Essentially, it spends two handfuls of lines on option parsing, and then the real business logic is performed by the `submodule--helper`, which is *already* a built-in. Even better: most of that business logic is implemented in a file that has the very file name you proposed already: `submodule.c`. So if I were you, I would add a section to your proposal (which in the end would no doubt dwarf the existing sections) that has as subsections each of those commands in `git-submodule.sh` that do *not* yet follow this pattern "parse options then hand off to submodule--helper". I would then study the commit history of the ones that *do* use the `submodule--helper` to see how they were converted, what conventions were used, whether there were recurring patterns, etc. In each of those subsections, I would then discuss what the still-to-be-converted commands do, try to find the closest command that already uses the `submodule--helper`, and then assess what it would take to convert them, how much code it would probably need, whether it could reuse parts that are already in `submodule.c`, etc. > Outside of the draft, I was wondering whether this should be > implemented through multiple patches to the master branch or through a > separate, long-running feature branch that will be merged at the end > of the GSoC timeline? Judging from past projects to convert scripts to C, I would say that the most successful strategy was to chomp off manageable parts and move them from the script to C. I am sure that you will find tons of good examples for this strategy by looking at the commit history of `git-submodule.sh` and then searching for the corresponding patches in the Git mai
[GSoC][RFC] proposal: convert git-submodule to builtin script
Hi, My name is Khalid Ali and I am looking to convert the git-submodule to a builtin C script. The link below contains my first proposal draft [1] and my microproject is at [2]. My main concern is that my second task is not verbose enough. I am not sure if I should add a specific breakdown of large items within the submodule command. Outside of the draft, I was wondering whether this should be implemented through multiple patches to the master branch or through a separate, long-running feature branch that will be merged at the end of the GSoC timeline? Feedback is greatly appreciated! [1] https://docs.google.com/document/d/1olGG8eJxFoMNyGt-4uMiTD3LjRYx15pttg67AJYliu8/edit?usp=sharing [2] https://public-inbox.org/git/20190402014115.22478-1-khalludi...@gmail.com/
Investment Proposal.
Greetings, We are consultancy firm situated in Bahrain currently looking to finance new or existing projects in any industry. Currently we are sourcing for investment opportunities for our review and consideration and we provide financial and strategic advisory services to growing companies and entrepreneurs both private and institutional investors and I would be delighted to discuss further. Should you wish to know more about the investment funding, feel free to contact us. Regards, Saleh H A Hussain Consultant P.O. Box 11674, Manama Kingdom of Bahrain. www.shcbahrain.com
Re: [GSoC][RFC] Proposal: Improve consistency of sequencer commands
recating am-based rebases only takes a little more > work, but it might expand to use up a lot of time. > > > Relevant Work > > = > > Dscho and I had a talk on how a non-am backend should implement `git rebase > > --whitespace=fix`, which he warned may become a large project (as it turns > > out it is a sub-task in one of the proposed ideas[0]), we were trying to > > integrate this on git-for-windows first. > > Keeping warning in mind, I discussed this project with Rafael and he > > suggested > > (with a little bit uncertainty in mind) that I should work on implementing > > a git-diff flag that generates a patch that when applied, will remove > > whitespace > > errors which I am currently working on. > > It's awesome that you're looking in to this, but it may make more > sense to knock out the easy parts of this project first. That way the > project gets some value out of your work for sure, you gain confidence > and familiarity with the codebase, and then you can tackle the more > difficult items. Of course, if you're just exploring to learn what's > possible in order to write the proposal, that's fine, I just think > once you start on this project, it'd make more sense to do the easier > ones first. Yes, I'm looking into the code to get some clear vision. > Hope that helps, Yes! The vision in now clearer. Thanks Elijah. :) > Elijah Thanks for the review Rohit
Re: [GSoC][RFC] Proposal: Improve consistency of sequencer commands
Hi Christian On 2019-03-23 22:17 UTC Christian Couder <> wrote: > On Fri, Mar 22, 2019 at 4:12 PM Rohit Ashiwal > wrote: > > > > Hey People > > > > I am Rohit Ashiwal and here my first draft of the proposal for the project > > titled: `Improve consistency of sequencer commands' this summer. I need your > > feedback and more than that I need help to improve the timeline of this > > proposal since it looks very weak. Basically, it lacks the "how" component > > as I don't know much about the codebase in detail. > > > > Thanks > > Rohit > > > > PS: Point one is missing in the timeline from the ideas page[0], can someone > > explain what exactly it wants? > > You mean this point from the idea page: > > "The suggestion to fix an interrupted rebase-i or cherry-pick due to a > commit that became empty via git reset HEAD (in builtin/commit.c) > instead of git rebase --skip or git cherry-pick --skip ranges from > annoying to confusing. (Especially since other interrupted am’s and > rebases both point to am/rebase –skip.). Note that git cherry-pick > --skip is not yet implemented, so that would have to be added first." Yes. > or something else? > > By the way it is not very clear if the proposal uses markdown or > another related format and if it is also possible (and perhaps even > better visually) to see it somewhere else (maybe on GitHub). If that's > indeed possible, please provide a link. It is a good thing though to > still also send it attached to an email, so that it can be easily > reviewed and commented on by people who prefer email discussions. This was intentional. Here is the link of the proposal hosted at gist.github.com[1], those who prefer text only version here[2] is mailing list link. > > List of Contributions at Git: > > - > > Status: Merge in next revision > > Maybe "Merged into the 'next' branch" > > > git/git: > > [Micro](3): Use helper functions in test script. > > Please give more information than that, for example you could point to > the commit in the next branch on GitHub and perhaps to the what's > cooking email from Junio where it can be seen that the patch has been > merged into next and what's its current status. Current proposal has added links to those commits. > > Status: Merged > > git-for-windows/git: > > [#2077](4): [FIX] git-archive error, gzip -cn : command not found. This was released in v2.21.0 [3] > > Status: Merged > > git-for-windows/build-extra: > > [#235](5): installer: Fix version of installer and installed file. > > For Git for Windows contributions I think a link to the pull request > is enough. It could be nice to know though if the commits are part of > a released version. > > The Project: `Improve consistency of sequencer commands' > > > > > > Overview > > > > git-sequencer was introduced by Stephan Beyer as his > > GSoC 2008 project[6]. It executed a sequence of git instructions to > > or and the sequence was given by a or through stdin. The > > git-sequencer wants to become the common backend for git-am, git-rebase > > and other git commands. The project was continued by Ramkumar > > > > in 2011[7], converting it to a builtin and extending its domain to > > git-cherry-pick. > > Yeah, you can say that it was another GSoC project and maybe give his > full name (Ramkumar Ramachandra). > > There have been more related work to extend usage of the sequencer > after these GSoC projects, at least from Dscho and maybe from Alban > Gruin and Elijah too. I would be nice if you could document that a > bit. > > > As of now, there are still some inconsistencies among these commands, e.g., > > there is no `--skip` flag in `git-cherry-pick` while one exists for > > `git-rebase`. > > This project aims to remove inconsistencies in how the command line options > > are > > handled. > > > > Points to work on: > > -- > > - Add `git cherry-pick --skip` > > - Implement flags that am-based rebases support, but not interactive > > or merge based, in interactive/merge based rebases > > Maybe the flags could be listed. > > > - [Bonus] Deprecate am-based rebases > > - [Bonus] Make a flag to allow rebase to rewrite commit messages that > > refer to older commits that were also rebased > > This part of your proposal ("Points to work on") looks weak to me. > > Please try to
Re: [GSoC][RFC] Proposal: Improve consistency of sequencer commands
Hi Rohit! On Fri, Mar 22, 2019 at 8:12 AM Rohit Ashiwal wrote: > > Hey People > > I am Rohit Ashiwal and here my first draft of the proposal for the project > titled: `Improve consistency of sequencer commands' this summer. I need your > feedback and more than that I need help to improve the timeline of this > proposal since it looks very weak. Basically, it lacks the "how" component > as I don't know much about the codebase in detail. > > Thanks > Rohit > > PS: Point one is missing in the timeline from the ideas page[0], can someone > explain what exactly it wants? I don't understand the question; could you restate it? > Points to work on: > -- > - Add `git cherry-pick --skip` I'd reword this section as 'Consistently suggest --skip for operations that have such a concept'.[1] Adding a --skip flag to cherry-pick is useful, but was only meant as a step. Let me explain in more detail and use another comparison point. Each of the git commands cherry-pick, merge, rebase take the flags "--continue" and "--abort"; but they didn't always do so and so continuing or aborting an operation often used special case-specific commands for each (e.g. git reset --hard (or later --merge) to abort a merge, git commit to continue it, etc.) Those commands don't necessarily make sense to users, whereas ' --continue' and ' --abort' do make intuitive sense and is thus memorable. We want the same for --skip. Both am-based rebases and am itself will give advice to the user to use 'git rebase --skip' or 'git am --skip' when a patch isn't needed. That's good. In contrast, interactive-based rebases and cherry-pick will suggest that the user run 'git reset' (with no arguments). The place that suggests that command should instead suggest either 'git rebase --skip' or 'git cherry-pick --skip', depending on which operation is in progress. The first step for doing that, is making sure that cherry-pick actually has a '--skip' option. > - Implement flags that am-based rebases support, but not interactive > or merge based, in interactive/merge based rebases The "merge-based" rebase backend was deleted in 2.21.0, with all its special flags reimplemented on the top of the interactive backend. So we can omit the deleted backend from the descriptions (instead just talk about the am-based and interactive backends). > - [Bonus] Deprecate am-based rebases > - [Bonus] Make a flag to allow rebase to rewrite commit messages that > refer to older commits that were also rebased I'd reorder these two. I suspect the second won't be too hard and will provide a new user-visible feature, while the former will hopefully not be visible to users; if the former has more than cosmetic differences visible to user, it might transform the problem into more of a social problem than a technical one or just make into something we can't do. > Proposed Timeline > - > + Community Bonding (May 6th - May 26th): > - Introduction to community > - Get familiar with the workflow > - Study and understand the workflow and implementation of the project > in detail > > + Phase 1 (May 27th - June 23rd): > - Start with implementing `git cherry-pick --skip` > - Write new tests for the just introduced flag(s) > - Analyse the requirements and differences of am-based and other > rebases flags Writing or finding tests to trigger all the --skip codepaths might be the biggest part of this phase. Implementing `git cherry-pick --skip` just involves making it run the code that `git reset` invokes. The you change the error message to reference ` --skip` instead of `git reset`. What you're calling phase 1 here isn't quite microproject sized, but it should be relatively quick and easy; I'd plan to spend much more of your time on phase 2. > + Phase 2 (June 24th - July 21st): > - Introduce flags of am-based rebases to other kinds. > - Add tests for the same. You should probably mention the individual cases from "INCOMPATIBLE FLAGS" of the git rebase manpage. Also, some advice for order of tackling these: I think you should probably do --ignore-whitespace first; my guess is that one is the easiest. Close up would be --committer-date-is-author-date and --ignore-date. Re-reading, I'm not sure -C even makes sense at all; it might be that the solution is just accepting the flag and ignoring it, or perhaps it remains the one flag the interactive backend won't support, or maybe there is something that makes sense to be done. There'd need to be a little investigation for that one, but it might tur
Re: [GSoC][RFC] Proposal: Improve consistency of sequencer commands
Hi Rohit, On Fri, Mar 22, 2019 at 4:12 PM Rohit Ashiwal wrote: > > Hey People > > I am Rohit Ashiwal and here my first draft of the proposal for the project > titled: `Improve consistency of sequencer commands' this summer. I need your > feedback and more than that I need help to improve the timeline of this > proposal since it looks very weak. Basically, it lacks the "how" component > as I don't know much about the codebase in detail. > > Thanks > Rohit > > PS: Point one is missing in the timeline from the ideas page[0], can someone > explain what exactly it wants? You mean this point from the idea page: "The suggestion to fix an interrupted rebase-i or cherry-pick due to a commit that became empty via git reset HEAD (in builtin/commit.c) instead of git rebase --skip or git cherry-pick --skip ranges from annoying to confusing. (Especially since other interrupted am’s and rebases both point to am/rebase –skip.). Note that git cherry-pick --skip is not yet implemented, so that would have to be added first." or something else? By the way it is not very clear if the proposal uses markdown or another related format and if it is also possible (and perhaps even better visually) to see it somewhere else (maybe on GitHub). If that's indeed possible, please provide a link. It is a good thing though to still also send it attached to an email, so that it can be easily reviewed and commented on by people who prefer email discussions. > List of Contributions at Git: > - > Status: Merge in next revision Maybe "Merged into the 'next' branch" > git/git: > [Micro](3): Use helper functions in test script. Please give more information than that, for example you could point to the commit in the next branch on GitHub and perhaps to the what's cooking email from Junio where it can be seen that the patch has been merged into next and what's its current status. > Status: Merged > git-for-windows/git: > [#2077](4): [FIX] git-archive error, gzip -cn : command not found. > > Status: Merged > git-for-windows/build-extra: > [#235](5): installer: Fix version of installer and installed file. For Git for Windows contributions I think a link to the pull request is enough. It could be nice to know though if the commits are part of a released version. > The Project: `Improve consistency of sequencer commands' > > > Overview > > git-sequencer was introduced by Stephan Beyer as his > GSoC 2008 project[6]. It executed a sequence of git instructions to > or and the sequence was given by a or through stdin. The > git-sequencer wants to become the common backend for git-am, git-rebase > and other git commands. The project was continued by Ramkumar > > in 2011[7], converting it to a builtin and extending its domain to > git-cherry-pick. Yeah, you can say that it was another GSoC project and maybe give his full name (Ramkumar Ramachandra). There have been more related work to extend usage of the sequencer after these GSoC projects, at least from Dscho and maybe from Alban Gruin and Elijah too. I would be nice if you could document that a bit. > As of now, there are still some inconsistencies among these commands, e.g., > there is no `--skip` flag in `git-cherry-pick` while one exists for > `git-rebase`. > This project aims to remove inconsistencies in how the command line options > are > handled. > Points to work on: > -- > - Add `git cherry-pick --skip` > - Implement flags that am-based rebases support, but not interactive > or merge based, in interactive/merge based rebases Maybe the flags could be listed. > - [Bonus] Deprecate am-based rebases > - [Bonus] Make a flag to allow rebase to rewrite commit messages that > refer to older commits that were also rebased This part of your proposal ("Points to work on") looks weak to me. Please try to add more details about what you plan to do, how you would describe the new flags in the documentation, which *.c *.h and test files might be changed, etc. > Proposed Timeline > - > + Community Bonding (May 6th - May 26th): > - Introduction to community > - Get familiar with the workflow > - Study and understand the workflow and implementation of the project > in detail > > + Phase 1 (May 27th - June 23rd): > - Start with implementing `git cherry-pick --skip` > - Write new tests for the just introduced flag(s) > - Analyse the requirements and differences of am-based and other > rebases flags > > + Phase 2 (June 24th - July 21st): > - Introduce flags of am-based
[GSoC][RFC] Proposal: Improve consistency of sequencer commands
Hey People I am Rohit Ashiwal and here my first draft of the proposal for the project titled: `Improve consistency of sequencer commands' this summer. I need your feedback and more than that I need help to improve the timeline of this proposal since it looks very weak. Basically, it lacks the "how" component as I don't know much about the codebase in detail. Thanks Rohit PS: Point one is missing in the timeline from the ideas page[0], can someone explain what exactly it wants? ## Improve consistency of sequencer commands ## About Me Personal Information ---+--- Name | Rohit Ashiwal Major | Computer Science and Engineering E-mail | rohit.ashiwal...@gmail.com IRC| __rohit Skype | rashiwal Ph no | [ ph_no ] Github | r1walz Linkedin | rohit-ashiwal Address| [ Address ] Postal Code| [ postal_code ] Time Zone | IST (UTC +0530) ---+--- Background -- I am a sophomore at the Indian Institute of Technology Roorkee[1], pursuing my bachelor's degree in Computer Science and Engineering. I was introduced to programming at a very early stage of my life. Since then, Ive been trying out new technologies by taking up various projects and participating in contests. I am passionate about system software development and competitive programming, and I also actively contribute to open-source projects. At college, I joined the Mobile Development Group [MDG](2), IIT Roorkee - a student group that fosters mobile development within the campus. I have been an active part of the Git community since February of this year, contributing to git-for-windows. Dev-Env --- I am fluent in C/C++, Java and Shell Scripting, otherwise, I can also program in Python, JavaScript. I use both Ubuntu 18.04 and Windows 10 x64 on my laptop. I prefer Linux for development unless the work is specific to Windows. VCS:git Editor: VS Code with gdb integrated Contributions to Open Source My contributions to open source have helped me gain experience in understanding the flow of any pre-written code at a rapid pace and enabled me to edit/add new features. List of Contributions at Git: - Status: Merge in next revision git/git: [Micro](3): Use helper functions in test script. Status: Merged git-for-windows/git: [#2077](4): [FIX] git-archive error, gzip -cn : command not found. Status: Merged git-for-windows/build-extra: [#235](5): installer: Fix version of installer and installed file. The Project: `Improve consistency of sequencer commands' Overview git-sequencer was introduced by Stephan Beyer as his GSoC 2008 project[6]. It executed a sequence of git instructions to or and the sequence was given by a or through stdin. The git-sequencer wants to become the common backend for git-am, git-rebase and other git commands. The project was continued by Ramkumar in 2011[7], converting it to a builtin and extending its domain to git-cherry-pick. As of now, there are still some inconsistencies among these commands, e.g., there is no `--skip` flag in `git-cherry-pick` while one exists for `git-rebase`. This project aims to remove inconsistencies in how the command line options are handled. Points to work on: -- - Add `git cherry-pick --skip` - Implement flags that am-based rebases support, but not interactive or merge based, in interactive/merge based rebases - [Bonus] Deprecate am-based rebases - [Bonus] Make a flag to allow rebase to rewrite commit messages that refer to older commits that were also rebased Proposed Timeline - + Community Bonding (May 6th - May 26th): - Introduction to community - Get familiar with the workflow - Study and understand the workflow and implementation of the project in detail + Phase 1 (May 27th - June 23rd): - Start with implementing `git cherry-pick --skip` - Write new tests for the just introduced flag(s) - Analyse the requirements and differences of am-based and other rebases flags + Phase 2 (June 24th - July 21st): - Introduce flags of am-based rebases to other kinds. - Add tests for the same. + Phase 3 (July 22th - August 19th): - Act on [Bonus] features - Documentation - Clean up tasks Relevant Work = D
Re: Proposal: Output should push to different servers in parallel
Ævar Arnfjörð Bjarmason writes: > This seems like a reasonable idea, until such time as someone submits > patches to implement this in git you can do this with some invocation of > GNU parallel -k, i.e. operate on N remotes in parallel, and use the -k > option to buffer up all their output and present it in sequence. Stopping the message there makes it sound like a polite way to say "a generic tool to allow you doing it on anything, not limited to Git, is already available, and a solution specific to Git is unwanted." I wanted to follow up with something that says "The 'parallel' tool works in the meantime, but here are examples of very useful things that we would not be able to live without that 'parallel' wouldn't let us do, and we need a Git specific solution to obtain that", but I am coming up with empty, so perhaps indeed we do not want a Git specific solution ;-)
Re: Proposal: Output should push to different servers in parallel
On Wed, Feb 06 2019, Victor Porton wrote: > I experienced a slowdown in Git pushing when I push to more than one server. > > I propose: > > Run push to several servers in parallel. > > Not to mix the output, nevertheless serialize the output, that is for > example cache the output of the second server push and start to output > it immediately after the first server push is finished. > > This approach combines the advantages of the current way (I suppose it > is so) to serialize pushes: first push to the first server, then to > the second, etc. and of my idea to push in parallel. > > I think the best way would be use multithreading, but multiprocessing > would be a good quick solution. This seems like a reasonable idea, until such time as someone submits patches to implement this in git you can do this with some invocation of GNU parallel -k, i.e. operate on N remotes in parallel, and use the -k option to buffer up all their output and present it in sequence.
Proposal: Output should push to different servers in parallel
I experienced a slowdown in Git pushing when I push to more than one server. I propose: Run push to several servers in parallel. Not to mix the output, nevertheless serialize the output, that is for example cache the output of the second server push and start to output it immediately after the first server push is finished. This approach combines the advantages of the current way (I suppose it is so) to serialize pushes: first push to the first server, then to the second, etc. and of my idea to push in parallel. I think the best way would be use multithreading, but multiprocessing would be a good quick solution.
Proposal
Hello , My name is Sgt Major John Dailey. I am here in Afghanistan , I came upon a project I think we can work together on. I and my partner (1st Lt. Daniel Farkas ) have the sum of $15 Million United State Dollars which we got from a Crude Oil Deal in Iraq before he was killed by an explosion while on a Vehicle Patrol. Due to this incident, I want you to receive these funds on my behalf as far as I can be assured that my share will be safe in your care until I complete my service here in Afghanistan and come over to meet with you. Since we are working here for an Official capacity, I cannot keep these funds hence by contacting you. I Guarantee and Assure you that this is risk free. I just need your acceptance to help me receive these funds and all is done. Since the death of my partner, my life is not guaranteed here anymore, so I have decided to share these funds with you. I am also offering you 40% of this money for the assistance you will give to me. One passionate appeal I will make to you, is for you not to discuss this matter with anybody, should you have reasons to reject this offer, please and please destroy this message as any leakage of this information will be too bad for us as soldiers here in Afghanistan. I do not know how long we will remain here, and I have been shot, wounded and survived so many suicide bomb attacks, this and other reasons have prompted me to reach out to you for help. I honestly want this matter to be resolved immediately, please contact me as soon as possible on my e-mail address which is my only way of communication. Yours In Service, SGM John Dailey
Proposal
I wish to discuss a proposal with you, please contact me via email for more details immediately.
Greetings in the name of God, Business proposal in God we trust
Greetings in the name of God Dear Friend Greetings in the name of God,please let this not sound strange to you for my only surviving lawyer who would have done this died early this year.I prayed and got your email id from your country guestbook. I am Mrs Suran Yoda from London,I am 72 years old,i am suffering from a long time cancer of the lungs which also affected my brain,from all indication my conditions is really deteriorating and it is quite obvious that,according to my doctors they have advised me that i may not live for the next two months,this is because the cancer stage has gotten to a very bad stage.I am married to (Dr Andrews Yoda) who worked with the Embassy of United Kingdom in South Africa for nine years,Before he died in 2004. I was bred up from a motherless babies home and was married to my late husband for Thirty years without a child,my husband died in a fatal motor accident Before his death we were true believers.Since his death I decided not to re-marry,I sold all my inherited belongings and deposited all the sum of $6.5 Million dollars with Bank in South Africa.Though what disturbs me mostly is the cancer. Having known my condition I decided to donate this fund to church,i want you as God fearing person,to also use this money to fund church,orphanages and widows,I took this decision,before i rest in peace because my time will so on be up. The Bible made us to understand that blessed are the hands that giveth. I took this decision because I don`t have any child that will inherit this money and my husband's relatives are not Christians and I don`t want my husband hard earned money to be misused by unbelievers. I don`t want a situation where these money will be used in an ungodly manner,hence the reason for taking this bold decision.I am not afraid of death hence i know where am going.Presently,I'm with my laptop in a hospital here in London where I have been undergoing treatment for cancer of the lungs. As soon as I receive your reply I shall give you the contact of the Bank.I will also issue you a letter of authority that will prove you as the new beneficiary of my fund.Please assure me that you will act accordingly as I stated.Hoping to hear from you soon. Remain blessed in the name of the Lord. Yours in Christ, Mrs Suran Yoda
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you,i asked before and i still await your positive response thanks.
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you.
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you.
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you.
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you.
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you.
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you.
Business Proposal
I am Sgt.Brenda Wilson, originally from Lake Jackson Texas USA.I personally made a special research and I came across your information. I am presently writing this mail to you from U.S Military base Kabul Afghanistan I have a secured business proposal for you. Reply for more details via my private E-mail ( brendawilson...@hotmail.com )
Proposal
-- Good day, i know you do not know me personally but i have checked your profile and i see generosity in you, There's an urgent offer attach to your name here in the office of Mr. Fawaz KhE. Al Saleh Member of the Board of Directors, Kuveyt Türk Participation Bank (Turkey) and head of private banking and wealth management Regards, Mr. Fawaz KhE. Al Saleh
BUSINESS INTEREST/ PROPOSAL
Hello RE:BUSINESS INQUIRY/ PROPOSAL How are you doing today, i hope this mail finds you in a good and convenient position! My name is ZHAO DONG. I am the senior manager for Procurement, Hong Kong Refining Company (Sinopec Group Inc) I have been mandated to source crude oil from Libya for supply to our refineries. However, I have been able to establish a good relationship with the senior management of the Azzawya Oil Refining Company, Libya. I am now looking for a competent middle man to stand in between my company, Hong Kong Refining Company and the Azzawya Oil Refining Company of Libya for the sale and purchase of 2 Million Barrels Monthly for 36 Months. This is in order to take home a commission of USD5 to USD7 per barrel. This amount is payable to the middle man as commission. On your response I will give you further details you may need and proof of my identity. Kindly reply directly to zhaodong...@gmail.com or zhaodon...@yandex.com for further vital details you may need. Best Regards ZHAO DONG
Proposal
-- Hello I have been trying to contact you. Did you get my business proposal? Best Regards, Miss.Victoria Mehmet
Lucrative Business Proposal
-- Dear Friend, I would like to discuss a very important issue with you. I am writing to find out if this is your valid email. Please, let me know if this email is valid Kind regards Adrien Saif Attorney to Quatif Group of Companies
Lucrative Business Proposal
-- Dear Friend, I would like to discuss a very important issue with you. I am writing to find out if this is your valid email. Please, let me know if this email is valid Kind regards Adrien Saif Attorney to Quatif Group of Companies
Proposal
-- Hello I have been trying to contact you. Did you get my business proposal? Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turke
Proposal
-- Hello I have been trying to contact you. Did you get my business proposal? Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turke
Proposal
-- Hello I have been trying to contact you. Did you get my business proposal? Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turke
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Business Proposal
Hello, I am Mr. Alan Austin, I am currently working with Credit suisse Bank London. I saw your contact during my private search and I have a deep believe that you will be very honest, committed and capable of assisting in this business venture. I am an account officer to late Dr. Manzoor Hassan who died with his entire family in Syria, It is based on this that I am contacting you to stand as the beneficiary to my late client so that his funds in our custody will be released and paid to you as the beneficiary to the deceased. It is important you respond back to me with your full name and address, including your direct phone number to enable me give you full details of this transaction and more information about my late client who left huge amount of money in our Bank. I will provide you with all the necessary information, documents and proof to legally back up the claim from the different offices concerned for the smooth transfer of the fund to any of your accounts as the true beneficiary. Yours Sincerely, Mr. Alan Austin
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey