Re: [RFC/PATCH] Supporting non-blob notes
On Mon, Feb 24, 2014 at 11:27 AM, wrote: > Johan Herland wrote on 02/24/2014 02:29:10: >> I've been thinking about this for a while now, and I find myself >> agreeing more and more with Junio's argument in the linked thread. >> >> I think notes are fundamentally - like file contents from Git's POV - >> an unstructured stream of bytes. Any real structure in a git note is >> imposed by the surrounding application/context, and having Git impose >> its own object model onto the contents of notes would likely be an >> unnecessary distraction. > > OTOH, it looks like a good idea to allow the surrounding application/context > to benefit from existing infrastructure. I identified so far: > > (i) diffing/grepping trees > (ii) efficiency of indexing through notes fanout All of my proposed alternatives store some sort of reference to the "real" data in a notes object; even when using a tree object directly as a note, the notes tree itself only stores a SHA1 reference to the tree object. As such, all alternatives (a) through (e) (even including your RFC) benefit from indexing through the notes fanout, and I'm not sure what is gained by attaching the "real" data more directly to the notes. In all of (a) through (e), the lookup of a specific commit's testrun logs always start with doing a lookup of the notes associated with a given commit. Once that is done, the remainder of the work is about resolving that reference and retrieving the associated resource, Whether the consists of loading an HTTP URL, fetching a remote Git repo, or looking up a local tree object is ultimately an implementation detail, and does not affect the indexing itself. > (iii) reachability > (iv) content packing These four criteria/requirements apply to your specific use case, but they do not necessarily apply to _all_ use cases. I can easily imagine a slightly different scenario: For example, a company setting with highly-available internal servers, and where testrun logs are primarily interesting to a small subset of users (e.g. most developers only look at them very occasionally). Now assume there is already a (third-party) system in place for archiving and indexing the testrun logs (i.e. providing (i), (ii) and (iv)), and direct reachability (iii) is not desired as including the testrun logs in the repo would add nothing but bloat for most users. In this scenario, simply adding a note with the appropriate URL to the third-party service would be a sufficient and preferable solution. >> In Yann's example, the testrun logs are probably best structured as a >> hierarchy of files, but that does not necessarily mean that they MUST >> be stored as a Git tree object (with accompanying sub-trees and >> blobs). For example, one could imagine many different solutions for >> storing the testrun logs: >> >> (a) Storing the logs statically on some server, and putting the >> corresponding URL in a notes blob. Reachability is manual/on-demand >> (be retrieving the URL). > > Would require to redo (ii) and (iv) in a way that does not impait (i) > >> (b) Storing the logs in a .tar.gz archive, and adding that archive as >> a blob note. Reachability is implicit/automatic (by unpacking the >> archive). > > Interferes with (i) and (iv), ie. does not allow to benefit from similarity > between the contents of (unpacked) notes. > >> (c) Storing the logs on some ref in an external repo, and putting the >> repo URL + ref in a notes blob. Reachability is manual/on-demand (by >> cloning/fetching the repo). >> (d) Storing the logs on some ref/commit in the same repo, and putting >> the ref/commit name in a notes blob. Reachability depends on the >> application/user to sync the ref/commit along with the notes. > > Better than (a), but still does not address (ii). > And indeed, my intent was to let the notes live in a separate "fork" repo, > so ordinary users need not fetch the testrun contents systematically with the > code. Just to clarify, my alternatives (except for (e) below) were not intended to satisfy the exact criteria for your use case, but only to demonstrate that there exist a variety of solutions for a variety of slightly different problems. When we consider adding significant complexity to the notes code, we must justify that with real and tangible benefits, not only for your exact use case, but preferably also for a larger group of related use cases. So far I don't see how allowing the direct use of tree objects as notes benefit more than your specific use case... >> (e) Storing the logs in a commit, putting the commit name in a blob >> note, and then creating/rewriting the notes history to include the >> commit in its ancestry. Reachability is automatic (i.e.follows the >> notes), but the application must control/manipulate the notes history. > > And finally, that one does address all points in my case. > >> Whichever of these (or other) solutions is most appropriate depends on >> the particular application/context, and (from Git's perspective), none >
Re: [RFC/PATCH] Supporting non-blob notes
Johan Herland wrote on 02/24/2014 02:29:10: > On Wed, Feb 19, 2014 at 12:10 AM, Duy Nguyen wrote: > > On Tue, Feb 18, 2014 at 9:46 PM, Johan Herland wrote: > >> On Mon, Feb 17, 2014 at 11:48 AM, wrote: > >>> The recent "git-note -C changes commit type?" thread > >>> ( http://thread.gmane.org/gmane.comp.version-control.git/241950 ) looks > >>> like a good occasion to discuss possible uses of non-blob notes. > >>> > >>> The use-case we're thinking about is the storage of testrun logs as > >>> notes (think: being able to justify that a given set of tests were > >>> successfully run on a given revision). > >> > >> I think this is a good use of notes, and organizing the testrun logs > >> into a tree of files seems like a natural way to proceed. > > > > Notes from the previous attempt to store trees as notes (something to > > watch out maybe, when you do it again) > > > > http://article.gmane.org/gmane.comp.version-control.git/197712 > > Thanks for that link. It is good to see that these issues have been > considered/discussed previously. Yes, it sheds some useful light on the problem, thanks. > I've been thinking about this for a while now, and I find myself > agreeing more and more with Junio's argument in the linked thread. > > I think notes are fundamentally - like file contents from Git's POV - > an unstructured stream of bytes. Any real structure in a git note is > imposed by the surrounding application/context, and having Git impose > its own object model onto the contents of notes would likely be an > unnecessary distraction. OTOH, it looks like a good idea to allow the surrounding application/context to benefit from existing infrastructure. I identified so far: (i) diffing/grepping trees (ii) efficiency of indexing through notes fanout (iii) reachability (iv) content packing > In Yann's example, the testrun logs are probably best structured as a > hierarchy of files, but that does not necessarily mean that they MUST > be stored as a Git tree object (with accompanying sub-trees and > blobs). For example, one could imagine many different solutions for > storing the testrun logs: > > (a) Storing the logs statically on some server, and putting the > corresponding URL in a notes blob. Reachability is manual/on-demand > (be retrieving the URL). Would require to redo (ii) and (iv) in a way that does not impait (i) > (b) Storing the logs in a .tar.gz archive, and adding that archive as > a blob note. Reachability is implicit/automatic (by unpacking the > archive). Interferes with (i) and (iv), ie. does not allow to benefit from similarity between the contents of (unpacked) notes. > (c) Storing the logs on some ref in an external repo, and putting the > repo URL + ref in a notes blob. Reachability is manual/on-demand (by > cloning/fetching the repo). > (d) Storing the logs on some ref/commit in the same repo, and putting > the ref/commit name in a notes blob. Reachability depends on the > application/user to sync the ref/commit along with the notes. Better than (a), but still does not address (ii). And indeed, my intent was to let the notes live in a separate "fork" repo, so ordinary users need not fetch the testrun contents systematically with the code. > (e) Storing the logs in a commit, putting the commit name in a blob > note, and then creating/rewriting the notes history to include the > commit in its ancestry. Reachability is automatic (i.e.follows the > notes), but the application must control/manipulate the notes history. And finally, that one does address all points in my case. > Whichever of these (or other) solutions is most appropriate depends on > the particular application/context, and (from Git's perspective), none > of them are inherently superior to any of the other. Even the question > of whether testrun logs should or should not be reachable by default, > depends on the surrounding application/context. Wouldn't it make sense to mention these possibilities in the git-notes manpage, to help people use the mechanism as intended ? > Now, the intention of Yann's RFC is to store the testrun logs directly > in a notes _tree_. This is not too different from alternative (e) > above, in that reachability is automatic. However, instead of having > the surrounding application manipulate the notes history to ensure > reachability, the RFC would rather teach Git's notes code to > accomodate the (likely rather special) case of having a note that is > BOTH structured like (or at least easily mapped to) a Git tree object, > AND that should be automatically reachable. Incidently, proposal (e) would allow the use of commits, although doing so would probably cause problems, not all of the children of the commit used as annotation having the same relationship to their parent. Are you suggesting using a slightly different mechanism than the "parent" relationship ? > Even though there is a certain elegance to
Re: [RFC/PATCH] Supporting non-blob notes
On Wed, Feb 19, 2014 at 12:10 AM, Duy Nguyen wrote: > On Tue, Feb 18, 2014 at 9:46 PM, Johan Herland wrote: >> On Mon, Feb 17, 2014 at 11:48 AM, wrote: >>> The recent "git-note -C changes commit type?" thread >>> (http://thread.gmane.org/gmane.comp.version-control.git/241950) looks >>> like a good occasion to discuss possible uses of non-blob notes. >>> >>> The use-case we're thinking about is the storage of testrun logs as >>> notes (think: being able to justify that a given set of tests were >>> successfully run on a given revision). >> >> I think this is a good use of notes, and organizing the testrun logs >> into a tree of files seems like a natural way to proceed. > > Notes from the previous attempt to store trees as notes (something to > watch out maybe, when you do it again) > > http://article.gmane.org/gmane.comp.version-control.git/197712 Thanks for that link. It is good to see that these issues have been considered/discussed previously. I've been thinking about this for a while now, and I find myself agreeing more and more with Junio's argument in the linked thread. I think notes are fundamentally - like file contents from Git's POV - an unstructured stream of bytes. Any real structure in a git note is imposed by the surrounding application/context, and having Git impose its own object model onto the contents of notes would likely be an unnecessary distraction. In Yann's example, the testrun logs are probably best structured as a hierarchy of files, but that does not necessarily mean that they MUST be stored as a Git tree object (with accompanying sub-trees and blobs). For example, one could imagine many different solutions for storing the testrun logs: (a) Storing the logs statically on some server, and putting the corresponding URL in a notes blob. Reachability is manual/on-demand (be retrieving the URL). (b) Storing the logs in a .tar.gz archive, and adding that archive as a blob note. Reachability is implicit/automatic (by unpacking the archive). (c) Storing the logs on some ref in an external repo, and putting the repo URL + ref in a notes blob. Reachability is manual/on-demand (by cloning/fetching the repo). (d) Storing the logs on some ref/commit in the same repo, and putting the ref/commit name in a notes blob. Reachability depends on the application/user to sync the ref/commit along with the notes. (e) Storing the logs in a commit, putting the commit name in a blob note, and then creating/rewriting the notes history to include the commit in its ancestry. Reachability is automatic (i.e.follows the notes), but the application must control/manipulate the notes history. Whichever of these (or other) solutions is most appropriate depends on the particular application/context, and (from Git's perspective), none of them are inherently superior to any of the other. Even the question of whether testrun logs should or should not be reachable by default, depends on the surrounding application/context. Now, the intention of Yann's RFC is to store the testrun logs directly in a notes _tree_. This is not too different from alternative (e) above, in that reachability is automatic. However, instead of having the surrounding application manipulate the notes history to ensure reachability, the RFC would rather teach Git's notes code to accomodate the (likely rather special) case of having a note that is BOTH structured like (or at least easily mapped to) a Git tree object, AND that should be automatically reachable. Even though there is a certain elegance to storing such a tree object directly as a notes object, there is AFAICS no other inherent advantage (e.g. performance- or functionality-wise) to following that approach. I'm not at all sure that it justifies increasing the complexity of the notes code. Furthermore, considering the RFC's original intention of also making commit and tag objects directly usable as notes, and realizing the fundamental difficulties in teaching Git to handle this (outlined in my previous email in this thread), I must conclude that the simplicity and flexibility of something like alternative (e) above far outweighs the added code complexity to support allowing any object type to be used as a note. Maybe we should instead consider making it easier to do alternative (e), by providing a command-line option for supplying additional parents to a notes commit? ...Johan [1]: The only "structure" in notes contents expected by Git is the text format expected when showing notes with "git log", or when editing/appending notes with your default text editor. However, these are typically bypassed and/or customized by an external application storing custom data in notes. -- Johan Herland, www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Supporting non-blob notes
On Tue, Feb 18, 2014 at 9:46 PM, Johan Herland wrote: > On Mon, Feb 17, 2014 at 11:48 AM, wrote: >> The recent "git-note -C changes commit type?" thread >> (http://thread.gmane.org/gmane.comp.version-control.git/241950) looks >> like a good occasion to discuss possible uses of non-blob notes. >> >> The use-case we're thinking about is the storage of testrun logs as >> notes (think: being able to justify that a given set of tests were >> successfully run on a given revision). > > I think this is a good use of notes, and organizing the testrun logs > into a tree of files seems like a natural way to proceed. Notes from the previous attempt to store trees as notes (something to watch out maybe, when you do it again) http://article.gmane.org/gmane.comp.version-control.git/197712 -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] Supporting non-blob notes
On Mon, Feb 17, 2014 at 11:48 AM, wrote: > The recent "git-note -C changes commit type?" thread > (http://thread.gmane.org/gmane.comp.version-control.git/241950) looks > like a good occasion to discuss possible uses of non-blob notes. > > The use-case we're thinking about is the storage of testrun logs as > notes (think: being able to justify that a given set of tests were > successfully run on a given revision). I think this is a good use of notes, and organizing the testrun logs into a tree of files seems like a natural way to proceed. > Here is a proof-of-concept patch (that applies to 1.8.4.2) I've been > playing with. Because of the -C behaviour described in this other > thread, I opted for a new -o flag that would not mess with the object > argument. This patch is very minimalist, and just allows storing a > tree note (currently any type of object, but that's easy to restrict > if we want to), and retrieving it. I think we must think _very_ carefully about which object types we allow to be stored in notes trees. As far as I can see, you use case (storing testrun logs) is covered nicely by allowing tree objects as notes, and I think that's where we should start. The note tree is itself a tree object, and storing sub-trees of that is not new or unusual to Git at all. Reachability is nicely covered by how Git already handles sub-trees. Obviously we must flesh out how the notes-related parts of the code deal with trees (see below), but that does not really affect the rest of Git, and should therefore be relatively uncontroversial. If we go on to _commit_ objects, they are currently only referenced from tree objects as "gitlink"s (with a special "16" mode). If you were to put one of these in a notes tree, you would get the same semantics as a "gitlink", i.e. git handles that part of the tree as a submodule where a different submodule repo is (to be) checked out. The commit is NOT considered/required to be reachable, and would therefore not be automatically communicated by a fetch or push. So if you want commits in a notes tree to be handled differently from commits-as-gitlinks, you would have to tweak all the code in Git that deal with gitlinks. You would have to introduce a differentiation between your "commits-as-gitlinks" and "commits-as-notes", either by reserving another special mode number, or by otherwise making the rest of Git notes-aware. All of this comes in addition to teaching the notes-related code how to deal with commits (i.e. how to display them, etc.). In other words, before you embark on this, you need a convincing argument for why allowing commits-as-notes is really necessary and worth it in the end. Please also consider that you _can_ support commits-as-notes by the mechanism I suggested in the previous thread: Store the commit SHA1 in a note-as-blob, and then amend the notes commit to include the commit SHA1 as an additional parent. It's not very elegant, but it solves the reachability problem. If we go even further and want to allow ANY git object as a note, then we must also consider tag objects, which AFAIK has never before been stored inside a tree. Here we are really entering uncharted territory... So for now (and in lieu of a convincing use case for notes-as-commits), I suggest you only look at notes-as-trees. The first consequence of this is probably that your added -o/--object option should be renamed. -t/--tree is not taken, AFAICS... > Johan Herland wrote: >> Obviously, it would not make sense to use refs/notes/history while >> displaying the commit log ("git log --notes=history"), as the raw >> commit object would be shown in the log. > > Currently, a non-blob commit is just not displayed at all. And rather > than displaying the raw object, we have a number of options available, > starting with object's sha1, to more elaborate presentations depending > on the type of object (commit info, tree hierarchy, etc, as "git notes > show" already does). This PoC shows that it can be dealt with later. I'm only considering the notes-as-tree case here... I assume that if you organize your notes in tree objects, then you probably have more information in there than is useful to display in the textual output from "git log". Also, you probably have special-purpose scripts for initially generating those trees, and later digging into the information stored therein. Hence we should concentrate on getting the basics covered, to allow those scripts to do their thing, and adding bells and whistles to "git log" for displaying notes-as-trees is much less important. For now, "git log" should probably show a short summary when encountering a notes-as-tree. Whether that summary consists of merely the tree SHA1, or in providing a (relatively short) tree listing, I leave up to you. I also agree that this can be dealt with later (as long as the default behaviour is not actively harmful/confusing). > What I envision, would be viewers like gitk simply show the > hyperlinked sha1, a