egit vs. git behaviour (was: [RFC/WIP] Pluggable reference backends)
On Mon, 10 Mar 2014 19:39:00 +, Shawn Pearce wrote: Yes, this was my real concern. Eclipse users using EGit expect EGit to be compatible with git-core at the filesystem level so they can do something in EGit then switch to a shell and bang out a command, or run a script provided by their project or co-worker. A question: Where to ask/report problems with that? We're currently running into problems that egit doesn't push to where git would when the local and remote branches aren't the same name. It seems that egit ignores the branch.*.merge settings. Or push.default? Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
Karsten, Thanks for your feedback! On 03/11/2014 11:56 AM, Karsten Blees wrote: Am 10.03.2014 12:00, schrieb Michael Haggerty: Reference transactions -- Very cool ideas indeed. However, I'm concerned a bit that transactions are conceptual overkill. How many concurrent updates do you expect in a repository? Wouldn't a single repo-wide lock suffice (and be _much_ simpler to implement with any backend, esp. file-based)? I am mostly thinking about long-running processes, like gc and prune-refs, which need to be made race-free without blocking other processes for the whole time they are running (whereas it might be quite tolerable to have them fail or only complete part of their work in any given invocation). Also, I work at GitHub, where we have quite a few repositories, some of which are quite active :-) Remember that I'm not yet proposing anything like hard-core ACID reference transactions. I'm just clearing the way for various possible changes in reference handling. I listed the ideas only to whet people's appetites and motivate the refactoring, which will take a while before it bears any real fruit. The API you posted in [1] doesn't look very much like a transaction API either (rather like batch-updates). E.g. there's no rollback, the queue* methods cannot report failure, and there's no way to read a ref as part of the transaction. So I'm afraid that backends that support transactions out of the box (e.g. RDBMSs) will be hard to adapt to this. Gmane is down at the moment but I assume you are referring to my patch series and the ref_transaction implementation therein. No explicit rollback is necessary at this stage, because the commit function first locks all of the references that it wants to change (first verifying that they have the expected values), and then modifies them all. By the time the references are locked, the whole transaction is guaranteed to succeed [1]. If the locks can't all be acquired, then any locks that were obtained are released. If a caller wants to rollback a transaction, it only needs to free the transaction instead of committing. I should probably make that clearer by renaming free_ref_transaction() to rollback_ref_transaction(). By the time we start implementing other reference backends, that function will of course have to do more. For that matter, maybe create_ref_transaction() should be renamed to begin_ref_transaction(). Now would be a good time for concrete bikeshedding suggestions about function names or other details of the API :-) Yes, the queue_*() methods should probably later make a preliminary check of the reference's old value and return an error if the expected value is already incorrect. This would allow callers to fail fast if the transaction is doomed to failure. But that wasn't needed yet for the one existing caller, which builds up a transaction and commits it immediately, so I didn't implement it yet. And the early checks would add overhead for this caller, so maybe they should be optional anyway. Maybe these functions should already be declared to return an error status, but there should be an option passed to create_ref_transaction() that selects whether fast checks should be performed or not for that transaction. Really, all that this first patch series does is put a different API around the mechanism that was already there, in update_refs(). There will be a lot more steps before we see anything approaching real reference transactions. But I think your (implied) suggestion, to make the API more reminiscent of something like database transactions, is a good one and I will work on it. Cheers, Michael [1] Guaranteed here is of course relative. The commit could still fail due to the process being killed, disk errors, etc. But it can't fail due to lock contention with another git process. -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: egit vs. git behaviour (was: [RFC/WIP] Pluggable reference backends)
On Wed, Mar 12, 2014 at 3:26 AM, Andreas Krey a.k...@gmx.de wrote: On Mon, 10 Mar 2014 19:39:00 +, Shawn Pearce wrote: Yes, this was my real concern. Eclipse users using EGit expect EGit to be compatible with git-core at the filesystem level so they can do something in EGit then switch to a shell and bang out a command, or run a script provided by their project or co-worker. A question: Where to ask/report problems with that? EGit developers have a bug tracker, from: http://eclipse.org/egit/support/ We see File a bug with a link to: https://bugs.eclipse.org/bugs/enter_bug.cgi?product=EGitrep_platform=Allop_sys=All We're currently running into problems that egit doesn't push to where git would when the local and remote branches aren't the same name. It seems that egit ignores the branch.*.merge settings. Or push.default? I think this is just missing code in EGit. Its probable they already know about it, or many of them don't use these features in .git/config and thus don't realize they are missing. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
Am 10.03.2014 12:00, schrieb Michael Haggerty: Reference transactions -- Very cool ideas indeed. However, I'm concerned a bit that transactions are conceptual overkill. How many concurrent updates do you expect in a repository? Wouldn't a single repo-wide lock suffice (and be _much_ simpler to implement with any backend, esp. file-based)? The API you posted in [1] doesn't look very much like a transaction API either (rather like batch-updates). E.g. there's no rollback, the queue* methods cannot report failure, and there's no way to read a ref as part of the transaction. So I'm afraid that backends that support transactions out of the box (e.g. RDBMSs) will be hard to adapt to this. Just my 2cents, Karsten [1] http://article.gmane.org/gmane.comp.version-control.git/243748 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/WIP] Pluggable reference backends
I have started working on pluggable ref backends. In this email I would like to share my plans and solicit feedback. (This morning I removed this project from the GSoC ideas page, because it is unfair to ask a student to shoot at a moving target.) Why? Currently, the reference- and reflog-handling code in Git is too coupled to the rest of the system. There are too many places that know, for example, the difference between loose and packed refs, or that loose references are stored as files directly under $GIT_DIR/refs/heads/, or the locking protocols that have to be adhered to when managing references. This tight coupling, in turn, makes it nearly impossible to experiment with alternate reference storage schemes. But there is a lot of potential to use alternate reference storage schemes to fix some currently-unfixable problems, and to implement some cool new features. Unfixable problems -- The on-disk format that we currently use to store references makes some problems impossible to fix: * It is impossible to get a self-consistent snapshot of all references at a given moment in time. This makes it impossible, even in principle, to do object pruning in a 100% race-free way. (Our current workaround of not deleting objects that are less than two weeks works in most cases but, aside from being ugly, has holes. * There are awkward filesystem-imposed constraints on reference naming, for example: * D/F conflicts (I): it is not possible to have branches named my-feature and my-feature/base at the same time. * D/F conflicts (II): it is not possible to have reflogs for branches named my-feature and my-feature/base at the same time. This leads to the problem that it is not, in general, possible to retain reflogs for branches that have been deleted. * There are additional constraints on reference names depending on the filesystem used to store them. For example, a Git repository on a case-insensitive filesystem fails in confusing ways if there are two loose references whose names differ only in case; however, packed references differing in case might work for a while. Also, reference names that include Unicode characters can have their normalization form changed if they are written on Mac OS. * The packed-refs file has to be rewritten whenever a packed reference is deleted. It might be nice to write 0{40} to a loose reference file to indicate that the reference has been deleted, but that would open the way for more D/F conflicts.) Wild new ideas -- So, I would like to reorganize the Git code to allow pluggable reference backends. If we had this, we could try out ideas like * Retain the idea of loose/packed references, but encode loose reference names using a portable naming scheme before storing them to the filesystem; maybe something like refs/heads/Foo.42 - refs.dir/heads.dir/%46oo%2e42 logs/refs/heads/Foo.42 - refs.dir/heads.dir/%46oo%2e42.log Yes, it looks uglier. But users shouldn't be looking in these directories anyway. This single change would prevent D/F conflicts, allow a reference to be deleted by writing 0{40} to its loose reference file, allow reflogs to be kept for deleted refs, and remove the problem of filesystem-dependent naming constraints. * Store references in a SQLite database, to get correct transaction handling. * Store references directly in the Git object database. * Implement repository groups that share a common object database and also a common reference store. Each repository in a group would get a sub-namespace in the shared database, and store its references in names like refs/member/$MEMBERID/refs/heads/ The member repos would act like restricted views of the shared database. This would be like a combination between alternates (with lowered risk of corruption) and gitnamespaces(7) (but usable for all git commands). * Reference transactions that can be used across multiple Git commands. Imagine, export GIT_TRANSACTION=$(git transaction begin) trap 'git transaction rollback' ERR git foo ... git bar ... git baz ... if ! git transaction commit then # Transaction failed; all references rolled back else # Transaction succeeded; all references updated atomically fi trap '' ERR unset GIT_TRANSACTION The GIT_TRANSACTION environment variable would tell git to read from the usual references, overridden with any reference changes that have occurred during the transaction, but write any changes (including both old and new values) to the transaction. The command git transaction commit would verify that the old values listed in the transaction still agree with the current values, and then make all of the changes atomically. Such transactions could also be broadcast to mirrors when they are committed to keep multiple Git
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 12:00 PM, Michael Haggerty mhag...@alum.mit.edu wrote: I have started working on pluggable ref backends. In this email I would like to share my plans and solicit feedback. No comments or useful feedback yet, except that I enthusiastically approve of the objective and the plan you have for how to get there. ...Johan -- Johan Herland, jo...@herland.net www.herland.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 4:00 AM, Michael Haggerty mhag...@alum.mit.edu wrote: I have started working on pluggable ref backends. In this email I would like to share my plans and solicit feedback. Yay! JGit already has pluggable ref backends, so it is good to see this starting in git-core. FWIW the Gerrit Code Review community is interested in this project. * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. * Reference transactions that can be used across multiple Git commands. Imagine, export GIT_TRANSACTION=$(git transaction begin) trap 'git transaction rollback' ERR git foo ... git bar ... git baz ... if ! git transaction commit then # Transaction failed; all references rolled back else # Transaction succeeded; all references updated atomically fi trap '' ERR unset GIT_TRANSACTION The GIT_TRANSACTION environment variable would tell git to read from the usual references, overridden with any reference changes that have occurred during the transaction, but write any changes (including both old and new values) to the transaction. The command git transaction commit would verify that the old values listed in the transaction still agree with the current values, and then make all of the changes atomically. Yay! Gerrit Code Review really wants to get transactions implemented. So I am very much in favor of trying to improve the situation in git-core. We want not only a transaction over 2+ references in the same repository, but we also want to perform transactions across repositories. Consider a git submodule child and parent being updated at the same time. We really want to update refs/heads/master in both repositories atomically at the central server. Such transactions could also be broadcast to mirrors when they are committed to keep multiple Git repositories in sync. Ooh, this would be very interesting. Git hosters [1] will be likely to take advantage of alternate reference backends pretty easily, because they know which tools touch their repositories and need only update those tools. It is expected that alternate reference backends will be useful for hosters even if they don't become practical for end-users. Alternate reference backends are absolutely useful to large hosters. The loose reference format isn't very scalable. The packed-refs helps, but you can do better. IIRC our android.googlesource.com reference backend uses only 79 bytes per reference on average, including both the name string and the value. This super compact format is easy to hold in RAM for hundreds of busy repositories. For end-users it is important that their repository be readable by all of the tools that they use. So if we want to make a new format a viable option for normal Git users (let alone make it the new default format), some coordination will be needed between all of the commonly-used Git implementations (git-core, libgit2, JGit, and maybe Dulwich, Grit, ...). Whether or not this happens in real life depends on how advantageous the hypothetical new format is to Git users and is beyond the scope of this proposal. It is sad we have this many implementations, but as one of the authors (JGit) I am happy to at least see you are worrying about compatibility with them. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On 10.03.2014, at 15:30, Shawn Pearce spea...@spearce.org wrote: On Mon, Mar 10, 2014 at 4:00 AM, Michael Haggerty mhag...@alum.mit.edu wrote: I have started working on pluggable ref backends. In this email I would like to share my plans and solicit feedback. Yay! Yay, too! JGit already has pluggable ref backends, so it is good to see this starting in git-core. FWIW the Gerrit Code Review community is interested in this project. * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. I understood this as an example (indeed, it is listed under Wile new ideas), not a proposal to put this into the git core. It might be an interesting experiment in any case, and if the proposed modularity is truly achieved, it could (if there was any interest in it, that is) be implemented in an external 3rd party project. Anyway, I am quite excited about this project. Usually, I am quite skeptical about such large scope ideas (Yeah, cool idea, but who will pull it off, and with which resources?). But this one seems to have a good chance of being implemented gradually and inside the main repository, with the help of feature flags. Thus, I am looking forward to Michael's announced initial patch series. I feel that I don't know enough yet about git overall to be of much help on my own at this point. But perhaps over time some mini- or micro-projects pop up were others can help (e.g. adapt these 50 tests to work with the 'quagga' ref); if they are pointed out (assuming that doing so isn't more work than just addressing them yourself ;-), I am willing to help out. Cheers, Max signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote: * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. That's assuming that attention spent on implementing the feature does not take away from implementing some other parallel scheme that does the same thing but does not use SQLite. I don't know what that would be offhand; mapping the ref and reflog into a relational database is pretty simple, and we get a lot of robustness and efficiency benefits for free. We could perhaps have some kind of relational backend could use an ODBC-like abstraction to point to a database. I have no idea if people would want to ever store refs in a real server-backend RDBMS, but I suspect Java has native support for such things. Certainly I think we should aim for compatibility where we can, but if there's not a compatible way to do something, I don't think the limitations of one platform should drag other ones down. And that goes both ways; we had to reimplement disk-compatible EWAH from scratch in C for git-core to have bitmaps, whereas JGit just got to use a ready-made library. I don't think that was a bad thing. People in mixed-implementation environments couldn't use it, but people with JGit-only environments were free to take advantage of it. At any rate, the repository needs to advertise this is the ref storage mechanism I use in the config. We're going to need to bump core.repositoryformatversion for such cases (because an old version of git should not blindly lock and write to a refs/ directory that nobody else is ever going to look at). And I'd suggest with that bump adding in something like core.refstorage, so that an implementation can say foobar ref storage? Never heard of it and barf. Whether it's because that implementation doesn't support foobar, because it's an old version that doesn't understand foobar yet, or because it was simply built without foobar support. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
Jeff King p...@peff.net writes: On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote: * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. Of course, the basic premise for this feature is let's assume that our file and/or operating system suck at providing file system functionality at file name granularity. There have been two historically approaches to that problem that are not independent: a) use Linux b) kick Linus. Option b) has been fairly successful over quite a bit of time, but at the current point of time, it has become harder to aim that kick on a single person and/or where it counts. The database approach is an alternative approach based on kicking an alternate set of people, namely database rather than operating system providers, based on the assumption that the former have softer behinds (the backend-based approach) making them more sensitive to kicking. So the database approach is most promising on the what are we going to do if our operating system vendor won't bother with sensible file system performance angle. Which isn't doing total system architecture a favor. Personally, I have little sympathy for helping subpar systems, keeping them on life support while they are in turn trying to squish the better systems. But then it is not me doing the actual work, so this is no more than an idle reflection. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, 10 Mar 2014, David Kastrup wrote: Jeff King p...@peff.net writes: On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote: * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. Of course, the basic premise for this feature is let's assume that our file and/or operating system suck at providing file system functionality at file name granularity. There have been two historically approaches to that problem that are not independent: a) use Linux b) kick Linus. As a note, if this is done properly, it could allow for plugins that connect to the underlying storage system (similar to the Facebook Mecurial change) Even for those who don't have the $ storage arrays, there may be other storage specific hacks that can be done to detect that files haven't changed. For example, with btrfs and you compile into a different directory thatn your source, you may be able to detect that things didn't change by the fact that the filesystem didn't have to do a rewrite of the parent node. David Lang -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
Jeff King p...@peff.net writes: On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote: * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. We would need to eventually have at least one backend that we know will play well with different Git implementations that matter (namely, git-core, Jgit and libgit2) before the feature can be widely adopted. The first backend that is used while the plugging-interface is in development can be anything and does not have to be one that eventual ubiquitous one, however; as long as it is something that we do not mind carrying it forever, along with that final reference backend. I take the objection from Shawn only as against making the sqlite that final one. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 10:46:01AM -0700, Junio C Hamano wrote: No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. We would need to eventually have at least one backend that we know will play well with different Git implementations that matter (namely, git-core, Jgit and libgit2) before the feature can be widely adopted. I assumed that the current refs/ and logs/ code, massaged into pluggable backend form, would be the first such. And I wouldn't be surprised to see some iteration on that once it is easier to move from scheme to scheme (e.g., to use some encoding of the names on the filesystem to avoid D/F conflicts, and thus allow reflogs for deleted refs). The first backend that is used while the plugging-interface is in development can be anything and does not have to be one that eventual ubiquitous one, however; as long as it is something that we do not mind carrying it forever, along with that final reference backend. I take the objection from Shawn only as against making the sqlite that final one. Sure, I'd agree with that. I'd think something like an sqlite interface would be mainly of interest to people running busy servers. I don't know that it would make a good default. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 05:14:02PM +0100, David Kastrup wrote: [storing refs in sqlite] Of course, the basic premise for this feature is let's assume that our file and/or operating system suck at providing file system functionality at file name granularity. There have been two historically approaches to that problem that are not independent: a) use Linux b) kick Linus. You didn't define suck here, but there are a number of issues with the current ref storage system. Here is a sampling: 1. The filesystem does not present an atomic view of the data (e.g., you read a, then while you are reading b, somebody else updates a; your view is one that never existed at any point in time). 2. Using the filesystem creates D/F conflicts between branches foo and foo/bar. Because this name is a primary key even for the reflogs, we cannot easily persist reflogs after the ref is removed. 3. We use packed-refs in conjunction with loose ones to achieve reasonable performance when there are a large number of refs. The scheme for determining the current value of a ref is complicated and error-prone (we had several race conditions that caused real data loss). Those things can be solved through better support from the filesystem. But they were also solved decades ago by relational databases. I generally avoid databases where possible. They lock your data up in a binary format that you can't easily touch with standard unix tools. And they introduce complexity and opportunity for bugs. But they are also a proven technology for solving exactly the sorts of problems that some people are having with git. I do not see a reason not to consider them as an option for a pluggable refs system. But I also do not see a reason to inflict their costs on people who do not have those problems. And that is why Michael's email is about _pluggable_ ref backends, and not let's convert git to sqlite. I do not even know if sqlite is going to end up as an interesting option. But it will be nice to be able to experiment with it easily due to git's ref code becoming more modular. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
Jeff King p...@peff.net writes: On Mon, Mar 10, 2014 at 05:14:02PM +0100, David Kastrup wrote: [storing refs in sqlite] Of course, the basic premise for this feature is let's assume that our file and/or operating system suck at providing file system functionality at file name granularity. There have been two historically approaches to that problem that are not independent: a) use Linux b) kick Linus. You didn't define suck here, but there are a number of issues with the current ref storage system. Here is a sampling: 1. The filesystem does not present an atomic view of the data (e.g., you read a, then while you are reading b, somebody else updates a; your view is one that never existed at any point in time). If there are no system calls suitable for addressing this problem that fundamentally concerns the use of the file system as a file-name addressed data store, I don't see why kick Linus would not apply here. 2. Using the filesystem creates D/F conflicts between branches foo and foo/bar. Because this name is a primary key even for the reflogs, we cannot easily persist reflogs after the ref is removed. That actually sounds more like kick Junio territory (the wonderful times when kick Linus could achieve almost anything are over). To wit: this sounds like a design shortcoming in Git's use of filesystems, not something that is actually inherent in the use of files. 3. We use packed-refs in conjunction with loose ones to achieve reasonable performance when there are a large number of refs. The scheme for determining the current value of a ref is complicated and error-prone (we had several race conditions that caused real data loss). Again, that sounds like we are talking about a scenario that is not a problem of files inherently but rather of Git's ways of managing them. Those things can be solved through better support from the filesystem. But they were also solved decades ago by relational databases. Relational databases that are not implemented on raw storage managed by database servers will still map their operations to file operations. But they are also a proven technology for solving exactly the sorts of problems that some people are having with git. I do not see a reason not to consider them as an option for a pluggable refs system. But I think it would be wrong to try solving 2. above at the database level when its actual problem lies with the reference-filename mapping scheme. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On 03/10/2014 04:52 PM, Jeff King wrote: On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote: * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. I think it's important to distinguish between two types of backend: * Exotic backends, optimized for servers, or embedded systems, or other controlled environments where the person deploying Git can decide about the whole technology stack. Here I say let a thousand flowers bloom. If user A wants to try an Oracle backend and only uses JGit, there's no need for him to implement the equivalent backend for git-core or libgit2. * Mainstream backends, intended for use by end-users on their workstations and notebooks. Such backends will be pretty worthless if they are not supported more or less universally, because one user will want to use the command line and Eclipse, another Visual Studio and TortoiseGit, a third will use GitHub for Mac plus a bunch of shell scripts written by his IT department. A backend that is not supported by the big three Git implementations (git-core, libgit2, and JGit) will probably be rejected by users. Realistically there will be at most a couple of mainstream backends--in fact probably usually a single established one and occasionally a single next-generation one waiting for people to migrate slowly to it. For mainstream backends I think it is important for the implementations to plan and coordinate ahead of time to make sure everybody's concerns are addressed. It sounds to me like Shawn is saying please don't make a SQLite-based backend the new default git-core backend and Peff is saying there is no reason that a Git hosting service shouldn't experiment with a SQLite-based backend. I see no contradiction there [1]. Also, please remember that I'm not advocating a SQLite backend or any other at this time. I'm only refactoring code to open the way for *future* flamefests :-) Michael [1] There might of course be a technical argument about whether a SQLite-based backend would be SO AWESOME for end-users that switching to it would be worth the extra inconvenience for the JGit folks. Personally I'm skeptical. -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/WIP] Pluggable reference backends
On Mon, Mar 10, 2014 at 2:07 PM, Michael Haggerty mhag...@alum.mit.edu wrote: On 03/10/2014 04:52 PM, Jeff King wrote: On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote: * Store references in a SQLite database, to get correct transaction handling. No to SQLLite in git-core. Using it from JGit requires building SQLLite and a JNI wrapper, which makes JGit significantly less portable. I know SQLLite is pretty amazing, but implementing compatibility with it from JGit will be a big nightmare for us. That seems like a poor reason not to implement a pluggable feature for git-core. If we implement it, then a site using only git-core can take advantage of it. Sites with JGit cannot, and would use a different pluggable storage mechanism that's supported by both. But if we don't implement, it hurts people using only git-core, and it does not help sites using JGit at all. I think it's important to distinguish between two types of backend: * Exotic backends, optimized for servers, or embedded systems, or other controlled environments where the person deploying Git can decide about the whole technology stack. Here I say let a thousand flowers bloom. If user A wants to try an Oracle backend and only uses JGit, there's no need for him to implement the equivalent backend for git-core or libgit2. FWIW I have been running JGit derived servers using Google Bigtable for reference storage for years. So yes in this sort of environment let people do what they think is best for them. * Mainstream backends, intended for use by end-users on their workstations and notebooks. Such backends will be pretty worthless if they are not supported more or less universally, because one user will want to use the command line and Eclipse, another Visual Studio and TortoiseGit, a third will use GitHub for Mac plus a bunch of shell scripts written by his IT department. A backend that is not supported by the big three Git implementations (git-core, libgit2, and JGit) will probably be rejected by users. Realistically there will be at most a couple of mainstream backends--in fact probably usually a single established one and occasionally a single next-generation one waiting for people to migrate slowly to it. For mainstream backends I think it is important for the implementations to plan and coordinate ahead of time to make sure everybody's concerns are addressed. Yes, this was my real concern. Eclipse users using EGit expect EGit to be compatible with git-core at the filesystem level so they can do something in EGit then switch to a shell and bang out a command, or run a script provided by their project or co-worker. Build systems often integrate with Git to e.g. embed `git describe` output into the binary. In mainstream use cross compatibility of the tools within a single working directory is something that I think users have come to expect. It sounds to me like Shawn is saying please don't make a SQLite-based backend the new default git-core backend and Peff is saying there is no reason that a Git hosting service shouldn't experiment with a SQLite-based backend. I see no contradiction there [1]. Yes. :-) Also, please remember that I'm not advocating a SQLite backend or any other at this time. I'm only refactoring code to open the way for *future* flamefests :-) Michael [1] There might of course be a technical argument about whether a SQLite-based backend would be SO AWESOME for end-users that switching to it would be worth the extra inconvenience for the JGit folks. Personally I'm skeptical. If it was really that amazing, yes, we would probably support it in JGit for those that need that amazing. But I tend to think we can (usually) find a simpler format that would provide many of the same benefits with less of the drawbacks of locking the data up into SQLLite's file format. I'm with Peff, I kind of like the fact that most of the Git data is easy to inspect by hand, or with some simple tools written in Git's source tree. Starting with go get this other SQLLite tool first then write this code is a lot less fun. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html