subject:"\[RFC\/WIP\] Pluggable reference backends"

egit vs. git behaviour (was: [RFC/WIP] Pluggable reference backends)

2014-03-12 Thread Andreas Krey

On Mon, 10 Mar 2014 19:39:00 +, Shawn Pearce wrote:
 Yes, this was my real concern. Eclipse users using EGit expect EGit to
 be compatible with git-core at the filesystem level so they can do
 something in EGit then switch to a shell and bang out a command, or
 run a script provided by their project or co-worker.

A question: Where to ask/report problems with that?

We're currently running into problems that egit doesn't push to where
git would when the local and remote branches aren't the same name. It
seems that egit ignores the branch.*.merge settings. Or push.default?

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-12 Thread Michael Haggerty

Karsten,

Thanks for your feedback!

On 03/11/2014 11:56 AM, Karsten Blees wrote:
 Am 10.03.2014 12:00, schrieb Michael Haggerty:
 
 Reference transactions --
 
 Very cool ideas indeed.
 
 However, I'm concerned a bit that transactions are conceptual
 overkill. How many concurrent updates do you expect in a repository?
 Wouldn't a single repo-wide lock suffice (and be _much_ simpler to
 implement with any backend, esp. file-based)?

I am mostly thinking about long-running processes, like gc and
prune-refs, which need to be made race-free without blocking other
processes for the whole time they are running (whereas it might be quite
tolerable to have them fail or only complete part of their work in any
given invocation).  Also, I work at GitHub, where we have quite a few
repositories, some of which are quite active :-)

Remember that I'm not yet proposing anything like hard-core ACID
reference transactions.  I'm just clearing the way for various possible
changes in reference handling.  I listed the ideas only to whet people's
appetites and motivate the refactoring, which will take a while before
it bears any real fruit.

 The API you posted in [1] doesn't look very much like a transaction
 API either (rather like batch-updates). E.g. there's no rollback, the
 queue* methods cannot report failure, and there's no way to read a
 ref as part of the transaction. So I'm afraid that backends that
 support transactions out of the box (e.g. RDBMSs) will be hard to
 adapt to this.

Gmane is down at the moment but I assume you are referring to my patch
series and the ref_transaction implementation therein.

No explicit rollback is necessary at this stage, because the commit
function first locks all of the references that it wants to change
(first verifying that they have the expected values), and then modifies
them all.  By the time the references are locked, the whole transaction
is guaranteed to succeed [1].  If the locks can't all be acquired, then
any locks that were obtained are released.

If a caller wants to rollback a transaction, it only needs to free the
transaction instead of committing.  I should probably make that clearer
by renaming free_ref_transaction() to rollback_ref_transaction().  By
the time we start implementing other reference backends, that function
will of course have to do more.  For that matter, maybe
create_ref_transaction() should be renamed to begin_ref_transaction().
Now would be a good time for concrete bikeshedding suggestions about
function names or other details of the API :-)

Yes, the queue_*() methods should probably later make a preliminary
check of the reference's old value and return an error if the expected
value is already incorrect.  This would allow callers to fail fast if
the transaction is doomed to failure.  But that wasn't needed yet for
the one existing caller, which builds up a transaction and commits it
immediately, so I didn't implement it yet.  And the early checks would
add overhead for this caller, so maybe they should be optional anyway.
Maybe these functions should already be declared to return an error
status, but there should be an option passed to create_ref_transaction()
that selects whether fast checks should be performed or not for that
transaction.

Really, all that this first patch series does is put a different API
around the mechanism that was already there, in update_refs().  There
will be a lot more steps before we see anything approaching real
reference transactions.  But I think your (implied) suggestion, to make
the API more reminiscent of something like database transactions, is a
good one and I will work on it.

Cheers,
Michael

[1] Guaranteed here is of course relative.  The commit could still
fail due to the process being killed, disk errors, etc.  But it can't
fail due to lock contention with another git process.

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: egit vs. git behaviour (was: [RFC/WIP] Pluggable reference backends)

2014-03-12 Thread Shawn Pearce

On Wed, Mar 12, 2014 at 3:26 AM, Andreas Krey a.k...@gmx.de wrote:
 On Mon, 10 Mar 2014 19:39:00 +, Shawn Pearce wrote:
 Yes, this was my real concern. Eclipse users using EGit expect EGit to
 be compatible with git-core at the filesystem level so they can do
 something in EGit then switch to a shell and bang out a command, or
 run a script provided by their project or co-worker.

 A question: Where to ask/report problems with that?

EGit developers have a bug tracker, from:

  http://eclipse.org/egit/support/

We see File a bug with a link to:

  
https://bugs.eclipse.org/bugs/enter_bug.cgi?product=EGitrep_platform=Allop_sys=All

 We're currently running into problems that egit doesn't push to where
 git would when the local and remote branches aren't the same name. It
 seems that egit ignores the branch.*.merge settings. Or push.default?

I think this is just missing code in EGit. Its probable they already
know about it, or many of them don't use these features in .git/config
and thus don't realize they are missing.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-11 Thread Karsten Blees

Am 10.03.2014 12:00, schrieb Michael Haggerty:
 
 Reference transactions
 --
 

Very cool ideas indeed.

However, I'm concerned a bit that transactions are conceptual overkill. How 
many concurrent updates do you expect in a repository? Wouldn't a single 
repo-wide lock suffice (and be _much_ simpler to implement with any backend, 
esp. file-based)?

The API you posted in [1] doesn't look very much like a transaction API either 
(rather like batch-updates). E.g. there's no rollback, the queue* methods 
cannot report failure, and there's no way to read a ref as part of the 
transaction. So I'm afraid that backends that support transactions out of the 
box (e.g. RDBMSs) will be hard to adapt to this.

Just my 2cents,
Karsten

[1] http://article.gmane.org/gmane.comp.version-control.git/243748


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC/WIP] Pluggable reference backends

2014-03-10 Thread Michael Haggerty

I have started working on pluggable ref backends.  In this email I
would like to share my plans and solicit feedback.

(This morning I removed this project from the GSoC ideas page, because
it is unfair to ask a student to shoot at a moving target.)

Why?


Currently, the reference- and reflog-handling code in Git is too
coupled to the rest of the system.  There are too many places that
know, for example, the difference between loose and packed refs, or
that loose references are stored as files directly under
$GIT_DIR/refs/heads/, or the locking protocols that have to be adhered
to when managing references.  This tight coupling, in turn, makes it
nearly impossible to experiment with alternate reference storage
schemes.

But there is a lot of potential to use alternate reference storage
schemes to fix some currently-unfixable problems, and to implement
some cool new features.

Unfixable problems
--

The on-disk format that we currently use to store references makes
some problems impossible to fix:

* It is impossible to get a self-consistent snapshot of all references
  at a given moment in time.  This makes it impossible, even in
  principle, to do object pruning in a 100% race-free way.  (Our
  current workaround of not deleting objects that are less than two
  weeks works in most cases but, aside from being ugly, has holes.

* There are awkward filesystem-imposed constraints on reference
  naming, for example:

  * D/F conflicts (I): it is not possible to have branches named
my-feature and my-feature/base at the same time.

  * D/F conflicts (II): it is not possible to have reflogs for
branches named my-feature and my-feature/base at the same
time.  This leads to the problem that it is not, in general,
possible to retain reflogs for branches that have been deleted.

  * There are additional constraints on reference names depending on
the filesystem used to store them.  For example, a Git repository
on a case-insensitive filesystem fails in confusing ways if there
are two loose references whose names differ only in case; however,
packed references differing in case might work for a while.  Also,
reference names that include Unicode characters can have their
normalization form changed if they are written on Mac OS.

* The packed-refs file has to be rewritten whenever a packed reference
  is deleted.  It might be nice to write 0{40} to a loose reference
  file to indicate that the reference has been deleted, but that would
  open the way for more D/F conflicts.)

Wild new ideas
--

So, I would like to reorganize the Git code to allow pluggable
reference backends.  If we had this, we could try out ideas like

* Retain the idea of loose/packed references, but encode loose
  reference names using a portable naming scheme before storing them
  to the filesystem; maybe something like

  refs/heads/Foo.42 - refs.dir/heads.dir/%46oo%2e42
  logs/refs/heads/Foo.42 - refs.dir/heads.dir/%46oo%2e42.log

  Yes, it looks uglier.  But users shouldn't be looking in these
  directories anyway.  This single change would prevent D/F conflicts,
  allow a reference to be deleted by writing 0{40} to its loose
  reference file, allow reflogs to be kept for deleted refs, and
  remove the problem of filesystem-dependent naming constraints.

* Store references in a SQLite database, to get correct transaction
  handling.

* Store references directly in the Git object database.

* Implement repository groups that share a common object database
  and also a common reference store.  Each repository in a group would
  get a sub-namespace in the shared database, and store its references
  in names like refs/member/$MEMBERID/refs/heads/  The member
  repos would act like restricted views of the shared database.  This
  would be like a combination between alternates (with lowered risk of
  corruption) and gitnamespaces(7) (but usable for all git commands).

* Reference transactions that can be used across multiple Git
  commands.  Imagine,

  export GIT_TRANSACTION=$(git transaction begin)
  trap 'git transaction rollback' ERR
  git foo ...
  git bar ...
  git baz ...
  if ! git transaction commit
  then
  # Transaction failed; all references rolled back
  else
  # Transaction succeeded; all references updated atomically
  fi
  trap '' ERR
  unset GIT_TRANSACTION

  The GIT_TRANSACTION environment variable would tell git to read
  from the usual references, overridden with any reference changes
  that have occurred during the transaction, but write any changes
  (including both old and new values) to the transaction.  The command
  git transaction commit would verify that the old values listed in
  the transaction still agree with the current values, and then make
  all of the changes atomically.

  Such transactions could also be broadcast to mirrors when they are
  committed to keep multiple Git

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Johan Herland

On Mon, Mar 10, 2014 at 12:00 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 I have started working on pluggable ref backends.  In this email I
 would like to share my plans and solicit feedback.

No comments or useful feedback yet, except that I enthusiastically
approve of the objective and the plan you have for how to get there.


...Johan

-- 
Johan Herland, jo...@herland.net
www.herland.net
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Shawn Pearce

On Mon, Mar 10, 2014 at 4:00 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 I have started working on pluggable ref backends.  In this email I
 would like to share my plans and solicit feedback.

Yay!

JGit already has pluggable ref backends, so it is good to see this
starting in git-core.

FWIW the Gerrit Code Review community is interested in this project.

 * Store references in a SQLite database, to get correct transaction
   handling.

No to SQLLite in git-core. Using it from JGit requires building
SQLLite and a JNI wrapper, which makes JGit significantly less
portable. I know SQLLite is pretty amazing, but implementing
compatibility with it from JGit will be a big nightmare for us.

 * Reference transactions that can be used across multiple Git
   commands.  Imagine,

   export GIT_TRANSACTION=$(git transaction begin)
   trap 'git transaction rollback' ERR
   git foo ...
   git bar ...
   git baz ...
   if ! git transaction commit
   then
   # Transaction failed; all references rolled back
   else
   # Transaction succeeded; all references updated atomically
   fi
   trap '' ERR
   unset GIT_TRANSACTION

   The GIT_TRANSACTION environment variable would tell git to read
   from the usual references, overridden with any reference changes
   that have occurred during the transaction, but write any changes
   (including both old and new values) to the transaction.  The command
   git transaction commit would verify that the old values listed in
   the transaction still agree with the current values, and then make
   all of the changes atomically.

Yay!

Gerrit Code Review really wants to get transactions implemented. So I
am very much in favor of trying to improve the situation in git-core.

We want not only a transaction over 2+ references in the same
repository, but we also want to perform transactions across
repositories. Consider a git submodule child and parent being updated
at the same time. We really want to update refs/heads/master in both
repositories atomically at the central server.

   Such transactions could also be broadcast to mirrors when they are
   committed to keep multiple Git repositories in sync.

Ooh, this would be very interesting.

 Git hosters [1] will be likely to take advantage of alternate
 reference backends pretty easily, because they know which tools touch
 their repositories and need only update those tools.  It is expected
 that alternate reference backends will be useful for hosters even if
 they don't become practical for end-users.

Alternate reference backends are absolutely useful to large hosters.
The loose reference format isn't very scalable. The packed-refs helps,
but you can do better. IIRC our android.googlesource.com reference
backend uses only 79 bytes per reference on average, including both
the name string and the value. This super compact format is easy to
hold in RAM for hundreds of busy repositories.

 For end-users it is important that their repository be readable by all
 of the tools that they use.  So if we want to make a new format a
 viable option for normal Git users (let alone make it the new default
 format), some coordination will be needed between all of the
 commonly-used Git implementations (git-core, libgit2, JGit, and maybe
 Dulwich, Grit, ...).  Whether or not this happens in real life depends
 on how advantageous the hypothetical new format is to Git users and is
 beyond the scope of this proposal.

It is sad we have this many implementations, but as one of the authors
(JGit) I am happy to at least see you are worrying about compatibility
with them.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Max Horn


On 10.03.2014, at 15:30, Shawn Pearce spea...@spearce.org wrote:

 On Mon, Mar 10, 2014 at 4:00 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 I have started working on pluggable ref backends.  In this email I
 would like to share my plans and solicit feedback.
 
 Yay!

Yay, too!

 JGit already has pluggable ref backends, so it is good to see this
 starting in git-core.
 
 FWIW the Gerrit Code Review community is interested in this project.
 
 * Store references in a SQLite database, to get correct transaction
  handling.
 
 No to SQLLite in git-core. Using it from JGit requires building
 SQLLite and a JNI wrapper, which makes JGit significantly less
 portable. I know SQLLite is pretty amazing, but implementing
 compatibility with it from JGit will be a big nightmare for us.

I understood this as an example (indeed, it is listed under Wile new ideas), 
not a proposal to put this into the git core. It might be an interesting 
experiment in any case, and if the proposed modularity is truly achieved, it 
could (if there was any interest in it, that is) be implemented in an external 
3rd party project.


Anyway, I am quite excited about this project. Usually, I am quite skeptical 
about such large scope ideas (Yeah, cool idea, but who will pull it off, and 
with which resources?). But this one seems to have a good chance of being 
implemented gradually and inside the main repository, with the help of feature 
flags. 

Thus, I am looking forward to Michael's announced initial patch series. I feel 
that I don't know enough yet about git overall to be of much help on my own at 
this point. But perhaps over time some mini- or micro-projects pop up were 
others can help (e.g. adapt these 50 tests to work with the 'quagga' ref); if 
they are pointed out (assuming that doing so isn't more work than just 
addressing them yourself ;-), I am willing to help out.


Cheers,
Max


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Jeff King

On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote:

  * Store references in a SQLite database, to get correct transaction
handling.
 
 No to SQLLite in git-core. Using it from JGit requires building
 SQLLite and a JNI wrapper, which makes JGit significantly less
 portable. I know SQLLite is pretty amazing, but implementing
 compatibility with it from JGit will be a big nightmare for us.

That seems like a poor reason not to implement a pluggable feature for
git-core. If we implement it, then a site using only git-core can take
advantage of it. Sites with JGit cannot, and would use a different
pluggable storage mechanism that's supported by both. But if we don't
implement, it hurts people using only git-core, and it does not help
sites using JGit at all.

That's assuming that attention spent on implementing the feature does
not take away from implementing some other parallel scheme that does the
same thing but does not use SQLite. I don't know what that would be
offhand; mapping the ref and reflog into a relational database is pretty
simple, and we get a lot of robustness and efficiency benefits for free.
We could perhaps have some kind of relational backend could use an
ODBC-like abstraction to point to a database. I have no idea if people
would want to ever store refs in a real server-backend RDBMS, but I
suspect Java has native support for such things.

Certainly I think we should aim for compatibility where we can, but if
there's not a compatible way to do something, I don't think the
limitations of one platform should drag other ones down. And that goes
both ways; we had to reimplement disk-compatible EWAH from scratch in C
for git-core to have bitmaps, whereas JGit just got to use a ready-made
library. I don't think that was a bad thing.  People in
mixed-implementation environments couldn't use it, but people with
JGit-only environments were free to take advantage of it.

At any rate, the repository needs to advertise this is the ref storage
mechanism I use in the config. We're going to need to bump
core.repositoryformatversion for such cases (because an old version of
git should not blindly lock and write to a refs/ directory that nobody
else is ever going to look at). And I'd suggest with that bump adding in
something like core.refstorage, so that an implementation can say
foobar ref storage? Never heard of it and barf. Whether it's because
that implementation doesn't support foobar, because it's an old
version that doesn't understand foobar yet, or because it was simply
built without foobar support.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread David Kastrup

Jeff King p...@peff.net writes:

 On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote:

  * Store references in a SQLite database, to get correct transaction
handling.
 
 No to SQLLite in git-core. Using it from JGit requires building
 SQLLite and a JNI wrapper, which makes JGit significantly less
 portable. I know SQLLite is pretty amazing, but implementing
 compatibility with it from JGit will be a big nightmare for us.

 That seems like a poor reason not to implement a pluggable feature for
 git-core. If we implement it, then a site using only git-core can take
 advantage of it. Sites with JGit cannot, and would use a different
 pluggable storage mechanism that's supported by both. But if we don't
 implement, it hurts people using only git-core, and it does not help
 sites using JGit at all.

Of course, the basic premise for this feature is let's assume that our
file and/or operating system suck at providing file system functionality
at file name granularity.  There have been two historically approaches
to that problem that are not independent: a) use Linux b) kick Linus.

Option b) has been fairly successful over quite a bit of time, but at
the current point of time, it has become harder to aim that kick on a
single person and/or where it counts.

The database approach is an alternative approach based on kicking an
alternate set of people, namely database rather than operating system
providers, based on the assumption that the former have softer behinds
(the backend-based approach) making them more sensitive to kicking.

So the database approach is most promising on the what are we going to
do if our operating system vendor won't bother with sensible file system
performance angle.  Which isn't doing total system architecture a
favor.

Personally, I have little sympathy for helping subpar systems, keeping
them on life support while they are in turn trying to squish the better
systems.

But then it is not me doing the actual work, so this is no more than an
idle reflection.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread David Lang


On Mon, 10 Mar 2014, David Kastrup wrote:


Jeff King p...@peff.net writes:


On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote:


* Store references in a SQLite database, to get correct transaction
  handling.


No to SQLLite in git-core. Using it from JGit requires building
SQLLite and a JNI wrapper, which makes JGit significantly less
portable. I know SQLLite is pretty amazing, but implementing
compatibility with it from JGit will be a big nightmare for us.


That seems like a poor reason not to implement a pluggable feature for
git-core. If we implement it, then a site using only git-core can take
advantage of it. Sites with JGit cannot, and would use a different
pluggable storage mechanism that's supported by both. But if we don't
implement, it hurts people using only git-core, and it does not help
sites using JGit at all.


Of course, the basic premise for this feature is let's assume that our
file and/or operating system suck at providing file system functionality
at file name granularity.  There have been two historically approaches
to that problem that are not independent: a) use Linux b) kick Linus.


As a note, if this is done properly, it could allow for plugins that connect to 
the underlying storage system (similar to the Facebook Mecurial change)


Even for those who don't have the $ storage arrays, there may be other 
storage specific hacks that can be done to detect that files haven't changed.


For example, with btrfs and you compile into a different directory thatn your 
source, you may be able to detect that things didn't change by the fact that the 
filesystem didn't have to do a rewrite of the parent node.


David Lang
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Junio C Hamano

Jeff King p...@peff.net writes:

 On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote:

  * Store references in a SQLite database, to get correct transaction
handling.
 
 No to SQLLite in git-core. Using it from JGit requires building
 SQLLite and a JNI wrapper, which makes JGit significantly less
 portable. I know SQLLite is pretty amazing, but implementing
 compatibility with it from JGit will be a big nightmare for us.

 That seems like a poor reason not to implement a pluggable feature for
 git-core. If we implement it, then a site using only git-core can take
 advantage of it. Sites with JGit cannot, and would use a different
 pluggable storage mechanism that's supported by both. But if we don't
 implement, it hurts people using only git-core, and it does not help
 sites using JGit at all.

We would need to eventually have at least one backend that we know
will play well with different Git implementations that matter
(namely, git-core, Jgit and libgit2) before the feature can be
widely adopted.

The first backend that is used while the plugging-interface is in
development can be anything and does not have to be one that
eventual ubiquitous one, however; as long as it is something that we
do not mind carrying it forever, along with that final reference
backend.  I take the objection from Shawn only as against making the
sqlite that final one.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Jeff King

On Mon, Mar 10, 2014 at 10:46:01AM -0700, Junio C Hamano wrote:

  No to SQLLite in git-core. Using it from JGit requires building
  SQLLite and a JNI wrapper, which makes JGit significantly less
  portable. I know SQLLite is pretty amazing, but implementing
  compatibility with it from JGit will be a big nightmare for us.
 
  That seems like a poor reason not to implement a pluggable feature for
  git-core. If we implement it, then a site using only git-core can take
  advantage of it. Sites with JGit cannot, and would use a different
  pluggable storage mechanism that's supported by both. But if we don't
  implement, it hurts people using only git-core, and it does not help
  sites using JGit at all.
 
 We would need to eventually have at least one backend that we know
 will play well with different Git implementations that matter
 (namely, git-core, Jgit and libgit2) before the feature can be
 widely adopted.

I assumed that the current refs/ and logs/ code, massaged into pluggable
backend form, would be the first such. And I wouldn't be surprised to
see some iteration on that once it is easier to move from scheme to
scheme (e.g., to use some encoding of the names on the filesystem to
avoid D/F conflicts, and thus allow reflogs for deleted refs).

 The first backend that is used while the plugging-interface is in
 development can be anything and does not have to be one that
 eventual ubiquitous one, however; as long as it is something that we
 do not mind carrying it forever, along with that final reference
 backend.  I take the objection from Shawn only as against making the
 sqlite that final one.

Sure, I'd agree with that. I'd think something like an sqlite interface
would be mainly of interest to people running busy servers. I don't know
that it would make a good default.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Jeff King

On Mon, Mar 10, 2014 at 05:14:02PM +0100, David Kastrup wrote:

 [storing refs in sqlite]

 Of course, the basic premise for this feature is let's assume that our
 file and/or operating system suck at providing file system functionality
 at file name granularity.  There have been two historically approaches
 to that problem that are not independent: a) use Linux b) kick Linus.

You didn't define suck here, but there are a number of issues with the
current ref storage system. Here is a sampling:

  1. The filesystem does not present an atomic view of the data (e.g.,
 you read a, then while you are reading b, somebody else updates
 a; your view is one that never existed at any point in time).

  2. Using the filesystem creates D/F conflicts between branches foo
 and foo/bar. Because this name is a primary key even for the
 reflogs, we cannot easily persist reflogs after the ref is removed.

  3. We use packed-refs in conjunction with loose ones to achieve
 reasonable performance when there are a large number of refs. The
 scheme for determining the current value of a ref is complicated
 and error-prone (we had several race conditions that caused real
 data loss).

Those things can be solved through better support from the filesystem.
But they were also solved decades ago by relational databases.

I generally avoid databases where possible. They lock your data up in a
binary format that you can't easily touch with standard unix tools. And
they introduce complexity and opportunity for bugs.

But they are also a proven technology for solving exactly the sorts of
problems that some people are having with git. I do not see a reason not
to consider them as an option for a pluggable refs system. But I also do
not see a reason to inflict their costs on people who do not have those
problems. And that is why Michael's email is about _pluggable_ ref
backends, and not let's convert git to sqlite.

I do not even know if sqlite is going to end up as an interesting
option. But it will be nice to be able to experiment with it easily due
to git's ref code becoming more modular.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread David Kastrup

Jeff King p...@peff.net writes:

 On Mon, Mar 10, 2014 at 05:14:02PM +0100, David Kastrup wrote:

 [storing refs in sqlite]

 Of course, the basic premise for this feature is let's assume that our
 file and/or operating system suck at providing file system functionality
 at file name granularity.  There have been two historically approaches
 to that problem that are not independent: a) use Linux b) kick Linus.

 You didn't define suck here, but there are a number of issues with the
 current ref storage system. Here is a sampling:

   1. The filesystem does not present an atomic view of the data (e.g.,
  you read a, then while you are reading b, somebody else updates
  a; your view is one that never existed at any point in time).

If there are no system calls suitable for addressing this problem that
fundamentally concerns the use of the file system as a file-name
addressed data store, I don't see why kick Linus would not apply here.

   2. Using the filesystem creates D/F conflicts between branches foo
  and foo/bar. Because this name is a primary key even for the
  reflogs, we cannot easily persist reflogs after the ref is
  removed.

That actually sounds more like kick Junio territory (the wonderful
times when kick Linus could achieve almost anything are over).  To
wit: this sounds like a design shortcoming in Git's use of filesystems,
not something that is actually inherent in the use of files.

   3. We use packed-refs in conjunction with loose ones to achieve
  reasonable performance when there are a large number of refs. The
  scheme for determining the current value of a ref is complicated
  and error-prone (we had several race conditions that caused real
  data loss).

Again, that sounds like we are talking about a scenario that is not a
problem of files inherently but rather of Git's ways of managing them.

 Those things can be solved through better support from the filesystem.
 But they were also solved decades ago by relational databases.

Relational databases that are not implemented on raw storage managed by
database servers will still map their operations to file operations.

 But they are also a proven technology for solving exactly the sorts of
 problems that some people are having with git. I do not see a reason
 not to consider them as an option for a pluggable refs system.

But I think it would be wrong to try solving 2. above at the database
level when its actual problem lies with the reference-filename mapping
scheme.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Michael Haggerty

On 03/10/2014 04:52 PM, Jeff King wrote:
 On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote:
 
 * Store references in a SQLite database, to get correct transaction
   handling.

 No to SQLLite in git-core. Using it from JGit requires building
 SQLLite and a JNI wrapper, which makes JGit significantly less
 portable. I know SQLLite is pretty amazing, but implementing
 compatibility with it from JGit will be a big nightmare for us.
 
 That seems like a poor reason not to implement a pluggable feature for
 git-core. If we implement it, then a site using only git-core can take
 advantage of it. Sites with JGit cannot, and would use a different
 pluggable storage mechanism that's supported by both. But if we don't
 implement, it hurts people using only git-core, and it does not help
 sites using JGit at all.

I think it's important to distinguish between two types of backend:

* Exotic backends, optimized for servers, or embedded systems, or other
controlled environments where the person deploying Git can decide about
the whole technology stack.  Here I say let a thousand flowers bloom.
If user A wants to try an Oracle backend and only uses JGit, there's no
need for him to implement the equivalent backend for git-core or libgit2.

* Mainstream backends, intended for use by end-users on their
workstations and notebooks.  Such backends will be pretty worthless if
they are not supported more or less universally, because one user will
want to use the command line and Eclipse, another Visual Studio and
TortoiseGit, a third will use GitHub for Mac plus a bunch of shell
scripts written by his IT department.  A backend that is not supported
by the big three Git implementations (git-core, libgit2, and JGit) will
probably be rejected by users.  Realistically there will be at most a
couple of mainstream backends--in fact probably usually a single
established one and occasionally a single next-generation one waiting
for people to migrate slowly to it.  For mainstream backends I think it
is important for the implementations to plan and coordinate ahead of
time to make sure everybody's concerns are addressed.

It sounds to me like Shawn is saying please don't make a SQLite-based
backend the new default git-core backend and Peff is saying there is
no reason that a Git hosting service shouldn't experiment with a
SQLite-based backend.  I see no contradiction there [1].

Also, please remember that I'm not advocating a SQLite backend or any
other at this time.  I'm only refactoring code to open the way for
*future* flamefests :-)

Michael

[1] There might of course be a technical argument about whether a
SQLite-based backend would be SO AWESOME for end-users that switching to
it would be worth the extra inconvenience for the JGit folks.
Personally I'm skeptical.

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/WIP] Pluggable reference backends

2014-03-10 Thread Shawn Pearce

On Mon, Mar 10, 2014 at 2:07 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 03/10/2014 04:52 PM, Jeff King wrote:
 On Mon, Mar 10, 2014 at 07:30:45AM -0700, Shawn Pearce wrote:

 * Store references in a SQLite database, to get correct transaction
   handling.

 No to SQLLite in git-core. Using it from JGit requires building
 SQLLite and a JNI wrapper, which makes JGit significantly less
 portable. I know SQLLite is pretty amazing, but implementing
 compatibility with it from JGit will be a big nightmare for us.

 That seems like a poor reason not to implement a pluggable feature for
 git-core. If we implement it, then a site using only git-core can take
 advantage of it. Sites with JGit cannot, and would use a different
 pluggable storage mechanism that's supported by both. But if we don't
 implement, it hurts people using only git-core, and it does not help
 sites using JGit at all.

 I think it's important to distinguish between two types of backend:

 * Exotic backends, optimized for servers, or embedded systems, or other
 controlled environments where the person deploying Git can decide about
 the whole technology stack.  Here I say let a thousand flowers bloom.
 If user A wants to try an Oracle backend and only uses JGit, there's no
 need for him to implement the equivalent backend for git-core or libgit2.

FWIW I have been running JGit derived servers using Google Bigtable
for reference storage for years. So yes in this sort of environment
let people do what they think is best for them.

 * Mainstream backends, intended for use by end-users on their
 workstations and notebooks.  Such backends will be pretty worthless if
 they are not supported more or less universally, because one user will
 want to use the command line and Eclipse, another Visual Studio and
 TortoiseGit, a third will use GitHub for Mac plus a bunch of shell
 scripts written by his IT department.  A backend that is not supported
 by the big three Git implementations (git-core, libgit2, and JGit) will
 probably be rejected by users.  Realistically there will be at most a
 couple of mainstream backends--in fact probably usually a single
 established one and occasionally a single next-generation one waiting
 for people to migrate slowly to it.  For mainstream backends I think it
 is important for the implementations to plan and coordinate ahead of
 time to make sure everybody's concerns are addressed.

Yes, this was my real concern. Eclipse users using EGit expect EGit to
be compatible with git-core at the filesystem level so they can do
something in EGit then switch to a shell and bang out a command, or
run a script provided by their project or co-worker. Build systems
often integrate with Git to e.g. embed `git describe` output into the
binary. In mainstream use cross compatibility of the tools within a
single working directory is something that I think users have come to
expect.

 It sounds to me like Shawn is saying please don't make a SQLite-based
 backend the new default git-core backend and Peff is saying there is
 no reason that a Git hosting service shouldn't experiment with a
 SQLite-based backend.  I see no contradiction there [1].

Yes. :-)

 Also, please remember that I'm not advocating a SQLite backend or any
 other at this time.  I'm only refactoring code to open the way for
 *future* flamefests :-)

 Michael

 [1] There might of course be a technical argument about whether a
 SQLite-based backend would be SO AWESOME for end-users that switching to
 it would be worth the extra inconvenience for the JGit folks.
 Personally I'm skeptical.

If it was really that amazing, yes, we would probably support it in
JGit for those that need that amazing.

But I tend to think we can (usually) find a simpler format that would
provide many of the same benefits with less of the drawbacks of
locking the data up into SQLLite's file format. I'm with Peff, I kind
of like the fact that most of the Git data is easy to inspect by hand,
or with some simple tools written in Git's source tree. Starting with
go get this other SQLLite tool first then write this code is a lot
less fun.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

egit vs. git behaviour (was: [RFC/WIP] Pluggable reference backends)

Re: [RFC/WIP] Pluggable reference backends

Re: egit vs. git behaviour (was: [RFC/WIP] Pluggable reference backends)

Re: [RFC/WIP] Pluggable reference backends

[RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

Re: [RFC/WIP] Pluggable reference backends

17 matches

Site Navigation

Mail list logo

Footer information