Re: Pluggable backends for refs,wip

2014-08-08 Thread Ronnie Sahlberg
On Thu, Aug 7, 2014 at 5:57 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 08/05/2014 02:40 PM, Ronnie Sahlberg wrote:
 Please see
 https://github.com/rsahlberg/git/tree/backend-struct-db-2
 for an example of a pluggable backend for refs storage.

 This series contain changes to make it possible to add new backends
 for handling/storage of refs and implements one new backend :
 refs-be-be.c .

 This new backend offloads the actual refs handling to a small database
 daemon with which ita talks via a very simple rpc protocol. That
 daemon in turn then connects to the datastore and read/writes the
 values to it.
 [...]

 Ronnie,

 This is awesome!  Congratulations on your progress.

 I'm still on vacation and haven't yet looked at the code.  I will be
 back next week and hope to find time to check it out, and also to do
 some more review of the code that you have already submitted to git core.

Thanks!



 Have you thought about how to test alternate reference backends?  This
 will be very important to getting one or more of them accepted into git
 core (not to mention giving people confidence to actually *use* them!)

I have thought about it and also done some experiments.
For the initial git support, I think we first should try to get the
pluggable backend support
into git, and also the work to change the current files backend into a
built-in pluggable backend.

I.e. get everything in the
https://github.com/rsahlberg/git/tree/backend-struct-db-2
branch except the last three patches.
That brings us to a stage where we have pluggable backend support and
we have one backend, the files backend, that works just like today.

The last three patches in that series are then just confirmation that
the pluggable backend approach works and we can add that a little
later once we finish tests and other things.



For tests there are the issues with git-clone and git-init
requiring two additional arguments in order to set up and initialize a
repository to use the database daemon backend.
Other future backends I would imagine would have similar needs.
The way I handle in the experiments I did was to use two new
environment variables GIT_INIT and GIT_CLONE that would default to
git-clone and git-init respectively
and then just override them with GIT_INIT=git-init
--db-repo-name=ROCKy --db-socket=/tmp/refsd.socket when I wanted the
tests to initialize a database backend repository.
This required some updates to test-lib.sh and test-lib-functions.sh as
well as the tests themself to use ${GIT_INIT} instead of git-init
directly.

I am not sure what is the best approach here is and would love if you
could help out with this once we get the basic pluggable backend stuff
in.




 It seems to me that a few steps are needed:

 * Each backend would need a suite of backend-aware tests that verify
 proper operation *within* the backend.  These tests would mostly use
 low-level plumbing commands like update-refs to create/modify/delete
 references, and would be allowed to grub around in the filesystem, talk
 directly with the database, etc. to make sure that the commands have the
 correct effects.  For example, for the traditional filesystem backend,
 these tests would be the ones to check that creating a reference causes
 a file to spring into existence under $GIT_DIR/refs.

Yes.
Quite a few tests do muck around with the files directly. Some for
good reasons but I think there are a lot of cases where the tests do
it just out of convenience.

For this we will need to convert the tests that don't strictly need to
muck around with the files to use a backend agnostic method to do the
same checks.
For the tests that are truly testing the backend itself, such as a
hypothetical test to check that a symbolic link to a ref behaves as it
should, we will need a mechanism where we can conditionalize the tests
based on what is the current backend.
So lots of if backend == database then skip this test



 The tests for pack-refs, and all tests that care about the distinction
 between packed and loose refs, would become part of the backend-aware
 tests for the filesystem backend.

 All of the backend-aware tests should be run every time the test suite
 is run (provided, of course, that the correct prerequisites are
 available, and subject to being turned off manually).

 * The rest of the test suite has to be made backend-agnostic.  For
 example, such tests should *not* be allowed to look under $GIT_DIR for
 the existence/absence of loose reference files [1] but would rather have
 to inquire about references via git commands.

 * It should be possible for the developer to choose easily which
 reference backend to use when running the agnostic part of the test
 suite.  The chosen backend should be used to run *all* backend-agnostic
 tests.


Agree.
It would be great if we could work on this together.


 A database-backed backend might even want to be testable in two modes:
 one with the DB daemon running constantly, and one where the daemon is
 

Re: Pluggable backends for refs,wip

2014-08-07 Thread Michael Haggerty
On 08/05/2014 02:40 PM, Ronnie Sahlberg wrote:
 Please see
 https://github.com/rsahlberg/git/tree/backend-struct-db-2
 for an example of a pluggable backend for refs storage.
 
 This series contain changes to make it possible to add new backends
 for handling/storage of refs and implements one new backend :
 refs-be-be.c .
 
 This new backend offloads the actual refs handling to a small database
 daemon with which ita talks via a very simple rpc protocol. That
 daemon in turn then connects to the datastore and read/writes the
 values to it.
 [...]

Ronnie,

This is awesome!  Congratulations on your progress.

I'm still on vacation and haven't yet looked at the code.  I will be
back next week and hope to find time to check it out, and also to do
some more review of the code that you have already submitted to git core.


Have you thought about how to test alternate reference backends?  This
will be very important to getting one or more of them accepted into git
core (not to mention giving people confidence to actually *use* them!)

It seems to me that a few steps are needed:

* Each backend would need a suite of backend-aware tests that verify
proper operation *within* the backend.  These tests would mostly use
low-level plumbing commands like update-refs to create/modify/delete
references, and would be allowed to grub around in the filesystem, talk
directly with the database, etc. to make sure that the commands have the
correct effects.  For example, for the traditional filesystem backend,
these tests would be the ones to check that creating a reference causes
a file to spring into existence under $GIT_DIR/refs.

The tests for pack-refs, and all tests that care about the distinction
between packed and loose refs, would become part of the backend-aware
tests for the filesystem backend.

All of the backend-aware tests should be run every time the test suite
is run (provided, of course, that the correct prerequisites are
available, and subject to being turned off manually).

* The rest of the test suite has to be made backend-agnostic.  For
example, such tests should *not* be allowed to look under $GIT_DIR for
the existence/absence of loose reference files [1] but would rather have
to inquire about references via git commands.

* It should be possible for the developer to choose easily which
reference backend to use when running the agnostic part of the test
suite.  The chosen backend should be used to run *all* backend-agnostic
tests.

A database-backed backend might even want to be testable in two modes:
one with the DB daemon running constantly, and one where the daemon is
stopped and started between each pair of Git commands.

So after the changes, a single run of the test suite should run the
backend-aware tests for *all* known backends followed by the
backend-agnostic tests for a single selected backend.

Michael

[1] When I was working on my quagga-reference spike [2] I found that a
lot of the test suite uses knowledge about how references and reflogs
are stored by the filesystem backend and just grabs at the files rather
than accessing the references using git commands.  It will take some
work to clean this up.

[2] http://thread.gmane.org/gmane.comp.version-control.git/243726

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pluggable backends for refs,wip

2014-08-05 Thread Nico Williams
Personally (a user of, not a maintainer of, git) I really want some
alternative backends.  In particular I'm after something like Fossil's
use of SQLite3; I want a SQLite3 backend for several reasons, not the
least of which is the power of SQL for looking at history.

I'm not sure that I necessarily want a daemon/background process.  I
get the appeal (add inotify and bingo, very fast git status, always),
but it seems likely to add obnoxious failure modes.

As to a SQLite3-type backend, I am of two minds: either add it as a
bolt-on to the builtin backend, or add it as a first-class backend
that replaces the builtin one.  The former is nice because the SQLite3
DB becomes more of a cache/index and query engine than a store, and
can be used without migrating any repos, but the latter is also nice
because SQLite3 provides strong ACID transactional semantics on local
filesystems.

Nico
--
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pluggable backends for refs,wip

2014-08-05 Thread Ronnie Sahlberg
On Tue, Aug 5, 2014 at 2:56 PM, Nico Williams n...@cryptonector.com wrote:
 Personally (a user of, not a maintainer of, git) I really want some
 alternative backends.  In particular I'm after something like Fossil's
 use of SQLite3; I want a SQLite3 backend for several reasons, not the
 least of which is the power of SQL for looking at history.

 I'm not sure that I necessarily want a daemon/background process.  I
 get the appeal (add inotify and bingo, very fast git status, always),
 but it seems likely to add obnoxious failure modes.

 As to a SQLite3-type backend, I am of two minds: either add it as a
 bolt-on to the builtin backend, or add it as a first-class backend
 that replaces the builtin one.  The former is nice because the SQLite3
 DB becomes more of a cache/index and query engine than a store, and
 can be used without migrating any repos, but the latter is also nice
 because SQLite3 provides strong ACID transactional semantics on local
 filesystems.


This will allow you to do either or both, depending on what you want.

I am adding one new first-class backend to talk to a separate daemon :
  refs-be-db.c
which then talks to a separate daemon   refsd-tdb.c

refsd-tdb.c is 7 RPCs and ~500 lines of code for a naive
implementation for a standalone separate daemon implementation.


If you rather want want a new first-class backend builtin to git
itself instead of as a separate daemon, then that will be possible
too.
It just means that you will have to base the work on refs-be-db.c
which is a much larger and complex code base than refsd-tdb.c.

But yeah, once this work is finished, you will be able to build new
first-class ref backends if you so wish.
Please see refs-be-db.c  that is the file and the methods you will
need to implement in order to have a first-class SQL* backend.


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pluggable backends for refs,wip

2014-08-05 Thread Nico Williams
Excellent.  Thanks!
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html