Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-15 Thread Stefan Beller
On Thu, Feb 15, 2018 at 12:42 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> This is a real take on the first part of the recent RFC[1].
>
> For the patches remaining in this series, The scope is about right
> and the size is more manageable.

ok, thanks.

>  With topics already on 'master',
> they have some interactions:
>
>  - ot/mru-on-list & gs/retire-mru
>
>The packed_git MRU has been greatly simplified by using list API
>directly.
>
>  - cc/sha1-file-name
>
>The function signature of sha1_file_name() has been updated.
>
> I could certainly carry evil merge resolutions going forward (these
> changes need to be made to object-store.h that has no mechanical
> conflicts), but it may be less distracting for later readers of the
> series if we rebase it on top of a more recent master.

I was debating when to reroll this series as I want to make the
change that Duy proposed, moving the 'ignore_env' into the
object store as well.

I'll rebase on top of the latest master or these topicc while at it.

Thanks for the heads up.
Stefan


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-15 Thread Junio C Hamano
Stefan Beller  writes:

> This is a real take on the first part of the recent RFC[1].

For the patches remaining in this series, The scope is about right
and the size is more manageable.  With topics already on 'master',
they have some interactions:

 - ot/mru-on-list & gs/retire-mru

   The packed_git MRU has been greatly simplified by using list API
   directly.

 - cc/sha1-file-name

   The function signature of sha1_file_name() has been updated.

I could certainly carry evil merge resolutions going forward (these
changes need to be made to object-store.h that has no mechanical
conflicts), but it may be less distracting for later readers of the
series if we rebase it on top of a more recent master.


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Duy Nguyen
On Wed, Feb 14, 2018 at 2:26 AM, Jonathan Nieder  wrote:
>> Duy suggested that we shall not use the repository blindly, but should 
>> carefully
>> examine whether to pass on an object store or the refstore or such[4], which
>> I agree with if it makes sense. This series unfortunately has an issue with 
>> that
>> as I would not want to pass down the `ignore_env` flag separately from the 
>> object
>> store, so I made all functions that only take the object store to have the 
>> raw
>> object store as the first parameter, and others using the full repository.
>
> I think I want to push back on this a little.
>
> The advantage of a function taking e.g. an object_store as an argument
> instead of a repository is that it increases its flexibility, since it
> allows callers that do not have access to a repository to call it.  The
> disadvantage is also that it increases the flexibility without any
> callers benefitting from that:
>
>  1. It ties us to assumptions from today.  If e.g. an object access in
> the future starts relying on some other information from the
> repository (e.g. its config) then we'd have to either add a
> back-pointer from the object store to its repository or add
> additional arguments for that additional data at that point.
>
> If all callers already have a repository, it is simpler to pass
> that repository as context so that we have the flexibility to make
> more use of it later.

It's essentially putting all global variables in the same place again.
Only this time it's not global namespace, but "struct repository".
It's no worse than the current state though.

>
>  2. It complicates the caller.  Instead of consistently passing the
> same repository argument as context to functions that access that
> repository, the caller would have to pull out relevant fields like
> the object store from it.

Well, I see that as a good point :)

>
>  3. It prevents us from making opportunistic use of other information
> from the repository, such as its name for use in error messages.

It does not exactly prevent us. It's just more effort to pass this
around. The repository name for example, there's no reason we can't
have object store name (which could be initialized the same as repo
name).

> In lower-level funcitons that need to be usable by callers without a
> repository (e.g. to find packfiles in an alternate) it makes sense to
> not pass a repository, but without such a use case in mind

I do agree with your benefit argument. But I'd like to point out
though that having all input to object store visible from something
like "struct raw_object_store" makes it easier to reason about the
code. I know how object store works, but occasionally I'm still
surprised when it getenv (or read $GIT_DIR/index, but not in object
store code) behind the scene. Imagine how hard it is for newcomers.

I would count that as benefit, even though it's not a use case per se.
Another potential benefit is writing unit tests will be much easier
(you can configure object store through struct repository too, but
setting one piece here, one piece there to control object store
behavior is not a nice experience). It's a nice thing to have, but not
a deciding factor.

> I don't think it needs to be a general goal.

My stand is a bit more aggressive. We should try to achieve better
abstraction if possible. But if it makes Stefan's life hell, it's not
worth doing. Converting to 'struct repository' is already a step
forward. Actually if it discourages him from finishing this work, it's
already not worth doing.
-- 
Duy


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Junio C Hamano
Stefan Beller  writes:

> Which is why I'd strongly consider having it only in the repository
> object as that is the largest-scoped thing we'd want. e.g. submodules
> should care about environment variables differently:
>
> GIT_WORK_TREE=~/mysuperproject git checkout \
> --recurse-submodules master
>
>  So with such a command in mind, the environment variable would
> only apply to the superproject and the nested submodules should
> ignore the env, but compute their paths off the superproject, i.e.
> the superproject repository, not its object store?

In the longer term endgame state, what is in the repository object
cannot just be a simple "do we honor environment variable" bit
anymore.  

It will be more like "we may (or may not) look at environment when
we create a repository object", i.e. a bit passed to repo_init()
constructor, and from then on, the repository object knows where the
object store and its alternates are, where the top of the working
tree is, where the repository (i.e. the directory that has "refs/"
in it) is, what "worktree" of the repository we are talking about by
itself.  There is no need for a bit "do we or do we not look at
environment?" that needs to be consulted at runtime, which is quite
a round-about thing.

In your example, the repository object that represents the
superproject will pay attention to GIT_WORK_TREE environment when it
is consturcted, and repository objects dynamically constructed while
the command "recurse-submodules" through them will be constructed
with their "where is the top of my working tree?" set appropriately.
They won't be storing "when figuring out where my working tree is,
do not pay attention to GIT_WORK_TREE environment variable" bit.



Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Stefan Beller
On Tue, Feb 13, 2018 at 11:35 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> For now the ignore_env bit lives in the repository, as that helps
>> when working with submodules, when reading its comments.
>> Unfortunately 359efeffc1 (repository: introduce the repository
>> object, 2017-06-22) did not reason about the existence of the ignore_env
>> flag in its commit message.
>>
>>> So from that point of view, it may not matter where the "bit" lives,
>>> among repository, object-store, or ref-store.
>>
>> It matters on the scale of confusing the developer?
>
> What I meant is that from the point of view, having the bit in the
> data (not on the invocation) is confusing no matter which data
> structure holds it--they are equally bad.

Right.

Which is why I'd strongly consider having it only in the repository
object as that is the largest-scoped thing we'd want. e.g. submodules
should care about environment variables differently:

GIT_WORK_TREE=~/mysuperproject git checkout \
--recurse-submodules master

 So with such a command in mind, the environment variable would
only apply to the superproject and the nested submodules should
ignore the env, but compute their paths off the superproject, i.e.
the superproject repository, not its object store?


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Junio C Hamano
Stefan Beller  writes:

> For now the ignore_env bit lives in the repository, as that helps
> when working with submodules, when reading its comments.
> Unfortunately 359efeffc1 (repository: introduce the repository
> object, 2017-06-22) did not reason about the existence of the ignore_env
> flag in its commit message.
>
>> So from that point of view, it may not matter where the "bit" lives,
>> among repository, object-store, or ref-store.
>
> It matters on the scale of confusing the developer?

What I meant is that from the point of view, having the bit in the
data (not on the invocation) is confusing no matter which data
structure holds it--they are equally bad.


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Brandon Williams
On 02/12, Stefan Beller wrote:
> This is a real take on the first part of the recent RFC[1].
> 
> Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary 
> repositories"
> might be a good breaking point for a first part at that RFC at patch 38.
> This series is smaller and contains only 26 patches as the patches in the big
> RFC were slightly out of order.
> 
> I developed this series partly by writing patches, but mostly by cherrypicking
> from that RFC on top of current master. I noticed no external conflicts apart
> from one addition to the repositories _INIT macro, which was easy to resolve.
> 
> Comments in the early range of that RFC were on 003 where Junio pointed out
> that the coccinelle patch ought to be not in contrib/coccinelle, so I put it
> in a sub directory there, as 'make coccicheck' doesn't traverse subdirs.
> 
> brian had a questoin on patch 25 in the RFC, but that seemed to resolve itself
> without any suggestion to include into this series[3].
> 
> Duy suggested that we shall not use the repository blindly, but should 
> carefully
> examine whether to pass on an object store or the refstore or such[4], which 
> I agree with if it makes sense. This series unfortunately has an issue with 
> that
> as I would not want to pass down the `ignore_env` flag separately from the 
> object
> store, so I made all functions that only take the object store to have the raw
> object store as the first parameter, and others using the full repository.
> 
> Eric Sunshine brought up memory leaks with the RFC, and I would think to
> have plugged all holes.

I've looked through the patches and I think they look good.  At the end
of the series all the #define tricks have been eliminated so we don't
have to worry about them possibly being left and forgotten :)


Thanks for getting the ball rolling on this.

> 
> [1] https://public-inbox.org/git/20180205235508.216277-1-sbel...@google.com/
> [2] 
> https://public-inbox.org/git/20180207143300.ce1c39ca07f6a0d64fe0e...@google.com/
> [3] 
> https://public-inbox.org/git/20180206011940.gd7...@genre.crustytoothpaste.net/
> [4] 
> https://public-inbox.org/git/cacsjy8cggekpx4czkyytsprj87uqvkzsol7fyt__p2dh_1l...@mail.gmail.com/
> 
> Thanks,
> Stefan
> 
> Jonathan Nieder (8):
>   pack: move prepare_packed_git_run_once to object store
>   pack: move approximate object count to object store
>   sha1_file: add repository argument to sha1_file_name
>   sha1_file: add repository argument to map_sha1_file
>   sha1_file: allow stat_sha1_file to handle arbitrary repositories
>   sha1_file: allow open_sha1_file to handle arbitrary repositories
>   sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
>   sha1_file: allow sha1_loose_object_info to handle arbitrary
> repositories
> 
> Stefan Beller (18):
>   repository: introduce raw object store field
>   object-store: move alt_odb_list and alt_odb_tail to object store
>   object-store: free alt_odb_list
>   object-store: move packed_git and packed_git_mru to object store
>   object-store: close all packs upon clearing the object store
>   sha1_file: add raw_object_store argument to alt_odb_usable
>   sha1_file: add repository argument to link_alt_odb_entry
>   sha1_file: add repository argument to read_info_alternates
>   sha1_file: add repository argument to link_alt_odb_entries
>   sha1_file: add repository argument to prepare_alt_odb
>   sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
>   sha1_file: allow prepare_alt_odb to handle arbitrary repositories
>   sha1_file: add repository argument to stat_sha1_file
>   sha1_file: add repository argument to open_sha1_file
>   sha1_file: add repository argument to map_sha1_file_1
>   sha1_file: add repository argument to sha1_loose_object_info
>   sha1_file: allow sha1_file_name to handle arbitrary repositories
>   sha1_file: allow map_sha1_file to handle arbitrary repositories
> 
>  builtin/am.c|   2 +-
>  builtin/clone.c |   2 +-
>  builtin/count-objects.c |   6 +-
>  builtin/fetch.c |   2 +-
>  builtin/fsck.c  |  13 ++-
>  builtin/gc.c|   4 +-
>  builtin/grep.c  |   2 +-
>  builtin/index-pack.c|   1 +
>  builtin/merge.c |   2 +-
>  builtin/pack-objects.c  |  21 ++--
>  builtin/pack-redundant.c|   6 +-
>  builtin/receive-pack.c  |   3 +-
>  cache.h |  46 ++--
>  contrib/coccinelle/refactoring/packed_git.cocci |   7 ++
>  environment.c   |   5 +-
>  fast-import.c   |   6 +-
>  http-backend.c  |   6 +-
>  http-push.c 

Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> This is a real take on the first part of the recent RFC[1].
>
> Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary 
> repositories"
> might be a good breaking point for a first part at that RFC at patch 38.
> This series is smaller and contains only 26 patches as the patches in the big
> RFC were slightly out of order.

Thanks.  This looks like a nice reviewable series, so I'm happy to see
it broken out.

[...]
> Comments in the early range of that RFC were on 003 where Junio pointed out
> that the coccinelle patch ought to be not in contrib/coccinelle, so I put it
> in a sub directory there, as 'make coccicheck' doesn't traverse subdirs.

Can you say a little more about this?  Was the problem that the
semantic patch wasn't idempotent, that it was too slow to run, or
something else?

If we're including the semantic patch for reference but never running
it, I think I'd prefer it to go in the commit message.  But if it's
useful to run then we should make it idempotent so it can go in
contrib/coccinelle.

[...]
> Duy suggested that we shall not use the repository blindly, but should 
> carefully
> examine whether to pass on an object store or the refstore or such[4], which
> I agree with if it makes sense. This series unfortunately has an issue with 
> that
> as I would not want to pass down the `ignore_env` flag separately from the 
> object
> store, so I made all functions that only take the object store to have the raw
> object store as the first parameter, and others using the full repository.

I think I want to push back on this a little.

The advantage of a function taking e.g. an object_store as an argument
instead of a repository is that it increases its flexibility, since it
allows callers that do not have access to a repository to call it.  The
disadvantage is also that it increases the flexibility without any
callers benefitting from that:

 1. It ties us to assumptions from today.  If e.g. an object access in
the future starts relying on some other information from the
repository (e.g. its config) then we'd have to either add a
back-pointer from the object store to its repository or add
additional arguments for that additional data at that point.

If all callers already have a repository, it is simpler to pass
that repository as context so that we have the flexibility to make
more use of it later.

 2. It complicates the caller.  Instead of consistently passing the
same repository argument as context to functions that access that
repository, the caller would have to pull out relevant fields like
the object store from it.

 3. It prevents us from making opportunistic use of other information
from the repository, such as its name for use in error messages.

In lower-level funcitons that need to be usable by callers without a
repository (e.g. to find packfiles in an alternate) it makes sense to
not pass a repository, but without such a use case in mind I don't
think it needs to be a general goal.

To put it another way, most callers do not *care* whether they are
working with a repository's object store, ref database, or some other
aspect of the repository.  They just know they want to e.g. read an
object from this repository.

It's similar to how FILE * works: some operations rely on the buffer
the FILE * manages and some other operations only rely on the
underlying file descriptor, but using the FILE * consistently provides
a clean abstraction that generally makes life easier.

> Eric Sunshine brought up memory leaks with the RFC, and I would think to
> have plugged all holes.

Yay, thank you!

I'll try to find time to look at the patches in detail soon, but no
promises (i.e. if someone else reviews them first, then even better
;-)).

Sincerely,
Jonathan


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Stefan Beller
On Tue, Feb 13, 2018 at 10:57 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Oh, that is an interesting perspective. Here is how I arrived at the opposite
>> conclusion initially: Searching for 'ignore_env' shows that we care about it
>> as well for the index and graft paths, which are not the object store, hence
>> it would be better kept in the repository. (The alternative would be to
>> duplicate it into the raw object store, but I do not like data duplication)
>>
>> But maybe it is better to duplicate this one bit instead of passing through
>> a larger scoped object.
>
> If a larger scoped repository refers to a smaller scoped
> object-store, is there still a need to duplicate that bit, instead
> of referring to the bit the smaller scoped one has when asking about
> the bit in the larger scoped one?

No (in theory). But in practice it may be worthwhile:

"What's the value of this ref?"

"Oh let me check the ignore_env bit that happens
to live in the object store first."

would be super confusing to me.

> I am not sure if these "do not look at environment variables" is an
> attribute of these entities---it sounds more like an attribute for
> each invocation of an operation, i.e. "I want to learn where the
> index is but would ignore GIT_INDEX environment for this particular
> query." and "What's the value of this ref?  Please honor the
> common-dir environment during this query".

That sounds like we want to have a configset struct eventually.

For now the ignore_env bit lives in the repository, as that helps
when working with submodules, when reading its comments.
Unfortunately 359efeffc1 (repository: introduce the repository
object, 2017-06-22) did not reason about the existence of the ignore_env
flag in its commit message.

> So from that point of view, it may not matter where the "bit" lives,
> among repository, object-store, or ref-store.

It matters on the scale of confusing the developer?

Stefan


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Junio C Hamano
Stefan Beller  writes:

> Oh, that is an interesting perspective. Here is how I arrived at the opposite
> conclusion initially: Searching for 'ignore_env' shows that we care about it
> as well for the index and graft paths, which are not the object store, hence
> it would be better kept in the repository. (The alternative would be to
> duplicate it into the raw object store, but I do not like data duplication)
>
> But maybe it is better to duplicate this one bit instead of passing through
> a larger scoped object.

If a larger scoped repository refers to a smaller scoped
object-store, is there still a need to duplicate that bit, instead
of referring to the bit the smaller scoped one has when asking about
the bit in the larger scoped one?

I am not sure if these "do not look at environment variables" is an
attribute of these entities---it sounds more like an attribute for
each invocation of an operation, i.e. "I want to learn where the
index is but would ignore GIT_INDEX environment for this particular
query." and "What's the value of this ref?  Please honor the
common-dir environment during this query".

So from that point of view, it may not matter where the "bit" lives,
among repository, object-store, or ref-store.



Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Stefan Beller
On Tue, Feb 13, 2018 at 4:13 AM, Duy Nguyen  wrote:
> On Tue, Feb 13, 2018 at 6:49 PM, Duy Nguyen  wrote:
>> On Mon, Feb 12, 2018 at 05:22:15PM -0800, Stefan Beller wrote:
>>> This is a real take on the first part of the recent RFC[1].
>>>
>>> ...
>>>
>>> Duy suggested that we shall not use the repository blindly, but
>>> should carefully examine whether to pass on an object store or the
>>> refstore or such[4], which I agree with if it makes sense. This
>>> series unfortunately has an issue with that as I would not want to
>>> pass down the `ignore_env` flag separately from the object store, so
>>> I made all functions that only take the object store to have the raw
>>> object store as the first parameter, and others using the full
>>> repository.
>>
>> Second proposal :) How about you store ignore_env in raw_object_store?
>> This would not be the first time an object has some configuration
>> passed in at construction time. And it has a "constructor" now,
>> raw_object_store_init() (I probably should merge _setup in it too)
>
> A bit more on this configuration parameters. Down the road I think we
> need something like this anyway to delete global config vars like
> packed_git_window_size, delta_base_cache_limit...  Either all these
> end up in raw_object_store, or raw_object_store holds a link to
> "struct config_set".

That makes sense long term.

> The ignore_env specifically though looks to me like a stop gap
> solution until everything goes through repo_init() first. At that
> point we don't have to delay getenv() anymore. We can getenv() all at
> repo_init() then pass them in raw_object_store and ignore_env should
> be gone. So sticking it inside raw_object_store _temporarily_ does not
> sound so bad.

Oh, that is an interesting perspective. Here is how I arrived at the opposite
conclusion initially: Searching for 'ignore_env' shows that we care about it
as well for the index and graft paths, which are not the object store, hence
it would be better kept in the repository. (The alternative would be to
duplicate it into the raw object store, but I do not like data duplication)

But maybe it is better to duplicate this one bit instead of passing through
a larger scoped object.

I'll rework the patches.

Thanks!
Stefan


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Brandon Williams
On 02/13, Duy Nguyen wrote:
> On Tue, Feb 13, 2018 at 6:49 PM, Duy Nguyen  wrote:
> > On Mon, Feb 12, 2018 at 05:22:15PM -0800, Stefan Beller wrote:
> >> This is a real take on the first part of the recent RFC[1].
> >>
> >> ...
> >>
> >> Duy suggested that we shall not use the repository blindly, but
> >> should carefully examine whether to pass on an object store or the
> >> refstore or such[4], which I agree with if it makes sense. This
> >> series unfortunately has an issue with that as I would not want to
> >> pass down the `ignore_env` flag separately from the object store, so
> >> I made all functions that only take the object store to have the raw
> >> object store as the first parameter, and others using the full
> >> repository.
> >
> > Second proposal :) How about you store ignore_env in raw_object_store?
> > This would not be the first time an object has some configuration
> > passed in at construction time. And it has a "constructor" now,
> > raw_object_store_init() (I probably should merge _setup in it too)
> 
> A bit more on this configuration parameters. Down the road I think we
> need something like this anyway to delete global config vars like
> packed_git_window_size, delta_base_cache_limit...  Either all these
> end up in raw_object_store, or raw_object_store holds a link to
> "struct config_set".
> 
> The ignore_env specifically though looks to me like a stop gap
> solution until everything goes through repo_init() first. At that
> point we don't have to delay getenv() anymore. We can getenv() all at
> repo_init() then pass them in raw_object_store and ignore_env should
> be gone. So sticking it inside raw_object_store _temporarily_ does not
> sound so bad.

I like this approach, I mean at the moment we are replicating a single
bit of data but that allows us to be able to limit the scope of where a
repository struct is passed, giving us a better abstraction layer.

> -- 
> Duy

-- 
Brandon Williams


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Duy Nguyen
On Tue, Feb 13, 2018 at 6:49 PM, Duy Nguyen  wrote:
> On Mon, Feb 12, 2018 at 05:22:15PM -0800, Stefan Beller wrote:
>> This is a real take on the first part of the recent RFC[1].
>>
>> ...
>>
>> Duy suggested that we shall not use the repository blindly, but
>> should carefully examine whether to pass on an object store or the
>> refstore or such[4], which I agree with if it makes sense. This
>> series unfortunately has an issue with that as I would not want to
>> pass down the `ignore_env` flag separately from the object store, so
>> I made all functions that only take the object store to have the raw
>> object store as the first parameter, and others using the full
>> repository.
>
> Second proposal :) How about you store ignore_env in raw_object_store?
> This would not be the first time an object has some configuration
> passed in at construction time. And it has a "constructor" now,
> raw_object_store_init() (I probably should merge _setup in it too)

A bit more on this configuration parameters. Down the road I think we
need something like this anyway to delete global config vars like
packed_git_window_size, delta_base_cache_limit...  Either all these
end up in raw_object_store, or raw_object_store holds a link to
"struct config_set".

The ignore_env specifically though looks to me like a stop gap
solution until everything goes through repo_init() first. At that
point we don't have to delay getenv() anymore. We can getenv() all at
repo_init() then pass them in raw_object_store and ignore_env should
be gone. So sticking it inside raw_object_store _temporarily_ does not
sound so bad.
-- 
Duy


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-13 Thread Duy Nguyen
On Mon, Feb 12, 2018 at 05:22:15PM -0800, Stefan Beller wrote:
> This is a real take on the first part of the recent RFC[1].
>
> ...
> 
> Duy suggested that we shall not use the repository blindly, but
> should carefully examine whether to pass on an object store or the
> refstore or such[4], which I agree with if it makes sense. This
> series unfortunately has an issue with that as I would not want to
> pass down the `ignore_env` flag separately from the object store, so
> I made all functions that only take the object store to have the raw
> object store as the first parameter, and others using the full
> repository.

Second proposal :) How about you store ignore_env in raw_object_store?
This would not be the first time an object has some configuration
passed in at construction time. And it has a "constructor" now,
raw_object_store_init() (I probably should merge _setup in it too)

The core changes look like this. I have a full commit on top of your
series [1] that keeps sha1_file.c functions take 'struct
raw_object_store' instead.

[1] https://github.com/pclouds/git/tree/object-store-part1

-- 8< --
diff --git a/object-store.h b/object-store.h
index f164c4e5c9..a8bf1738f2 100644
--- a/object-store.h
+++ b/object-store.h
@@ -34,9 +34,13 @@ struct raw_object_store {
 * packs.
 */
unsigned packed_git_initialized : 1;
+
+   unsigned ignore_env : 1;
 };
 #define RAW_OBJECT_STORE_INIT { NULL, NULL, MRU_INIT, NULL, NULL, 0, 0, 0 }
 
+void raw_object_store_init(struct raw_object_store *ros, struct repository *r);
+void raw_object_store_setup(struct raw_object_store *ros, char *);
 void raw_object_store_clear(struct raw_object_store *o);
 
 struct packed_git {
diff --git a/object.c b/object.c
index 79c2c447bc..a4534bf4c4 100644
--- a/object.c
+++ b/object.c
@@ -461,6 +461,20 @@ static void free_alt_odbs(struct raw_object_store *o)
}
 }
 
+void raw_object_store_init(struct raw_object_store *o,
+  struct repository *r)
+{
+   /* FIXME: memset? */
+   o->ignore_env = r->ignore_env;
+}
+
+void raw_object_store_setup(struct raw_object_store *o,
+   char *objectdir)
+{
+   free(o->objectdir);
+   o->objectdir = objectdir;
+}
+
 void raw_object_store_clear(struct raw_object_store *o)
 {
free(o->objectdir);
diff --git a/repository.c b/repository.c
index bd2ad578de..ac5863d7e1 100644
--- a/repository.c
+++ b/repository.c
@@ -49,9 +49,9 @@ static void repo_setup_env(struct repository *repo)
!repo->ignore_env);
free(repo->commondir);
repo->commondir = strbuf_detach(, NULL);
-   free(repo->objects.objectdir);
-   repo->objects.objectdir = git_path_from_env(DB_ENVIRONMENT, 
repo->commondir,
-   "objects", !repo->ignore_env);
+   raw_object_store_setup(>objects,
+  git_path_from_env(DB_ENVIRONMENT, 
repo->commondir,
+"objects", !repo->ignore_env));
free(repo->graft_file);
repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
 "info/grafts", !repo->ignore_env);
@@ -142,6 +142,8 @@ int repo_init(struct repository *repo, const char *gitdir, 
const char *worktree)
 
repo->ignore_env = 1;
 
+   raw_object_store_init(>objects, repo);
+
if (repo_init_gitdir(repo, gitdir))
goto error;
 
diff --git a/sha1_file.c b/sha1_file.c
index c3f35914ce..3be58651a1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -677,14 +677,14 @@ int foreach_alt_odb(alt_odb_fn fn, void *cb)
return r;
 }
 
-void prepare_alt_odb(struct repository *r)
+void prepare_alt_odb(struct raw_object_store *ros)
 {
if (r->objects.alt_odb_tail)
return;
 
r->objects.alt_odb_tail = >objects.alt_odb_list;
 
-   if (!r->ignore_env) {
+   if (!ros->ignore_env) {
const char *alt = getenv(ALTERNATE_DB_ENVIRONMENT);
if (!alt)
alt = "";
-- 8< --


--
Duy


Re: [PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-12 Thread Stefan Beller
On Mon, Feb 12, 2018 at 5:22 PM, Stefan Beller  wrote:

> I developed this series [...] on top of current master.

also available at https://github.com/stefanbeller/git/tree/object-store-part1


[PATCH 00/26] Moving global state into the repository object (part 1)

2018-02-12 Thread Stefan Beller
This is a real take on the first part of the recent RFC[1].

Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary 
repositories"
might be a good breaking point for a first part at that RFC at patch 38.
This series is smaller and contains only 26 patches as the patches in the big
RFC were slightly out of order.

I developed this series partly by writing patches, but mostly by cherrypicking
from that RFC on top of current master. I noticed no external conflicts apart
from one addition to the repositories _INIT macro, which was easy to resolve.

Comments in the early range of that RFC were on 003 where Junio pointed out
that the coccinelle patch ought to be not in contrib/coccinelle, so I put it
in a sub directory there, as 'make coccicheck' doesn't traverse subdirs.

brian had a questoin on patch 25 in the RFC, but that seemed to resolve itself
without any suggestion to include into this series[3].

Duy suggested that we shall not use the repository blindly, but should carefully
examine whether to pass on an object store or the refstore or such[4], which 
I agree with if it makes sense. This series unfortunately has an issue with that
as I would not want to pass down the `ignore_env` flag separately from the 
object
store, so I made all functions that only take the object store to have the raw
object store as the first parameter, and others using the full repository.

Eric Sunshine brought up memory leaks with the RFC, and I would think to
have plugged all holes.

[1] https://public-inbox.org/git/20180205235508.216277-1-sbel...@google.com/
[2] 
https://public-inbox.org/git/20180207143300.ce1c39ca07f6a0d64fe0e...@google.com/
[3] 
https://public-inbox.org/git/20180206011940.gd7...@genre.crustytoothpaste.net/
[4] 
https://public-inbox.org/git/cacsjy8cggekpx4czkyytsprj87uqvkzsol7fyt__p2dh_1l...@mail.gmail.com/

Thanks,
Stefan

Jonathan Nieder (8):
  pack: move prepare_packed_git_run_once to object store
  pack: move approximate object count to object store
  sha1_file: add repository argument to sha1_file_name
  sha1_file: add repository argument to map_sha1_file
  sha1_file: allow stat_sha1_file to handle arbitrary repositories
  sha1_file: allow open_sha1_file to handle arbitrary repositories
  sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
  sha1_file: allow sha1_loose_object_info to handle arbitrary
repositories

Stefan Beller (18):
  repository: introduce raw object store field
  object-store: move alt_odb_list and alt_odb_tail to object store
  object-store: free alt_odb_list
  object-store: move packed_git and packed_git_mru to object store
  object-store: close all packs upon clearing the object store
  sha1_file: add raw_object_store argument to alt_odb_usable
  sha1_file: add repository argument to link_alt_odb_entry
  sha1_file: add repository argument to read_info_alternates
  sha1_file: add repository argument to link_alt_odb_entries
  sha1_file: add repository argument to prepare_alt_odb
  sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
  sha1_file: allow prepare_alt_odb to handle arbitrary repositories
  sha1_file: add repository argument to stat_sha1_file
  sha1_file: add repository argument to open_sha1_file
  sha1_file: add repository argument to map_sha1_file_1
  sha1_file: add repository argument to sha1_loose_object_info
  sha1_file: allow sha1_file_name to handle arbitrary repositories
  sha1_file: allow map_sha1_file to handle arbitrary repositories

 builtin/am.c|   2 +-
 builtin/clone.c |   2 +-
 builtin/count-objects.c |   6 +-
 builtin/fetch.c |   2 +-
 builtin/fsck.c  |  13 ++-
 builtin/gc.c|   4 +-
 builtin/grep.c  |   2 +-
 builtin/index-pack.c|   1 +
 builtin/merge.c |   2 +-
 builtin/pack-objects.c  |  21 ++--
 builtin/pack-redundant.c|   6 +-
 builtin/receive-pack.c  |   3 +-
 cache.h |  46 ++--
 contrib/coccinelle/refactoring/packed_git.cocci |   7 ++
 environment.c   |   5 +-
 fast-import.c   |   6 +-
 http-backend.c  |   6 +-
 http-push.c |   1 +
 http-walker.c   |   4 +-
 http.c  |   6 +-
 mru.h   |   1 +
 object-store.h  |  75 +
 object.c|  26 +
 pack-bitmap.c   |   4 +-
 pack-check.c|   1 +
 pack-revindex.c