Re: [PATCH 5/9] handle alternates paths the same as the main object dir

2018-11-12 Thread Derrick Stolee




On 11/12/2018 10:46 AM, Jeff King wrote:

On Mon, Nov 12, 2018 at 10:38:28AM -0500, Derrick Stolee wrote:


We could go either direction here, but this patch moves the alternates
struct over to the main directory style (rather than vice-versa).
Technically the alternates style is more efficient, as it avoids
rewriting the object directory name on each call. But this is unlikely
to matter in practice, as we avoid reallocations either way (and nobody
has ever noticed or complained that the main object directory is copying
a few extra bytes before making a much more expensive system call).

Hm. I've complained in the past [1] about a simple method like strbuf_addf()
over loose objects, but that was during abbreviation checks so we were
adding the string for every loose object but not actually reading the
objects.

[1]
https://public-inbox.org/git/20171201174956.143245-1-dsto...@microsoft.com/

I suspect that had more to do with the cost of snprintf() than the extra
bytes being copied. And here we'd still be using addstr and addch
exclusively. I'm open to numeric arguments to the contrary, though. :)


I agree. I don't think it is worth investigating now, as the performance 
difference should be moot. I am making a mental note to take a look here 
if I notice a performance regression later. ;)



There's actually a lot of low-hanging fruit there for pre-sizing, too.
E.g., fill_sha1_path() calls strbuf_addch() in a loop, but it could
quite easily grow the 41 bytes it needs ahead of time. I wouldn't want
to change that without finding a measurable improvement, though. It
might not be a big deal due to fec501dae8 (strbuf_addch: avoid calling
strbuf_grow, 2015-04-16).

-Peff




Re: [PATCH 5/9] handle alternates paths the same as the main object dir

2018-11-12 Thread Jeff King
On Mon, Nov 12, 2018 at 10:38:28AM -0500, Derrick Stolee wrote:

> > We could go either direction here, but this patch moves the alternates
> > struct over to the main directory style (rather than vice-versa).
> > Technically the alternates style is more efficient, as it avoids
> > rewriting the object directory name on each call. But this is unlikely
> > to matter in practice, as we avoid reallocations either way (and nobody
> > has ever noticed or complained that the main object directory is copying
> > a few extra bytes before making a much more expensive system call).
> 
> Hm. I've complained in the past [1] about a simple method like strbuf_addf()
> over loose objects, but that was during abbreviation checks so we were
> adding the string for every loose object but not actually reading the
> objects.
> 
> [1]
> https://public-inbox.org/git/20171201174956.143245-1-dsto...@microsoft.com/

I suspect that had more to do with the cost of snprintf() than the extra
bytes being copied. And here we'd still be using addstr and addch
exclusively. I'm open to numeric arguments to the contrary, though. :)

There's actually a lot of low-hanging fruit there for pre-sizing, too.
E.g., fill_sha1_path() calls strbuf_addch() in a loop, but it could
quite easily grow the 41 bytes it needs ahead of time. I wouldn't want
to change that without finding a measurable improvement, though. It
might not be a big deal due to fec501dae8 (strbuf_addch: avoid calling
strbuf_grow, 2015-04-16).

-Peff


Re: [PATCH 5/9] handle alternates paths the same as the main object dir

2018-11-12 Thread Derrick Stolee

On 11/12/2018 9:49 AM, Jeff King wrote:

When we generate loose file paths for the main object directory, the
caller provides a buffer to loose_object_path (formerly sha1_file_name).
The callers generally keep their own static buffer to avoid excessive
reallocations.

But for alternate directories, each struct carries its own scratch
buffer. This is needlessly different; let's unify them.

We could go either direction here, but this patch moves the alternates
struct over to the main directory style (rather than vice-versa).
Technically the alternates style is more efficient, as it avoids
rewriting the object directory name on each call. But this is unlikely
to matter in practice, as we avoid reallocations either way (and nobody
has ever noticed or complained that the main object directory is copying
a few extra bytes before making a much more expensive system call).


Hm. I've complained in the past [1] about a simple method like 
strbuf_addf() over loose objects, but that was during abbreviation 
checks so we were adding the string for every loose object but not 
actually reading the objects.


[1] 
https://public-inbox.org/git/20171201174956.143245-1-dsto...@microsoft.com/


The other concern I have is for alternates that may have long-ish paths 
to their object directories.


So, this is worth keeping an eye on, but is likely to be fine.


And this has the advantage that the reusable buffers are tied to
particular calls, which makes the invalidation rules simpler (for
example, the return value from stat_sha1_file() used to be invalidated
by basically any other object call, but now it is affected only by other
calls to stat_sha1_file()).

We do steal the trick from alt_sha1_path() of returning a pointer to the
filled buffer, which makes a few conversions more convenient.

Signed-off-by: Jeff King 
---
  object-store.h | 14 +-
  object.c   |  1 -
  sha1-file.c| 44 
  sha1-name.c|  8 ++--
  4 files changed, 23 insertions(+), 44 deletions(-)

diff --git a/object-store.h b/object-store.h
index fefa17e380..b2fa0d0df0 100644
--- a/object-store.h
+++ b/object-store.h
@@ -10,10 +10,6 @@
  struct object_directory {
struct object_directory *next;
  
-	/* see alt_scratch_buf() */

-   struct strbuf scratch;
-   size_t base_len;
-
/*
 * Used to store the results of readdir(3) calls when searching
 * for unique abbreviated hashes.  This cache is never
@@ -54,14 +50,6 @@ void add_to_alternates_file(const char *dir);
   */
  void add_to_alternates_memory(const char *dir);
  
-/*

- * Returns a scratch strbuf pre-filled with the alternate object directory,
- * including a trailing slash, which can be used to access paths in the
- * alternate. Always use this over direct access to alt->scratch, as it
- * cleans up any previous use of the scratch buffer.
- */
-struct strbuf *alt_scratch_buf(struct object_directory *odb);
-
  struct packed_git {
struct packed_git *next;
struct list_head mru;
@@ -157,7 +145,7 @@ void raw_object_store_clear(struct raw_object_store *o);
   * Put in `buf` the name of the file in the local object database that
   * would be used to store a loose object with the specified sha1.
   */
-void loose_object_path(struct repository *r, struct strbuf *buf, const 
unsigned char *sha1);
+const char *loose_object_path(struct repository *r, struct strbuf *buf, const 
unsigned char *sha1);
  
  void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size);
  
diff --git a/object.c b/object.c

index 6af8e908bb..dd485ac629 100644
--- a/object.c
+++ b/object.c
@@ -484,7 +484,6 @@ struct raw_object_store *raw_object_store_new(void)
  
  static void free_alt_odb(struct object_directory *odb)

  {
-   strbuf_release(>scratch);
oid_array_clear(>loose_objects_cache);
free(odb);
  }
diff --git a/sha1-file.c b/sha1-file.c
index 478eac326b..15db6b61a9 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -346,27 +346,20 @@ static void fill_sha1_path(struct strbuf *buf, const 
unsigned char *sha1)
}
  }
  
-void loose_object_path(struct repository *r, struct strbuf *buf,

-  const unsigned char *sha1)
+static const char *odb_loose_path(const char *path, struct strbuf *buf,
+ const unsigned char *sha1)
  {
strbuf_reset(buf);
-   strbuf_addstr(buf, r->objects->objectdir);
+   strbuf_addstr(buf, path);
strbuf_addch(buf, '/');
fill_sha1_path(buf, sha1);
+   return buf->buf;
  }
  
-struct strbuf *alt_scratch_buf(struct object_directory *odb)

+const char *loose_object_path(struct repository *r, struct strbuf *buf,
+ const unsigned char *sha1)
  {
-   strbuf_setlen(>scratch, odb->base_len);
-   return >scratch;
-}
-
-static const char *alt_sha1_path(struct object_directory *odb,
-const unsigned