Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-20 Thread Ben Peart



On 11/20/2017 6:51 PM, Ramsay Jones wrote:



On 20/11/17 14:01, Ben Peart wrote:

Further testing has revealed that switching from the regular heap to a 
refactored version of the mem_pool in fast-import.c produces similar gains as 
parallelizing do_index_load().  This appears to be a much simpler patch for 
similar gains so we will be pursuing that path.

Combining the two patches resulted in a further 25% reduction of 
do_index_load() but with index load times getting smaller, that only resulted 
in a 3% improvement in total git command time.  Given the greater complexity of 
this patch, we've decided to put it on hold for now.  We'll keep it in our back 
pocket in case we need it in the future.


Since you are withdrawing the 'bp/fastindex' branch, I won't send the
patch to fix up the sparse warnings (unless you would like to see it
before you put it in your back pocket). ;-)

ATB,
Ramsay Jones




I'm happy to take your patches as I am keeping a copy just in case we 
need to revisit it.


I'm actually interested to see how Peff's suggestion to use tcmalloc 
helps because nearly 90% of the remaining execution time was spent 
waiting for a lock in malloc due to all contention from the multi-threading.


Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-20 Thread Ramsay Jones


On 20/11/17 14:01, Ben Peart wrote:
> Further testing has revealed that switching from the regular heap to a 
> refactored version of the mem_pool in fast-import.c produces similar gains as 
> parallelizing do_index_load().  This appears to be a much simpler patch for 
> similar gains so we will be pursuing that path.
> 
> Combining the two patches resulted in a further 25% reduction of 
> do_index_load() but with index load times getting smaller, that only resulted 
> in a 3% improvement in total git command time.  Given the greater complexity 
> of this patch, we've decided to put it on hold for now.  We'll keep it in our 
> back pocket in case we need it in the future.

Since you are withdrawing the 'bp/fastindex' branch, I won't send the
patch to fix up the sparse warnings (unless you would like to see it
before you put it in your back pocket). ;-)

ATB,
Ramsay Jones




Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-20 Thread Jeff King
On Mon, Nov 20, 2017 at 09:20:35AM -0500, Jeff King wrote:

> Out of curiosity, have you tried experimenting with any high-performance
> 3rd-party allocator libraries? I've often wondered if we could get a
> performance improvement from dropping in a new allocator, but was never
> able to measure any real benefit over glibc's ptmalloc2. The situation
> might be different on Windows, though (i.e., if the libc allocator isn't
> that great).

Just linking with tcmalloc, like:

diff --git a/Makefile b/Makefile
index ee9d5eb11e..4f299cd914 100644
--- a/Makefile
+++ b/Makefile
@@ -1018,7 +1018,7 @@ BUILTIN_OBJS += builtin/worktree.o
 BUILTIN_OBJS += builtin/write-tree.o
 
 GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
-EXTLIBS =
+EXTLIBS = -ltcmalloc
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
 

seems to consistently show about 30% speedup on p0002:

Test  HEAD^ HEAD
  
--
0002.1: read_cache/discard_cache 1000 times   0.19(0.18+0.01)   0.13(0.12+0.01) 
-31.6%

I don't think we really even need a patch for it. You should be able to
just build with:

  make EXT_LIBS=-ltcmalloc

I can't think of any real advantage to a Makefile knob except
discoverability.

-Peff


Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-20 Thread Jeff King
On Mon, Nov 20, 2017 at 09:01:45AM -0500, Ben Peart wrote:

> Further testing has revealed that switching from the regular heap to a
> refactored version of the mem_pool in fast-import.c produces similar gains
> as parallelizing do_index_load().  This appears to be a much simpler patch
> for similar gains so we will be pursuing that path.

That sounds like a pretty easy win for index entries, which tend to
stick around in big clumps.

Out of curiosity, have you tried experimenting with any high-performance
3rd-party allocator libraries? I've often wondered if we could get a
performance improvement from dropping in a new allocator, but was never
able to measure any real benefit over glibc's ptmalloc2. The situation
might be different on Windows, though (i.e., if the libc allocator isn't
that great).

Most of the high-performance allocators are focused on concurrency,
which usually isn't a big deal for git. But tcmalloc, at least, claims
to be about 6x faster than glibc.

The reason I ask is that we could possibly get the same wins without
writing a single line of code. And it could apply across the whole
code-base, not just the index code. I don't know how close a general
purpose allocator could come to a pooled implementation, though. You're
inherently making a tradeoff with a pool in not being able to free
individual entries.

-Peff


Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-20 Thread Ben Peart
Further testing has revealed that switching from the regular heap to a 
refactored version of the mem_pool in fast-import.c produces similar 
gains as parallelizing do_index_load().  This appears to be a much 
simpler patch for similar gains so we will be pursuing that path.


Combining the two patches resulted in a further 25% reduction of 
do_index_load() but with index load times getting smaller, that only 
resulted in a 3% improvement in total git command time.  Given the 
greater complexity of this patch, we've decided to put it on hold for 
now.  We'll keep it in our back pocket in case we need it in the future.




Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-14 Thread Junio C Hamano
Ben Peart  writes:

> OK.  I'll call this new extension "EOIE" ("end of index
> entries"). Other than the standard header/footer, it will only contain
> a 32 bit offset to the beginning of the extension entries.  I'll
> always write out that extension unless you would prefer it be behind a
> setting (if so, what do you want it called and what should the default
> be)?  I won't add support in update-index for this extension.

To make it robust, if I were doing this, I would at least add a checksum
of some sort.  

As each extension section consists of 4-byte extension type, 4-byte
size, followed by that many bytes of the "meat" of the section, what
I had in mind when I suggested this backpointer was something like:

"EOIE" <32-bit size> <32-bit offset> <20-byte hash>

where the size of the extension section is obviously 24-byte to
cover the offset plus hash, and the hash is computed over extension
types and their sizes (but not their contents---this is not about
protecting against file corruption and not worth wasting the cycles
for hashing) for all the extension sections this index file has
(except for "EOIE" at the end, for obvious reasons).  E.g. if we
have "TREE" extension that is N-bytes long, "REUC" extension that is
M-bytes long, followed by "EOIE", then the hash would be

SHA-1("TREE" +  +
  "REUC" + )

Then the reader would

 - Seek back 32-byte from the trailer to ensure it sees "EOIE"
   followed by a correct size (24?)

 - Jump to the offset and find 4-bytes that presumably is the type
   of the first extension, followed by its size.  

 - Feed these 8-bytes to the hash, skip that section based on its
   size (while making sure we won't run off the end of the file,
   which is a sign that we thought EOIE exists when there wasn't).
   Repeat this until we hit where we found "EOIE" (or we notice our
   mistake by overrunning it).

 - Check the hash to make sure we got it right.

> Since the goal was to find a way to load the IEOT extension before the
> cache entries, I'll also refactor the extension reading loop into a
> function that takes a function pointer and add a
> preread_index_extension() function that can be passed in when that
> loop is run before the cache entries are loaded.  When the loop is run
> again after the cache entries are loaded, it will pass in the existing
> read_index_extension() function.  Extensions can then choose which
> function they want to be loaded in.
>
> The code to read/write/use the IEOT to parallelize the cache entry
> loading will stay behind a config setting that defaults to false (at
> least for now).  I'll stick with "core.fastindex" until someone can
> (please) propose a better name.

Sounds good.


Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-14 Thread Ben Peart



On 11/14/2017 8:12 PM, Junio C Hamano wrote:

Ben Peart  writes:


I have no thoughts or plans for changes in the future of IEOT (which I
plan to rename ITOC).  At this point in time, I can't even imagine
what else we'd want as the index only contains cache entries, ...


Yeah, but the thing is that this is open source, and the imagination
of the originator of the initial idea does not limit how the code
and data structure evolves.

Back when I added the index extensions to the system, I didn't have
perfect foresight, and I didn't have specific plans to add anything
beyond "TREE" to optimize "write-tree" and "diff-index --cached".

In hindsight, I got one thing right and one thing wrong.

Even though I didn't have any plan to add a mandatory extension, I
made the code to ignore optional ones and error out on mandatory
ones if an index extension section is not understood.  It turns out
that later (in fact much later---the "TREE" extension dates back to
April 2006, while "link" came in June 2014) we could add the
split-index mode without having to worry about older versions of Git
doing random wrong things when they see this new extension, thanks
to that design decision.  That went well.

On the other hand, I didn't think things through to realize that
some operations may want to peek only the extensions without ever
using the main table, that some other operations may want to read
some extensions first before reading the main table, or more
importantly, that these limitations would start mattering once Git
becomes popular enough and starts housing a project tree with very
many paths in the main table.

I really wish somebody had brought it up as a potential issue during
the review---I would have defined the extended index format to have
the simple extension at the end that points back to the tail end of
the main table back then, and we wouldn't be having this back and
forth now.  But I was just too happy and excited that I have found a
room to squeeze extension sections into the index file format
without breaking existing implementations of Git (which checked the
trailer checksum matches to the result of hashing the whole thing,
and read the recorded number of entries from the main table, without
even noticing that there is a gap in between), and that excitement
blinded me.


I understand the risk but the list of offsets into the cache entries
is pretty simple as well. I prefer the simplicity of a single TOC
extension that gives us random access to the entire index rather than
having to piece one together using multiple extensions.  That model
has its own set of risks and tradeoffs.


I thought that you are not using the "where does the series of
extensions begin" information in the first place, no?  That piece of
information is useful independent of the usefulness of "index into
the main table to list entries where the prefix-compression is
reset".  So if anything, I'd prefer the simplicity of introducing
that "where does the series of extensions begin" that does not say
anything else, and build on other things like ITOC as mere users of
the mechanism.



OK.  I'll call this new extension "EOIE" ("end of index entries"). Other 
than the standard header/footer, it will only contain a 32 bit offset to 
the beginning of the extension entries.  I'll always write out that 
extension unless you would prefer it be behind a setting (if so, what do 
you want it called and what should the default be)?  I won't add support 
in update-index for this extension.


Since the goal was to find a way to load the IEOT extension before the 
cache entries, I'll also refactor the extension reading loop into a 
function that takes a function pointer and add a 
preread_index_extension() function that can be passed in when that loop 
is run before the cache entries are loaded.  When the loop is run again 
after the cache entries are loaded, it will pass in the existing 
read_index_extension() function.  Extensions can then choose which 
function they want to be loaded in.


The code to read/write/use the IEOT to parallelize the cache entry 
loading will stay behind a config setting that defaults to false (at 
least for now).  I'll stick with "core.fastindex" until someone can 
(please) propose a better name.


Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-14 Thread Ben Peart



On 11/14/2017 10:04 AM, Junio C Hamano wrote:

Ben Peart  writes:


How about I add the logic to write out a special extension right
before the SHA1 that contains an offset to the beginning of the
extensions section.  I will also add the logic in do_read_index() to
search for and load this special extension if it exists.

This will provide a common framework for any future extension to take
advantage of if it wants to be loaded/processed before or in parallel
with the cache entries or other extensions.

For all existing extensions that assume they are loaded _after_ the
cache entries, in do_read_index() I'll add the logic to use the offset
(if it exists) to adjust the src_offset and then load them normally.

Given the IEOT extension is just another list of offsets into the
index to enable out of order processing, I'll add those offsets into
the same extension so that it is a more generic "table of contents"
for the entire index.  This enables us to have common/reusable way to
have random access to _all_ sections in the index while maintaining
backwards comparability with the existing index formats and code.

These additional offsets will initially only be used to parallelize
the loading of cache entries and only if the user explicitly enables
that option but I can think of other interesting uses for them in the
future.


If we freeze the format of IEOT extension so that we can guarantee
that the very first version of Git that understands IEOT can always
find the beginning of extension section in an index file that was
written by future versions of Git, then I'm all for that plan, but
my impression was that you are planning to make incompatible change
in the future to IEOT, judging from the way that IEOT records its own
version number in the section and the reader uses it to reject an
unknown one.


I have no thoughts or plans for changes in the future of IEOT (which I 
plan to rename ITOC).  At this point in time, I can't even imagine what 
else we'd want as the index only contains cache entries, extensions and 
the trailing SHA1 and with the TOC we have random access to each of 
those.  If nothing else, it gives another signature to verify to ensure 
we actually have a valid trailing extension. :)


I only added the version because I've learned my ability to predict the 
future is pretty bad and it leaves us that option _if_ something ever 
comes up that we need it _and_ are willing to live with the significant 
negative trade-offs you correctly point out.




With that plan, what I suspect would happen is that a version of Git
that understands another optional extension section that wants to be
findable without scanning the main table and the then-current
version of IEOT would not be able to use an index file written by a
new version of Git that enhances the format of the IEOT extension
bumps its extension version.

And if that is the case I would have to say that I strongly suspect
that you would regret the design decision to mix it into IEOT.  That
is why I keep suggesting that the back pointer extension should be
on its own, minimizing what it does and minimizing the need to be
updated across versions of Git.



I understand the risk but the list of offsets into the cache entries is 
pretty simple as well. I prefer the simplicity of a single TOC extension 
that gives us random access to the entire index rather than having to 
piece one together using multiple extensions.  That model has its own 
set of risks and tradeoffs.


Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-14 Thread Junio C Hamano
Ben Peart  writes:

> How about I add the logic to write out a special extension right
> before the SHA1 that contains an offset to the beginning of the
> extensions section.  I will also add the logic in do_read_index() to
> search for and load this special extension if it exists.
>
> This will provide a common framework for any future extension to take
> advantage of if it wants to be loaded/processed before or in parallel
> with the cache entries or other extensions.
>
> For all existing extensions that assume they are loaded _after_ the
> cache entries, in do_read_index() I'll add the logic to use the offset
> (if it exists) to adjust the src_offset and then load them normally.
>
> Given the IEOT extension is just another list of offsets into the
> index to enable out of order processing, I'll add those offsets into
> the same extension so that it is a more generic "table of contents"
> for the entire index.  This enables us to have common/reusable way to
> have random access to _all_ sections in the index while maintaining
> backwards comparability with the existing index formats and code.
>
> These additional offsets will initially only be used to parallelize
> the loading of cache entries and only if the user explicitly enables
> that option but I can think of other interesting uses for them in the
> future.

If we freeze the format of IEOT extension so that we can guarantee
that the very first version of Git that understands IEOT can always
find the beginning of extension section in an index file that was
written by future versions of Git, then I'm all for that plan, but
my impression was that you are planning to make incompatible change
in the future to IEOT, judging from the waythat IEOT records its own
version number in the section and the reader uses it to reject an
unknown one.

With that plan, what I suspect would happen is that a version of Git
that understands another optional extension section that wants to be
findable without scanning the main table and the then-current
version of IEOT would not be able to use an index file written by a
new version of Git that enhances the format of the IEOT extension
bumps its extension version.

And if that is the case I would have to say that I strongly suspect
that you would regret the design decision to mix it into IEOT.  That
is why I keep suggesting that the back pointer extension should be
on its own, minimizing what it does and minimizing the need to be
updated across versions of Git.




Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-14 Thread Ben Peart



On 11/13/2017 8:10 PM, Junio C Hamano wrote:

Ben Peart  writes:


The proposed format for extensions (ie having both a header and a
footer with name and size) in the current patch already enables having
multiple extensions that can be parsed forward or backward.  Any
extensions that would want to be parse-able in reverse would just all
need to be written one after the other after right before the trailing
SHA1 (and of course, include the proper footer).


In other words, anything that wants to be scannable from the tail is
welcome to reimplement the same validation structure used by IEOT to
check the section specific sanity constraint, and this series is not
interested in introducing a more general infrastructure to make it
easy for code that want to find where each extension section in the
file begins without pasing the body of the index.

I find it somewhat unsatisfactory in that it is a fundamental change
to allow jumping to the start of an extension section from the tail
that can benefit any future codepath, and have expected a feature
neutral extension whose sole purpose is to do so [*1*].

That way, extension sections whose names are all-caps can stay to be
optional, even if they allow locating from the tail of the file.  If
you require them to implement the same validation struture as IEOT
to perform section specific sanity constraint and also require them
to be placed consecutively at the end, the reader MUST know about
all such extensions---otherwise they cannot scan backwards and find
ones that appear before an unknown but optional one.  If you keep an
extension section at the end whose sole purpose is to point at the
beginning of extension sections, the reader can then scan forward as
usual, skipping over unknown but optional ones, and reading your
IEOT can merely be an user (and the first user) of that more generic
feature that is futureproof, no?




How about I add the logic to write out a special extension right before 
the SHA1 that contains an offset to the beginning of the extensions 
section.  I will also add the logic in do_read_index() to search for and 
load this special extension if it exists.


This will provide a common framework for any future extension to take 
advantage of if it wants to be loaded/processed before or in parallel 
with the cache entries or other extensions.


For all existing extensions that assume they are loaded _after_ the 
cache entries, in do_read_index() I'll add the logic to use the offset 
(if it exists) to adjust the src_offset and then load them normally.


Given the IEOT extension is just another list of offsets into the index 
to enable out of order processing, I'll add those offsets into the same 
extension so that it is a more generic "table of contents" for the 
entire index.  This enables us to have common/reusable way to have 
random access to _all_ sections in the index while maintaining backwards 
comparability with the existing index formats and code.


These additional offsets will initially only be used to parallelize the 
loading of cache entries and only if the user explicitly enables that 
option but I can think of other interesting uses for them in the future.


Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-13 Thread Junio C Hamano
Ben Peart  writes:

> The proposed format for extensions (ie having both a header and a
> footer with name and size) in the current patch already enables having
> multiple extensions that can be parsed forward or backward.  Any
> extensions that would want to be parse-able in reverse would just all
> need to be written one after the other after right before the trailing
> SHA1 (and of course, include the proper footer).

In other words, anything that wants to be scannable from the tail is
welcome to reimplement the same validation structure used by IEOT to
check the section specific sanity constraint, and this series is not
interested in introducing a more general infrastructure to make it
easy for code that want to find where each extension section in the
file begins without pasing the body of the index.

I find it somewhat unsatisfactory in that it is a fundamental change
to allow jumping to the start of an extension section from the tail
that can benefit any future codepath, and have expected a feature
neutral extension whose sole purpose is to do so [*1*].

That way, extension sections whose names are all-caps can stay to be
optional, even if they allow locating from the tail of the file.  If
you require them to implement the same validation struture as IEOT
to perform section specific sanity constraint and also require them
to be placed consecutively at the end, the reader MUST know about
all such extensions---otherwise they cannot scan backwards and find
ones that appear before an unknown but optional one.  If you keep an
extension section at the end whose sole purpose is to point at the
beginning of extension sections, the reader can then scan forward as
usual, skipping over unknown but optional ones, and reading your
IEOT can merely be an user (and the first user) of that more generic
feature that is futureproof, no?




Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-13 Thread Ben Peart



On 11/9/2017 11:46 PM, Junio C Hamano wrote:

Ben Peart  writes:


To make this work for V4 indexes, when writing the index, it periodically
"resets" the prefix-compression by encoding the current entry as if the
path name for the previous entry is completely different and saves the
offset of that entry in the IEOT.  Basically, with V4 indexes, it
generates offsets into blocks of prefix-compressed entries.


You specifically mentioned about this change in your earlier
correspondence on this series, and it reminds me of Shawn's reftable
proposal that also is heavily depended on prefix compression with
occasional reset to allow scanning from an entry in the middle.  I
find it a totally sensible design choice.



Great! I didn't realize there was already precedent for this in other 
patches.  I guess good minds think alike... :)



To enable reading the IEOT extension before reading all the variable
length cache entries and other extensions, the IEOT is written last,
right before the trailing SHA1.


OK, we have already closed the door for further enhancements to
place something other than the index extensions after this block by
mistakenly made it the rule that the end of the series of extended
sections must coincide with the beginning of the trailing checksum,
so it does not sound all that bad to insist on this particular one
to be the last, I guess.  But see below...


The format of that extension has the signature bytes and size at the
beginning (like a normal extension) as well as at the end in reverse
order to enable parsing the extension by seeking back from the end of
the file.


I think this is a reasonable workaround to allow a single extension
that needs to be read before the main index is loaded.

But I'd suggest taking this approach one step further.  Instead,
what if we introduce a new extension EOIE ("end of index entries")
whose sole purpose is to sit at the end of the series of extensions
and point at the beginning of the index extension section of the
file, to tell you where to seek in order to skip the main index?

That way, you can

  - seek to the end of the index file;

  - go backward skiping the trailing file checksum;

  - now you might be at the end of the EOIE extension.  seek back
necessary number of bytes and verify EOIE header, pick up the
recorded file offset of the beginning of the extensions section.

  - The 4-byte sequence you found may happen to match EOIE but that
is not enough to be sure that you indeed have such an extension.
So the following must be done carefully, allowing the possibility
that there wasn't any EOIE extension at the end.
Seek back to that offset, and repeat these three steps to skip
over all extensions:

- read the (possible) 4-byte header
- read the (possible) extsize (validate that this is a sane value)
- skip that many bytes

until you come back to the location you assumed that you found
your EOIE header, to make sure you did not get fooled by bytes
that happened to look like one.  Some "extsize" you picked up
during that process may lead you beyond the end of the index
file, which would be a sure sign that what you found at the end
of the index file was *not* the EOIE extension but a part of
some other extension who happened to have these four bytes at the
right place.



I'm doing this careful examination and verification in the patch 
already.  Please look closely at read_ieot_extension() to make sure 
there isn't a test I should be doing but missed.



which would be a lot more robust way to allow any extensions to be
read before the main body of the index.

And a lot more importantly, we will leave the door open to allow
more than one index extensions that we would prefer to read before
reading the main body if we do it this way, because we can easily
skip things over without spending cycles once we have a robust way
to find where the end of the main index is.  After all, the reason
you placed IEOT at the end is not because you wanted to have it at
the very end.  You only wanted to be able to find where it is
without having to parse the variable-length records of the main
index.  And there may be other index extensions that wants to be
readable without reading the main part just like IEOT, and we do not
want to close the door for them.



The proposed format for extensions (ie having both a header and a footer 
with name and size) in the current patch already enables having multiple 
extensions that can be parsed forward or backward.  Any extensions that 
would want to be parse-able in reverse would just all need to be written 
one after the other after right before the trailing SHA1 (and of course, 
include the proper footer).  The same logic that parses the IEOT 
extension could be extended to continue looking backwards through the 
file parsing extensions until it encounters an unknown signature, 
invalid size, etc.


That said, the IEOT is 

Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization

2017-11-09 Thread Junio C Hamano
Ben Peart  writes:

> To make this work for V4 indexes, when writing the index, it periodically
> "resets" the prefix-compression by encoding the current entry as if the
> path name for the previous entry is completely different and saves the
> offset of that entry in the IEOT.  Basically, with V4 indexes, it
> generates offsets into blocks of prefix-compressed entries.

You specifically mentioned about this change in your earlier
correspondence on this series, and it reminds me of Shawn's reftable
proposal that also is heavily depended on prefix compression with
occasional reset to allow scanning from an entry in the middle.  I
find it a totally sensible design choice.

> To enable reading the IEOT extension before reading all the variable
> length cache entries and other extensions, the IEOT is written last,
> right before the trailing SHA1.

OK, we have already closed the door for further enhancements to
place something other than the index extensions after this block by
mistakenly made it the rule that the end of the series of extended
sections must coincide with the beginning of the trailing checksum,
so it does not sound all that bad to insist on this particular one
to be the last, I guess.  But see below...

> The format of that extension has the signature bytes and size at the
> beginning (like a normal extension) as well as at the end in reverse
> order to enable parsing the extension by seeking back from the end of
> the file.

I think this is a reasonable workaround to allow a single extension
that needs to be read before the main index is loaded.  

But I'd suggest taking this approach one step further.  Instead,
what if we introduce a new extension EOIE ("end of index entries")
whose sole purpose is to sit at the end of the series of extensions
and point at the beginning of the index extension section of the
file, to tell you where to seek in order to skip the main index?

That way, you can

 - seek to the end of the index file;

 - go backward skiping the trailing file checksum;

 - now you might be at the end of the EOIE extension.  seek back
   necessary number of bytes and verify EOIE header, pick up the
   recorded file offset of the beginning of the extensions section.

 - The 4-byte sequence you found may happen to match EOIE but that
   is not enough to be sure that you indeed have such an extension.
   So the following must be done carefully, allowing the possibility
   that there wasn't any EOIE extension at the end.
   Seek back to that offset, and repeat these three steps to skip
   over all extensions:

   - read the (possible) 4-byte header
   - read the (possible) extsize (validate that this is a sane value)
   - skip that many bytes

   until you come back to the location you assumed that you found
   your EOIE header, to make sure you did not get fooled by bytes
   that happened to look like one.  Some "extsize" you picked up
   during that process may lead you beyond the end of the index
   file, which would be a sure sign that what you found at the end
   of the index file was *not* the EOIE extension but a part of
   some other extension who happened to have these four bytes at the
   right place.

which would be a lot more robust way to allow any extensions to be
read before the main body of the index.

And a lot more importantly, we will leave the door open to allow
more than one index extensions that we would prefer to read before
reading the main body if we do it this way, because we can easily
skip things over without spending cycles once we have a robust way
to find where the end of the main index is.  After all, the reason
you placed IEOT at the end is not because you wanted to have it at
the very end.  You only wanted to be able to find where it is
without having to parse the variable-length records of the main
index.  And there may be other index extensions that wants to be
readable without reading the main part just like IEOT, and we do not
want to close the door for them.

Hmm?