Re: [pkg-discuss] [REVIEW] repository on-disk format changes

Shawn Walker Thu, 12 Aug 2010 14:02:58 -0700

On 08/11/10 07:00 PM, Brock Pytlik wrote:

On 08/ 9/10 01:03 AM, Shawn Walker wrote:

Greetings,


The following webrev contains changes or fixes for the following items:

16744 repository multi-publisher on-disk format should be formalized
and implemented
[snip]

webrev:
http://cr.opensolaris.org/~swalker/pkg-repo-format/
[snip]

...

I reviewed this mostly looking at search issues and potential collisions
with the manifest signing wad that's out now.

Currently the manifest signing wad adds/removes certs to/from the repo,
I assume that after both wads land, the right answer is to have the
certs added to particular publishers and as part of adding the cert, a
publisher property will be set?
So instead of:

/usr/bin/pkgrepo add-signing-ca-cert path ...
it will be:
/usr/bin/pkgrepo add-signing-ca-cert publisher path ...

Is that correct? I don't think the -p option is the right approach here
because by default, publishers should not be sharing certificates so
making the publisher argument optional seems like it would cause problems.

You could certainly do it that way, although I'd ask Dan or Danek aboutthe user interface aspect of it.

As an aside, it'd be nice to find a less wordy name for that subcommand.Is there a reason it needs to be 'add-signing-ca-cert' specifically?

modules/indexer.py:
With the locking going in, is there any reason to retain the consistent
opening logic? Are you leaving it around purely for backward
compatibility? If so, can you file a RFE to remove that functionality
once we've decided enough time has passed?

I thought the consistent opening logic was needed because migrate wasn'tan atomic action, so despite the locking, it would still be needed forread-only consumers.

I think the locking here is fairly complex because they keep getting
picked up and dropped. I think it would be easier to understand and
maintain if we did either what we've done in other places w/ wrappers
which are responsible for taking and dropping the locks, or keep the
existing approach but make each function take the lock only if
self.__locked is False, and then dropping the lock only if it was taken
in that function's invocation.

I don't understand the view of the locking here being complex. Thelock() method is only called in two places. The only bit of logic herethat I could possibly understand that being applied to is the unlock()that happens during _generic_update_index() ifrebuild_index_from_scratch has to be called.

I could try to do what you describe, but there are some additionalsafeguards I'd have to put in place that would actually make this morecomplicated. In the case of rebuild_index_from_scratch, since the indexdirectory is being removed anyway, it's necessary to drop the lock andobtain it again because of the file lock that's needed.

Why not have the indexer use the same makedirs as misc provides? The
reason for having it raise search_errors was that it was a module shared
by both client and server, while api_errors were client only. Since this
no longer the case (server modules import misc which imports api_errors)
we might as well consolidate both the exception space and the makedirs
functions.


It was only to preserve the existing behaviour.

modules/server/depot.py:
What's the timeout of .5 seconds on line 2026 buy?

I'll add a comment, but that's basically necessary to keep the depotserver from burning through a large amount of CPU and so that the depotserver doesn't hang on shutdown (it waits at least timeout secondsbefore exiting, which if you don't specify a timeout means forever).

I could specify a longer timeout, but that directly affects shutdowntime and the amount of time before the depot might start a queued task.

Why can't more than one background task be queued?

Because I don't believe more than one should be allowed. It's a fareasier to understand (and debug) model where only one major operation ata time can be queued.

Do the background processes happen in the same process or are they
forked out by cherrypy? If it's that same process, then that concerns me
as indexing previously happened in a separate process from the depot,
which meant the memory it used was returned to the system when the
indexing was done, instead of being attached to the depot for as long as
it was running.

If the memory usage of the depot server after indexing is a concern forthe user, they can restart it, or alternatively use pkgrepo on thefilesystem path of the repository the depot's serving. I'm happy todocument that in the man pages.

It was intentional that they happen in the same process. Ultimately, Ididn't feel that the complexity or problems involved with forking wereworth it to solve a resource issue.

There's no way to abort a task running in the background?

No, and that's intentional. I don't want to implement an overly complexmechanism here. There was no way to stop a background indexing eitherin the past, so this isn't any worse than what we already had in this way.

Is the user told that the operation they just instructed the depot to do
is being done in the background and that they'll need to check the logs
for errors? Suppose a user wanted to refresh-packages and wait to do the
next step until that was finished, how would they accomplish that? Would
they loop polling stauts/0 until the operation they cared about was
marked as finished? It seems that, based on what I understand about how
indexing locking currently works, this wouldn't necessarily work because
the user could have their status/0 request processed in one of the times
between indexing dropping and picking up the lock, meaning the status
would be reported as online instead of indexing.

The same problem existed previously with indexing. Long term, we needto figure out a better way to provide more detailed status informationto the user but that's not this wad. I'll open an RFE for it.

modules/server/repositoy.py:
Why doesn't open need to lock the repository anymore?

Because the underlying _RepoStore() object will lock itself. Thisallows synchronous publication of packages to disparate publishers.

tests/api/t_api_search:
line 1257: Why the change from True to False?


Leftover debugging.

Thanks,
-Shawn
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] repository on-disk format changes

Reply via email to