[Rpm-maint] [rpm-software-management/rpm] RFE: eliminate all remnants of rpm network access in man pages and code (#521)

2018-08-17 Thread Jeff Johnson
Again this RFE is a result of assessing how to send a patch to simplify 
lib/rpminstall.c

It is very clear that years of refactoring has managed to remove almost all 
ability for rpm to access the network (the one exception I can find is in 
lib/rpminstall.c)

Meanwhile the rpm.8 man page still documents --ftpport proxy overrides and 
there are utterly useless macros for hkp:// pubkey retrieval and defaults that 
cannot possibly have worked for years.

It's your trash now, not mine. I'd suggest biting the bullet and hauling out 
gobs and gobs of uselessness in rpm code that likely has not worked for almost 
a decade.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/521___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


[Rpm-maint] [rpm-software-management/rpm] RFE: eliminate Berkeley DB by switching to either LMDB or ... (#520)

2018-08-17 Thread Jeff Johnson
This RFE is an attempt to document two important remaining known issues that 
need to be solved to use LMDB in production.

The issues are:

* file path indices may exceed limits imposed on the size of keys by LMDB

The solution was already suggested by Howard Chu in Fedora bugzilla: use a hash 
on the directory name instead of the actual directory name. The hash 
computation of the directory name can be handled in the LMDB backend without 
more pervasive changes (though there are space and I/O and consistency benefits 
to using a directory name hash everywhere)

* header instances used as keys into Packages need to become big endian for 
btree functionality

A balanced tree expects big endian integer keys so that keys will be equally 
distributed. Using little endian keys will cause key pile ups that void 
algorithmic guarantees.

If there are other known LMDB implementation issues, please add.

There are of course further issues converting user databases everywhere, but 
those issues cannot be blamed on the backend chosen to replace Berkeley DB.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/520___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] RFE: run rpm scripts on a different thread using MQTT pub/sub message queues (#519)

2018-08-17 Thread Jeff Johnson
To forestall the obvious objections to using MQTT, please note that I also have 
working code for SysV messages, POSIX message queues, AMQP, ZeroMQ, etc.

I also am quite capable of an implementation of asynchronous RPC similar to 
above in XMLRPC, jabber, or UNIX domain sockets, or protocol du jour.

The type of transport used is not the issue.

The example provided provides an outline of a solution to a known problem in 
order to ask an important question.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/519#issuecomment-413930567___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


[Rpm-maint] [rpm-software-management/rpm] RFE: run rpm scripts on a different thread using MQTT pub/sub message queues (#519)

2018-08-17 Thread Jeff Johnson
This RFE tries to provide a concrete example to known bottlenecks with rpm 
install/update mentioned in issue #517.

MQTT is Yet Another Message Queue (YAMQ) like ZeroMQ, or AMQP, or M$ MQ 
developed by IBM with a client implementation maintained in the Apache 
Foundation.

Message queues are typically (ZeroMQ is an exception) implemented with a broker 
to ensure reliable delivery of messages in sequential order, using 
publish/subscribe methods for producers/consumers.

With MQTT the subscribe consumer is delivered a message through a callback on a 
different thread.

One approach to running scripts on a different thread would involve creating 
both a publish/subscribe mailbox within RPM, and sending the script to be 
executed through MQTT to be run asynchronously.

The subscribe code would take the message body (I.e. the rpm script to run) and 
either invoke /bin/sh or run embedded lua on a different thread. The return 
code would then be sent back to the original publisher to be collected.

Using MQTT in this fashion is just an obvious implementation of asynchronous 
RPC. The benefit comes from the simplicity by which the scriptlet runs on a 
different thread, and rpm execution can proceed without blocking on waitpid, 
nor implementing thread pools, or using locks, or worrying much about thread 
safety of rpmlib since MQTT messages hide all the gory details.

I have working code for MQTT, the refactoring to achieve asynchronous execution 
is obvious, and I can provide an implementation, with measurements, if there is 
interest. Yadda yadda.

The real point of this RFE is to supply a concrete illustration of an 
alternative path to parallelism for issue #517 and to reask the question:

What implementations are deemed acceptable to RPM?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/519___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] RFC: What approach to improving performance through threads or non-blocking I/O is acceptable in RPM? (#517)

2018-08-17 Thread Jeff Johnson
I was referring to using fsync wrapped in an event loop (and on rotating 
media), and was referring to rpm, not system, performance. Avoiding blocking 
system calls by using non-blocking alternatives in an event loop is what nodejs 
does.

Rpmbuild and rpm both do loops over package operations and some of the 
operations are costly. I'm not sure why you consider that comparison "weird", 
although certainly rpmbuild and rpm are performing very different tasks.

Running scripts within an event loop to avoid rpm blocking on waitpid, or using 
a thread pool so that scripts will run in parallel on multiple CPUs would seem 
to be one approach to solving the bottleneck you mention, but I have no numbers 
either.

Post transaction file triggers involve loops that might benefit from 
parallelism. Comparing existing script execution to a proposed file trigger 
alternative is rather irrelevant to the topic asked here.

Thread safety can always be achieved with a "big lock" that guarantees thread 
safety by permitting only a single thread to use rpmlib at any point in time. 
The added complexity is minimal for a "big lock" implementation, and a "big 
lock" does guarantee thread safety.

The point of this RFC was to ask whether, say, OPENMP or POSIX threads 
implementations were preferred if/when threading is attempted to solve some 
perceived bottleneck. I suspect that we might agree that having multiple, 
organically grown, thread paradigmsadds a large (and mostly unnecessary) 
complexity.

Since you are not aware of any existing bottlenecks with rpm install/upgrade, 
I'm not sure further discussion is useful.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/517#issuecomment-413899939___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint


Re: [Rpm-maint] [rpm-software-management/rpm] RFC: What approach to improving performance through threads or non-blocking I/O is acceptable in RPM? (#517)

2018-08-17 Thread Florian Festi
Mixing rpmbuild and rpm installation in the introduction is kinda weird as both 
are very different things. The added fsync actually lowers the performance of 
rpm to leave more air to the rest of the system. So it is not a "performance 
tweak" either.

Focussing on rpm installation/update here:
So far I am not away of any obvious bottle neck in installation that made it on 
my "this needs fixing list". It is known that scriptlets have been using a 
significant amount of installation time. This may have improved with switching 
to (posttrans) file triggers. But I don't have any numbers on that.
It also is not obvious to me that parallel execution will yield a performance 
gain that will justify the added complexity. Nevertheless thread safety should 
be improved to allow librpm to be used in threaded environments.
I am open to discuss specific parts of the code and whether they are bottle 
necks. But that'd require some numbers.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/517#issuecomment-413847900___
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint