Codename: apt-fetcher Mentors: Michael Vogt, David Kalnischkies Project proposal page: [0] Project design page: [1]
General summary (3 - 5 lines): The most significant benefit that I gained from working as a Debian Developer during GSoC this year was the ability to handle a very complex codebase. I managed to read and comprehend dozens of source code files, propose design ideas and entry points for these in the current architecture, update the build process and makefile hierarchy, build test cases for the new features, keep the project backwards compatible while making it open for further improvements. I have improved my C++ and BASH scripting skills, along with the ability to communicate with the Debian specialists, implement desired features, detect and correct bugs. The final result is an extensible module for a fundamental and intensively used Debian application. Last 3 weeks summary: What I've done: - replaced the legacy code with the new one - fixed integration tests - implemented the possibility to choose which code (legacy vs. new) will be used in the binaries at compile time What problems I've run into: - error object bug - difficulty in understanding the testing scripts framework - circular header inclusions In the last 3 weeks of GSoC, after discussing with the mentors, the project focused mainly on integration, rather than developing new features. Until now, the apt-get update functionality was tested only by calling the framework's methods from libapt. The next step was to make the necessary changes in apt-get (the end application) to use the new code. Me and the mentors lost quite a few time fixing a bug regarding error reporting - in APT, the error messages are kept in a global queue, represented by the _error object. The implemented parser uses this object when parsing sources.list. It first tries parsing a line as a standard line, as specified in the standard format. If this fails, it tries to parse it as a comment, and if this fails too, it reads it as garbage. All these errors are registered in the _error object, and the calling function takes care to pop one message at a time in case of failure. The _error object keeps both errors and warnings in the same place. On some test case, the _error object had some warnings before the parsing, so after the parsing, the parsing errors remained. The fix was to save the whole context before the parsing and restore it afterwards, in case the parsing succeeds. The new code was supposed to replace the old one in apt-get, not just work alongside. A long time was spent removing all the code using apt-pkg/sourcelist and apt-pkg/deb/debmetaindex. All this code was replaced with references to the framework and the default plugins. After the project passed compilation stage, it had to pass test/integration. Some tests were failing because of the changes, and they had to be fixed. Others were failing not because of apt-fetcher, but it took a lot of time to figure this out. Also, the scripts for the tests use a fairly complex framework which must first be understood before individual tests could be debugged. After the change in the code didn't produce any failures in test/integration, I added a test for downloading Contents files with apt-get update - test-contents-basic. After finishing this step, the mentors thought it would be a good idea to keep the legacy code, too, though, since we don't want to break compatibility. I managed to do that with a couple of header files and some #ifdefs. Fixing all the references to use the right wrapper functions and to make the switch only from a single place was also part of the final touches to the project. Parts of the initial plan that I didn't get to implement: * "an user interface for the parser" - right now, all the changes to the parser can only be made through code. The parser can be customized, in the future, to make use of the APT configuration for its settings. The parser is a component which exceeded its initial estimated complexity - predicate based iteration, ability to implement plugins with parsing methods for other formats (xml). The parser user interface's importance is not crucial for APT itself, but rather for other applications that might be using it via libapt. * "optimizations in the acquire logic" - APT already implements transfer methods - e.g. http, ftp, copy, gzip, gpgv, pdiffs - as independent binaries that are launched when a specific metadata file needs to be processed. The framework plugin model implements the capability to define acquire algorithms for metadata file types, using these methods. So rather than improving the acquire logic, apt-fetcher gives the possibility of defining a new one from scratch for the new files. As for the present acquire logic, it was completely integrated. * "a plugin for debtags" - the information about package tags is located in Packages index files in the Debian archive, not on separate files. These files are already downloaded by default by apt-get update, so we may consider that tags are supported by the framework. Most focus was put on developing a plugin for the Contents files, since these were, until now, unsupported by apt-get. To sum up, the apt-fetcher module is a viable alternative for the current apt-get update backend. The parser for sources.list is pluggable and can support other formats. It can be used from libapt in other projects as well. The framework is pluggable both for metadata type files and Release files - in the future, plugins can be defined for other Archive formats than the standard Debian archive. It implements the standard flow from a list of Source file object provided by the parser to a list of Items to be downloaded by the present acquire module. Extending the Item object is part of the plugin, so the developer may define specific operations when downloading a metadata file. The plugin can also acces the APT configuration object - _config - and make custom settings when registering a plugin so they can be used in the metadata acquire process. The APT package is proof that a complex project cannot be fully understood in the timespan of preparing an application. The proposed estimations and timeline didn't have the expected accuracy. On one hand, there was no need to implement a backend acquire logic, since the present one could easily be integrated; on the other, researching the state of the art lasted the whole community bonding period and more than half of the coding period. And that was only enough to develop what I've developed, since it was difficult to wrap my mind around the whole picture of the APT code. I changed the design and the code very often, I was always missing details which would later prove to be relevant. The next most difficult thing to -understanding was designing, building an architecture over a construction you don't undestand completely. To me, the project was a success. APT is one of the fundamental components of Debian and its successors as a Linux Distro, it has passed a long series of revisions, and through them it was impossible to predict how its architecture would evolve. This resulted in a very stable and complex code. To refactor, expose and extend the metadata acquire backend was very important in the context of integrating APT with other package management applications, in Debian and other distributions (the AppStream project). This component is a little part in the APT suite, but to be capable of making the changes, one must first have the big picture. I've organized its current code and made it extensible for future development. I've integrated it in APT and tested it with the regression suite. I've implemented a base on which future developers may build plugins for their specific formats and files. My contributions to the APT package can be found in the repo [2]. Bogdan Purcareata [0] http://wiki.debian.org/SummerOfCode2012/Projects#Pluggable_acquire-system_for_APT [1] http://wiki.debian.org/BogdanPurcareata/PluggableAptBackend [2] https://launchpad.net/apt-fetcher _______________________________________________ Soc-coordination mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/soc-coordination
