Hi, this is the third bi-weekly report on my Summer of Code project 'Semantic Package Review Interface for mentors.debian.net'.
My project aims to extract metadata from packages submitted to mentors.d.n[1], and use this data to match a mackage with a potential sponsor. Since a lot of packages get stuck in the mentoring process because their maintainers have difficulty finding a sponsor, this should ease their entering the Debian process. The last two weeks, I have been working on improving debexpo importer's plugin API[2] so I can import data into the DB when a package is uploaded, instead of computing data on the fly when a package page is visited. Importer plugins are run when a package is uploaded to mentors.d.n and add package metadata to the database, for example the lintian or debian QA status, the bugs closed by the package, etc. Currently, this information is stored in a non-standardized way, which prevents us from easily accessing it outside the plugins' html templates. Data is serialized into JSON objects and stored as is into the database. With Nicolas' help, I have worked up a new database model[3] for this information and started updating current plugins to this API, improving/simplifying the current model, and removing the need for JSON data. I think the API is (almost) complete, but I haven't been able to test it out, because plugins need to access objects (for example, PackageVersion) that are not currently accessible to the importer at the time it calls them. Understanding and editing the importer logic is not easy, because it consists of a single python class, with most of the work done in two 150-lines-methods, mixing access to the DB, checks and local repository management Since I had to move stuff around and my project is closely related to the importer, I have started refactoring the Importer class to make it more easily maintainable. My progress has been way slower than I (and probably my mentors, although they haven't said anything yet) expected of me. The plugin API redesign and update of existing plugins should have been ready for use in metadata extraction last week. I think the main reason is that, even though I started with writing up a plan first, I unnecessarily changed it while writing code. At least twice, I threw away stuff when I thought I had to redesign the model, instead of quickly implementing my first plan and improving it progressively. Also, because I changed two many things at the same time, I did not have regular feedback in the form of tests in debexpo, which has not helped with motivation and productivity. In this sense, my work on the importer refactoring has been better: I've started with a dummy class for managing a new upload's data[4], with only docstrings in methods, and small comments in the importer code where I thought refactoring was needed. This way, I can make small changes and testing along the way each time I commit, which helps me focusing on small things, thus writing code faster. I now feel confident I can finish this refactoring tomorrow, including the new plugin system changes. I might be wrong though, especially since Hofstdadter's Law[5] has held every step of my gsoc until now. I have freed my evenings/nights for this week so I can make up for the time I lost on plugins rewriting; I'd like to get real results for the actual 'metadata extraction and sponsor recommendation' part before we hit the mid-term evaluation deadline. I'm also considering writing nose tests when I write stuff that can't be immediately integrated into debexpo's codebase, which would provide me with fast feedback and small tasks to complete one after the other. [1] [http://mentors.debian.net/] [2] [http://anonscm.debian.org/gitweb/?p=debexpo/debexpo.git;a=blob;f=debexpo/plugins/__init__.py;h=d1d2dfba124f889637db3cf9696858bc87edd800;hb=devel] [3] [http://anonscm.debian.org/gitweb/?p=debexpo/debexpo.git;a=blob;f=debexpo/model/plugin_results.py;h=6ee3f044f69b841f7c8d46a981bd256427ddb6e9;hb=plugin-api] [4] [http://anonscm.debian.org/gitweb/?p=debexpo/debexpo.git;a=blob;f=debexpo/importer/upload_data.py;h=6e5a3e8832f5536abcf21d8c80ebb3f5910dab02;hb=new-importer] [5] "Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law." -- Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid _______________________________________________ Soc-coordination mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/soc-coordination
