Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
Hi, On Dienstag, 26. Februar 2013, Andreas Beckmann wrote: I'm primarily concerned about reimplementing a bad piece of code (the second half of dwke that creates the .tpl files) in order to build a new feature on top of it. The perfectionist in me would like to fix things properly first. yes, but... the imperfect way was used quite successfully with piuparts for a long time ;-) I really do like the approach of reviewing patches before inclusion as more eyes may spot more problems me too, absolutly. Yet I also can only imagine Dave's frustration trying to get his work in and recognized, so far this hasn''t happen for this feature, and for quite a long time. And I'd like Dave to stay motivated and contributing, and I like the new feature also. So I'm a bit torn, (currently) leaning towards releasing 0.50 soon (now?) and then starting 0.51 with the merge of dave/sort-issues-by-rdep - or do you think thats premature? cheers, Holger -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
On Mon, Feb 25, 2013 at 8:45 AM, Andreas Beckmann a...@debian.org wrote: In general I think we should allow the flexibility to have a per-section known-problems-directory setting, so each report Section should generate its own problem list and not get a global one passed OK, but out of scope of the patch set under consideration, which replaces the existing detect_well_known_errors with one that sorts by rdep. I tried to create a reduced version of Dave's sort-issues-by-rdep branch that only does the .tpl generation, as that is the part I want to look at right now: preview/dave-dwke-only-create-tpl David Steele (9): 01 Add skeleton for python replacement of detect-well-known-errors 02 detect_well_known_errors.py - Clean obsolete kpr and bug files. 03 detect_well_known_errors.py - Add class for handling known problems. 05 detect_well_known_errors.py - Create missing kpr files. 06 detect_well_known_errors.py - Create Failure Mgr class to hold kpr fails. 07 detect_well_known_errors - Create html tpl files. 16 detect_well_known_errors - Sort known errors/issues by rdep count. 17 detect_well_known_errors - Display the reverse dependency count. 20 detect_well_known_errors - Add PTS link to issue/error entries. reordered, merged, dropped .kpr creation, cleanup of obsolete files, ... but not tested at all Take a look at skip_kpr. It gives you your tpl-only capability with about a dozen lines of code. This is part of the piuparts-report work I originally submitted, which is out of scope for the patch set under consideration. The problems I see right now: * many functions from piuparts-report are either copied (e.g. pts_subdir( source )) * or reimplemented differently, e.g. the variable substitution in the templates. I don't know which variant is better, but I don't really want *two* implementations of the same thing This is not a change from what it replaces. Elimination of the redundancies can be added to the scope of a piuparts-report integration task. The internal representation of a set of logs is very different which makes integration into -report difficult That depends on what you mean by integration. There is validity to the claim that it has been integrated, in existing patches outside the current scope. As you are saying, if this was designed from scratch for integration with piuparts-report, it would lean much more heavily on packagesdb. What is on the table is not an integrated solution. It is a replacement for the bash script, with issue rdep sorting. The assumption that there is only $pkgspec.log in (at most) one subdir is nothing I would rely on (although it usually is) It should be a valid assumption. The only requirement along these lines should be to avoid crashing in the presence of this error condition. BTS and PTS URLs should not be embedded in the templates, probably best to have a function that generates a certain url for a package name to allow for future extensions, e.g. Ubuntu support. That is a change that is in scope with the future extensions. I understand that you don't like the way that I solved the known_problem .conf issues in the patches that come after this submission, and that you believe they aren't the right way to add issues to piuparts-report. I am OK with you taking whatever pieces of this you might feel to be useful and crafting a more elegant integration. But I ask that you consider what's on the table within the scope of the problem it solves. Please make your changes for piuparts-report after this is in.
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
Hi Dave, On Samstag, 23. Februar 2013, Dave Steele wrote: I've reworked based on Andreas' issues related to detect_well_known_errors and rdeps. thanks! (extra bonus points if you could tell how many commits it are in each branch, due to rebase its rather easy for me to find out, but becoming this told would be even better ;) Comments related to piupartslib and piuparts-reports I've deferred as currently out of scope. The problems and failures classes in the python script are available for future rework. nice! I've seen two typos: a.) unkownsasfailures.sort - I believe you mean unknownasfailures.sort :) b.) Packages with failures not yet well known detected in $SECTION - this wording might even be from me, today I'd say: Packages with unknown failures detected in $SECTION Regarding merging into develop: yes, I want. But first I want to finish merging Andreas current bits, then merge that develop into piatti (and run it there) and then merge these two branches of yours. cheers, Holger
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
On Thu, Feb 21, 2013 at 4:24 AM, Andreas Beckmann a...@debian.org wrote: this work looks really promising and I'm curious to try it some day on my instance. But as I wrote before there is no need to reimplement the .tpl generation in python. Instead these intermediate files should go away and the html generation should be moved directly into piuparts-report. There will be a package db available. I think this requirement to generate .tpl externally dates back to the time when all logfiles were grepped daily, i.e. before we remembered the results in .kpr. I took the least invasive path from mimicking detect_well_known_errors to sorting by rdep to eliminating linktarget_by_template (where rdep sorting was the single original goal). I agree that .tpl's are obsolete, but that wasn't an overriding goal for me, and not necessary to get issue logic out of piuparts-report. There's no significant performance issue. Even if .kpr generation can be sped up significantly, I don't think I want to run this from inside piuparts-report. Just like piuparts-analyze (that takes 30-60 minutes for my instance) this is something that will continue to be run from the generate-piuparts-report driver script ... and having it sped up by a magnitude will decrease my hesitation to run it with --recheck-all. OK. A minimally invasive fix would be to add a 'skip kpr creation' option, used inside piuparts-report, and re-introduce detect_well_known_errors, which imports known_problems. Interested? Also if the .tpl files are gone, we can actually run piuparts-report without running piuparts-analyze or detect_well_known_errors directly before it. The above would have the same net effect. And about speeding up the grepping - wouldn't it be even faster if we can run multiple regexes at the same time on the input - either by 'ORing' them together or passing a list to re or ... then we would just need to figure out which one has matched ... (No, I haven't tried anything like this, but I'm considering testing this with the multiple grep calls in detect_piuparts_issues. grep -lE '(foo)|(bar)|(f[o0]{2}bar|baz)' should be significantly faster than grep -l foo grep -l bar grep -lE 'f[o0]{2}bar|baz' And there we only care about 'any match' disregarding which matched. Or am I mistaken here? Interesting idea. I'll give it a try. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
On Thu, Feb 21, 2013 at 5:02 AM, Andreas Beckmann a...@debian.org wrote: +if self.inc_re.search( logbody, re.MULTILINE ): +for line in logbody.splitlines(): +if self.inc_re.search( line ): +if self.exc_re == None \ + or not self.exc_re.search(line): +return( True ) That looks inefficient. Why do we have to grep twice to identify matching lines even if we have no exclusion pattern? More than 99% of the tests will return no failure. If the MULTILINE search is 1% faster than the loop, this is a net win. Is it for 'foo.*bar' matching on 'The food shop\n\nSetting up libbar (08-15) ...' ? Hmm, no, DOTALL is off by default. The MULTILINE search is pure optimization - it can be remove with no change to the results. DOTALL is off to match grep. Anyway, once you have a match, it shouldn't be too difficult to find the position and identify the matching line without needing to rematch on each line individually. Maybe even extend the pattern internally to ^.*($PATTERN) to match at BeginOfLine, then add a search for '$' starting from the BoL to find the corresponding EoL ... and apply the exclusion pattern on the range found that way. Maybe, but to get bang for the buck, focus on the 99%.Your idea to 'look for any problem' in the other thread looks like the right path to try. There simply aren't enough failure cases (even in 62 sections :-) ) to worry too much about the rest. PS: for reviewing a series of patches I don't really care about the author's development history but prefer rebased, rewritten and reordered history to produce an easily readable patch series with small and self contained patches. (Hint: please fold 'Template HTML format fix' into the commit it fixes.) There is a point to that commit. I wrote the python replacement to produce identical output to the shell script, before adding fixes and features (actually there are caveats, listed in the commit). You can check out that version to verify. Fixing the HTML format and merging the templates earlier interferes with that capability. Of course rewriting is off limits once something has been merged into mainline. But I see no gain in merging a lot of fixup commits into mainline if the development branch could have been rewritten before the merge. I wrote of another fixup branch, containing fixes to errors in the well_known branch I had previously announced. I'm not sure if I should have gone ahead and rolled the fixes into the announced branch or not. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
On Mon, Feb 18, 2013 at 5:44 AM, Holger Levsen hol...@layer-acht.org wrote: ... these are quite some different changes, can you please isolate the commits for Sort known issues by reverse dependency count and rebase them onto current develop?! The new serial branches sort-issues-by-rdep and sort-issues-by-rdep-fast are separated from the rest of the work, and rebased to develop. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
The rest of my proposed changes for known problem handling are pushed, for review. A rebase is needed before merging. I will do this at your request. The following serial branch heads are involved: well-known - I've added tolerance for missing files and packages, and added PTS links fast-problems - replaced grep shell calls with python re. Per the commit: Run with full .kpr replacement is 2 1/2 minutes vs. 28 minutes for grep, per section, with stale file buffers, and idle slaves. Subsequent runs are 15 seconds vs. 60 seconds. Replacing the packagesdb rdep sort with an alpha sort reduces that to 5 seconds. fast-report - detect_well_known_errors is morphed into the piupartslib module 'known_problems', and is and called from piuparts-report. Report runs always include issues and error summaries now. report_problem_integration - replace linktarget_by_template with known_problem module support. All problem definition information is encoded in the conf file. piatti-problems - known_problems uses the packaged dir for the problem files. A new known-problem-directory config parameter lets piatti set it back to under /org. Commits: piatti-problems 1b61655 piuparts-report - Add known-problem-directory config for piatti. report_problem_integration a8360ec piuparts-report - Add a special Problem case for unknown failures. dc39b89 piuparts-report - replace linktarget_by_template with Problem class. 4db2254 piuparts-report - add known Problems class list to Section. 108bbfd Add piuparts-report linktarget_by_template information to known_problems. fast-report bdc0939 Mv detect_well_known_errors to piupartslib - call from piuparts-report. fast-problems 4394b8f detect_well_known_errors - Changelog entry for re speedup. c398289 Remove COMMAND parameter from known_problems. 4e4e011 detect_well_known_errors - Generate 'grep' help command from INCLUDE. 87696e9 detect_well_known_errors - Use python re for fast kpr generation. 2991e15 known_problems - Add INCLUDE parameters for re-based searching. well-known 955a6a2 Close the 698526 python detect_well_known_errors wishlist bug. 5b61a03 detect_well_known_errors - Add PTS link to issue/error entries. fe4e400 detect_well_known_errors - handle having the pkgsdb entry disappear. 895e035 detect_well_known_errors - Tolerate missing .kpr files. 967e27d detect_well_known_errors - Tolerate deleted log files. 427aa41 detect_well_known_errors - restore recheck and recheck-failed options. a4553bc Bump the required python version to 2.7. a12f676 detect_well_known_errors - display the reverse dependency count. 500e97f detect_well_known_errors - sort known errors/issues by rdep count. 7de4eb9 detect_well_known_errors - integrate the package templates. d066bb3 detect_well_known_errors - Template HTML format fix. 76b8ce2 detect_well_known_errors - Copyright notice. b8af3e4 detect_well_known_errors.py - move to detect_well_known_errors. 25b9351 Remove bash detect_well_known_errors. ece5e4e detect_well_known_errors.py - change ext's to create kpr and tpl files. 39837e9 detect_well_known_errors.py - print failures to match bash script 198c65e detect_well_known_errors - Create html tpl files 4049338 detect_well_known_errors.py - Create Failure Mgr class to hold kpr fails. d04c1bb detect_well_known_errors.py - Create missing kpr files 1880598 detect_well_known_errors.py - add class for handling known problems 9b25943 detect_well_known_errors.py - establish the problem file location. 53df049 detect_well_known_errors.py - clean obsolete kpr and bug files cdd8803 Add skeleton for python replacement of detect-well-known-errors 601e6a7 start with 0.50 df94975 release as 0.49 In addition, there is a fixup branch that contains changes that need to be rebased into well-known. fixup-well 8fc8df2 fixup - fix method arguments for recheck* parameters b897262 fixup - fix filtered() a1b381d fixup - delete the comment that .kprn is temporary Andreas, per your wishlist: On Sun, Jan 20, 2013 at 7:56 AM, Dave Steele dste...@gmail.com wrote: On Sun, Jan 20, 2013 at 6:56 AM, Andreas Beckmann deb...@abeckmann.de wrote: ... What I'd like to see is (in probable order of implementation) * piuparts-report discovering all existing known problem descriptions instead of hardcoding them Done, by pulling in the detect_well_known_errors code as a module, and using it's Problem class. - need to add ordering information somehow, perhaps by adding a number prefix: 42_foo_not_found_issue.conf or by adding a variable with a sort key inside (there should be a bug or some todo entries about this) Done, using a PRIORITY key in the problem files, seeded with the order of linktarget_by_template in piuparts-report. - needs to move title information from piuparts-report to .conf Done, using a new EXPLAIN field
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
On Sun, Jan 20, 2013 at 6:56 AM, Andreas Beckmann deb...@abeckmann.de wrote: ... What I'd like to see is (in probable order of implementation) * piuparts-report discovering all existing known problem descriptions instead of hardcoding them - need to add ordering information somehow, perhaps by adding a number prefix: 42_foo_not_found_issue.conf or by adding a variable with a sort key inside (there should be a bug or some todo entries about this) - needs to move title information from piuparts-report to .conf * piuparts-report generating the known problem reports, allowing access to packagedb etc. for better reports, making .tpl files obsolete * getting rid of error/issue redundancies * computing the .kpr with python re instead of grep * adjusting the .conf and .kpr formats to what is actually needed I would prioritize python re. The results could affect the strategy for the rest. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#698526: [Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count
On 2013-01-19 22:06, Dave Steele wrote: The well-known git branch implements a version of detect_well_known_errors to accomplish this. The script is ported from bash to python, to take advantage of the rdep capability of piupartsdb. It was developed alongside the bash script to support side-by-side testing. Without having looked at the code yet, I like the idea :-) Now that you have access to the package DB, can you add a PTS link for each failing package? These need to be src based ... Andreas -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org