On 08/28/2014 02:07 PM, Joe Gordon wrote: > > > > On Thu, Aug 28, 2014 at 10:17 AM, Sean Dague <s...@dague.net > <mailto:s...@dague.net>> wrote: > > On 08/28/2014 12:48 PM, Doug Hellmann wrote: > > > > On Aug 27, 2014, at 5:56 PM, Sean Dague <s...@dague.net > <mailto:s...@dague.net>> wrote: > > > >> On 08/27/2014 05:27 PM, Doug Hellmann wrote: > >>> > >>> On Aug 27, 2014, at 2:54 PM, Sean Dague <s...@dague.net > <mailto:s...@dague.net>> wrote: > >>> > >>>> Note: thread intentionally broken, this is really a different > topic. > >>>> > >>>> On 08/27/2014 02:30 PM, Doug Hellmann wrote:> > >>>>> On Aug 27, 2014, at 1:30 PM, Chris Dent <chd...@redhat.com > <mailto:chd...@redhat.com>> wrote: > >>>>> > >>>>>> On Wed, 27 Aug 2014, Doug Hellmann wrote: > >>>>>> > >>>>>>> I have found it immensely helpful, for example, to have a > written set > >>>>>>> of the steps involved in creating a new library, from > importing the > >>>>>>> git repo all the way through to making it available to other > projects. > >>>>>>> Without those instructions, it would have been much harder > to split up > >>>>>>> the work. The team would have had to train each other by word of > >>>>>>> mouth, and we would have had constant issues with inconsistent > >>>>>>> approaches triggering different failures. The time we spent > building > >>>>>>> and verifying the instructions has paid off to the extent > that we even > >>>>>>> had one developer not on the core team handle a graduation > for us. > >>>>>> > >>>>>> +many more for the relatively simple act of just writing > stuff down > >>>>> > >>>>> "Write it down.” is my theme for Kilo. > >>>> > >>>> I definitely get the sentiment. "Write it down" is also hard > when you > >>>> are talking about things that do change around quite a bit. > OpenStack as > >>>> a whole sees 250 - 500 changes a week, so the interaction > pattern moves > >>>> around enough that it's really easy to have *very* stale > information > >>>> written down. Stale information is even more dangerous than no > >>>> information some times, as it takes people down very wrong paths. > >>>> > >>>> I think we break down on communication when we get into a > conversation > >>>> of "I want to learn gate debugging" because I don't quite know > what that > >>>> means, or where the starting point of understanding is. So those > >>>> intentions are well meaning, but tend to stall. The reality was > there > >>>> was no road map for those of us that dive in, it's just > understanding > >>>> how OpenStack holds together as a whole and where some of the > high risk > >>>> parts are. And a lot of that comes with days staring at code > and logs > >>>> until patterns emerge. > >>>> > >>>> Maybe if we can get smaller more targeted questions, we can > help folks > >>>> better? I'm personally a big fan of answering the targeted > questions > >>>> because then I also know that the time spent exposing that > information > >>>> was directly useful. > >>>> > >>>> I'm more than happy to mentor folks. But I just end up finding > the "I > >>>> want to learn" at the generic level something that's hard to > grasp onto > >>>> or figure out how we turn it into action. I'd love to hear more > ideas > >>>> from folks about ways we might do that better. > >>> > >>> You and a few others have developed an expertise in this > important skill. I am so far away from that level of expertise that > I don’t know the questions to ask. More often than not I start with > the console log, find something that looks significant, spend an > hour or so tracking it down, and then have someone tell me that it > is a red herring and the issue is really some other thing that they > figured out very quickly by looking at a file I never got to. > >>> > >>> I guess what I’m looking for is some help with the patterns. > What made you think to look in one log file versus another? Some of > these jobs save a zillion little files, which ones are actually > useful? What tools are you using to correlate log entries across all > of those files? Are you doing it by hand? Is logstash useful for > that, or is that more useful for finding multiple occurrences of the > same issue? > >>> > >>> I realize there’s not a way to write a how-to that will live > forever. Maybe one way to deal with that is to write up the research > done on bugs soon after they are solved, and publish that to the > mailing list. Even the retrospective view is useful because we can > all learn from it without having to live through it. The mailing > list is a fairly ephemeral medium, and something very old in the > archives is understood to have a good chance of being out of date so > we don’t have to keep adding disclaimers. > >> > >> Sure. Matt's actually working up a blog post describing the thing he > >> nailed earlier in the week. > > > > Yes, I appreciate that both of you are responding to my questions. :-) > > > > I have some more specific questions/comments below. Please take > all of this in the spirit of trying to make this process easier by > pointing out where I’ve found it hard, and not just me complaining. > I’d like to work on fixing any of these things that can be fixed, by > writing or reviewing patches for early in kilo. > > > >> > >> Here is my off the cuff set of guidelines: > >> > >> #1 - is it a test failure or a setup failure > >> > >> This should be pretty easy to figure out. Test failures come at > the end > >> of console log and say that tests failed (after you see a bunch of > >> passing tempest tests). > >> > >> Always start at *the end* of files and work backwards. > > > > That’s interesting because in my case I saw a lot of failures > after the initial “real” problem. So I usually read the logs like C > compiler output: Assume the first error is real, and the others > might have been caused by that one. Do you work from the bottom up > to a point where you don’t see any more errors instead of reading > top down? > > Bottom up to get to problems, then figure out if it's in a subprocess so > the problems could exist for a while. That being said, not all tools do > useful things like actually error when they fail (I'm looking at you > yum....) so there are always edge cases here. > > >> > >> #2 - if it's a test failure, what API call was unsuccessful. > >> > >> Start with looking at the API logs for the service at the top > level, and > >> see if there is a simple traceback at the right timestamp. If not, > >> figure out what that API call was calling out to, again look at the > >> simple cases assuming failures will create ERRORS or TRACES > (though they > >> often don't). > > > > In my case, a neutron call failed. Most of the other services seem > to have a *-api.log file, but neutron doesn’t. It took a little > while to find the API-related messages in screen-q-svc.txt (I’m glad > I’ve been around long enough to know it used to be called > “quantum”). I get that screen-n-*.txt would collide with nova. Is it > necessary to abbreviate those filenames at all? > > Yeh... service naming could definitely be better, especially with > neutron. There are implications for long names in screen, but maybe we > just get over it as we already have too many tabs to be in one page in > the console anymore anyway. > > >> Hints on the service log order you should go after are on the footer > >> over every log page - > >> > > http://logs.openstack.org/76/79776/15/gate/gate-tempest-dsvm-full/700ee7e/logs/ > >> (it's included as an Apache footer) for some services. It's been > there > >> for about 18 months, I think people are fully blind to it at this > point. > > > > Where would I go to edit that footer to add information about the > neutron log files? Is that Apache footer defined in an infra repo? > > Note the following at the end of the footer output: > > About this Help > > This help file is part of the openstack-infra/config project, and can be > found at modules/openstack_project/files/logs/help/tempest_logs.html . > The file can be updated via the standard OpenStack Gerrit Review > process. > > > I took a first whack at trying to add some more information to the > footer here: https://review.openstack.org/#/c/117390/
\o/ - you rock joe! -Sean -- Sean Dague http://dague.net _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev