Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Thierry Carrez
David Kranz wrote:
 On 08/27/2014 03:43 PM, Sean Dague wrote:
 On 08/27/2014 03:33 PM, David Kranz wrote:
 Race conditions are what makes debugging very hard. I think we are in
 the process of experimenting with such an idea: asymetric gating by
 moving functional tests to projects, making them deeper and more
 extensive, and gating against their own projects. The result should be
 that when a code change is made, we will spend much more time running
 tests of code that is most likely to be growing a race bug from the
 change. Of course there is a risk that we will impair integration
 testing and we will have to be vigilant about that. One mitigating
 factor is that if cross-project interaction uses apis (official or not)
 that are well tested by the functional tests, there is less risk that a
 bug will only show up only when those apis are used by another project.

 So, sorry, this is really not about systemic changes (we're running
 those in parallel), but more about skills transfer in people getting
 engaged. Because we need both. I guess that's the danger of breaking the
 thread is apparently I lost part of the context.

 I agree we need both. I made the comment because if we can make gate
 debugging less daunting
 then less skill will be needed and I think that is key. Honestly, I am
 not sure the full skill you have can be transferred. It was gained
 partly through learning in simpler times.

I think we could develop tools and visualizations that would help the
debugging tasks. We could make those tasks more visible, and therefore
more appealing to the brave souls that step up to tackle them. Sean and
Joe did a ton of work improving the raw data, deriving graphs from it,
highlighting log syntax or adding helpful Apache footers. But those days
they spend so much time fixing the issues themselves, they can't
continue on improving those tools.

And that's part of where the gate burnout comes from: spending so much
time on the issues themselves that you can no longer work on preventing
them from happening, or making the job of handling the issues easier, or
documenting/mentoring other people so that they can do it in your place.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Doug Hellmann

On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote:

 On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
 
 Note: thread intentionally broken, this is really a different topic.
 
 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
 
 On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.
 
 +many more for the relatively simple act of just writing stuff down
 
 Write it down.” is my theme for Kilo.
 
 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.
 
 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.
 
 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.
 
 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.
 
 You and a few others have developed an expertise in this important skill. I 
 am so far away from that level of expertise that I don’t know the questions 
 to ask. More often than not I start with the console log, find something 
 that looks significant, spend an hour or so tracking it down, and then have 
 someone tell me that it is a red herring and the issue is really some other 
 thing that they figured out very quickly by looking at a file I never got to.
 
 I guess what I’m looking for is some help with the patterns. What made you 
 think to look in one log file versus another? Some of these jobs save a 
 zillion little files, which ones are actually useful? What tools are you 
 using to correlate log entries across all of those files? Are you doing it 
 by hand? Is logstash useful for that, or is that more useful for finding 
 multiple occurrences of the same issue?
 
 I realize there’s not a way to write a how-to that will live forever. Maybe 
 one way to deal with that is to write up the research done on bugs soon 
 after they are solved, and publish that to the mailing list. Even the 
 retrospective view is useful because we can all learn from it without having 
 to live through it. The mailing list is a fairly ephemeral medium, and 
 something very old in the archives is understood to have a good chance of 
 being out of date so we don’t have to keep adding disclaimers.
 
 Sure. Matt's actually working up a blog post describing the thing he
 nailed earlier in the week.

Yes, I appreciate that both of you are responding to my questions. :-)

I have some more specific questions/comments below. Please take all of this in 
the spirit of trying to make this process easier by pointing out where I’ve 
found it hard, and not just me complaining. I’d like to work on fixing any of 
these things that can be fixed, by writing or reviewing patches for early in 
kilo.

 
 Here is my off the cuff set of guidelines:
 
 #1 - is it a test failure or a setup failure
 
 This should be pretty easy to figure out. Test failures come at the end
 of console log and say that tests failed (after you see a bunch of
 passing tempest tests).
 
 Always start at *the end* of files and work backwards.

That’s interesting because in my case I saw a lot of failures after the initial 
“real” problem. So I usually read the logs like C compiler output: Assume the 
first error is real, and the 

Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Dean Troyer
On Thu, Aug 28, 2014 at 11:48 AM, Doug Hellmann d...@doughellmann.com
wrote:

 In my case, a neutron call failed. Most of the other services seem to have
 a *-api.log file, but neutron doesn’t. It took a little while to find the
 API-related messages in screen-q-svc.txt (I’m glad I’ve been around long
 enough to know it used to be called “quantum”). I get that screen-n-*.txt
 would collide with nova. Is it necessary to abbreviate those filenames at
 all?


Cleaning up the service names has been a background conversation for some
time and came up again last night in IRC.  I abbreviated them in the first
place to try to get them all in my screen status bar, so that was a while
ago...

I don't think the current ENABLED_SERVICES is scaling well and using full
names (nova-api, glance-registry, etc) will make it even harder to read.
 May that is misplaced concern? I do think though that making the logfile
names and locations more obvious in the gate results will be helpful.

I've started scratching out a plan to migrate to full names and will get it
into an Etherpad soon.  Also simplifying the log file configuration vars
and locations.

dt

-- 

Dean Troyer
dtro...@gmail.com
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Sean Dague
On 08/28/2014 12:48 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote:
 
 On 08/27/2014 05:27 PM, Doug Hellmann wrote:

 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:

 Note: thread intentionally broken, this is really a different topic.

 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:

 On Wed, 27 Aug 2014, Doug Hellmann wrote:

 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.

 +many more for the relatively simple act of just writing stuff down

 Write it down.” is my theme for Kilo.

 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.

 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.

 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.

 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.

 You and a few others have developed an expertise in this important skill. I 
 am so far away from that level of expertise that I don’t know the questions 
 to ask. More often than not I start with the console log, find something 
 that looks significant, spend an hour or so tracking it down, and then have 
 someone tell me that it is a red herring and the issue is really some other 
 thing that they figured out very quickly by looking at a file I never got 
 to.

 I guess what I’m looking for is some help with the patterns. What made you 
 think to look in one log file versus another? Some of these jobs save a 
 zillion little files, which ones are actually useful? What tools are you 
 using to correlate log entries across all of those files? Are you doing it 
 by hand? Is logstash useful for that, or is that more useful for finding 
 multiple occurrences of the same issue?

 I realize there’s not a way to write a how-to that will live forever. Maybe 
 one way to deal with that is to write up the research done on bugs soon 
 after they are solved, and publish that to the mailing list. Even the 
 retrospective view is useful because we can all learn from it without 
 having to live through it. The mailing list is a fairly ephemeral medium, 
 and something very old in the archives is understood to have a good chance 
 of being out of date so we don’t have to keep adding disclaimers.

 Sure. Matt's actually working up a blog post describing the thing he
 nailed earlier in the week.
 
 Yes, I appreciate that both of you are responding to my questions. :-)
 
 I have some more specific questions/comments below. Please take all of this 
 in the spirit of trying to make this process easier by pointing out where 
 I’ve found it hard, and not just me complaining. I’d like to work on fixing 
 any of these things that can be fixed, by writing or reviewing patches for 
 early in kilo.
 

 Here is my off the cuff set of guidelines:

 #1 - is it a test failure or a setup failure

 This should be pretty easy to figure out. Test failures come at the end
 of console log and say that tests failed (after you see a bunch of
 passing tempest tests).

 Always start at *the end* of files and work backwards.
 
 That’s interesting because in my case I saw a lot of failures after the 
 initial “real” problem. So I usually read the logs like C compiler 

Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Doug Hellmann

On Aug 28, 2014, at 1:00 PM, Dean Troyer dtro...@gmail.com wrote:

 On Thu, Aug 28, 2014 at 11:48 AM, Doug Hellmann d...@doughellmann.com wrote:
 In my case, a neutron call failed. Most of the other services seem to have a 
 *-api.log file, but neutron doesn’t. It took a little while to find the 
 API-related messages in screen-q-svc.txt (I’m glad I’ve been around long 
 enough to know it used to be called “quantum”). I get that screen-n-*.txt 
 would collide with nova. Is it necessary to abbreviate those filenames at all?
 
 Cleaning up the service names has been a background conversation for some 
 time and came up again last night in IRC.  I abbreviated them in the first 
 place to try to get them all in my screen status bar, so that was a while 
 ago...
 
 I don't think the current ENABLED_SERVICES is scaling well and using full 
 names (nova-api, glance-registry, etc) will make it even harder to read.  May 
 that is misplaced concern? I do think though that making the logfile names 
 and locations more obvious in the gate results will be helpful.

I usually use the functions for editing ENABLED_SERVICES. Is it still common to 
edit the variable directly?

 
 I've started scratching out a plan to migrate to full names and will get it 
 into an Etherpad soon.  Also simplifying the log file configuration vars and 
 locations.

Cool. Let us know if we can make any changes in oslo.log to simplify that work.

Doug

 
 dt
 
 -- 
 
 Dean Troyer
 dtro...@gmail.com
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Doug Hellmann

On Aug 28, 2014, at 1:17 PM, Sean Dague s...@dague.net wrote:

 On 08/28/2014 12:48 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote:
 
 On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
 
 Note: thread intentionally broken, this is really a different topic.
 
 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
 
 On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.
 
 +many more for the relatively simple act of just writing stuff down
 
 Write it down.” is my theme for Kilo.
 
 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.
 
 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.
 
 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.
 
 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.
 
 You and a few others have developed an expertise in this important skill. 
 I am so far away from that level of expertise that I don’t know the 
 questions to ask. More often than not I start with the console log, find 
 something that looks significant, spend an hour or so tracking it down, 
 and then have someone tell me that it is a red herring and the issue is 
 really some other thing that they figured out very quickly by looking at a 
 file I never got to.
 
 I guess what I’m looking for is some help with the patterns. What made you 
 think to look in one log file versus another? Some of these jobs save a 
 zillion little files, which ones are actually useful? What tools are you 
 using to correlate log entries across all of those files? Are you doing it 
 by hand? Is logstash useful for that, or is that more useful for finding 
 multiple occurrences of the same issue?
 
 I realize there’s not a way to write a how-to that will live forever. 
 Maybe one way to deal with that is to write up the research done on bugs 
 soon after they are solved, and publish that to the mailing list. Even the 
 retrospective view is useful because we can all learn from it without 
 having to live through it. The mailing list is a fairly ephemeral medium, 
 and something very old in the archives is understood to have a good chance 
 of being out of date so we don’t have to keep adding disclaimers.
 
 Sure. Matt's actually working up a blog post describing the thing he
 nailed earlier in the week.
 
 Yes, I appreciate that both of you are responding to my questions. :-)
 
 I have some more specific questions/comments below. Please take all of this 
 in the spirit of trying to make this process easier by pointing out where 
 I’ve found it hard, and not just me complaining. I’d like to work on fixing 
 any of these things that can be fixed, by writing or reviewing patches for 
 early in kilo.
 
 
 Here is my off the cuff set of guidelines:
 
 #1 - is it a test failure or a setup failure
 
 This should be pretty easy to figure out. Test failures come at the end
 of console log and say that tests failed (after you see a bunch of
 passing tempest tests).
 
 Always start at *the end* of files and work backwards.
 
 That’s interesting because in my case I saw a lot of failures 

Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Joe Gordon
On Thu, Aug 28, 2014 at 10:17 AM, Sean Dague s...@dague.net wrote:

 On 08/28/2014 12:48 PM, Doug Hellmann wrote:
 
  On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote:
 
  On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
  On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
 
  Note: thread intentionally broken, this is really a different topic.
 
  On 08/27/2014 02:30 PM, Doug Hellmann wrote:
  On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
 
  On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
  I have found it immensely helpful, for example, to have a written
 set
  of the steps involved in creating a new library, from importing the
  git repo all the way through to making it available to other
 projects.
  Without those instructions, it would have been much harder to
 split up
  the work. The team would have had to train each other by word of
  mouth, and we would have had constant issues with inconsistent
  approaches triggering different failures. The time we spent
 building
  and verifying the instructions has paid off to the extent that we
 even
  had one developer not on the core team handle a graduation for us.
 
  +many more for the relatively simple act of just writing stuff down
 
  Write it down.” is my theme for Kilo.
 
  I definitely get the sentiment. Write it down is also hard when you
  are talking about things that do change around quite a bit. OpenStack
 as
  a whole sees 250 - 500 changes a week, so the interaction pattern
 moves
  around enough that it's really easy to have *very* stale information
  written down. Stale information is even more dangerous than no
  information some times, as it takes people down very wrong paths.
 
  I think we break down on communication when we get into a conversation
  of I want to learn gate debugging because I don't quite know what
 that
  means, or where the starting point of understanding is. So those
  intentions are well meaning, but tend to stall. The reality was there
  was no road map for those of us that dive in, it's just understanding
  how OpenStack holds together as a whole and where some of the high
 risk
  parts are. And a lot of that comes with days staring at code and logs
  until patterns emerge.
 
  Maybe if we can get smaller more targeted questions, we can help folks
  better? I'm personally a big fan of answering the targeted questions
  because then I also know that the time spent exposing that information
  was directly useful.
 
  I'm more than happy to mentor folks. But I just end up finding the I
  want to learn at the generic level something that's hard to grasp
 onto
  or figure out how we turn it into action. I'd love to hear more ideas
  from folks about ways we might do that better.
 
  You and a few others have developed an expertise in this important
 skill. I am so far away from that level of expertise that I don’t know the
 questions to ask. More often than not I start with the console log, find
 something that looks significant, spend an hour or so tracking it down, and
 then have someone tell me that it is a red herring and the issue is really
 some other thing that they figured out very quickly by looking at a file I
 never got to.
 
  I guess what I’m looking for is some help with the patterns. What made
 you think to look in one log file versus another? Some of these jobs save a
 zillion little files, which ones are actually useful? What tools are you
 using to correlate log entries across all of those files? Are you doing it
 by hand? Is logstash useful for that, or is that more useful for finding
 multiple occurrences of the same issue?
 
  I realize there’s not a way to write a how-to that will live forever.
 Maybe one way to deal with that is to write up the research done on bugs
 soon after they are solved, and publish that to the mailing list. Even the
 retrospective view is useful because we can all learn from it without
 having to live through it. The mailing list is a fairly ephemeral medium,
 and something very old in the archives is understood to have a good chance
 of being out of date so we don’t have to keep adding disclaimers.
 
  Sure. Matt's actually working up a blog post describing the thing he
  nailed earlier in the week.
 
  Yes, I appreciate that both of you are responding to my questions. :-)
 
  I have some more specific questions/comments below. Please take all of
 this in the spirit of trying to make this process easier by pointing out
 where I’ve found it hard, and not just me complaining. I’d like to work on
 fixing any of these things that can be fixed, by writing or reviewing
 patches for early in kilo.
 
 
  Here is my off the cuff set of guidelines:
 
  #1 - is it a test failure or a setup failure
 
  This should be pretty easy to figure out. Test failures come at the end
  of console log and say that tests failed (after you see a bunch of
  passing tempest tests).
 
  Always start at *the end* of files and work backwards.
 
  That’s 

Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Dean Troyer
On Thu, Aug 28, 2014 at 12:44 PM, Doug Hellmann d...@doughellmann.com
wrote:

 I usually use the functions for editing ENABLED_SERVICES. Is it still
 common to edit the variable directly?

 Not generally.  It was looking at it in log files to see what was/was not
enabled where I started to think about that.  The default is already pretty
long, however having full words might make the scan easier than x- does.

 I've started scratching out a plan to migrate to full names and will get
 it into an Etherpad soon.  Also simplifying the log file configuration vars
 and locations.

 https://etherpad.openstack.org/p/devstack-logging


 Cool. Let us know if we can make any changes in oslo.log to simplify that
 work.


I don't think olso.log is involved, this is all of the log files that
DevStack generates or captures: screen windows and the stack.sh run itself.
 There might be room to optimize if we're capturing something that is also
being logged elsewhere, but when using screen people seem to want it all in
a window (see horizon and recent keystone windows) anyway.

dt

-- 

Dean Troyer
dtro...@gmail.com
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Sean Dague
On 08/28/2014 01:48 PM, Doug Hellmann wrote:
 
 On Aug 28, 2014, at 1:17 PM, Sean Dague s...@dague.net wrote:
 
 On 08/28/2014 12:48 PM, Doug Hellmann wrote:

 On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote:

 On 08/27/2014 05:27 PM, Doug Hellmann wrote:

 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:

 Note: thread intentionally broken, this is really a different topic.

 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:

 On Wed, 27 Aug 2014, Doug Hellmann wrote:

 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.

 +many more for the relatively simple act of just writing stuff down

 Write it down.” is my theme for Kilo.

 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.

 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.

 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.

 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.

 You and a few others have developed an expertise in this important skill. 
 I am so far away from that level of expertise that I don’t know the 
 questions to ask. More often than not I start with the console log, find 
 something that looks significant, spend an hour or so tracking it down, 
 and then have someone tell me that it is a red herring and the issue is 
 really some other thing that they figured out very quickly by looking at 
 a file I never got to.

 I guess what I’m looking for is some help with the patterns. What made 
 you think to look in one log file versus another? Some of these jobs save 
 a zillion little files, which ones are actually useful? What tools are 
 you using to correlate log entries across all of those files? Are you 
 doing it by hand? Is logstash useful for that, or is that more useful for 
 finding multiple occurrences of the same issue?

 I realize there’s not a way to write a how-to that will live forever. 
 Maybe one way to deal with that is to write up the research done on bugs 
 soon after they are solved, and publish that to the mailing list. Even 
 the retrospective view is useful because we can all learn from it without 
 having to live through it. The mailing list is a fairly ephemeral medium, 
 and something very old in the archives is understood to have a good 
 chance of being out of date so we don’t have to keep adding disclaimers.

 Sure. Matt's actually working up a blog post describing the thing he
 nailed earlier in the week.

 Yes, I appreciate that both of you are responding to my questions. :-)

 I have some more specific questions/comments below. Please take all of this 
 in the spirit of trying to make this process easier by pointing out where 
 I’ve found it hard, and not just me complaining. I’d like to work on fixing 
 any of these things that can be fixed, by writing or reviewing patches for 
 early in kilo.


 Here is my off the cuff set of guidelines:

 #1 - is it a test failure or a setup failure

 This should be pretty easy to figure out. Test failures come at the end
 of console log and say that tests failed (after you see a bunch of
 passing tempest tests).

 Always start at *the end* of files and work backwards.

 That’s interesting because in my case I 

Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Sean Dague
On 08/28/2014 02:07 PM, Joe Gordon wrote:
 
 
 
 On Thu, Aug 28, 2014 at 10:17 AM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:
 
 On 08/28/2014 12:48 PM, Doug Hellmann wrote:
 
  On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:
 
  On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
  On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:
 
  Note: thread intentionally broken, this is really a different
 topic.
 
  On 08/27/2014 02:30 PM, Doug Hellmann wrote:
  On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com
 mailto:chd...@redhat.com wrote:
 
  On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
  I have found it immensely helpful, for example, to have a
 written set
  of the steps involved in creating a new library, from
 importing the
  git repo all the way through to making it available to other
 projects.
  Without those instructions, it would have been much harder
 to split up
  the work. The team would have had to train each other by word of
  mouth, and we would have had constant issues with inconsistent
  approaches triggering different failures. The time we spent
 building
  and verifying the instructions has paid off to the extent
 that we even
  had one developer not on the core team handle a graduation
 for us.
 
  +many more for the relatively simple act of just writing
 stuff down
 
  Write it down.” is my theme for Kilo.
 
  I definitely get the sentiment. Write it down is also hard
 when you
  are talking about things that do change around quite a bit.
 OpenStack as
  a whole sees 250 - 500 changes a week, so the interaction
 pattern moves
  around enough that it's really easy to have *very* stale
 information
  written down. Stale information is even more dangerous than no
  information some times, as it takes people down very wrong paths.
 
  I think we break down on communication when we get into a
 conversation
  of I want to learn gate debugging because I don't quite know
 what that
  means, or where the starting point of understanding is. So those
  intentions are well meaning, but tend to stall. The reality was
 there
  was no road map for those of us that dive in, it's just
 understanding
  how OpenStack holds together as a whole and where some of the
 high risk
  parts are. And a lot of that comes with days staring at code
 and logs
  until patterns emerge.
 
  Maybe if we can get smaller more targeted questions, we can
 help folks
  better? I'm personally a big fan of answering the targeted
 questions
  because then I also know that the time spent exposing that
 information
  was directly useful.
 
  I'm more than happy to mentor folks. But I just end up finding
 the I
  want to learn at the generic level something that's hard to
 grasp onto
  or figure out how we turn it into action. I'd love to hear more
 ideas
  from folks about ways we might do that better.
 
  You and a few others have developed an expertise in this
 important skill. I am so far away from that level of expertise that
 I don’t know the questions to ask. More often than not I start with
 the console log, find something that looks significant, spend an
 hour or so tracking it down, and then have someone tell me that it
 is a red herring and the issue is really some other thing that they
 figured out very quickly by looking at a file I never got to.
 
  I guess what I’m looking for is some help with the patterns.
 What made you think to look in one log file versus another? Some of
 these jobs save a zillion little files, which ones are actually
 useful? What tools are you using to correlate log entries across all
 of those files? Are you doing it by hand? Is logstash useful for
 that, or is that more useful for finding multiple occurrences of the
 same issue?
 
  I realize there’s not a way to write a how-to that will live
 forever. Maybe one way to deal with that is to write up the research
 done on bugs soon after they are solved, and publish that to the
 mailing list. Even the retrospective view is useful because we can
 all learn from it without having to live through it. The mailing
 list is a fairly ephemeral medium, and something very old in the
 archives is understood to have a good chance of being out of date so
 we don’t have to keep adding disclaimers.
 
  Sure. Matt's actually working up a blog post describing the thing he
  nailed earlier in the week.
 
  Yes, I appreciate that both of you are responding to my questions. :-)
 
  I have some more specific questions/comments 

Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Joe Gordon
On Thu, Aug 28, 2014 at 5:17 AM, Thierry Carrez thie...@openstack.org
wrote:

 David Kranz wrote:
  On 08/27/2014 03:43 PM, Sean Dague wrote:
  On 08/27/2014 03:33 PM, David Kranz wrote:
  Race conditions are what makes debugging very hard. I think we are in
  the process of experimenting with such an idea: asymetric gating by
  moving functional tests to projects, making them deeper and more
  extensive, and gating against their own projects. The result should be
  that when a code change is made, we will spend much more time running
  tests of code that is most likely to be growing a race bug from the
  change. Of course there is a risk that we will impair integration
  testing and we will have to be vigilant about that. One mitigating
  factor is that if cross-project interaction uses apis (official or not)
  that are well tested by the functional tests, there is less risk that a
  bug will only show up only when those apis are used by another project.
 
  So, sorry, this is really not about systemic changes (we're running
  those in parallel), but more about skills transfer in people getting
  engaged. Because we need both. I guess that's the danger of breaking the
  thread is apparently I lost part of the context.
 
  I agree we need both. I made the comment because if we can make gate
  debugging less daunting
  then less skill will be needed and I think that is key. Honestly, I am
  not sure the full skill you have can be transferred. It was gained
  partly through learning in simpler times.

 I think we could develop tools and visualizations that would help the
 debugging tasks. We could make those tasks more visible, and therefore
 more appealing to the brave souls that step up to tackle them. Sean and
 Joe did a ton of work improving the raw data, deriving graphs from it,
 highlighting log syntax or adding helpful Apache footers. But those days
 they spend so much time fixing the issues themselves, they can't
 continue on improving those tools.


Some tooling improvements I would like to do but probably don't have the
time for:

* Add the ability to filter http://status.openstack.org/elastic-recheck/ by
project. So a neutron dev can see the list of bugs that are neutron related
* Make the list of open reviews on
http://status.openstack.org/elastic-recheck/ easier to find
* Create an up to date diagram of what OpenStack looks like when running,
how services interact etc.
http://docs.openstack.org/training-guides/content/figures/5/figures/image31.jpg
 and
http://docs.openstack.org/admin-guide-cloud/content/figures/2/figures/openstack-arch-havana-logical-v1.jpg
are out of date
* Make http://jogo.github.io/gate easier to understand. This is what I
check to see the health of the gate.
* Build a request-id tracker for logs. Make it easier to find the logs for
a given request-id across multiple  services (nova-api,nova-scheduler etc.)



 And that's part of where the gate burnout comes from: spending so much
 time on the issues themselves that you can no longer work on preventing
 them from happening, or making the job of handling the issues easier, or
 documenting/mentoring other people so that they can do it in your place.

 --
 Thierry Carrez (ttx)

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Doug Hellmann

On Aug 28, 2014, at 2:16 PM, Sean Dague s...@dague.net wrote:

 On 08/28/2014 02:07 PM, Joe Gordon wrote:
 
 
 
 On Thu, Aug 28, 2014 at 10:17 AM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:
 
On 08/28/2014 12:48 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net
mailto:s...@dague.net wrote:
 
 On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net
mailto:s...@dague.net wrote:
 
 Note: thread intentionally broken, this is really a different
topic.
 
 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com
mailto:chd...@redhat.com wrote:
 
 On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
 I have found it immensely helpful, for example, to have a
written set
 of the steps involved in creating a new library, from
importing the
 git repo all the way through to making it available to other
projects.
 Without those instructions, it would have been much harder
to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent
building
 and verifying the instructions has paid off to the extent
that we even
 had one developer not on the core team handle a graduation
for us.
 
 +many more for the relatively simple act of just writing
stuff down
 
 Write it down.” is my theme for Kilo.
 
 I definitely get the sentiment. Write it down is also hard
when you
 are talking about things that do change around quite a bit.
OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction
pattern moves
 around enough that it's really easy to have *very* stale
information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.
 
 I think we break down on communication when we get into a
conversation
 of I want to learn gate debugging because I don't quite know
what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was
there
 was no road map for those of us that dive in, it's just
understanding
 how OpenStack holds together as a whole and where some of the
high risk
 parts are. And a lot of that comes with days staring at code
and logs
 until patterns emerge.
 
 Maybe if we can get smaller more targeted questions, we can
help folks
 better? I'm personally a big fan of answering the targeted
questions
 because then I also know that the time spent exposing that
information
 was directly useful.
 
 I'm more than happy to mentor folks. But I just end up finding
the I
 want to learn at the generic level something that's hard to
grasp onto
 or figure out how we turn it into action. I'd love to hear more
ideas
 from folks about ways we might do that better.
 
 You and a few others have developed an expertise in this
important skill. I am so far away from that level of expertise that
I don’t know the questions to ask. More often than not I start with
the console log, find something that looks significant, spend an
hour or so tracking it down, and then have someone tell me that it
is a red herring and the issue is really some other thing that they
figured out very quickly by looking at a file I never got to.
 
 I guess what I’m looking for is some help with the patterns.
What made you think to look in one log file versus another? Some of
these jobs save a zillion little files, which ones are actually
useful? What tools are you using to correlate log entries across all
of those files? Are you doing it by hand? Is logstash useful for
that, or is that more useful for finding multiple occurrences of the
same issue?
 
 I realize there’s not a way to write a how-to that will live
forever. Maybe one way to deal with that is to write up the research
done on bugs soon after they are solved, and publish that to the
mailing list. Even the retrospective view is useful because we can
all learn from it without having to live through it. The mailing
list is a fairly ephemeral medium, and something very old in the
archives is understood to have a good chance of being out of date so
we don’t have to keep adding disclaimers.
 
 Sure. Matt's actually working up a blog post describing the thing he
 nailed earlier in the week.
 
 Yes, I appreciate that both of you are responding to my questions. :-)
 
 I have some more specific questions/comments below. Please take
all of this in the spirit of trying to make this process easier by
pointing out where I’ve found it hard, and not just me complaining.
I’d like to work on fixing any of these things that can be fixed, by
writing or reviewing patches for early in kilo.
 
 
 Here is 

Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Doug Hellmann

On Aug 28, 2014, at 2:15 PM, Sean Dague s...@dague.net wrote:

 On 08/28/2014 01:48 PM, Doug Hellmann wrote:
 
 On Aug 28, 2014, at 1:17 PM, Sean Dague s...@dague.net wrote:
 
 On 08/28/2014 12:48 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote:
 
 On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
 
 Note: thread intentionally broken, this is really a different topic.
 
 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
 
 On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other 
 projects.
 Without those instructions, it would have been much harder to split 
 up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we 
 even
 had one developer not on the core team handle a graduation for us.
 
 +many more for the relatively simple act of just writing stuff down
 
 Write it down.” is my theme for Kilo.
 
 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.
 
 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.
 
 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.
 
 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.
 
 You and a few others have developed an expertise in this important 
 skill. I am so far away from that level of expertise that I don’t know 
 the questions to ask. More often than not I start with the console log, 
 find something that looks significant, spend an hour or so tracking it 
 down, and then have someone tell me that it is a red herring and the 
 issue is really some other thing that they figured out very quickly by 
 looking at a file I never got to.
 
 I guess what I’m looking for is some help with the patterns. What made 
 you think to look in one log file versus another? Some of these jobs 
 save a zillion little files, which ones are actually useful? What tools 
 are you using to correlate log entries across all of those files? Are 
 you doing it by hand? Is logstash useful for that, or is that more 
 useful for finding multiple occurrences of the same issue?
 
 I realize there’s not a way to write a how-to that will live forever. 
 Maybe one way to deal with that is to write up the research done on bugs 
 soon after they are solved, and publish that to the mailing list. Even 
 the retrospective view is useful because we can all learn from it 
 without having to live through it. The mailing list is a fairly 
 ephemeral medium, and something very old in the archives is understood 
 to have a good chance of being out of date so we don’t have to keep 
 adding disclaimers.
 
 Sure. Matt's actually working up a blog post describing the thing he
 nailed earlier in the week.
 
 Yes, I appreciate that both of you are responding to my questions. :-)
 
 I have some more specific questions/comments below. Please take all of 
 this in the spirit of trying to make this process easier by pointing out 
 where I’ve found it hard, and not just me complaining. I’d like to work on 
 fixing any of these things that can be fixed, by writing or reviewing 
 patches for early in kilo.
 
 
 Here is my off the cuff set of guidelines:
 
 #1 - is it a test failure or a setup failure
 
 This should be pretty easy to figure out. Test failures come at the end
 of console log and say that tests failed (after you see a bunch of
 passing tempest tests).
 
 

Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread Jeremy Stanley
On 2014-08-27 14:54:55 -0400 (-0400), Sean Dague wrote:
[...]
 I think we break down on communication when we get into a
 conversation of I want to learn gate debugging because I don't
 quite know what that means, or where the starting point of
 understanding is. So those intentions are well meaning, but tend
 to stall. The reality was there was no road map for those of us
 that dive in, it's just understanding how OpenStack holds together
 as a whole and where some of the high risk parts are. And a lot of
 that comes with days staring at code and logs until patterns
 emerge.
[...]

One way to put this in perspective, I think, is to talk about
devstack-gate integration test jobs (which are only one of a variety
of kinds of jobs we gate on, but it's possibly the most nebulous
case).

Since devstack-gate mostly just sets up an OpenStack (for a variety
of definitions thereof) and then runs some defined suite of
transformations and tests against it, a failure really is quite
often this cloud broke. You are really looking, post-mortem, at
what would in production probably be considered a catastrophic
cascade failure involving multiple moving parts, where all you have
left is (hopefully enough, sometimes not) logs of what the services
were doing when all hell broke loose. However, you're an ops team of
one trying to get to the bottom of why your environment went toes
up... and then you're a developer trying to work out what to patch
where to make it not happen again (if you're lucky).

That is gate debugging and, to support your point, is something
which can at best be only vaguely documented.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread David Kranz

On 08/27/2014 02:54 PM, Sean Dague wrote:

Note: thread intentionally broken, this is really a different topic.

On 08/27/2014 02:30 PM, Doug Hellmann wrote:

On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:


On Wed, 27 Aug 2014, Doug Hellmann wrote:


I have found it immensely helpful, for example, to have a written set
of the steps involved in creating a new library, from importing the
git repo all the way through to making it available to other projects.
Without those instructions, it would have been much harder to split up
the work. The team would have had to train each other by word of
mouth, and we would have had constant issues with inconsistent
approaches triggering different failures. The time we spent building
and verifying the instructions has paid off to the extent that we even
had one developer not on the core team handle a graduation for us.

+many more for the relatively simple act of just writing stuff down

Write it down.” is my theme for Kilo.

I definitely get the sentiment. Write it down is also hard when you
are talking about things that do change around quite a bit. OpenStack as
a whole sees 250 - 500 changes a week, so the interaction pattern moves
around enough that it's really easy to have *very* stale information
written down. Stale information is even more dangerous than no
information some times, as it takes people down very wrong paths.

I think we break down on communication when we get into a conversation
of I want to learn gate debugging because I don't quite know what that
means, or where the starting point of understanding is. So those
intentions are well meaning, but tend to stall. The reality was there
was no road map for those of us that dive in, it's just understanding
how OpenStack holds together as a whole and where some of the high risk
parts are. And a lot of that comes with days staring at code and logs
until patterns emerge.

Maybe if we can get smaller more targeted questions, we can help folks
better? I'm personally a big fan of answering the targeted questions
because then I also know that the time spent exposing that information
was directly useful.

I'm more than happy to mentor folks. But I just end up finding the I
want to learn at the generic level something that's hard to grasp onto
or figure out how we turn it into action. I'd love to hear more ideas
from folks about ways we might do that better.

-Sean

Race conditions are what makes debugging very hard. I think we are in 
the process of experimenting with such an idea: asymetric gating by 
moving functional tests to projects, making them deeper and more 
extensive, and gating against their own projects. The result should be 
that when a code change is made, we will spend much more time running 
tests of code that is most likely to be growing a race bug from the 
change. Of course there is a risk that we will impair integration 
testing and we will have to be vigilant about that. One mitigating 
factor is that if cross-project interaction uses apis (official or not) 
that are well tested by the functional tests, there is less risk that a 
bug will only show up only when those apis are used by another project.


 -David

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread Sean Dague
On 08/27/2014 03:33 PM, David Kranz wrote:
 On 08/27/2014 02:54 PM, Sean Dague wrote:
 Note: thread intentionally broken, this is really a different topic.

 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:

 On Wed, 27 Aug 2014, Doug Hellmann wrote:

 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.
 +many more for the relatively simple act of just writing stuff down
 Write it down.” is my theme for Kilo.
 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.

 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.

 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.

 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.

 -Sean

 Race conditions are what makes debugging very hard. I think we are in
 the process of experimenting with such an idea: asymetric gating by
 moving functional tests to projects, making them deeper and more
 extensive, and gating against their own projects. The result should be
 that when a code change is made, we will spend much more time running
 tests of code that is most likely to be growing a race bug from the
 change. Of course there is a risk that we will impair integration
 testing and we will have to be vigilant about that. One mitigating
 factor is that if cross-project interaction uses apis (official or not)
 that are well tested by the functional tests, there is less risk that a
 bug will only show up only when those apis are used by another project.

So, sorry, this is really not about systemic changes (we're running
those in parallel), but more about skills transfer in people getting
engaged. Because we need both. I guess that's the danger of breaking the
thread is apparently I lost part of the context.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread David Kranz

On 08/27/2014 03:43 PM, Sean Dague wrote:

On 08/27/2014 03:33 PM, David Kranz wrote:

On 08/27/2014 02:54 PM, Sean Dague wrote:

Note: thread intentionally broken, this is really a different topic.

On 08/27/2014 02:30 PM, Doug Hellmann wrote:

On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:


On Wed, 27 Aug 2014, Doug Hellmann wrote:


I have found it immensely helpful, for example, to have a written set
of the steps involved in creating a new library, from importing the
git repo all the way through to making it available to other projects.
Without those instructions, it would have been much harder to split up
the work. The team would have had to train each other by word of
mouth, and we would have had constant issues with inconsistent
approaches triggering different failures. The time we spent building
and verifying the instructions has paid off to the extent that we even
had one developer not on the core team handle a graduation for us.

+many more for the relatively simple act of just writing stuff down

Write it down.” is my theme for Kilo.

I definitely get the sentiment. Write it down is also hard when you
are talking about things that do change around quite a bit. OpenStack as
a whole sees 250 - 500 changes a week, so the interaction pattern moves
around enough that it's really easy to have *very* stale information
written down. Stale information is even more dangerous than no
information some times, as it takes people down very wrong paths.

I think we break down on communication when we get into a conversation
of I want to learn gate debugging because I don't quite know what that
means, or where the starting point of understanding is. So those
intentions are well meaning, but tend to stall. The reality was there
was no road map for those of us that dive in, it's just understanding
how OpenStack holds together as a whole and where some of the high risk
parts are. And a lot of that comes with days staring at code and logs
until patterns emerge.

Maybe if we can get smaller more targeted questions, we can help folks
better? I'm personally a big fan of answering the targeted questions
because then I also know that the time spent exposing that information
was directly useful.

I'm more than happy to mentor folks. But I just end up finding the I
want to learn at the generic level something that's hard to grasp onto
or figure out how we turn it into action. I'd love to hear more ideas
from folks about ways we might do that better.

 -Sean


Race conditions are what makes debugging very hard. I think we are in
the process of experimenting with such an idea: asymetric gating by
moving functional tests to projects, making them deeper and more
extensive, and gating against their own projects. The result should be
that when a code change is made, we will spend much more time running
tests of code that is most likely to be growing a race bug from the
change. Of course there is a risk that we will impair integration
testing and we will have to be vigilant about that. One mitigating
factor is that if cross-project interaction uses apis (official or not)
that are well tested by the functional tests, there is less risk that a
bug will only show up only when those apis are used by another project.

So, sorry, this is really not about systemic changes (we're running
those in parallel), but more about skills transfer in people getting
engaged. Because we need both. I guess that's the danger of breaking the
thread is apparently I lost part of the context.

-Sean

I agree we need both. I made the comment because if we can make gate 
debugging less daunting
then less skill will be needed and I think that is key. Honestly, I am 
not sure the full skill you have can be transferred. It was gained 
partly through

learning in simpler times.

 -David

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread Anita Kuno
On 08/27/2014 03:43 PM, Sean Dague wrote:
 On 08/27/2014 03:33 PM, David Kranz wrote:
 On 08/27/2014 02:54 PM, Sean Dague wrote:
 Note: thread intentionally broken, this is really a different topic.

 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:

 On Wed, 27 Aug 2014, Doug Hellmann wrote:

 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.
 +many more for the relatively simple act of just writing stuff down
 Write it down.” is my theme for Kilo.
 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.

 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.

 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.

 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.

 -Sean

 Race conditions are what makes debugging very hard. I think we are in
 the process of experimenting with such an idea: asymetric gating by
 moving functional tests to projects, making them deeper and more
 extensive, and gating against their own projects. The result should be
 that when a code change is made, we will spend much more time running
 tests of code that is most likely to be growing a race bug from theby 
 change. Of course there is a risk that we will impair integration
 testing and we will have to be vigilant about that. One mitigating
 factor is that if cross-project interaction uses apis (official or not)
 that are well tested by the functional tests, there is less risk that a
 bug will only show up only when those apis are used by another project.
 
 So, sorry, this is really not about systemic changes (we're running
 those in parallel), but more about skills transfer in people getting
 engaged. Because we need both. I guess that's the danger of breaking the
 thread is apparently I lost part of the context.
 
   -Sean
 
I love mentoring it is my favourite skills transfer pattern.

The optimal pattern is I agree to mentor someone and they are focused on
what I task them with, I evaluate it and we review it and not only do
they learn a skill but they have their own personal experience as a
foundation for having that skill.

Here is the part that breaks down in OpenStack - then the person I
mentor agrees to mentor someone else. Now I am mentoring one person plus
another by proxy, which is great because now in addition to technical
skills like searching and finding and offering patches and reviewing,
the person I'm mentoring learns the mentoring skills to be able to pass
on what they learn. For some reason I don't seem to make much headway (a
little but not much) in getting any traction in the second layer of
mentoring. For whatever reason, it just doesn't work and I am having to
teach everything all over from scratch one at a time to people. This is
not what I am used to and is really exhausting.

I wish I had answers but I don't. I don't know why this structure
doesn't pick up and scale out, but it doesn't.

Perhaps you might figure it out, Sean. I don't know.

Thanks,
Anita.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread Doug Hellmann

On Aug 27, 2014, at 5:27 PM, Doug Hellmann d...@doughellmann.com wrote:

 
 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
 
 Note: thread intentionally broken, this is really a different topic.
 
 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
 
 On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.
 
 +many more for the relatively simple act of just writing stuff down
 
 Write it down.” is my theme for Kilo.
 
 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.
 
 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.
 
 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.
 
 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.
 
 You and a few others have developed an expertise in this important skill. I 
 am so far away from that level of expertise that I don’t know the questions 
 to ask. More often than not I start with the console log, find something that 
 looks significant, spend an hour or so tracking it down, and then have 
 someone tell me that it is a red herring and the issue is really some other 
 thing that they figured out very quickly by looking at a file I never got to.
 
 I guess what I’m looking for is some help with the patterns. What made you 
 think to look in one log file versus another? Some of these jobs save a 
 zillion little files, which ones are actually useful? What tools are you 
 using to correlate log entries across all of those files? Are you doing it by 
 hand? Is logstash useful for that, or is that more useful for finding 
 multiple occurrences of the same issue?
 
 I realize there’s not a way to write a how-to that will live forever. Maybe 
 one way to deal with that is to write up the research done on bugs soon after 
 they are solved, and publish that to the mailing list. Even the retrospective 
 view is useful because we can all learn from it without having to live 
 through it. The mailing list is a fairly ephemeral medium, and something very 
 old in the archives is understood to have a good chance of being out of date 
 so we don’t have to keep adding disclaimers.

Matt’s blog post [1] is an example of the sort of thing I think would be 
helpful. Obviously one post isn’t going to make the reader an expert, but over 
time a few of these will impart some useful knowledge.

Doug

[1] 
http://blog.kortar.org/?p=52draftsforfriends=cTT3WsXqsH66eEt6uoi9rQaL2vGc8Vde

 
 Doug
 
 
  -Sean
 
 -- 
 Sean Dague
 http://dague.net
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread Sean Dague
On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
 
 Note: thread intentionally broken, this is really a different topic.

 On 08/27/2014 02:30 PM, Doug Hellmann wrote:
 On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:

 On Wed, 27 Aug 2014, Doug Hellmann wrote:

 I have found it immensely helpful, for example, to have a written set
 of the steps involved in creating a new library, from importing the
 git repo all the way through to making it available to other projects.
 Without those instructions, it would have been much harder to split up
 the work. The team would have had to train each other by word of
 mouth, and we would have had constant issues with inconsistent
 approaches triggering different failures. The time we spent building
 and verifying the instructions has paid off to the extent that we even
 had one developer not on the core team handle a graduation for us.

 +many more for the relatively simple act of just writing stuff down

 Write it down.” is my theme for Kilo.

 I definitely get the sentiment. Write it down is also hard when you
 are talking about things that do change around quite a bit. OpenStack as
 a whole sees 250 - 500 changes a week, so the interaction pattern moves
 around enough that it's really easy to have *very* stale information
 written down. Stale information is even more dangerous than no
 information some times, as it takes people down very wrong paths.

 I think we break down on communication when we get into a conversation
 of I want to learn gate debugging because I don't quite know what that
 means, or where the starting point of understanding is. So those
 intentions are well meaning, but tend to stall. The reality was there
 was no road map for those of us that dive in, it's just understanding
 how OpenStack holds together as a whole and where some of the high risk
 parts are. And a lot of that comes with days staring at code and logs
 until patterns emerge.

 Maybe if we can get smaller more targeted questions, we can help folks
 better? I'm personally a big fan of answering the targeted questions
 because then I also know that the time spent exposing that information
 was directly useful.

 I'm more than happy to mentor folks. But I just end up finding the I
 want to learn at the generic level something that's hard to grasp onto
 or figure out how we turn it into action. I'd love to hear more ideas
 from folks about ways we might do that better.
 
 You and a few others have developed an expertise in this important skill. I 
 am so far away from that level of expertise that I don’t know the questions 
 to ask. More often than not I start with the console log, find something that 
 looks significant, spend an hour or so tracking it down, and then have 
 someone tell me that it is a red herring and the issue is really some other 
 thing that they figured out very quickly by looking at a file I never got to.
 
 I guess what I’m looking for is some help with the patterns. What made you 
 think to look in one log file versus another? Some of these jobs save a 
 zillion little files, which ones are actually useful? What tools are you 
 using to correlate log entries across all of those files? Are you doing it by 
 hand? Is logstash useful for that, or is that more useful for finding 
 multiple occurrences of the same issue?
 
 I realize there’s not a way to write a how-to that will live forever. Maybe 
 one way to deal with that is to write up the research done on bugs soon after 
 they are solved, and publish that to the mailing list. Even the retrospective 
 view is useful because we can all learn from it without having to live 
 through it. The mailing list is a fairly ephemeral medium, and something very 
 old in the archives is understood to have a good chance of being out of date 
 so we don’t have to keep adding disclaimers.

Sure. Matt's actually working up a blog post describing the thing he
nailed earlier in the week.

Here is my off the cuff set of guidelines:

#1 - is it a test failure or a setup failure

This should be pretty easy to figure out. Test failures come at the end
of console log and say that tests failed (after you see a bunch of
passing tempest tests).

Always start at *the end* of files and work backwards.

#2 - if it's a test failure, what API call was unsuccessful.

Start with looking at the API logs for the service at the top level, and
see if there is a simple traceback at the right timestamp. If not,
figure out what that API call was calling out to, again look at the
simple cases assuming failures will create ERRORS or TRACES (though they
often don't).

Hints on the service log order you should go after are on the footer
over every log page -
http://logs.openstack.org/76/79776/15/gate/gate-tempest-dsvm-full/700ee7e/logs/
(it's included as an Apache footer) for some services. It's been there
for about 18 months, I think people are fully blind to it at this 

Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread Matthew Treinish
On Wed, Aug 27, 2014 at 05:47:09PM -0400, Doug Hellmann wrote:
 
 On Aug 27, 2014, at 5:27 PM, Doug Hellmann d...@doughellmann.com wrote:
 
  
  On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
  
  Note: thread intentionally broken, this is really a different topic.
  
  On 08/27/2014 02:30 PM, Doug Hellmann wrote:
  On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
  
  On Wed, 27 Aug 2014, Doug Hellmann wrote:
  
  I have found it immensely helpful, for example, to have a written set
  of the steps involved in creating a new library, from importing the
  git repo all the way through to making it available to other projects.
  Without those instructions, it would have been much harder to split up
  the work. The team would have had to train each other by word of
  mouth, and we would have had constant issues with inconsistent
  approaches triggering different failures. The time we spent building
  and verifying the instructions has paid off to the extent that we even
  had one developer not on the core team handle a graduation for us.
  
  +many more for the relatively simple act of just writing stuff down
  
  Write it down.” is my theme for Kilo.
  
  I definitely get the sentiment. Write it down is also hard when you
  are talking about things that do change around quite a bit. OpenStack as
  a whole sees 250 - 500 changes a week, so the interaction pattern moves
  around enough that it's really easy to have *very* stale information
  written down. Stale information is even more dangerous than no
  information some times, as it takes people down very wrong paths.
  
  I think we break down on communication when we get into a conversation
  of I want to learn gate debugging because I don't quite know what that
  means, or where the starting point of understanding is. So those
  intentions are well meaning, but tend to stall. The reality was there
  was no road map for those of us that dive in, it's just understanding
  how OpenStack holds together as a whole and where some of the high risk
  parts are. And a lot of that comes with days staring at code and logs
  until patterns emerge.
  
  Maybe if we can get smaller more targeted questions, we can help folks
  better? I'm personally a big fan of answering the targeted questions
  because then I also know that the time spent exposing that information
  was directly useful.
  
  I'm more than happy to mentor folks. But I just end up finding the I
  want to learn at the generic level something that's hard to grasp onto
  or figure out how we turn it into action. I'd love to hear more ideas
  from folks about ways we might do that better.
  
  You and a few others have developed an expertise in this important skill. I 
  am so far away from that level of expertise that I don’t know the questions 
  to ask. More often than not I start with the console log, find something 
  that looks significant, spend an hour or so tracking it down, and then have 
  someone tell me that it is a red herring and the issue is really some other 
  thing that they figured out very quickly by looking at a file I never got 
  to.
  
  I guess what I’m looking for is some help with the patterns. What made you 
  think to look in one log file versus another? Some of these jobs save a 
  zillion little files, which ones are actually useful? What tools are you 
  using to correlate log entries across all of those files? Are you doing it 
  by hand? Is logstash useful for that, or is that more useful for finding 
  multiple occurrences of the same issue?
  
  I realize there’s not a way to write a how-to that will live forever. Maybe 
  one way to deal with that is to write up the research done on bugs soon 
  after they are solved, and publish that to the mailing list. Even the 
  retrospective view is useful because we can all learn from it without 
  having to live through it. The mailing list is a fairly ephemeral medium, 
  and something very old in the archives is understood to have a good chance 
  of being out of date so we don’t have to keep adding disclaimers.
 
 Matt’s blog post [1] is an example of the sort of thing I think would be 
 helpful. Obviously one post isn’t going to make the reader an expert, but 
 over time a few of these will impart some useful knowledge.
 
 Doug
 
 [1] 
 http://blog.kortar.org/?p=52draftsforfriends=cTT3WsXqsH66eEt6uoi9rQaL2vGc8Vde

So that was just an expiring link (which shouldn't be valid anymore) to the
draft which I generated to get some initial feedback before I posted it. The
permanent link to the post is here:

http://blog.kortar.org/?p=52


-Matt Treinish


pgpUDCJpsLewe.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-27 Thread Salvatore Orlando
As it has been pointed out previously in this thread debugging gate
failures is mostly about chasing race conditions, which in some cases
involve the most disparate interactions between Openstack services [1].

Finding the root cause of these races is a mix of knowledge, pragmatism,
and luck. Having more people looking at gate failures can only be a good
thing.
While little can be done to transfer luck, good things can be written
regarding pragmatism and knowledge.

Knowledge is about knowing the tools, the infrastructure, and ultimately
the dynamics of the stuff that's being tested. This involves understanding
the zuul layout, devstack-gate, tempest, and most importantly logstash (in
my opinion). Unfortunately it's difficult to do this without being
sufficiently expert of the matter being tested.
For instance debugging a SSH failure with neutron involves knowledge of the
internals of neutron's l3 agent, ovs agent, metadata agent, the
nova/neutron interface, the nova/neutron notification system, nova's
network info instance cache and so on.

Pragmatism is about writing down and sharing the process followed for
triaging gate failures, especially when it comes to analysing openstack's
logs. Different people might be using different processes, and sharing them
can only be good.

To this aim the Neutron community has tried to put these things in writing
in this unfinished effort [2]. Hopefully there could be a wiki page (or a
set of pages) like this not limited to neutron only but to the whole set of
projects tested in the integrated gate.

This effort can also constitute a basis for improving the process. As an
example, event correlation in logs, ability of validating hypothesis by
correlating traces for failures manifestation with potential traces for
root causes are two areas with room for improvement.

Salvatore

[1] https://bugs.launchpad.net/neutron/+bug/1273386
[2] https://wiki.openstack.org/wiki/NeutronGateFailureTriage


On 28 August 2014 00:32, Matthew Treinish mtrein...@kortar.org wrote:

 On Wed, Aug 27, 2014 at 05:47:09PM -0400, Doug Hellmann wrote:
 
  On Aug 27, 2014, at 5:27 PM, Doug Hellmann d...@doughellmann.com
 wrote:
 
  
   On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
  
   Note: thread intentionally broken, this is really a different topic.
  
   On 08/27/2014 02:30 PM, Doug Hellmann wrote:
   On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
  
   On Wed, 27 Aug 2014, Doug Hellmann wrote:
  
   I have found it immensely helpful, for example, to have a written
 set
   of the steps involved in creating a new library, from importing the
   git repo all the way through to making it available to other
 projects.
   Without those instructions, it would have been much harder to
 split up
   the work. The team would have had to train each other by word of
   mouth, and we would have had constant issues with inconsistent
   approaches triggering different failures. The time we spent
 building
   and verifying the instructions has paid off to the extent that we
 even
   had one developer not on the core team handle a graduation for us.
  
   +many more for the relatively simple act of just writing stuff down
  
   Write it down.” is my theme for Kilo.
  
   I definitely get the sentiment. Write it down is also hard when you
   are talking about things that do change around quite a bit. OpenStack
 as
   a whole sees 250 - 500 changes a week, so the interaction pattern
 moves
   around enough that it's really easy to have *very* stale information
   written down. Stale information is even more dangerous than no
   information some times, as it takes people down very wrong paths.
  
   I think we break down on communication when we get into a conversation
   of I want to learn gate debugging because I don't quite know what
 that
   means, or where the starting point of understanding is. So those
   intentions are well meaning, but tend to stall. The reality was there
   was no road map for those of us that dive in, it's just understanding
   how OpenStack holds together as a whole and where some of the high
 risk
   parts are. And a lot of that comes with days staring at code and logs
   until patterns emerge.
  
   Maybe if we can get smaller more targeted questions, we can help folks
   better? I'm personally a big fan of answering the targeted questions
   because then I also know that the time spent exposing that information
   was directly useful.
  
   I'm more than happy to mentor folks. But I just end up finding the I
   want to learn at the generic level something that's hard to grasp
 onto
   or figure out how we turn it into action. I'd love to hear more ideas
   from folks about ways we might do that better.
  
   You and a few others have developed an expertise in this important
 skill. I am so far away from that level of expertise that I don’t know the
 questions to ask. More often than not I start with the console log, find
 something that looks