Re: [vdsm] flowID schema

2012-02-14 Thread Dan Kenigsberg
On Mon, Feb 13, 2012 at 12:06:46PM -0500, Ayal Baron wrote:
 
 
 - Original Message -
  On 02/13/2012 02:28 PM, Ayal Baron wrote:
  
  ...
   is that it (ab)uses an http header for carrying FlowID,
   Yes, it certainly does appear to overload it.  I would be nice to
   have
   something formal given to it by engine, but I can appreciate the
   difficulty implementing such a scheme.
  
   Technically I disagree, this is a cross cutting concern which has
   nothing to do with any specific call hence it should be passed as
   a header, that is actually rather elegant.
  
   To the specific matter at hand though. what would really be nice is
   solving the real problem properly, and not contaminating the API
   and the log with things which have marginal benefit if at all.
  
  going back to the 'grep' issue.
  vdsm logs are verbose. they are multi-threaded as well.
  I think this should be more than just about finding the entry point
  of
  the flow, then identifying for this specific log format how to trace
  it,
  which would require writing a log analyzer with plugins for each
  component.
  having all lines which are relevant to a flow with a flowid logged in
  them would make it much easier to get all (or most) of relevant parts
  of
  the flow (most, since something orthogonal to the flow may have
  happened
  affecting it, like loss of network)
  
 I'm sorry but what you're proposing is to make the log even more difficult to 
 read for absolutely NO reason.
 I haven't seen 1 good reason to add more to the log.
 What we should be focusing on is:
 1. adding the relevant data that is needed to the engine log so that most of 
 the time users wouldn't need to go the host
 2. reducing the verbosity of the vdsm log and increasing readability (the 
 flow ID does exactly the opposite).
 
 As opposed to most people here who are thinking that this sounds like a good 
 idea, I actually have debugged at least dozens of issues in engine and vdsm 
 and can assure you that not once would this have been beneficial to me.

When I debug an Engine-related issue, I tend to find a silly API call in
Vdsm. Then I have to start correlating this to Engine logs. This step
can be made quicker and less error-prone by logging FlowID both Engine
and Vdsm. To me, this is the 1 good reason for logging FlowID on API
entry in Vdsm.

However, I find adding FlowID to each and every log line a bit
excessive. Our log is too cluttered as it is. Logging FlowID whenever a
new thread is spawned makes more sense to me.

 What was mostly missing in the engine logs was understanding what thread in 
 engine called what operation in vdsm and what vdsm's response was.  In 3.0 my 
 understanding is that engine fixed this so this entire feature will be 
 counter productive (will make logs less readable and harder to decipher, adds 
 complexity to the API and adds complexity to users of the rest API).
 All cross hosts issues stem from *different* flows, so this would not help in 
 this case and single host issues are easily traceable today (and you *never* 
 need to follow an entire flow, it's entirely redundant and inefficient).
 
 I'm more than willing to show this on any set of logs by the way and would be 
 happy to be proven wrong.
 
 More often than not by the way, the issue is that inside a specific call 
 (i.e. 1 verb, not a flow) people are not proficient in finding the offending 
 line (which is why I wrote the 'how to read the vdsm log' wiki).
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] vdsm hangs in SamplingMethod after reinstall

2012-02-14 Thread Adam Litke
On Sun, Feb 12, 2012 at 06:46:25PM -0500, Ayal Baron wrote:
 
 
 - Original Message -
  On Thu, Feb 09, 2012 at 07:15:48PM -0500, Ayal Baron wrote:
   
   
   - Original Message -
Hi.  I am running into a very annoying problem when working on
vdsm
lately.  My
development process involves stopping vdsm, replacing files, and
restarting it.
I do this pretty frequently.  Sometimes, after restarting vdsm
the
XMLRPC call
getStorageDomainsList() hangs.  The following line is the last to
 
 Can you post the exact flow you're running?

Still working on this.  It isn't reproducing reliably -- only when I really need
to get some work done :)

 
print in the
log:

Thread-18::DEBUG::2012-02-09
17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying to
enter
sampling method (storage.sdc.refreshStorage)

The only solution I've been able to come up with is restarting my
machine.  When
stopping vdsm I search for any stale threads but I am unable to
find
them.  Do
you know what else might be causing DynamicBarrier.enter() to
hang
for a long
period of time?  Do the threading primitives use some sort of
temporary disk
storage that needs to be cleaned up?  Thanks for the help!
   
   Try to add some logging in sdc.py:
   def refreshStorage(self):
ADD LOG HERE
  
  Yep have done this and I am not even getting into the refreshStorage
  function.
  We actually hang in DynamicBarrier.enter().  I am going to add some
  debugging to
  determine which locking operation gets stuck.
 
 On the face of it it sounds like a python bug.
 Is supervdsm running? did you try killing it as well?
 Are you sure there is no 'Got in to sampling method' line in the log?
 Have you tried adding logging in 'enter' to see at what stage exactly you get 
 stuck?
 
 (side note - code should probably be updated with 'with' as it was originally 
 written for use with python 2.4)
 
 
  
   multipath.rescan()
   
   I have a feeling that your issue is not with SamplingMethod
   

--
Adam Litke a...@us.ibm.com
IBM Linux Technology Center

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

   
  
  --
  Adam Litke a...@us.ibm.com
  IBM Linux Technology Center
  
  
 

-- 
Adam Litke a...@us.ibm.com
IBM Linux Technology Center

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] vdsm hangs in SamplingMethod after reinstall

2012-02-14 Thread Ayal Baron


- Original Message -
 On Sun, Feb 12, 2012 at 06:46:25PM -0500, Ayal Baron wrote:
  
  
  - Original Message -
   On Thu, Feb 09, 2012 at 07:15:48PM -0500, Ayal Baron wrote:


- Original Message -
 Hi.  I am running into a very annoying problem when working
 on
 vdsm
 lately.  My
 development process involves stopping vdsm, replacing files,
 and
 restarting it.
 I do this pretty frequently.  Sometimes, after restarting
 vdsm
 the
 XMLRPC call
 getStorageDomainsList() hangs.  The following line is the
 last to
  
  Can you post the exact flow you're running?
 
 Still working on this.  It isn't reproducing reliably -- only when I
 really need
 to get some work done :)

So try finalizing those MOM patches and you should see this in no time ;)

 
  
 print in the
 log:
 
 Thread-18::DEBUG::2012-02-09
 17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying
 to
 enter
 sampling method (storage.sdc.refreshStorage)
 
 The only solution I've been able to come up with is
 restarting my
 machine.  When
 stopping vdsm I search for any stale threads but I am unable
 to
 find
 them.  Do
 you know what else might be causing DynamicBarrier.enter() to
 hang
 for a long
 period of time?  Do the threading primitives use some sort of
 temporary disk
 storage that needs to be cleaned up?  Thanks for the help!

Try to add some logging in sdc.py:
def refreshStorage(self):
 ADD LOG HERE
   
   Yep have done this and I am not even getting into the
   refreshStorage
   function.
   We actually hang in DynamicBarrier.enter().  I am going to add
   some
   debugging to
   determine which locking operation gets stuck.
  
  On the face of it it sounds like a python bug.
  Is supervdsm running? did you try killing it as well?
  Are you sure there is no 'Got in to sampling method' line in the
  log?
  Have you tried adding logging in 'enter' to see at what stage
  exactly you get stuck?
  
  (side note - code should probably be updated with 'with' as it was
  originally written for use with python 2.4)
  
  
   
multipath.rescan()

I have a feeling that your issue is not with SamplingMethod

 
 --
 Adam Litke a...@us.ibm.com
 IBM Linux Technology Center
 
 ___
 vdsm-devel mailing list
 vdsm-devel@lists.fedorahosted.org
 https://fedorahosted.org/mailman/listinfo/vdsm-devel
 

   
   --
   Adam Litke a...@us.ibm.com
   IBM Linux Technology Center
   
   
  
 
 --
 Adam Litke a...@us.ibm.com
 IBM Linux Technology Center
 
 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel