Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-17 Thread NeilBrown
On Mon, Apr 16 2018, James Simmons wrote:

>> James,
>> 
>> If I understand correctly, you're saying you want to be able to build 
>> without debug support...?  I'm not convinced that building a client without 
>> debug support is interesting or useful.  In fact, I think it would be 
>> harmful, and we shouldn't open up the possibility - this is switchable debug 
>> with very low overhead when not actually "on".  It would be really awful to 
>> get a problem on a running system and discover there's no debug support - 
>> that you can't even enable debug without a reinstall.
>> 
>> If I've understood you correctly, then I would want to see proof of a 
>> significant performance cost when debug is built but *off* before agreeing 
>> to even exposing this option.  (I know it's a choice they'd have to make, 
>> but if it's not really useful with a side order of potentially harmful, we 
>> shouldn't even give people the choice.)
>
> I'm not saying add the option today but this is more for the long game.
> While the Intel lustre developers deeply love lustre's debugging 
> infrastructure I see a future where something better will come along to
> replace it. When that day comes we will have a period where both
> debugging infrastructurs will exist and some deployers of lustre will
> want to turn off the old debugging infrastructure and just use the new.
> That is what I have in mind. A switch to flip between options.

My position on this is that lustre's debugging infrastructure (in
mainline) *will* be changed to use something that the rest of the kernel
can and does use.  Quite possibly that "something" will first be
enhanced so that it is as powerful and useful as what lustre has.
I suspect this will partly be pr_debug(), partly WARN_ON(), partly trace
points.  But I'm not very familiar with tracepoints or with lustre
debugging yet so this is far from certain.
pr_debug() and tracepoints can be compiled out, but only kernel-wide.
There is no reason for lustre to be special there.  WARN_ON() and
BUG_ON() cannot be compiled out, but BUG_ON() must only be used when
proceeding is unarguably worse than crashing the machine.  In recent
years a lot of BUG_ON()s have been removed or changed to warnings.  We
need to maintain that attitude.

I don't like the idea of have two parallel debuging infrastructures that
you can choose between - it encourages confusion and brings no benefits.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-17 Thread NeilBrown
On Mon, Apr 16 2018, James Simmons wrote:

>> James,
>> 
>> If I understand correctly, you're saying you want to be able to build 
>> without debug support...?  I'm not convinced that building a client without 
>> debug support is interesting or useful.  In fact, I think it would be 
>> harmful, and we shouldn't open up the possibility - this is switchable debug 
>> with very low overhead when not actually "on".  It would be really awful to 
>> get a problem on a running system and discover there's no debug support - 
>> that you can't even enable debug without a reinstall.
>> 
>> If I've understood you correctly, then I would want to see proof of a 
>> significant performance cost when debug is built but *off* before agreeing 
>> to even exposing this option.  (I know it's a choice they'd have to make, 
>> but if it's not really useful with a side order of potentially harmful, we 
>> shouldn't even give people the choice.)
>
> I'm not saying add the option today but this is more for the long game.
> While the Intel lustre developers deeply love lustre's debugging 
> infrastructure I see a future where something better will come along to
> replace it. When that day comes we will have a period where both
> debugging infrastructurs will exist and some deployers of lustre will
> want to turn off the old debugging infrastructure and just use the new.
> That is what I have in mind. A switch to flip between options.

My position on this is that lustre's debugging infrastructure (in
mainline) *will* be changed to use something that the rest of the kernel
can and does use.  Quite possibly that "something" will first be
enhanced so that it is as powerful and useful as what lustre has.
I suspect this will partly be pr_debug(), partly WARN_ON(), partly trace
points.  But I'm not very familiar with tracepoints or with lustre
debugging yet so this is far from certain.
pr_debug() and tracepoints can be compiled out, but only kernel-wide.
There is no reason for lustre to be special there.  WARN_ON() and
BUG_ON() cannot be compiled out, but BUG_ON() must only be used when
proceeding is unarguably worse than crashing the machine.  In recent
years a lot of BUG_ON()s have been removed or changed to warnings.  We
need to maintain that attitude.

I don't like the idea of have two parallel debuging infrastructures that
you can choose between - it encourages confusion and brings no benefits.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Dilger, Andreas
On Apr 16, 2018, at 16:48, Doug Oucharek  wrote:
> 
>> 
>> On Apr 16, 2018, at 3:42 PM, James Simmons  wrote:
>> 
>> 
>>> James,
>>> 
>>> If I understand correctly, you're saying you want to be able to build 
>>> without debug support...?  I'm not convinced that building a client without 
>>> debug support is interesting or useful.  In fact, I think it would be 
>>> harmful, and we shouldn't open up the possibility - this is switchable 
>>> debug with very low overhead when not actually "on".  It would be really 
>>> awful to get a problem on a running system and discover there's no debug 
>>> support - that you can't even enable debug without a reinstall.
>>> 
>>> If I've understood you correctly, then I would want to see proof of a 
>>> significant performance cost when debug is built but *off* before agreeing 
>>> to even exposing this option.  (I know it's a choice they'd have to make, 
>>> but if it's not really useful with a side order of potentially harmful, we 
>>> shouldn't even give people the choice.)
>> 
>> I'm not saying add the option today but this is more for the long game.
>> While the Intel lustre developers deeply love lustre's debugging 
>> infrastructure I see a future where something better will come along to
>> replace it. When that day comes we will have a period where both
>> debugging infrastructurs will exist and some deployers of lustre will
>> want to turn off the old debugging infrastructure and just use the new.
>> That is what I have in mind. A switch to flip between options.
> 
> Yes please!!  An option for users which says “no, you do not have the right 
> to panic my system via LASSERT whenever you like” would be a blessing.

Note that LASSERT() itself does not panic the system, unless you configure it
with panic_on_lbug=1.  Otherwise, it just blocks that thread (though this can
also have an impact on other threads if you are holding locks at that time).

That said, the LASSERT() should not be hit unless there is bad code, data
corruption, or the LASSERT() itself is incorrect (essentially bad code also).

So "whenever you like" is "whenever the system is about to corrupt your data",
and people are not very forgiving if a filesystem corrupts their data...

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Dilger, Andreas
On Apr 16, 2018, at 16:48, Doug Oucharek  wrote:
> 
>> 
>> On Apr 16, 2018, at 3:42 PM, James Simmons  wrote:
>> 
>> 
>>> James,
>>> 
>>> If I understand correctly, you're saying you want to be able to build 
>>> without debug support...?  I'm not convinced that building a client without 
>>> debug support is interesting or useful.  In fact, I think it would be 
>>> harmful, and we shouldn't open up the possibility - this is switchable 
>>> debug with very low overhead when not actually "on".  It would be really 
>>> awful to get a problem on a running system and discover there's no debug 
>>> support - that you can't even enable debug without a reinstall.
>>> 
>>> If I've understood you correctly, then I would want to see proof of a 
>>> significant performance cost when debug is built but *off* before agreeing 
>>> to even exposing this option.  (I know it's a choice they'd have to make, 
>>> but if it's not really useful with a side order of potentially harmful, we 
>>> shouldn't even give people the choice.)
>> 
>> I'm not saying add the option today but this is more for the long game.
>> While the Intel lustre developers deeply love lustre's debugging 
>> infrastructure I see a future where something better will come along to
>> replace it. When that day comes we will have a period where both
>> debugging infrastructurs will exist and some deployers of lustre will
>> want to turn off the old debugging infrastructure and just use the new.
>> That is what I have in mind. A switch to flip between options.
> 
> Yes please!!  An option for users which says “no, you do not have the right 
> to panic my system via LASSERT whenever you like” would be a blessing.

Note that LASSERT() itself does not panic the system, unless you configure it
with panic_on_lbug=1.  Otherwise, it just blocks that thread (though this can
also have an impact on other threads if you are holding locks at that time).

That said, the LASSERT() should not be hit unless there is bad code, data
corruption, or the LASSERT() itself is incorrect (essentially bad code also).

So "whenever you like" is "whenever the system is about to corrupt your data",
and people are not very forgiving if a filesystem corrupts their data...

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Doug Oucharek

> On Apr 16, 2018, at 3:42 PM, James Simmons  wrote:
> 
> 
>> James,
>> 
>> If I understand correctly, you're saying you want to be able to build 
>> without debug support...?  I'm not convinced that building a client without 
>> debug support is interesting or useful.  In fact, I think it would be 
>> harmful, and we shouldn't open up the possibility - this is switchable debug 
>> with very low overhead when not actually "on".  It would be really awful to 
>> get a problem on a running system and discover there's no debug support - 
>> that you can't even enable debug without a reinstall.
>> 
>> If I've understood you correctly, then I would want to see proof of a 
>> significant performance cost when debug is built but *off* before agreeing 
>> to even exposing this option.  (I know it's a choice they'd have to make, 
>> but if it's not really useful with a side order of potentially harmful, we 
>> shouldn't even give people the choice.)
> 
> I'm not saying add the option today but this is more for the long game.
> While the Intel lustre developers deeply love lustre's debugging 
> infrastructure I see a future where something better will come along to
> replace it. When that day comes we will have a period where both
> debugging infrastructurs will exist and some deployers of lustre will
> want to turn off the old debugging infrastructure and just use the new.
> That is what I have in mind. A switch to flip between options.

Yes please!!  An option for users which says “no, you do not have the right to 
panic my system via LASSERT whenever you like” would be a blessing.

Doug

> 
>> - Patrick
>> 
>> On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" 
>>  
>> wrote:
>> 
>> 
>>> CDEBUG_STACK() and CHECK_STACK() are macros to help with
>>> debugging, so move them from
>>>   drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>>> to
>>>   drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> 
>>> This seems a more fitting location, and is a step towards
>>> removing linux/libcfs.h and simplifying the include file structure.
>> 
>>Nak. Currently the lustre client always enables debugging but that
>>shouldn't be the case. What we do need is the able to turn off the 
>>crazy debugging stuff. In the development branch of lustre it is
>>done with CDEBUG_ENABLED. We need something like that in Kconfig
>>much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
>>to be able to turn that off this should be moved to just after
>>LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
>>it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
>>would be empty.
>> 
>>> Signed-off-by: NeilBrown 
>>> ---
>>> .../lustre/include/linux/libcfs/libcfs_debug.h |   32 
>>> 
>>> .../lustre/include/linux/libcfs/linux/libcfs.h |   31 
>>> ---
>>> 2 files changed, 32 insertions(+), 31 deletions(-)
>>> 
>>> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h 
>>> b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> index 9290a19429e7..0dc7b91efe7c 100644
>>> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, 
>>> int is_subsys);
>>> extern unsigned int libcfs_catastrophe;
>>> extern unsigned int libcfs_panic_on_lbug;
>>> 
>>> +/* Enable debug-checks on stack size - except on x86_64 */
>>> +#if !defined(__x86_64__)
>>> +# ifdef __ia64__
>>> +#  define CDEBUG_STACK() (THREAD_SIZE - \
>>> + ((unsigned long)__builtin_dwarf_cfa() &   \
>>> +  (THREAD_SIZE - 1)))
>>> +# else
>>> +#  define CDEBUG_STACK() (THREAD_SIZE - \
>>> + ((unsigned long)__builtin_frame_address(0) &  \
>>> +  (THREAD_SIZE - 1)))
>>> +# endif /* __ia64__ */
>>> +
>>> +#define __CHECK_STACK(msgdata, mask, cdls)   \
>>> +do {   \
>>> +   if (unlikely(CDEBUG_STACK() > libcfs_stack)) {\
>>> +   LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL);   \
>>> +   libcfs_stack = CDEBUG_STACK();\
>>> +   libcfs_debug_msg(msgdata,  \
>>> +"maximum lustre stack %lu\n",\
>>> +CDEBUG_STACK());  \
>>> +   (msgdata)->msg_mask = mask;  \
>>> +   (msgdata)->msg_cdls = cdls;  \
>>> +   dump_stack();  \
>>> + /*panic("LBUG");*/  

Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Doug Oucharek

> On Apr 16, 2018, at 3:42 PM, James Simmons  wrote:
> 
> 
>> James,
>> 
>> If I understand correctly, you're saying you want to be able to build 
>> without debug support...?  I'm not convinced that building a client without 
>> debug support is interesting or useful.  In fact, I think it would be 
>> harmful, and we shouldn't open up the possibility - this is switchable debug 
>> with very low overhead when not actually "on".  It would be really awful to 
>> get a problem on a running system and discover there's no debug support - 
>> that you can't even enable debug without a reinstall.
>> 
>> If I've understood you correctly, then I would want to see proof of a 
>> significant performance cost when debug is built but *off* before agreeing 
>> to even exposing this option.  (I know it's a choice they'd have to make, 
>> but if it's not really useful with a side order of potentially harmful, we 
>> shouldn't even give people the choice.)
> 
> I'm not saying add the option today but this is more for the long game.
> While the Intel lustre developers deeply love lustre's debugging 
> infrastructure I see a future where something better will come along to
> replace it. When that day comes we will have a period where both
> debugging infrastructurs will exist and some deployers of lustre will
> want to turn off the old debugging infrastructure and just use the new.
> That is what I have in mind. A switch to flip between options.

Yes please!!  An option for users which says “no, you do not have the right to 
panic my system via LASSERT whenever you like” would be a blessing.

Doug

> 
>> - Patrick
>> 
>> On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" 
>>  
>> wrote:
>> 
>> 
>>> CDEBUG_STACK() and CHECK_STACK() are macros to help with
>>> debugging, so move them from
>>>   drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>>> to
>>>   drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> 
>>> This seems a more fitting location, and is a step towards
>>> removing linux/libcfs.h and simplifying the include file structure.
>> 
>>Nak. Currently the lustre client always enables debugging but that
>>shouldn't be the case. What we do need is the able to turn off the 
>>crazy debugging stuff. In the development branch of lustre it is
>>done with CDEBUG_ENABLED. We need something like that in Kconfig
>>much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
>>to be able to turn that off this should be moved to just after
>>LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
>>it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
>>would be empty.
>> 
>>> Signed-off-by: NeilBrown 
>>> ---
>>> .../lustre/include/linux/libcfs/libcfs_debug.h |   32 
>>> 
>>> .../lustre/include/linux/libcfs/linux/libcfs.h |   31 
>>> ---
>>> 2 files changed, 32 insertions(+), 31 deletions(-)
>>> 
>>> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h 
>>> b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> index 9290a19429e7..0dc7b91efe7c 100644
>>> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>>> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, 
>>> int is_subsys);
>>> extern unsigned int libcfs_catastrophe;
>>> extern unsigned int libcfs_panic_on_lbug;
>>> 
>>> +/* Enable debug-checks on stack size - except on x86_64 */
>>> +#if !defined(__x86_64__)
>>> +# ifdef __ia64__
>>> +#  define CDEBUG_STACK() (THREAD_SIZE - \
>>> + ((unsigned long)__builtin_dwarf_cfa() &   \
>>> +  (THREAD_SIZE - 1)))
>>> +# else
>>> +#  define CDEBUG_STACK() (THREAD_SIZE - \
>>> + ((unsigned long)__builtin_frame_address(0) &  \
>>> +  (THREAD_SIZE - 1)))
>>> +# endif /* __ia64__ */
>>> +
>>> +#define __CHECK_STACK(msgdata, mask, cdls)   \
>>> +do {   \
>>> +   if (unlikely(CDEBUG_STACK() > libcfs_stack)) {\
>>> +   LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL);   \
>>> +   libcfs_stack = CDEBUG_STACK();\
>>> +   libcfs_debug_msg(msgdata,  \
>>> +"maximum lustre stack %lu\n",\
>>> +CDEBUG_STACK());  \
>>> +   (msgdata)->msg_mask = mask;  \
>>> +   (msgdata)->msg_cdls = cdls;  \
>>> +   dump_stack();  \
>>> + /*panic("LBUG");*/\
>>> +   }  \
>>> +} while (0)
>>> 

Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread James Simmons

> James,
> 
> If I understand correctly, you're saying you want to be able to build without 
> debug support...?  I'm not convinced that building a client without debug 
> support is interesting or useful.  In fact, I think it would be harmful, and 
> we shouldn't open up the possibility - this is switchable debug with very low 
> overhead when not actually "on".  It would be really awful to get a problem 
> on a running system and discover there's no debug support - that you can't 
> even enable debug without a reinstall.
> 
> If I've understood you correctly, then I would want to see proof of a 
> significant performance cost when debug is built but *off* before agreeing to 
> even exposing this option.  (I know it's a choice they'd have to make, but if 
> it's not really useful with a side order of potentially harmful, we shouldn't 
> even give people the choice.)

I'm not saying add the option today but this is more for the long game.
While the Intel lustre developers deeply love lustre's debugging 
infrastructure I see a future where something better will come along to
replace it. When that day comes we will have a period where both
debugging infrastructurs will exist and some deployers of lustre will
want to turn off the old debugging infrastructure and just use the new.
That is what I have in mind. A switch to flip between options.

> - Patrick
> 
> On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" 
>  
> wrote:
> 
> 
> > CDEBUG_STACK() and CHECK_STACK() are macros to help with
> > debugging, so move them from
> >drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> > to
> >drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > 
> > This seems a more fitting location, and is a step towards
> > removing linux/libcfs.h and simplifying the include file structure.
> 
> Nak. Currently the lustre client always enables debugging but that
> shouldn't be the case. What we do need is the able to turn off the 
> crazy debugging stuff. In the development branch of lustre it is
> done with CDEBUG_ENABLED. We need something like that in Kconfig
> much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
> to be able to turn that off this should be moved to just after
> LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
> it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
> would be empty.
>  
> > Signed-off-by: NeilBrown 
> > ---
> >  .../lustre/include/linux/libcfs/libcfs_debug.h |   32 
> 
> >  .../lustre/include/linux/libcfs/linux/libcfs.h |   31 
> ---
> >  2 files changed, 32 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > index 9290a19429e7..0dc7b91efe7c 100644
> > --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char 
> *str, int is_subsys);
> >  extern unsigned int libcfs_catastrophe;
> >  extern unsigned int libcfs_panic_on_lbug;
> >  
> > +/* Enable debug-checks on stack size - except on x86_64 */
> > +#if !defined(__x86_64__)
> > +# ifdef __ia64__
> > +#  define CDEBUG_STACK() (THREAD_SIZE -
>  \
> > + ((unsigned long)__builtin_dwarf_cfa() &   
> \
> > +  (THREAD_SIZE - 1)))
> > +# else
> > +#  define CDEBUG_STACK() (THREAD_SIZE -
>  \
> > + ((unsigned long)__builtin_frame_address(0) &  
> \
> > +  (THREAD_SIZE - 1)))
> > +# endif /* __ia64__ */
> > +
> > +#define __CHECK_STACK(msgdata, mask, cdls)   \
> > +do {   \
> > +   if (unlikely(CDEBUG_STACK() > libcfs_stack)) {\
> > +   LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL);   
> \
> > +   libcfs_stack = CDEBUG_STACK();\
> > +   libcfs_debug_msg(msgdata,  \
> > +"maximum lustre stack %lu\n",\
> > +CDEBUG_STACK());  \
> > +   (msgdata)->msg_mask = mask;  \
> > +   (msgdata)->msg_cdls = cdls;  \
> > +   dump_stack();  \
> > + /*panic("LBUG");*/   

Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread James Simmons

> James,
> 
> If I understand correctly, you're saying you want to be able to build without 
> debug support...?  I'm not convinced that building a client without debug 
> support is interesting or useful.  In fact, I think it would be harmful, and 
> we shouldn't open up the possibility - this is switchable debug with very low 
> overhead when not actually "on".  It would be really awful to get a problem 
> on a running system and discover there's no debug support - that you can't 
> even enable debug without a reinstall.
> 
> If I've understood you correctly, then I would want to see proof of a 
> significant performance cost when debug is built but *off* before agreeing to 
> even exposing this option.  (I know it's a choice they'd have to make, but if 
> it's not really useful with a side order of potentially harmful, we shouldn't 
> even give people the choice.)

I'm not saying add the option today but this is more for the long game.
While the Intel lustre developers deeply love lustre's debugging 
infrastructure I see a future where something better will come along to
replace it. When that day comes we will have a period where both
debugging infrastructurs will exist and some deployers of lustre will
want to turn off the old debugging infrastructure and just use the new.
That is what I have in mind. A switch to flip between options.

> - Patrick
> 
> On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" 
>  
> wrote:
> 
> 
> > CDEBUG_STACK() and CHECK_STACK() are macros to help with
> > debugging, so move them from
> >drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> > to
> >drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > 
> > This seems a more fitting location, and is a step towards
> > removing linux/libcfs.h and simplifying the include file structure.
> 
> Nak. Currently the lustre client always enables debugging but that
> shouldn't be the case. What we do need is the able to turn off the 
> crazy debugging stuff. In the development branch of lustre it is
> done with CDEBUG_ENABLED. We need something like that in Kconfig
> much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
> to be able to turn that off this should be moved to just after
> LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
> it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
> would be empty.
>  
> > Signed-off-by: NeilBrown 
> > ---
> >  .../lustre/include/linux/libcfs/libcfs_debug.h |   32 
> 
> >  .../lustre/include/linux/libcfs/linux/libcfs.h |   31 
> ---
> >  2 files changed, 32 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > index 9290a19429e7..0dc7b91efe7c 100644
> > --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> > @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char 
> *str, int is_subsys);
> >  extern unsigned int libcfs_catastrophe;
> >  extern unsigned int libcfs_panic_on_lbug;
> >  
> > +/* Enable debug-checks on stack size - except on x86_64 */
> > +#if !defined(__x86_64__)
> > +# ifdef __ia64__
> > +#  define CDEBUG_STACK() (THREAD_SIZE -
>  \
> > + ((unsigned long)__builtin_dwarf_cfa() &   
> \
> > +  (THREAD_SIZE - 1)))
> > +# else
> > +#  define CDEBUG_STACK() (THREAD_SIZE -
>  \
> > + ((unsigned long)__builtin_frame_address(0) &  
> \
> > +  (THREAD_SIZE - 1)))
> > +# endif /* __ia64__ */
> > +
> > +#define __CHECK_STACK(msgdata, mask, cdls)   \
> > +do {   \
> > +   if (unlikely(CDEBUG_STACK() > libcfs_stack)) {\
> > +   LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL);   
> \
> > +   libcfs_stack = CDEBUG_STACK();\
> > +   libcfs_debug_msg(msgdata,  \
> > +"maximum lustre stack %lu\n",\
> > +CDEBUG_STACK());  \
> > +   (msgdata)->msg_mask = mask;  \
> > +   (msgdata)->msg_cdls = cdls;  \
> > +   dump_stack();  \
> > + /*panic("LBUG");*/
> \
> > +   }

Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Patrick Farrell
James,

If I understand correctly, you're saying you want to be able to build without 
debug support...?  I'm not convinced that building a client without debug 
support is interesting or useful.  In fact, I think it would be harmful, and we 
shouldn't open up the possibility - this is switchable debug with very low 
overhead when not actually "on".  It would be really awful to get a problem on 
a running system and discover there's no debug support - that you can't even 
enable debug without a reinstall.

If I've understood you correctly, then I would want to see proof of a 
significant performance cost when debug is built but *off* before agreeing to 
even exposing this option.  (I know it's a choice they'd have to make, but if 
it's not really useful with a side order of potentially harmful, we shouldn't 
even give people the choice.)

- Patrick

On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" 
 
wrote:


> CDEBUG_STACK() and CHECK_STACK() are macros to help with
> debugging, so move them from
>drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> to
>drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> 
> This seems a more fitting location, and is a step towards
> removing linux/libcfs.h and simplifying the include file structure.

Nak. Currently the lustre client always enables debugging but that
shouldn't be the case. What we do need is the able to turn off the 
crazy debugging stuff. In the development branch of lustre it is
done with CDEBUG_ENABLED. We need something like that in Kconfig
much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
to be able to turn that off this should be moved to just after
LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
would be empty.
 
> Signed-off-by: NeilBrown 
> ---
>  .../lustre/include/linux/libcfs/libcfs_debug.h |   32 

>  .../lustre/include/linux/libcfs/linux/libcfs.h |   31 
---
>  2 files changed, 32 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> index 9290a19429e7..0dc7b91efe7c 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, 
int is_subsys);
>  extern unsigned int libcfs_catastrophe;
>  extern unsigned int libcfs_panic_on_lbug;
>  
> +/* Enable debug-checks on stack size - except on x86_64 */
> +#if !defined(__x86_64__)
> +# ifdef __ia64__
> +#  define CDEBUG_STACK() (THREAD_SIZE -   \
> +   ((unsigned long)__builtin_dwarf_cfa() &   \
> +(THREAD_SIZE - 1)))
> +# else
> +#  define CDEBUG_STACK() (THREAD_SIZE -   \
> +   ((unsigned long)__builtin_frame_address(0) &  \
> +(THREAD_SIZE - 1)))
> +# endif /* __ia64__ */
> +
> +#define __CHECK_STACK(msgdata, mask, cdls) \
> +do { \
> + if (unlikely(CDEBUG_STACK() > libcfs_stack)) {\
> + LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL);   \
> + libcfs_stack = CDEBUG_STACK();\
> + libcfs_debug_msg(msgdata,  \
> +  "maximum lustre stack %lu\n",\
> +  CDEBUG_STACK());  \
> + (msgdata)->msg_mask = mask;  \
> + (msgdata)->msg_cdls = cdls;  \
> + dump_stack();  \
> +   /*panic("LBUG");*/\
> + }  \
> +} while (0)
> +#define CFS_CHECK_STACK(msgdata, mask, cdls)  __CHECK_STACK(msgdata, 
mask, cdls)
> +#else /* __x86_64__ */
> +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> +#define CDEBUG_STACK() (0L)
> +#endif /* __x86_64__ */
> +
>  #ifndef DEBUG_SUBSYSTEM
>  # define DEBUG_SUBSYSTEM S_UNDEFINED
>  #endif
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h 
b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> index 07d3cb2217d1..83aec9c7698f 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> +++ 

Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Patrick Farrell
James,

If I understand correctly, you're saying you want to be able to build without 
debug support...?  I'm not convinced that building a client without debug 
support is interesting or useful.  In fact, I think it would be harmful, and we 
shouldn't open up the possibility - this is switchable debug with very low 
overhead when not actually "on".  It would be really awful to get a problem on 
a running system and discover there's no debug support - that you can't even 
enable debug without a reinstall.

If I've understood you correctly, then I would want to see proof of a 
significant performance cost when debug is built but *off* before agreeing to 
even exposing this option.  (I know it's a choice they'd have to make, but if 
it's not really useful with a side order of potentially harmful, we shouldn't 
even give people the choice.)

- Patrick

On 4/15/18, 10:49 PM, "lustre-devel on behalf of James Simmons" 
 
wrote:


> CDEBUG_STACK() and CHECK_STACK() are macros to help with
> debugging, so move them from
>drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> to
>drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> 
> This seems a more fitting location, and is a step towards
> removing linux/libcfs.h and simplifying the include file structure.

Nak. Currently the lustre client always enables debugging but that
shouldn't be the case. What we do need is the able to turn off the 
crazy debugging stuff. In the development branch of lustre it is
done with CDEBUG_ENABLED. We need something like that in Kconfig
much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
to be able to turn that off this should be moved to just after
LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
would be empty.
 
> Signed-off-by: NeilBrown 
> ---
>  .../lustre/include/linux/libcfs/libcfs_debug.h |   32 

>  .../lustre/include/linux/libcfs/linux/libcfs.h |   31 
---
>  2 files changed, 32 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> index 9290a19429e7..0dc7b91efe7c 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, 
int is_subsys);
>  extern unsigned int libcfs_catastrophe;
>  extern unsigned int libcfs_panic_on_lbug;
>  
> +/* Enable debug-checks on stack size - except on x86_64 */
> +#if !defined(__x86_64__)
> +# ifdef __ia64__
> +#  define CDEBUG_STACK() (THREAD_SIZE -   \
> +   ((unsigned long)__builtin_dwarf_cfa() &   \
> +(THREAD_SIZE - 1)))
> +# else
> +#  define CDEBUG_STACK() (THREAD_SIZE -   \
> +   ((unsigned long)__builtin_frame_address(0) &  \
> +(THREAD_SIZE - 1)))
> +# endif /* __ia64__ */
> +
> +#define __CHECK_STACK(msgdata, mask, cdls) \
> +do { \
> + if (unlikely(CDEBUG_STACK() > libcfs_stack)) {\
> + LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL);   \
> + libcfs_stack = CDEBUG_STACK();\
> + libcfs_debug_msg(msgdata,  \
> +  "maximum lustre stack %lu\n",\
> +  CDEBUG_STACK());  \
> + (msgdata)->msg_mask = mask;  \
> + (msgdata)->msg_cdls = cdls;  \
> + dump_stack();  \
> +   /*panic("LBUG");*/\
> + }  \
> +} while (0)
> +#define CFS_CHECK_STACK(msgdata, mask, cdls)  __CHECK_STACK(msgdata, 
mask, cdls)
> +#else /* __x86_64__ */
> +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
> +#define CDEBUG_STACK() (0L)
> +#endif /* __x86_64__ */
> +
>  #ifndef DEBUG_SUBSYSTEM
>  # define DEBUG_SUBSYSTEM S_UNDEFINED
>  #endif
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h 
b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> index 07d3cb2217d1..83aec9c7698f 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
> @@ -80,35 +80,4 @@
>  #include 
>  #include