Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-02-03 Thread Kuvaja, Erno
Now in my understanding our services does not log to user. The user gets 
whatever error message/exception it happens to be thrown at. This is exactly 
Why we need some common identifier between them (and who ever offers request ID 
being that, I can get some of my friends with well broken English calling you 
and trying to give that to you over phone ;) ).

More inline.

> -Original Message-
> From: Rochelle Grober [mailto:rochelle.gro...@huawei.com]
> Sent: 02 February 2015 21:34
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes
> 
> What I see in this conversation is that we are talking about multiple 
> different
> user classes.
> 
> Infra-operator needs as much info as possible, so if it is a vendor driver 
> that is
> erring out, the dev-ops can see it in the log.

NO! Absolutely not. This is where we need to be careful what we classify as 
DEBUG and what INFO+ as the ops definitely do not need nor want it all. 
> 
> Tenant-operator is a totally different class of user.  These guys need VM
> based logs and virtual network based logs, etc., but should never see as far
> under the covers as the infra-ops *has* to see.

They see pretty much just the error messages raised to them, not the cloud 
infra logs anyways. What we need to do is to be more helpful towards them what 
they should and can help themselves with and where they would need ops help.
> 
> So, sounds like a security policy issue of what makes it to tenant logs and
> what stays "in the data center" thing.

Logs should never contain sensitive information (URIs, credentials, etc.) 
regardless where they are stored. Again obscurity is not security either.
> 
> There are *lots* of logs that are being generated.  It sounds like we need
> standards on what goes into which logs along with error codes,
> logging/reporting levels, criticality, etc.

We need guidelines. Now it's really hard to come by with tight rules how things 
needs to be logged as backend failure can be critical for some services while 
others might not care too much about that. (For example if swift has disk down, 
it's not catastrophic failure, they just move to next copy. But if back end 
store is down for glance, we can do pretty much nothing. Now should these two 
back end store failures be logged same way, no they should not.)

We need to keep the decision in the projects as mostly they are the only ones 
knowing how specific error condition affects the service. Also if the rules 
does not fit, it's really difficult to enforce for those, so let's not pick 
that fight.

- Erno
> 
> --Rocky
> 
> (bcc'ing the ops list so they can join this discussion, here)
> 
> -Original Message-
> From: Sean Dague [mailto:s...@dague.net]
> Sent: Monday, February 02, 2015 8:19 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes
> 
> On 02/01/2015 06:20 PM, Morgan Fainberg wrote:
> > Putting on my "sorry-but-it-is-my-job-to-get-in-your-way" hat (aka
> security), let's be careful how generous we are with the user and data we
> hand back. It should give enough information to be useful but no more. I
> don't want to see us opened to weird attack vectors because we're exposing
> internal state too generously.
> >
> > In short let's aim for a slow roll of extra info in, and evaluate each data 
> > point
> we expose (about a failure) before we do so. Knowing more about a failure is
> important for our users. Allowing easy access to information that could be
> used to attack / increase impact of a DOS could be bad.
> >
> > I think we can do it but it is important to not swing the pendulum too far
> the other direction too fast (give too much info all of a sudden).
> 
> Security by cloud obscurity?
> 
> I agree we should evaluate information sharing with security in mind.
> However, the black boxing level we have today is bad for OpenStack. At a
> certain point once you've added so many belts and suspenders, you can no
> longer walk normally any more.
> 
> Anyway, lets stop having this discussion in abstract and actually just 
> evaluate
> the cases in question that come up.
> 
>   -Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> __
> 
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> 
> Op

Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-02-03 Thread Kuvaja, Erno
> -Original Message-
> From: Sean Dague [mailto:s...@dague.net]
> Sent: 02 February 2015 16:19
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes
> 
> On 02/01/2015 06:20 PM, Morgan Fainberg wrote:
> > Putting on my "sorry-but-it-is-my-job-to-get-in-your-way" hat (aka
> security), let's be careful how generous we are with the user and data we
> hand back. It should give enough information to be useful but no more. I
> don't want to see us opened to weird attack vectors because we're exposing
> internal state too generously.
> >
> > In short let's aim for a slow roll of extra info in, and evaluate each data 
> > point
> we expose (about a failure) before we do so. Knowing more about a failure is
> important for our users. Allowing easy access to information that could be
> used to attack / increase impact of a DOS could be bad.
> >
> > I think we can do it but it is important to not swing the pendulum too far
> the other direction too fast (give too much info all of a sudden).
> 
> Security by cloud obscurity?
> 
> I agree we should evaluate information sharing with security in mind.
> However, the black boxing level we have today is bad for OpenStack. At a
> certain point once you've added so many belts and suspenders, you can no
> longer walk normally any more.

++
> 
> Anyway, lets stop having this discussion in abstract and actually just 
> evaluate
> the cases in question that come up.

++

- Erno
> 
>   -Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> __
> 
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-02-02 Thread Rochelle Grober
What I see in this conversation is that we are talking about multiple different 
user classes.

Infra-operator needs as much info as possible, so if it is a vendor driver that 
is erring out, the dev-ops can see it in the log.

Tenant-operator is a totally different class of user.  These guys need VM based 
logs and virtual network based logs, etc., but should never see as far under 
the covers as the infra-ops *has* to see.

So, sounds like a security policy issue of what makes it to tenant logs and 
what stays "in the data center" thing.  

There are *lots* of logs that are being generated.  It sounds like we need 
standards on what goes into which logs along with error codes, 
logging/reporting levels, criticality, etc.

--Rocky

(bcc'ing the ops list so they can join this discussion, here)

-Original Message-
From: Sean Dague [mailto:s...@dague.net] 
Sent: Monday, February 02, 2015 8:19 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

On 02/01/2015 06:20 PM, Morgan Fainberg wrote:
> Putting on my "sorry-but-it-is-my-job-to-get-in-your-way" hat (aka security), 
> let's be careful how generous we are with the user and data we hand back. It 
> should give enough information to be useful but no more. I don't want to see 
> us opened to weird attack vectors because we're exposing internal state too 
> generously. 
> 
> In short let's aim for a slow roll of extra info in, and evaluate each data 
> point we expose (about a failure) before we do so. Knowing more about a 
> failure is important for our users. Allowing easy access to information that 
> could be used to attack / increase impact of a DOS could be bad. 
> 
> I think we can do it but it is important to not swing the pendulum too far 
> the other direction too fast (give too much info all of a sudden). 

Security by cloud obscurity?

I agree we should evaluate information sharing with security in mind.
However, the black boxing level we have today is bad for OpenStack. At a
certain point once you've added so many belts and suspenders, you can no
longer walk normally any more.

Anyway, lets stop having this discussion in abstract and actually just
evaluate the cases in question that come up.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-02-02 Thread Sean Dague
On 02/01/2015 06:20 PM, Morgan Fainberg wrote:
> Putting on my "sorry-but-it-is-my-job-to-get-in-your-way" hat (aka security), 
> let's be careful how generous we are with the user and data we hand back. It 
> should give enough information to be useful but no more. I don't want to see 
> us opened to weird attack vectors because we're exposing internal state too 
> generously. 
> 
> In short let's aim for a slow roll of extra info in, and evaluate each data 
> point we expose (about a failure) before we do so. Knowing more about a 
> failure is important for our users. Allowing easy access to information that 
> could be used to attack / increase impact of a DOS could be bad. 
> 
> I think we can do it but it is important to not swing the pendulum too far 
> the other direction too fast (give too much info all of a sudden). 

Security by cloud obscurity?

I agree we should evaluate information sharing with security in mind.
However, the black boxing level we have today is bad for OpenStack. At a
certain point once you've added so many belts and suspenders, you can no
longer walk normally any more.

Anyway, lets stop having this discussion in abstract and actually just
evaluate the cases in question that come up.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-02-02 Thread Sean Dague
On 02/02/2015 12:54 AM, Christopher Yeoh wrote:
> 
> 
> On Sun, Feb 1, 2015 at 2:57 AM, Sean Dague  > wrote:
> 
> On 01/31/2015 05:24 AM, Duncan Thomas wrote:
> > Hi
> >
> > This discussion came up at the cinder mid-cycle last week too,
> > specifically in the context of 'Can we change the details text in an
> > existing error, or is that an unacceptable API change'.
> >
> > I have to second security / operational concerns about exposing
> too much
> > granularity of failure in these error codes.
> >
> > For cases where there is something wrong with the request (item out of
> > range, invalid names, feature not supported, etc) I totally agree that
> > we should have good, clear, parsable response, and standardisation
> would
> > be good. Having some fixed part of the response (whether a numeric
> code
> > or, as I tend to prefer, a CamelCaseDescription so that I don't
> have to
> > go look it up) and a human readable description section that is
> subject
> > to change seems sensible.
> >
> > What I would rather not see is leakage of information when something
> > internal to the cloud goes wrong, that the tenant can do nothing
> > against. We certainly shouldn't be leaking internal implementation
> > details like vendor details - that is what request IDs and logs
> are for.
> > The whole point of the cloud, to me, is that separation between the
> > things a tenant controls (what they want done) and what the cloud
> > provider controls (the details of how the work is done).
> >
> > For example, if a create volume request fails because cinder-scheduler
> > has crashed, all the tenant should get back is 'Things are broken, try
> > again later or pass request id 1234-5678-abcd-def0 to the cloud
> admin'.
> > They should need to or even be allowed to care about the details
> of the
> > failure, it is not their domain.
> 
> Sure, the value really is in determining things that are under the
> client's control to do differently. A concrete one is a multi hypervisor
> cloud with 2 hypervisors (say kvm and docker). The volume attach
> operation to a docker instance (which presumably is a separate set of
> instance types) can't work. The user should be told that that can't work
> with this instance_type if they try it.
> 
> That's actually user correctable information. And doesn't require a
> ticket to move forward.
> 
> I also think we could have a detail level knob, because I expect the
> level of information exposure might be considered different in public
> cloud use case vs. a private cloud at an org level or a private cloud at
> a dept level.
> 
> 
> That could turn into a major compatibility issue if what we returned
> could (and
> probably would between public/private) change between clouds? If we want
> to encourage
> people to parse this sort of thing I think we need to settle on whether
> we send the
> information back or not for everyone. 

Sure, it's a theoretical concern. We're not going to get anywhere rat
holing on theoretical concerns though, lets get some concrete instances
out there to discuss.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-02-01 Thread Christopher Yeoh
On Sun, Feb 1, 2015 at 2:57 AM, Sean Dague  wrote:

> On 01/31/2015 05:24 AM, Duncan Thomas wrote:
> > Hi
> >
> > This discussion came up at the cinder mid-cycle last week too,
> > specifically in the context of 'Can we change the details text in an
> > existing error, or is that an unacceptable API change'.
> >
> > I have to second security / operational concerns about exposing too much
> > granularity of failure in these error codes.
> >
> > For cases where there is something wrong with the request (item out of
> > range, invalid names, feature not supported, etc) I totally agree that
> > we should have good, clear, parsable response, and standardisation would
> > be good. Having some fixed part of the response (whether a numeric code
> > or, as I tend to prefer, a CamelCaseDescription so that I don't have to
> > go look it up) and a human readable description section that is subject
> > to change seems sensible.
> >
> > What I would rather not see is leakage of information when something
> > internal to the cloud goes wrong, that the tenant can do nothing
> > against. We certainly shouldn't be leaking internal implementation
> > details like vendor details - that is what request IDs and logs are for.
> > The whole point of the cloud, to me, is that separation between the
> > things a tenant controls (what they want done) and what the cloud
> > provider controls (the details of how the work is done).
> >
> > For example, if a create volume request fails because cinder-scheduler
> > has crashed, all the tenant should get back is 'Things are broken, try
> > again later or pass request id 1234-5678-abcd-def0 to the cloud admin'.
> > They should need to or even be allowed to care about the details of the
> > failure, it is not their domain.
>
> Sure, the value really is in determining things that are under the
> client's control to do differently. A concrete one is a multi hypervisor
> cloud with 2 hypervisors (say kvm and docker). The volume attach
> operation to a docker instance (which presumably is a separate set of
> instance types) can't work. The user should be told that that can't work
> with this instance_type if they try it.
>
> That's actually user correctable information. And doesn't require a
> ticket to move forward.
>
> I also think we could have a detail level knob, because I expect the
> level of information exposure might be considered different in public
> cloud use case vs. a private cloud at an org level or a private cloud at
> a dept level.
>
>
That could turn into a major compatibility issue if what we returned could
(and
probably would between public/private) change between clouds? If we want to
encourage
people to parse this sort of thing I think we need to settle on whether we
send the
information back or not for everyone.


> -Sean
>
> >
> >
> >
> > On 30 January 2015 at 02:34, Rochelle Grober  > > wrote:
> >
> > Hi folks!
> >
> > Changed the tags a bit because this is a discussion for all projects
> > and dovetails with logging rationalization/standards/
> >
> > At the Paris summit, we had a number of session on logging that kept
> > circling back to Error Codes.  But, these codes would not be http
> > codes, rather, as others have pointed out, codes related to the
> > calling entities and referring entities and the actions that
> > happened or didn’t.  Format suggestions were gathered from the
> > Operators and from some senior developers.  The Logging Working
> > Group is planning to put forth a spec for discussion on formats and
> > standards before the Ops mid-cycle meetup.
> >
> > Working from a Glance proposal on error codes:
> > https://review.openstack.org/#/c/127482/ and discussions with
> > operators and devs, we have a strawman to propose.  We also have a
> > number of requirements from Ops and some Devs.
> >
> > Here is the basic idea:
> >
> > Code for logs would have four segments:
> > Project Vendor/Component  Error
> > Catalog number Criticality
> > Def [A-Z] [A-Z] [A-Z]   -
> > [{0-9}|{A-Z}][A-Z] - [-]-   [0-9]
> > Ex.  CIN-   NA-
> >   0001-
> >2
> > Cinder   NetApp
> >   driver error no
> >   Criticality
> > Ex.  GLA-  0A-
> >0051
> >  3
> > Glance  Api
> >error no
> >  Criticality
> > Three letters for project,  Either a two letter vendor code or a
> > number and letter for 0+letter for internal component of project
> > (like API=0A, Controller =0C, etc),  four d

Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-02-01 Thread Morgan Fainberg
Putting on my "sorry-but-it-is-my-job-to-get-in-your-way" hat (aka security), 
let's be careful how generous we are with the user and data we hand back. It 
should give enough information to be useful but no more. I don't want to see us 
opened to weird attack vectors because we're exposing internal state too 
generously. 

In short let's aim for a slow roll of extra info in, and evaluate each data 
point we expose (about a failure) before we do so. Knowing more about a failure 
is important for our users. Allowing easy access to information that could be 
used to attack / increase impact of a DOS could be bad. 

I think we can do it but it is important to not swing the pendulum too far the 
other direction too fast (give too much info all of a sudden). 

--Morgan

Sent via mobile

> On Jan 31, 2015, at 08:57, James E. Blair  wrote:
> 
> Sean Dague  writes:
> 
>>> On 01/31/2015 05:24 AM, Duncan Thomas wrote:
>>> What I would rather not see is leakage of information when something
>>> internal to the cloud goes wrong, that the tenant can do nothing
>>> against. We certainly shouldn't be leaking internal implementation
>>> details like vendor details - that is what request IDs and logs are for.
>>> The whole point of the cloud, to me, is that separation between the
>>> things a tenant controls (what they want done) and what the cloud
>>> provider controls (the details of how the work is done).
>> 
>> Sure, the value really is in determining things that are under the
>> client's control to do differently. A concrete one is a multi hypervisor
>> cloud with 2 hypervisors (say kvm and docker). The volume attach
>> operation to a docker instance (which presumably is a separate set of
>> instance types) can't work. The user should be told that that can't work
>> with this instance_type if they try it.
> 
> I agree that we should find the right balance.  Some anecdata from
> infra-as-a-user: we have seen OpenStack sometimes unable to allocate a
> public IP address for our servers when we cycle them too quickly with
> nodepool.  That shows up as an opaque error for us, and it's only by
> chatting with the operators that we know what's going on, yet, there
> might be things we can do to reduce the occurrence (like rebuild nodes
> instead of destroying them; delay before creating again; etc.).
> 
> So I would suggest that when we search for the sweet spot of how much
> detail to include, we be somewhat generous with the user, who after all,
> is likely to be technically competent and frustrated if they are
> replacing systems that they can control and diagnose with a black box
> that has a habit of saying "no" at random times for no discernible
> reason.
> 
> -Jim
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-01-31 Thread James E. Blair
Sean Dague  writes:

> On 01/31/2015 05:24 AM, Duncan Thomas wrote:
>> What I would rather not see is leakage of information when something
>> internal to the cloud goes wrong, that the tenant can do nothing
>> against. We certainly shouldn't be leaking internal implementation
>> details like vendor details - that is what request IDs and logs are for.
>> The whole point of the cloud, to me, is that separation between the
>> things a tenant controls (what they want done) and what the cloud
>> provider controls (the details of how the work is done).
>
> Sure, the value really is in determining things that are under the
> client's control to do differently. A concrete one is a multi hypervisor
> cloud with 2 hypervisors (say kvm and docker). The volume attach
> operation to a docker instance (which presumably is a separate set of
> instance types) can't work. The user should be told that that can't work
> with this instance_type if they try it.

I agree that we should find the right balance.  Some anecdata from
infra-as-a-user: we have seen OpenStack sometimes unable to allocate a
public IP address for our servers when we cycle them too quickly with
nodepool.  That shows up as an opaque error for us, and it's only by
chatting with the operators that we know what's going on, yet, there
might be things we can do to reduce the occurrence (like rebuild nodes
instead of destroying them; delay before creating again; etc.).

So I would suggest that when we search for the sweet spot of how much
detail to include, we be somewhat generous with the user, who after all,
is likely to be technically competent and frustrated if they are
replacing systems that they can control and diagnose with a black box
that has a habit of saying "no" at random times for no discernible
reason.

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-01-31 Thread Sean Dague
On 01/31/2015 05:24 AM, Duncan Thomas wrote:
> Hi
> 
> This discussion came up at the cinder mid-cycle last week too,
> specifically in the context of 'Can we change the details text in an
> existing error, or is that an unacceptable API change'.
> 
> I have to second security / operational concerns about exposing too much
> granularity of failure in these error codes.
> 
> For cases where there is something wrong with the request (item out of
> range, invalid names, feature not supported, etc) I totally agree that
> we should have good, clear, parsable response, and standardisation would
> be good. Having some fixed part of the response (whether a numeric code
> or, as I tend to prefer, a CamelCaseDescription so that I don't have to
> go look it up) and a human readable description section that is subject
> to change seems sensible.
> 
> What I would rather not see is leakage of information when something
> internal to the cloud goes wrong, that the tenant can do nothing
> against. We certainly shouldn't be leaking internal implementation
> details like vendor details - that is what request IDs and logs are for.
> The whole point of the cloud, to me, is that separation between the
> things a tenant controls (what they want done) and what the cloud
> provider controls (the details of how the work is done).
> 
> For example, if a create volume request fails because cinder-scheduler
> has crashed, all the tenant should get back is 'Things are broken, try
> again later or pass request id 1234-5678-abcd-def0 to the cloud admin'.
> They should need to or even be allowed to care about the details of the
> failure, it is not their domain.

Sure, the value really is in determining things that are under the
client's control to do differently. A concrete one is a multi hypervisor
cloud with 2 hypervisors (say kvm and docker). The volume attach
operation to a docker instance (which presumably is a separate set of
instance types) can't work. The user should be told that that can't work
with this instance_type if they try it.

That's actually user correctable information. And doesn't require a
ticket to move forward.

I also think we could have a detail level knob, because I expect the
level of information exposure might be considered different in public
cloud use case vs. a private cloud at an org level or a private cloud at
a dept level.

-Sean

> 
> 
> 
> On 30 January 2015 at 02:34, Rochelle Grober  > wrote:
> 
> Hi folks!
> 
> Changed the tags a bit because this is a discussion for all projects
> and dovetails with logging rationalization/standards/
> 
> At the Paris summit, we had a number of session on logging that kept
> circling back to Error Codes.  But, these codes would not be http
> codes, rather, as others have pointed out, codes related to the
> calling entities and referring entities and the actions that
> happened or didn’t.  Format suggestions were gathered from the
> Operators and from some senior developers.  The Logging Working
> Group is planning to put forth a spec for discussion on formats and
> standards before the Ops mid-cycle meetup.
> 
> Working from a Glance proposal on error codes: 
> https://review.openstack.org/#/c/127482/ and discussions with
> operators and devs, we have a strawman to propose.  We also have a
> number of requirements from Ops and some Devs.
> 
> Here is the basic idea:
> 
> Code for logs would have four segments:
> Project Vendor/Component  Error
> Catalog number Criticality
> Def [A-Z] [A-Z] [A-Z]   - 
> [{0-9}|{A-Z}][A-Z] - [-]-   [0-9]
> Ex.  CIN-   NA- 
>   0001- 
>2
> Cinder   NetApp 
>   driver error no   
>   Criticality
> Ex.  GLA-  0A- 
>0051 
>  3
> Glance  Api 
>error no 
>  Criticality
> Three letters for project,  Either a two letter vendor code or a
> number and letter for 0+letter for internal component of project
> (like API=0A, Controller =0C, etc),  four digit error number which
> could be subsetted for even finer granularity, and a criticality number.
> 
> This is for logging purposes and tracking down root cause faster for
> operators, but if an error is generated, why can the same codes be
> used internally for the code as externally for the logs?  This also
> allows for a unique message to 

Re: [openstack-dev] [Product] [all][log] Openstack HTTP error codes

2015-01-31 Thread Duncan Thomas
Hi

This discussion came up at the cinder mid-cycle last week too, specifically
in the context of 'Can we change the details text in an existing error, or
is that an unacceptable API change'.

I have to second security / operational concerns about exposing too much
granularity of failure in these error codes.

For cases where there is something wrong with the request (item out of
range, invalid names, feature not supported, etc) I totally agree that we
should have good, clear, parsable response, and standardisation would be
good. Having some fixed part of the response (whether a numeric code or, as
I tend to prefer, a CamelCaseDescription so that I don't have to go look it
up) and a human readable description section that is subject to change
seems sensible.

What I would rather not see is leakage of information when something
internal to the cloud goes wrong, that the tenant can do nothing against.
We certainly shouldn't be leaking internal implementation details like
vendor details - that is what request IDs and logs are for. The whole point
of the cloud, to me, is that separation between the things a tenant
controls (what they want done) and what the cloud provider controls (the
details of how the work is done).

For example, if a create volume request fails because cinder-scheduler has
crashed, all the tenant should get back is 'Things are broken, try again
later or pass request id 1234-5678-abcd-def0 to the cloud admin'. They
should need to or even be allowed to care about the details of the failure,
it is not their domain.



On 30 January 2015 at 02:34, Rochelle Grober 
wrote:

> Hi folks!
>
> Changed the tags a bit because this is a discussion for all projects and
> dovetails with logging rationalization/standards/
>
> At the Paris summit, we had a number of session on logging that kept
> circling back to Error Codes.  But, these codes would not be http codes,
> rather, as others have pointed out, codes related to the calling entities
> and referring entities and the actions that happened or didn’t.  Format
> suggestions were gathered from the Operators and from some senior
> developers.  The Logging Working Group is planning to put forth a spec for
> discussion on formats and standards before the Ops mid-cycle meetup.
>
> Working from a Glance proposal on error codes:
> https://review.openstack.org/#/c/127482/ and discussions with operators
> and devs, we have a strawman to propose.  We also have a number of
> requirements from Ops and some Devs.
>
> Here is the basic idea:
>
> Code for logs would have four segments:
> Project Vendor/Component  Error
> Catalog number Criticality
> Def [A-Z] [A-Z] [A-Z]   -
> [{0-9}|{A-Z}][A-Z] - [-]-   [0-9]
> Ex.  CIN-   NA-
> 0001- 2
> Cinder   NetApp
> driver error no
> Criticality
> Ex.  GLA-  0A-
>  0051   3
> Glance  Api
>  error no   Criticality
> Three letters for project,  Either a two letter vendor code or a number
> and letter for 0+letter for internal component of project (like API=0A,
> Controller =0C, etc),  four digit error number which could be subsetted for
> even finer granularity, and a criticality number.
>
> This is for logging purposes and tracking down root cause faster for
> operators, but if an error is generated, why can the same codes be used
> internally for the code as externally for the logs?  This also allows for a
> unique message to be associated with the error code that is more
> descriptive and that can be pre translated.  Again, for logging purposes,
> the error code would not be part of the message payload, but part of the
> headers.  Referrer IDs and other info would still be expected in the
> payload of the message and could include instance ids/names, NICs or VIFs,
> etc.  The message headers is code in Oslo.log and when using the Oslo.log
> library, will be easy to use.
>
> Since this discussion came up, I thought I needed to get this info out to
> folks and advertise that anyone will be able to comment on the spec to
> drive it to agreement.  I will be  advertising it here and on Ops and
> Product-WG mailing lists.  I’d also like to invite anyone who want to
> participate in discussions to join them.  We’ll be starting a bi-weekly or
> weekly IRC meeting (also announced in the stated MLs) in February.
>
> And please realize that other than Oslo.log, the changes to make the
> errors more useable will be almost entirely community created standards
> with community created tools to help enforce them.  None of which exist
> yet, FYI.
>
> --RockyG
>