Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-05-01 Thread gordon chung


On 29/04/17 08:14 AM, Julien Danjou wrote:
> 1. get 'deleted' metrics
> 2. delete all things in storage
>   -> if it fails, whatever, ignore, maybe a janitor is doing the same
>   thing?
> 3. expunge from indexer

possibly? i was thinking it was possible that maybe it would partially 
delete and could not deleted the rest on second go but i guess i'll need 
to look at that and see if we can do that.

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-29 Thread Julien Danjou
On Fri, Apr 28 2017, gordon chung wrote:

>>> refresh i believe is always disabled by default regardless of what
>>> interface you're using.
>>
>> You gotta to show me where it is 'cause I can't see that and I don't
>> recall any option for that. :/
>
> https://github.com/openstack/gnocchi/commit/72a2091727431555eba65c6ef8ff89448f3432f0
>  
>
>
> although now that i check, i see it's blocking by default... which means 
> we did guarantee all measures would be return.

I guess we misunderstood each other. What I meant originally is that we
did not have a flag to "disable its usage completely". Because from an
operator point of view it could be an easier way to provoke a DoS.

> hmmm. true. i'm still hoping we don't have to lock an entire sack for 
> one metric and not return an error status just because it can't lock. 
> doesn't seem like a good experience.

If you wait long enough, you always can return. :)

-- 
Julien Danjou
/* Free Software hacker
   https://julien.danjou.info */


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-29 Thread Julien Danjou
On Fri, Apr 28 2017, gordon chung wrote:

>>> if the sack is unlocked, then it means a processing worker isn't looking
>>> at the sack, and when it does lock the sack, it first has to check sack
>>> for existing measures to process and then check indexer to validate that
>>> they are still active. because it checks indexer later, even if both
>>> janitor and processing worker check lock at same time, we can guarantee
>>> it will have indexer state processing worker sees is > 00:00:00 since
>>> janitor has state before getting lock, while processing worker as state
>>> sometime after getting lock.
>>
>> My brain hurts but that sounds perfect. That even means we potentially
>> did not have to lock currently, sack or no sack.
>>
>
> oh darn, i didn't consider multiple janitors... so this only works if we 
> make janitor completely separate and only allow one janitor ever. back 
> to square 1

Not sure it's a big deal if you have multiple janitors. Do you have a
scenario in mind where we'd have problem?
Since it's mainly:
1. get 'deleted' metrics
2. delete all things in storage
  -> if it fails, whatever, ignore, maybe a janitor is doing the same
  thing?
3. expunge from indexer

I miss something hm?

-- 
Julien Danjou
;; Free Software hacker
;; https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread gordon chung


On 28/04/17 10:50 AM, Julien Danjou wrote:
> On Fri, Apr 28 2017, gordon chung wrote:
>
>> if the sack is unlocked, then it means a processing worker isn't looking
>> at the sack, and when it does lock the sack, it first has to check sack
>> for existing measures to process and then check indexer to validate that
>> they are still active. because it checks indexer later, even if both
>> janitor and processing worker check lock at same time, we can guarantee
>> it will have indexer state processing worker sees is > 00:00:00 since
>> janitor has state before getting lock, while processing worker as state
>> sometime after getting lock.
>
> My brain hurts but that sounds perfect. That even means we potentially
> did not have to lock currently, sack or no sack.
>

oh darn, i didn't consider multiple janitors... so this only works if we 
make janitor completely separate and only allow one janitor ever. back 
to square 1

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread gordon chung


On 28/04/17 10:50 AM, Julien Danjou wrote:
> On Fri, Apr 28 2017, gordon chung wrote:
>
>> if the sack is unlocked, then it means a processing worker isn't looking
>> at the sack, and when it does lock the sack, it first has to check sack
>> for existing measures to process and then check indexer to validate that
>> they are still active. because it checks indexer later, even if both
>> janitor and processing worker check lock at same time, we can guarantee
>> it will have indexer state processing worker sees is > 00:00:00 since
>> janitor has state before getting lock, while processing worker as state
>> sometime after getting lock.
>
> My brain hurts but that sounds perfect. That even means we potentially
> did not have to lock currently, sack or no sack.
>

yeah, i think you're right.

success! confused you with my jumble of random words.

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread gordon chung


On 28/04/17 10:11 AM, Julien Danjou wrote:
> On Fri, Apr 28 2017, gordon chung wrote:
>
>> refresh i believe is always disabled by default regardless of what
>> interface you're using.
>
> You gotta to show me where it is 'cause I can't see that and I don't
> recall any option for that. :/

https://github.com/openstack/gnocchi/commit/72a2091727431555eba65c6ef8ff89448f3432f0
 


although now that i check, i see it's blocking by default... which means 
we did guarantee all measures would be return.

>
>> in the case of cross-metric aggregations, this is a timeout for entire
>> request or per metric timeout? i think it's going to get quite chaotic
>> in the multiple metric (multiple sacks) refresh. :(
>
> Right I did not think about multiple refresh. Well it'll be a nice
> slalom of lock acquiring. :-)
>
>> i'm hoping not to have a timeout because i imagine there will be
>> scenarios where we block trying to lock sack, and when we finally get
>> sack lock, we find there is no work for us. this means we just added x
>> seconds to response to for no reason.
>
> Right, I see your point. Though we _could_ enhance refresh to first
> check if there's any job to do. It's lock-free. Just checking. :)
>

hmmm. true. i'm still hoping we don't have to lock an entire sack for 
one metric and not return an error status just because it can't lock. 
doesn't seem like a good experience.

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread Julien Danjou
On Fri, Apr 28 2017, gordon chung wrote:

> if the sack is unlocked, then it means a processing worker isn't looking 
> at the sack, and when it does lock the sack, it first has to check sack 
> for existing measures to process and then check indexer to validate that 
> they are still active. because it checks indexer later, even if both 
> janitor and processing worker check lock at same time, we can guarantee 
> it will have indexer state processing worker sees is > 00:00:00 since 
> janitor has state before getting lock, while processing worker as state 
> sometime after getting lock.

My brain hurts but that sounds perfect. That even means we potentially
did not have to lock currently, sack or no sack.

-- 
Julien Danjou
// Free Software hacker
// https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread Julien Danjou
On Fri, Apr 28 2017, gordon chung wrote:

> refresh i believe is always disabled by default regardless of what 
> interface you're using.

You gotta to show me where it is 'cause I can't see that and I don't
recall any option for that. :/

> in the case of cross-metric aggregations, this is a timeout for entire 
> request or per metric timeout? i think it's going to get quite chaotic 
> in the multiple metric (multiple sacks) refresh. :(

Right I did not think about multiple refresh. Well it'll be a nice
slalom of lock acquiring. :-)

> i'm hoping not to have a timeout because i imagine there will be 
> scenarios where we block trying to lock sack, and when we finally get 
> sack lock, we find there is no work for us. this means we just added x 
> seconds to response to for no reason.

Right, I see your point. Though we _could_ enhance refresh to first
check if there's any job to do. It's lock-free. Just checking. :)

-- 
Julien Danjou
# Free Software hacker
# https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread gordon chung


On 28/04/17 09:23 AM, Julien Danjou wrote:
> On Fri, Apr 28 2017, gordon chung wrote:
>
>> what if we just don't lock ever on delete, but if we still check lock to
>> see if it's processed. at that point, the janitor knows that metric(s)
>> is deleted, and since no one else is working on sack, any metricd that
>> follows will also know that the metric(s) are deleted and therefore,
>> won't work on it.
>
> I did not understand your whole idea, can you detail a bit more?
> Though the "check lock" approach usually does not work and is a source
> of race condition, but again, I did not get the whole picture. :)
>

my thinking is that, when the janitor goes to process a sack, it means 
it has indexer state from time 00:00:00.

if the sack is locked, then it means a processing worker is looking at 
sack and has indexer state from time <= 00:00:00.

if the sack is unlocked, then it means a processing worker isn't looking 
at the sack, and when it does lock the sack, it first has to check sack 
for existing measures to process and then check indexer to validate that 
they are still active. because it checks indexer later, even if both 
janitor and processing worker check lock at same time, we can guarantee 
it will have indexer state processing worker sees is > 00:00:00 since 
janitor has state before getting lock, while processing worker as state 
sometime after getting lock.

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread gordon chung


On 28/04/17 09:19 AM, Julien Danjou wrote:
> That's not what I meant. You can have the same mechanism as currently,
> but then you compute the sacks of all metrics and you
> itertools.groupby() per sack on them before locking the sack and
> expunging them.

yeah, we should do that. i'll add that as a patch.

>
>> > refresh is currently disabled by default so i think we're ok.
> Well you mean it's disabled by default _in the CLI_, not in the API
> right?

refresh i believe is always disabled by default regardless of what 
interface you're using.

>
>> > what's the timeout for? timeout api's attempt to aggregate metric? i
>> > think it's a bad experience if we add any timeout since i assume it will
>> > still return what it can return and then the results become somewhat
>> > ambiguous.
> No, I meant timeout for grabbing the sack's lock. You wouldn't return a
> 2xx but a 5xx stating the API is unable to compute stuff right now, so
> try again without refresh or something.

in the case of cross-metric aggregations, this is a timeout for entire 
request or per metric timeout? i think it's going to get quite chaotic 
in the multiple metric (multiple sacks) refresh. :(

i'm hoping not to have a timeout because i imagine there will be 
scenarios where we block trying to lock sack, and when we finally get 
sack lock, we find there is no work for us. this means we just added x 
seconds to response to for no reason.

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread Julien Danjou
On Fri, Apr 28 2017, gordon chung wrote:

> what if we just don't lock ever on delete, but if we still check lock to 
> see if it's processed. at that point, the janitor knows that metric(s) 
> is deleted, and since no one else is working on sack, any metricd that 
> follows will also know that the metric(s) are deleted and therefore, 
> won't work on it.

I did not understand your whole idea, can you detail a bit more?
Though the "check lock" approach usually does not work and is a source
of race condition, but again, I did not get the whole picture. :)

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread Julien Danjou
On Fri, Apr 28 2017, gordon chung wrote:

> so the tradeoff here is that now we're doing a lot more calls to 
> indexer. additionally, we're pulling a lot more unused results from db.
> a single janitor currently just grabs all deleted metrics and starts 
> attempting to clean them up one at a time. if we merge, we will have n 
> calls to indexer, where n is number of workers, each pulling in all the 
> deleted metrics, and then checking to see if the metric is in it's sack, 
> and if not, moving on. that's a lot of extra, wasted work. we could 
> reduce that work by adding sack information to indexer ;) but that will 
> still add significantly more calls to indexer (which we could reduce by 
> not triggering cleanup every job interval)

That's not what I meant. You can have the same mechanism as currently,
but then you compute the sacks of all metrics and you
itertools.groupby() per sack on them before locking the sack and
expunging them.

> refresh is currently disabled by default so i think we're ok.

Well you mean it's disabled by default _in the CLI_, not in the API
right?

> what's the timeout for? timeout api's attempt to aggregate metric? i 
> think it's a bad experience if we add any timeout since i assume it will 
> still return what it can return and then the results become somewhat 
> ambiguous.

No, I meant timeout for grabbing the sack's lock. You wouldn't return a
2xx but a 5xx stating the API is unable to compute stuff right now, so
try again without refresh or something.

-- 
Julien Danjou
// Free Software hacker
// https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread gordon chung


On 28/04/17 03:48 AM, Julien Danjou wrote:
> Yes, I wrote that in a review somewhere. We need to rework 1. so
> deletion happens at the same time we lock the sack to process metrics
> basically. We might want to merge the janitor into the worker I imagine.
> Currently a janitor can grab metrics and do dumb things like:
> - metric1 from sackA
> - metric2 from sackB
> - metric3 from sackA
>
> and do 3 different lock+delete -_-

what if we just don't lock ever on delete, but if we still check lock to 
see if it's processed. at that point, the janitor knows that metric(s) 
is deleted, and since no one else is working on sack, any metricd that 
follows will also know that the metric(s) are deleted and therefore, 
won't work on it.

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread gordon chung


On 28/04/17 03:48 AM, Julien Danjou wrote:
>
> Yes, I wrote that in a review somewhere. We need to rework 1. so
> deletion happens at the same time we lock the sack to process metrics
> basically. We might want to merge the janitor into the worker I imagine.
> Currently a janitor can grab metrics and do dumb things like:
> - metric1 from sackA
> - metric2 from sackB
> - metric3 from sackA
>
> and do 3 different lock+delete -_-

so the tradeoff here is that now we're doing a lot more calls to 
indexer. additionally, we're pulling a lot more unused results from db.
a single janitor currently just grabs all deleted metrics and starts 
attempting to clean them up one at a time. if we merge, we will have n 
calls to indexer, where n is number of workers, each pulling in all the 
deleted metrics, and then checking to see if the metric is in it's sack, 
and if not, moving on. that's a lot of extra, wasted work. we could 
reduce that work by adding sack information to indexer ;) but that will 
still add significantly more calls to indexer (which we could reduce by 
not triggering cleanup every job interval)


>>
>> alternatively, this could be solved by keeping per-metric locks in
>> addition to per-sack locks. this would effectively double the number of
>> active locks we have so instead of each metricd worker having a single
>> per-sack lock, it will also have a per-metric lock for whatever metric
>> it may be publishing at the time.
>
> If we got a timeout set for scenario 3, I'm not that worried. I guess
> worst thing is that people would be unhappy with the API spending time
> doing computation anyway so we'd need to rework how refresh work or add
> an ability to disable it.
>

refresh is currently disabled by default so i think we're ok.

what's the timeout for? timeout api's attempt to aggregate metric? i 
think it's a bad experience if we add any timeout since i assume it will 
still return what it can return and then the results become somewhat 
ambiguous.

now that i think about it more this issue still exists in per-metric 
scenario (but to lesser extent). 'refresh' can still be blocked by 
metricd but it's just a significantly smaller chance and the window for 
missed unprocessed measures is smaller.

-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-28 Thread Julien Danjou
On Thu, Apr 27 2017, gordon chung wrote:

> so as we transition to the bucket/shard/sack framework for incoming 
> writes, we've set up locks on the sacks so we only have one process 
> handling any given sack. this allows us to remove the per-metric locking 
> we had previously.

yay!

> the issue i've notice now is that if we only have per-sack locking, 
> metric based actions can affect sack level processing. for example:
>
> scenario 1:
> 1. delete metric, locks sack to delete single metric,
> 2. metricd processor attempts to process entire sack but can't, so skips.

Yes, I wrote that in a review somewhere. We need to rework 1. so
deletion happens at the same time we lock the sack to process metrics
basically. We might want to merge the janitor into the worker I imagine.
Currently a janitor can grab metrics and do dumb things like:
- metric1 from sackA
- metric2 from sackB
- metric3 from sackA

and do 3 different lock+delete -_-

> scenario 2:
> 1. API request passes in 'refresh' param so they want all unaggregated 
> metrics to be processed on demand and returned.
> 2. API locks 1 or more sacks to process 1 or more metrics
> 3. metricd processor attempts to process entire sack but can't, so 
> skips. potentially multiple sacks unprocessed in currently cycle.
>
> scenario 3
> same as scenario 2 but metricd processor locks first, and either blocks
> API process OR  doesn't allow API to guarantee 'all measures processed'.

Yes, I'm even more worried about scenario 3, we should probably add a
safe guard timeout parameter set by the admin there.

> i imagine these scenarios are not critical unless a very large 
> processing interval is defined or if for some unfortunate reason, the 
> metric-based actions are perfectly timed to lock out background processing.
>
> alternatively, this could be solved by keeping per-metric locks in 
> addition to per-sack locks. this would effectively double the number of 
> active locks we have so instead of each metricd worker having a single 
> per-sack lock, it will also have a per-metric lock for whatever metric 
> it may be publishing at the time.

If we got a timeout set for scenario 3, I'm not that worried. I guess
worst thing is that people would be unhappy with the API spending time
doing computation anyway so we'd need to rework how refresh work or add
an ability to disable it.

-- 
Julien Danjou
// Free Software hacker
// https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

2017-04-27 Thread gordon chung
hey,

so as we transition to the bucket/shard/sack framework for incoming 
writes, we've set up locks on the sacks so we only have one process 
handling any given sack. this allows us to remove the per-metric locking 
we had previously.

the issue i've notice now is that if we only have per-sack locking, 
metric based actions can affect sack level processing. for example:

scenario 1:
1. delete metric, locks sack to delete single metric,
2. metricd processor attempts to process entire sack but can't, so skips.

OR

scenario 2:
1. API request passes in 'refresh' param so they want all unaggregated 
metrics to be processed on demand and returned.
2. API locks 1 or more sacks to process 1 or more metrics
3. metricd processor attempts to process entire sack but can't, so 
skips. potentially multiple sacks unprocessed in currently cycle.

scenario 3
same as scenario 2 but metricd processor locks first, and either blocks
API process OR  doesn't allow API to guarantee 'all measures processed'.

i imagine these scenarios are not critical unless a very large 
processing interval is defined or if for some unfortunate reason, the 
metric-based actions are perfectly timed to lock out background processing.

alternatively, this could be solved by keeping per-metric locks in 
addition to per-sack locks. this would effectively double the number of 
active locks we have so instead of each metricd worker having a single 
per-sack lock, it will also have a per-metric lock for whatever metric 
it may be publishing at the time.


cheers,
-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev