Re: [openstack-dev] [gnocchi] typical length of timeseries data

2016-08-02 Thread Julien Danjou
On Tue, Aug 02 2016, gordon chung wrote:

> so from very rough testing, we can choose to lower it to 3600points which
> offers better split opportunities with negligible improvement/degradation, or
> even more to 900points with potentially small write degradation (massive
> batching).

3600 points sounds nice. :)

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] typical length of timeseries data

2016-08-02 Thread gordon chung


On 29/07/16 03:29 PM, gordon chung wrote:

i'm using Ceph. but i should mention i also only have 1 thread enabled
because python+threading is... yeah.

i'll give it a try again with threads enabled.


I tried this again with 16 threads. as expected, python (2.7.x) threads do jack 
all.

i also tried lowering the points per object to 900 (~8KB max). this performed 
~4% worse for read/writes. i should probably add a disclaimer that i'm batching 
75K points/metric at once which is probably not normal.

so from very rough testing, we can choose to lower it to 3600points which 
offers better split opportunities with negligible improvement/degradation, or 
even more to 900points with potentially small write degradation (massive 
batching).


--
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] typical length of timeseries data

2016-07-29 Thread Rochelle Grober
Just an FYI that might be the reason for the 14400:

1440 is the number of minutes in a day.  14400 would be tenths of minutes in a 
day of number of 6second chunks (huh???)

So, the number was picked to divide files in human logical, not computer 
logical chunks.

--Rocky

-Original Message-
From: gordon chung [mailto:g...@live.ca] 
Sent: Thursday, July 28, 2016 3:05 PM
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [gnocchi] typical length of timeseries data

hi folks,

this is probably something to discuss on ops list as well eventually but 
what do you think about shrinking the max size of timeseries chunks from 
14400 to something smaller? i'm curious to understand what the length of 
the typical timeseries is. my main reason for bringing this up is that 
even our default 'high' policy doesn't reach 14400 limit so it at most 
will only split into two, partially filled objects. as we look to make a 
more efficient storage format for v3(?) seems like this may be an 
opportunity to change size as well (if necessary)

14400 points roughly equals 128KB object which is cool but maybe we 
should target something smaller? 7200points aka 64KB? 3600 points aka 
32KB? just for reference our biggest default series is 10080 points 
(1min granularity over week).

that said 128KB (at most) might not be that bad from read/write pov and 
maybe it's ok to keep it at 14400? i know from the test i did earlier, 
the time requirement to read/write increases linearly (7200 point object 
takes roughly half time of 14400 point object)[1]. i think the main item 
is we don't want it too small that we're updating multiple objects at a 
time.

[1] http://www.slideshare.net/GordonChung/gnocchi-profiling-v2/25

cheers,

-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] typical length of timeseries data

2016-07-29 Thread gordon chung


On 29/07/2016 12:20 PM, Julien Danjou wrote:
> On Fri, Jul 29 2016, gordon chung wrote:
>
>> so at first glance, it doesn't really seem to affect performance much
>> whether it's one 'larger' file or many smaller files.
>
> I guess it's because your storage system latency (file?) does not make a
> difference. I imagine that over Swift or Ceph, it might change things a
> bit.
>
> If you add time.sleep(1) in _get_measures(), you'd see a difference. ;)
>

i'm using Ceph. but i should mention i also only have 1 thread enabled 
because python+threading is... yeah.

i'll give it a try again with threads enabled.

cheers,
-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] typical length of timeseries data

2016-07-29 Thread Julien Danjou
On Fri, Jul 29 2016, gordon chung wrote:

> so at first glance, it doesn't really seem to affect performance much 
> whether it's one 'larger' file or many smaller files.

I guess it's because your storage system latency (file?) does not make a
difference. I imagine that over Swift or Ceph, it might change things a
bit.

If you add time.sleep(1) in _get_measures(), you'd see a difference. ;)

-- 
Julien Danjou
;; Free Software hacker
;; https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] typical length of timeseries data

2016-07-29 Thread gordon chung


On 29/07/2016 5:00 AM, Julien Danjou wrote:
> Best way is probably to do some bench… but I think it really depends on
> the use cases here. The interest of having many small splits is that you
> can parallelize the read.
>
> Considering the compression ratio we have, I think we should split in
> smaller files. I'd pick 3600 and give it a try.

i gave this a quick try with a series of ~68k points

with object size of 14400 points (uncompressed), i got:

[gchung@gchung-dev ~(keystone_admin)]$ time gnocchi measures show 
dc51c402-67e6-4b28-aba0-9d46b35b5397 --granularity 60 &> /tmp/blah

real0m6.398s
user0m5.003s
sys 0m0.071s

it took ~39.45s to process into 24 different aggregated series and 
created 6 split objects.

with object size of 3600 points (uncompressed), i got:

[gchung@gchung-dev ~(keystone_admin)]$ time gnocchi measures show 
301947fd-97ee-428a-b445-41a67ee62c38 --granularity 60 &> /tmp/blah

real0m6.495s
user0m4.970s
sys 0m0.073s

it took ~39.89s to process into 24 different aggregated series and 
created 21 split objects

so at first glance, it doesn't really seem to affect performance much 
whether it's one 'larger' file or many smaller files. that said, with 
new proposed v3 serialisation format, a larger file has a greater 
requirement for additional padding which is not a good thing.

cheers,

-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gnocchi] typical length of timeseries data

2016-07-29 Thread Julien Danjou
On Thu, Jul 28 2016, gordon chung wrote:

> this is probably something to discuss on ops list as well eventually but 
> what do you think about shrinking the max size of timeseries chunks from 
> 14400 to something smaller? i'm curious to understand what the length of 
> the typical timeseries is. my main reason for bringing this up is that 
> even our default 'high' policy doesn't reach 14400 limit so it at most 
> will only split into two, partially filled objects. as we look to make a 
> more efficient storage format for v3(?) seems like this may be an 
> opportunity to change size as well (if necessary)

1 minute granularity over a year: 525600 points, 27 splits.
Even in that case, which is pretty precise, that's not a lot of split
I'd say.

> 14400 points roughly equals 128KB object which is cool but maybe we 
> should target something smaller? 7200points aka 64KB? 3600 points aka 
> 32KB? just for reference our biggest default series is 10080 points 
> (1min granularity over week).

It's 128 Kb if you don't compress, but if you do, it's usually way less.

> that said 128KB (at most) might not be that bad from read/write pov and 
> maybe it's ok to keep it at 14400? i know from the test i did earlier, 
> the time requirement to read/write increases linearly (7200 point object 
> takes roughly half time of 14400 point object)[1]. i think the main item 
> is we don't want it too small that we're updating multiple objects at a 
> time.

Best way is probably to do some bench… but I think it really depends on
the use cases here. The interest of having many small splits is that you
can parallelize the read.

Considering the compression ratio we have, I think we should split in
smaller files. I'd pick 3600 and give it a try.

-- 
Julien Danjou
// Free Software hacker
// https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [gnocchi] typical length of timeseries data

2016-07-28 Thread gordon chung
hi folks,

this is probably something to discuss on ops list as well eventually but 
what do you think about shrinking the max size of timeseries chunks from 
14400 to something smaller? i'm curious to understand what the length of 
the typical timeseries is. my main reason for bringing this up is that 
even our default 'high' policy doesn't reach 14400 limit so it at most 
will only split into two, partially filled objects. as we look to make a 
more efficient storage format for v3(?) seems like this may be an 
opportunity to change size as well (if necessary)

14400 points roughly equals 128KB object which is cool but maybe we 
should target something smaller? 7200points aka 64KB? 3600 points aka 
32KB? just for reference our biggest default series is 10080 points 
(1min granularity over week).

that said 128KB (at most) might not be that bad from read/write pov and 
maybe it's ok to keep it at 14400? i know from the test i did earlier, 
the time requirement to read/write increases linearly (7200 point object 
takes roughly half time of 14400 point object)[1]. i think the main item 
is we don't want it too small that we're updating multiple objects at a 
time.

[1] http://www.slideshare.net/GordonChung/gnocchi-profiling-v2/25

cheers,

-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev