Re: [openstack-dev] [gnocchi] typical length of timeseries data
On Tue, Aug 02 2016, gordon chung wrote: > so from very rough testing, we can choose to lower it to 3600points which > offers better split opportunities with negligible improvement/degradation, or > even more to 900points with potentially small write degradation (massive > batching). 3600 points sounds nice. :) -- Julien Danjou -- Free Software hacker -- https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gnocchi] typical length of timeseries data
On 29/07/16 03:29 PM, gordon chung wrote: i'm using Ceph. but i should mention i also only have 1 thread enabled because python+threading is... yeah. i'll give it a try again with threads enabled. I tried this again with 16 threads. as expected, python (2.7.x) threads do jack all. i also tried lowering the points per object to 900 (~8KB max). this performed ~4% worse for read/writes. i should probably add a disclaimer that i'm batching 75K points/metric at once which is probably not normal. so from very rough testing, we can choose to lower it to 3600points which offers better split opportunities with negligible improvement/degradation, or even more to 900points with potentially small write degradation (massive batching). -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gnocchi] typical length of timeseries data
Just an FYI that might be the reason for the 14400: 1440 is the number of minutes in a day. 14400 would be tenths of minutes in a day of number of 6second chunks (huh???) So, the number was picked to divide files in human logical, not computer logical chunks. --Rocky -Original Message- From: gordon chung [mailto:g...@live.ca] Sent: Thursday, July 28, 2016 3:05 PM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [gnocchi] typical length of timeseries data hi folks, this is probably something to discuss on ops list as well eventually but what do you think about shrinking the max size of timeseries chunks from 14400 to something smaller? i'm curious to understand what the length of the typical timeseries is. my main reason for bringing this up is that even our default 'high' policy doesn't reach 14400 limit so it at most will only split into two, partially filled objects. as we look to make a more efficient storage format for v3(?) seems like this may be an opportunity to change size as well (if necessary) 14400 points roughly equals 128KB object which is cool but maybe we should target something smaller? 7200points aka 64KB? 3600 points aka 32KB? just for reference our biggest default series is 10080 points (1min granularity over week). that said 128KB (at most) might not be that bad from read/write pov and maybe it's ok to keep it at 14400? i know from the test i did earlier, the time requirement to read/write increases linearly (7200 point object takes roughly half time of 14400 point object)[1]. i think the main item is we don't want it too small that we're updating multiple objects at a time. [1] http://www.slideshare.net/GordonChung/gnocchi-profiling-v2/25 cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gnocchi] typical length of timeseries data
On 29/07/2016 12:20 PM, Julien Danjou wrote: > On Fri, Jul 29 2016, gordon chung wrote: > >> so at first glance, it doesn't really seem to affect performance much >> whether it's one 'larger' file or many smaller files. > > I guess it's because your storage system latency (file?) does not make a > difference. I imagine that over Swift or Ceph, it might change things a > bit. > > If you add time.sleep(1) in _get_measures(), you'd see a difference. ;) > i'm using Ceph. but i should mention i also only have 1 thread enabled because python+threading is... yeah. i'll give it a try again with threads enabled. cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gnocchi] typical length of timeseries data
On Fri, Jul 29 2016, gordon chung wrote: > so at first glance, it doesn't really seem to affect performance much > whether it's one 'larger' file or many smaller files. I guess it's because your storage system latency (file?) does not make a difference. I imagine that over Swift or Ceph, it might change things a bit. If you add time.sleep(1) in _get_measures(), you'd see a difference. ;) -- Julien Danjou ;; Free Software hacker ;; https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gnocchi] typical length of timeseries data
On 29/07/2016 5:00 AM, Julien Danjou wrote: > Best way is probably to do some bench… but I think it really depends on > the use cases here. The interest of having many small splits is that you > can parallelize the read. > > Considering the compression ratio we have, I think we should split in > smaller files. I'd pick 3600 and give it a try. i gave this a quick try with a series of ~68k points with object size of 14400 points (uncompressed), i got: [gchung@gchung-dev ~(keystone_admin)]$ time gnocchi measures show dc51c402-67e6-4b28-aba0-9d46b35b5397 --granularity 60 &> /tmp/blah real0m6.398s user0m5.003s sys 0m0.071s it took ~39.45s to process into 24 different aggregated series and created 6 split objects. with object size of 3600 points (uncompressed), i got: [gchung@gchung-dev ~(keystone_admin)]$ time gnocchi measures show 301947fd-97ee-428a-b445-41a67ee62c38 --granularity 60 &> /tmp/blah real0m6.495s user0m4.970s sys 0m0.073s it took ~39.89s to process into 24 different aggregated series and created 21 split objects so at first glance, it doesn't really seem to affect performance much whether it's one 'larger' file or many smaller files. that said, with new proposed v3 serialisation format, a larger file has a greater requirement for additional padding which is not a good thing. cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gnocchi] typical length of timeseries data
On Thu, Jul 28 2016, gordon chung wrote: > this is probably something to discuss on ops list as well eventually but > what do you think about shrinking the max size of timeseries chunks from > 14400 to something smaller? i'm curious to understand what the length of > the typical timeseries is. my main reason for bringing this up is that > even our default 'high' policy doesn't reach 14400 limit so it at most > will only split into two, partially filled objects. as we look to make a > more efficient storage format for v3(?) seems like this may be an > opportunity to change size as well (if necessary) 1 minute granularity over a year: 525600 points, 27 splits. Even in that case, which is pretty precise, that's not a lot of split I'd say. > 14400 points roughly equals 128KB object which is cool but maybe we > should target something smaller? 7200points aka 64KB? 3600 points aka > 32KB? just for reference our biggest default series is 10080 points > (1min granularity over week). It's 128 Kb if you don't compress, but if you do, it's usually way less. > that said 128KB (at most) might not be that bad from read/write pov and > maybe it's ok to keep it at 14400? i know from the test i did earlier, > the time requirement to read/write increases linearly (7200 point object > takes roughly half time of 14400 point object)[1]. i think the main item > is we don't want it too small that we're updating multiple objects at a > time. Best way is probably to do some bench… but I think it really depends on the use cases here. The interest of having many small splits is that you can parallelize the read. Considering the compression ratio we have, I think we should split in smaller files. I'd pick 3600 and give it a try. -- Julien Danjou // Free Software hacker // https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [gnocchi] typical length of timeseries data
hi folks, this is probably something to discuss on ops list as well eventually but what do you think about shrinking the max size of timeseries chunks from 14400 to something smaller? i'm curious to understand what the length of the typical timeseries is. my main reason for bringing this up is that even our default 'high' policy doesn't reach 14400 limit so it at most will only split into two, partially filled objects. as we look to make a more efficient storage format for v3(?) seems like this may be an opportunity to change size as well (if necessary) 14400 points roughly equals 128KB object which is cool but maybe we should target something smaller? 7200points aka 64KB? 3600 points aka 32KB? just for reference our biggest default series is 10080 points (1min granularity over week). that said 128KB (at most) might not be that bad from read/write pov and maybe it's ok to keep it at 14400? i know from the test i did earlier, the time requirement to read/write increases linearly (7200 point object takes roughly half time of 14400 point object)[1]. i think the main item is we don't want it too small that we're updating multiple objects at a time. [1] http://www.slideshare.net/GordonChung/gnocchi-profiling-v2/25 cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev