Re: [Gluster-devel] I/O performance
Here are the results of the last run: https://docs.google.com/spreadsheets/d/19JqvuFKZxKifgrhLF-5-bgemYj8XKldUox1QwsmGj2k/edit?usp=sharing Each test has been run with a rough approximation of the best configuration I've found (in number of client and brick threads), but I haven't done an exhaustive search of the best configuration in each case. The "fio rand write" test seems to have a big regression. An initial check of the data shows that 2 of the 5 runs have taken > 50% more time. I'll try to check why. Many of the tests show a very high disk utilization, so comparisons may not be accurate. In any case it's clear that we need a method to automatically adjust the number of worker threads to the given load to make this useful. Without that it's virtually impossible to find a fixed number of threads that will work fine in all cases. I'm currently working on this. Xavi On Wed, Feb 13, 2019 at 11:34 AM Xavi Hernandez wrote: > On Tue, Feb 12, 2019 at 1:30 AM Vijay Bellur wrote: > >> >> >> On Tue, Feb 5, 2019 at 10:57 PM Xavi Hernandez >> wrote: >> >>> On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah < >>> pguru...@redhat.com> wrote: >>> On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez >>> wrote: > On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez > wrote: > >> On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah < >> pguru...@redhat.com> wrote: >> >>> Can the threads be categorised to do certain kinds of fops? >>> >> >> Could be, but creating multiple thread groups for different tasks is >> generally bad because many times you end up with lots of idle threads >> which >> waste resources and could increase contention. I think we should only >> differentiate threads if it's absolutely necessary. >> >> >>> Read/write affinitise to certain set of threads, the other metadata >>> fops to other set of threads. So we limit the read/write threads and not >>> the metadata threads? Also if aio is enabled in the backend the threads >>> will not be blocked on disk IO right? >>> >> >> If we don't block the thread but we don't prevent more requests to go >> to the disk, then we'll probably have the same problem. Anyway, I'll try >> to >> run some tests with AIO to see if anything changes. >> > > I've run some simple tests with AIO enabled and results are not good. > A simple dd takes >25% more time. Multiple parallel dd take 35% more time > to complete. > Thank you. That is strange! Had few questions, what tests are you running for measuring the io-threads performance(not particularly aoi)? is it dd from multiple clients? >>> >>> Yes, it's a bit strange. What I see is that many threads from the thread >>> pool are active but using very little CPU. I also see an AIO thread for >>> each brick, but its CPU usage is not big either. Wait time is always 0 (I >>> think this is a side effect of AIO activity). However system load grows >>> very high. I've seen around 50, while on the normal test without AIO it's >>> stays around 20-25. >>> >>> Right now I'm running the tests on a single machine (no real network >>> communication) using an NVMe disk as storage. I use a single mount point. >>> The tests I'm running are these: >>> >>>- Single dd, 128 GiB, blocks of 1MiB >>>- 16 parallel dd, 8 GiB per dd, blocks of 1MiB >>>- fio in sequential write mode, direct I/O, blocks of 128k, 16 >>>threads, 8GiB per file >>>- fio in sequential read mode, direct I/O, blocks of 128k, 16 >>>threads, 8GiB per file >>>- fio in random write mode, direct I/O, blocks of 128k, 16 threads, >>>8GiB per file >>>- fio in random read mode, direct I/O, blocks of 128k, 16 threads, >>>8GiB per file >>>- smallfile create, 16 threads, 256 files per thread, 32 MiB per >>>file (with one brick down, for the following test) >>>- self-heal of an entire brick (from the previous smallfile test) >>>- pgbench init phase with scale 100 >>> >>> I run all these tests for a replica 3 volume and a disperse 4+2 volume. >>> >> >> >> Are these performance results available somewhere? I am quite curious to >> understand the performance gains on NVMe! >> > > I'm updating test results with the latest build. I'll report it here once > it's complete. > > Xavi > >> ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Tue, Feb 12, 2019 at 1:30 AM Vijay Bellur wrote: > > > On Tue, Feb 5, 2019 at 10:57 PM Xavi Hernandez > wrote: > >> On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah >> wrote: >> >>> >>> >>> On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez >> wrote: >>> On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez wrote: > On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah < > pguru...@redhat.com> wrote: > >> Can the threads be categorised to do certain kinds of fops? >> > > Could be, but creating multiple thread groups for different tasks is > generally bad because many times you end up with lots of idle threads > which > waste resources and could increase contention. I think we should only > differentiate threads if it's absolutely necessary. > > >> Read/write affinitise to certain set of threads, the other metadata >> fops to other set of threads. So we limit the read/write threads and not >> the metadata threads? Also if aio is enabled in the backend the threads >> will not be blocked on disk IO right? >> > > If we don't block the thread but we don't prevent more requests to go > to the disk, then we'll probably have the same problem. Anyway, I'll try > to > run some tests with AIO to see if anything changes. > I've run some simple tests with AIO enabled and results are not good. A simple dd takes >25% more time. Multiple parallel dd take 35% more time to complete. >>> >>> >>> Thank you. That is strange! Had few questions, what tests are you >>> running for measuring the io-threads performance(not particularly aoi)? is >>> it dd from multiple clients? >>> >> >> Yes, it's a bit strange. What I see is that many threads from the thread >> pool are active but using very little CPU. I also see an AIO thread for >> each brick, but its CPU usage is not big either. Wait time is always 0 (I >> think this is a side effect of AIO activity). However system load grows >> very high. I've seen around 50, while on the normal test without AIO it's >> stays around 20-25. >> >> Right now I'm running the tests on a single machine (no real network >> communication) using an NVMe disk as storage. I use a single mount point. >> The tests I'm running are these: >> >>- Single dd, 128 GiB, blocks of 1MiB >>- 16 parallel dd, 8 GiB per dd, blocks of 1MiB >>- fio in sequential write mode, direct I/O, blocks of 128k, 16 >>threads, 8GiB per file >>- fio in sequential read mode, direct I/O, blocks of 128k, 16 >>threads, 8GiB per file >>- fio in random write mode, direct I/O, blocks of 128k, 16 threads, >>8GiB per file >>- fio in random read mode, direct I/O, blocks of 128k, 16 threads, >>8GiB per file >>- smallfile create, 16 threads, 256 files per thread, 32 MiB per file >>(with one brick down, for the following test) >>- self-heal of an entire brick (from the previous smallfile test) >>- pgbench init phase with scale 100 >> >> I run all these tests for a replica 3 volume and a disperse 4+2 volume. >> > > > Are these performance results available somewhere? I am quite curious to > understand the performance gains on NVMe! > I'm updating test results with the latest build. I'll report it here once it's complete. Xavi > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Tue, Feb 5, 2019 at 10:57 PM Xavi Hernandez wrote: > On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah > wrote: > >> >> >> On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez > wrote: >> >>> On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez >>> wrote: >>> On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah < pguru...@redhat.com> wrote: > Can the threads be categorised to do certain kinds of fops? > Could be, but creating multiple thread groups for different tasks is generally bad because many times you end up with lots of idle threads which waste resources and could increase contention. I think we should only differentiate threads if it's absolutely necessary. > Read/write affinitise to certain set of threads, the other metadata > fops to other set of threads. So we limit the read/write threads and not > the metadata threads? Also if aio is enabled in the backend the threads > will not be blocked on disk IO right? > If we don't block the thread but we don't prevent more requests to go to the disk, then we'll probably have the same problem. Anyway, I'll try to run some tests with AIO to see if anything changes. >>> >>> I've run some simple tests with AIO enabled and results are not good. A >>> simple dd takes >25% more time. Multiple parallel dd take 35% more time to >>> complete. >>> >> >> >> Thank you. That is strange! Had few questions, what tests are you running >> for measuring the io-threads performance(not particularly aoi)? is it dd >> from multiple clients? >> > > Yes, it's a bit strange. What I see is that many threads from the thread > pool are active but using very little CPU. I also see an AIO thread for > each brick, but its CPU usage is not big either. Wait time is always 0 (I > think this is a side effect of AIO activity). However system load grows > very high. I've seen around 50, while on the normal test without AIO it's > stays around 20-25. > > Right now I'm running the tests on a single machine (no real network > communication) using an NVMe disk as storage. I use a single mount point. > The tests I'm running are these: > >- Single dd, 128 GiB, blocks of 1MiB >- 16 parallel dd, 8 GiB per dd, blocks of 1MiB >- fio in sequential write mode, direct I/O, blocks of 128k, 16 >threads, 8GiB per file >- fio in sequential read mode, direct I/O, blocks of 128k, 16 threads, >8GiB per file >- fio in random write mode, direct I/O, blocks of 128k, 16 threads, >8GiB per file >- fio in random read mode, direct I/O, blocks of 128k, 16 threads, >8GiB per file >- smallfile create, 16 threads, 256 files per thread, 32 MiB per file >(with one brick down, for the following test) >- self-heal of an entire brick (from the previous smallfile test) >- pgbench init phase with scale 100 > > I run all these tests for a replica 3 volume and a disperse 4+2 volume. > Are these performance results available somewhere? I am quite curious to understand the performance gains on NVMe! Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
Thank You for all the detailed explanation. If its the disk saturating, if we run some of the above mentioned tests(with multithreads) on plain xfs, we should hit the saturation right. Will try out some tests, this is interesting. Thanks, Poornima On Wed, Feb 6, 2019 at 12:27 PM Xavi Hernandez wrote: > On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah > wrote: > >> >> >> On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez > wrote: >> >>> On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez >>> wrote: >>> On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah < pguru...@redhat.com> wrote: > Can the threads be categorised to do certain kinds of fops? > Could be, but creating multiple thread groups for different tasks is generally bad because many times you end up with lots of idle threads which waste resources and could increase contention. I think we should only differentiate threads if it's absolutely necessary. > Read/write affinitise to certain set of threads, the other metadata > fops to other set of threads. So we limit the read/write threads and not > the metadata threads? Also if aio is enabled in the backend the threads > will not be blocked on disk IO right? > If we don't block the thread but we don't prevent more requests to go to the disk, then we'll probably have the same problem. Anyway, I'll try to run some tests with AIO to see if anything changes. >>> >>> I've run some simple tests with AIO enabled and results are not good. A >>> simple dd takes >25% more time. Multiple parallel dd take 35% more time to >>> complete. >>> >> >> >> Thank you. That is strange! Had few questions, what tests are you running >> for measuring the io-threads performance(not particularly aoi)? is it dd >> from multiple clients? >> > > Yes, it's a bit strange. What I see is that many threads from the thread > pool are active but using very little CPU. I also see an AIO thread for > each brick, but its CPU usage is not big either. Wait time is always 0 (I > think this is a side effect of AIO activity). However system load grows > very high. I've seen around 50, while on the normal test without AIO it's > stays around 20-25. > > Right now I'm running the tests on a single machine (no real network > communication) using an NVMe disk as storage. I use a single mount point. > The tests I'm running are these: > >- Single dd, 128 GiB, blocks of 1MiB >- 16 parallel dd, 8 GiB per dd, blocks of 1MiB >- fio in sequential write mode, direct I/O, blocks of 128k, 16 >threads, 8GiB per file >- fio in sequential read mode, direct I/O, blocks of 128k, 16 threads, >8GiB per file >- fio in random write mode, direct I/O, blocks of 128k, 16 threads, >8GiB per file >- fio in random read mode, direct I/O, blocks of 128k, 16 threads, >8GiB per file >- smallfile create, 16 threads, 256 files per thread, 32 MiB per file >(with one brick down, for the following test) >- self-heal of an entire brick (from the previous smallfile test) >- pgbench init phase with scale 100 > > I run all these tests for a replica 3 volume and a disperse 4+2 volume. > > Xavi > > >> Regards, >> Poornima >> >> >>> Xavi >>> >>> All this is based on the assumption that large number of parallel read > writes make the disk perf bad but not the large number of dentry and > metadata ops. Is that true? > It depends. If metadata is not cached, it's as bad as a read or write since it requires a disk access (a clear example of this is the bad performance of 'ls' in cold cache, which is basically metadata reads). In fact, cached data reads are also very fast, and data writes could go to the cache and be updated later in background, so I think the important point is if things are cached or not, instead of if they are data or metadata. Since we don't have this information from the user side, it's hard to tell what's better. My opinion is that we shouldn't differentiate requests of data/metadata. If metadata requests happen to be faster, then that thread will be able to handle other requests immediately, which seems good enough. However there's one thing that I would do. I would differentiate reads (data or metadata) from writes. Normally writes come from cached information that is flushed to disk at some point, so this normally happens in the background. But reads tend to be in foreground, meaning that someone (user or application) is waiting for it. So I would give preference to reads over writes. To do so effectively, we need to not saturate the backend, otherwise when we need to send a read, it will still need to wait for all pending requests to complete. If disks are not saturated, we can have the answer to the read quite fast, and then continue processing the remaining writes. Anyway, I may be
Re: [Gluster-devel] I/O performance
On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah wrote: > > > On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez >> On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez >> wrote: >> >>> On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah < >>> pguru...@redhat.com> wrote: >>> Can the threads be categorised to do certain kinds of fops? >>> >>> Could be, but creating multiple thread groups for different tasks is >>> generally bad because many times you end up with lots of idle threads which >>> waste resources and could increase contention. I think we should only >>> differentiate threads if it's absolutely necessary. >>> >>> Read/write affinitise to certain set of threads, the other metadata fops to other set of threads. So we limit the read/write threads and not the metadata threads? Also if aio is enabled in the backend the threads will not be blocked on disk IO right? >>> >>> If we don't block the thread but we don't prevent more requests to go to >>> the disk, then we'll probably have the same problem. Anyway, I'll try to >>> run some tests with AIO to see if anything changes. >>> >> >> I've run some simple tests with AIO enabled and results are not good. A >> simple dd takes >25% more time. Multiple parallel dd take 35% more time to >> complete. >> > > > Thank you. That is strange! Had few questions, what tests are you running > for measuring the io-threads performance(not particularly aoi)? is it dd > from multiple clients? > Yes, it's a bit strange. What I see is that many threads from the thread pool are active but using very little CPU. I also see an AIO thread for each brick, but its CPU usage is not big either. Wait time is always 0 (I think this is a side effect of AIO activity). However system load grows very high. I've seen around 50, while on the normal test without AIO it's stays around 20-25. Right now I'm running the tests on a single machine (no real network communication) using an NVMe disk as storage. I use a single mount point. The tests I'm running are these: - Single dd, 128 GiB, blocks of 1MiB - 16 parallel dd, 8 GiB per dd, blocks of 1MiB - fio in sequential write mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - fio in sequential read mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - fio in random write mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - fio in random read mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - smallfile create, 16 threads, 256 files per thread, 32 MiB per file (with one brick down, for the following test) - self-heal of an entire brick (from the previous smallfile test) - pgbench init phase with scale 100 I run all these tests for a replica 3 volume and a disperse 4+2 volume. Xavi > Regards, > Poornima > > >> Xavi >> >> >>> All this is based on the assumption that large number of parallel read writes make the disk perf bad but not the large number of dentry and metadata ops. Is that true? >>> >>> It depends. If metadata is not cached, it's as bad as a read or write >>> since it requires a disk access (a clear example of this is the bad >>> performance of 'ls' in cold cache, which is basically metadata reads). In >>> fact, cached data reads are also very fast, and data writes could go to the >>> cache and be updated later in background, so I think the important point is >>> if things are cached or not, instead of if they are data or metadata. Since >>> we don't have this information from the user side, it's hard to tell what's >>> better. My opinion is that we shouldn't differentiate requests of >>> data/metadata. If metadata requests happen to be faster, then that thread >>> will be able to handle other requests immediately, which seems good enough. >>> >>> However there's one thing that I would do. I would differentiate reads >>> (data or metadata) from writes. Normally writes come from cached >>> information that is flushed to disk at some point, so this normally happens >>> in the background. But reads tend to be in foreground, meaning that someone >>> (user or application) is waiting for it. So I would give preference to >>> reads over writes. To do so effectively, we need to not saturate the >>> backend, otherwise when we need to send a read, it will still need to wait >>> for all pending requests to complete. If disks are not saturated, we can >>> have the answer to the read quite fast, and then continue processing the >>> remaining writes. >>> >>> Anyway, I may be wrong, since all these things depend on too many >>> factors. I haven't done any specific tests about this. It's more like a >>> brainstorming. As soon as I can I would like to experiment with this and >>> get some empirical data. >>> >>> Xavi >>> >>> Thanks, Poornima On Fri, Feb 1, 2019, 5:34 PM Emmanuel Dreyfus >>> > On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: > > Perhaps we could throttle both a
Re: [Gluster-devel] I/O performance
On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez > wrote: > >> On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah >> wrote: >> >>> Can the threads be categorised to do certain kinds of fops? >>> >> >> Could be, but creating multiple thread groups for different tasks is >> generally bad because many times you end up with lots of idle threads which >> waste resources and could increase contention. I think we should only >> differentiate threads if it's absolutely necessary. >> >> >>> Read/write affinitise to certain set of threads, the other metadata fops >>> to other set of threads. So we limit the read/write threads and not the >>> metadata threads? Also if aio is enabled in the backend the threads will >>> not be blocked on disk IO right? >>> >> >> If we don't block the thread but we don't prevent more requests to go to >> the disk, then we'll probably have the same problem. Anyway, I'll try to >> run some tests with AIO to see if anything changes. >> > > I've run some simple tests with AIO enabled and results are not good. A > simple dd takes >25% more time. Multiple parallel dd take 35% more time to > complete. > Thank you. That is strange! Had few questions, what tests are you running for measuring the io-threads performance(not particularly aoi)? is it dd from multiple clients? Regards, Poornima > Xavi > > >> All this is based on the assumption that large number of parallel read >>> writes make the disk perf bad but not the large number of dentry and >>> metadata ops. Is that true? >>> >> >> It depends. If metadata is not cached, it's as bad as a read or write >> since it requires a disk access (a clear example of this is the bad >> performance of 'ls' in cold cache, which is basically metadata reads). In >> fact, cached data reads are also very fast, and data writes could go to the >> cache and be updated later in background, so I think the important point is >> if things are cached or not, instead of if they are data or metadata. Since >> we don't have this information from the user side, it's hard to tell what's >> better. My opinion is that we shouldn't differentiate requests of >> data/metadata. If metadata requests happen to be faster, then that thread >> will be able to handle other requests immediately, which seems good enough. >> >> However there's one thing that I would do. I would differentiate reads >> (data or metadata) from writes. Normally writes come from cached >> information that is flushed to disk at some point, so this normally happens >> in the background. But reads tend to be in foreground, meaning that someone >> (user or application) is waiting for it. So I would give preference to >> reads over writes. To do so effectively, we need to not saturate the >> backend, otherwise when we need to send a read, it will still need to wait >> for all pending requests to complete. If disks are not saturated, we can >> have the answer to the read quite fast, and then continue processing the >> remaining writes. >> >> Anyway, I may be wrong, since all these things depend on too many >> factors. I haven't done any specific tests about this. It's more like a >> brainstorming. As soon as I can I would like to experiment with this and >> get some empirical data. >> >> Xavi >> >> >>> Thanks, >>> Poornima >>> >>> >>> On Fri, Feb 1, 2019, 5:34 PM Emmanuel Dreyfus >> On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: > Perhaps we could throttle both aspects - number of I/O requests per disk While there it would be nice to detect and report a disk with lower than peer performance: that happen sometimes when a disk is dying, and last time I was hit by that performance problem, I had a hard time finding the culprit. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel >>> ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez wrote: > On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah > wrote: > >> Can the threads be categorised to do certain kinds of fops? >> > > Could be, but creating multiple thread groups for different tasks is > generally bad because many times you end up with lots of idle threads which > waste resources and could increase contention. I think we should only > differentiate threads if it's absolutely necessary. > > >> Read/write affinitise to certain set of threads, the other metadata fops >> to other set of threads. So we limit the read/write threads and not the >> metadata threads? Also if aio is enabled in the backend the threads will >> not be blocked on disk IO right? >> > > If we don't block the thread but we don't prevent more requests to go to > the disk, then we'll probably have the same problem. Anyway, I'll try to > run some tests with AIO to see if anything changes. > I've run some simple tests with AIO enabled and results are not good. A simple dd takes >25% more time. Multiple parallel dd take 35% more time to complete. Xavi > All this is based on the assumption that large number of parallel read >> writes make the disk perf bad but not the large number of dentry and >> metadata ops. Is that true? >> > > It depends. If metadata is not cached, it's as bad as a read or write > since it requires a disk access (a clear example of this is the bad > performance of 'ls' in cold cache, which is basically metadata reads). In > fact, cached data reads are also very fast, and data writes could go to the > cache and be updated later in background, so I think the important point is > if things are cached or not, instead of if they are data or metadata. Since > we don't have this information from the user side, it's hard to tell what's > better. My opinion is that we shouldn't differentiate requests of > data/metadata. If metadata requests happen to be faster, then that thread > will be able to handle other requests immediately, which seems good enough. > > However there's one thing that I would do. I would differentiate reads > (data or metadata) from writes. Normally writes come from cached > information that is flushed to disk at some point, so this normally happens > in the background. But reads tend to be in foreground, meaning that someone > (user or application) is waiting for it. So I would give preference to > reads over writes. To do so effectively, we need to not saturate the > backend, otherwise when we need to send a read, it will still need to wait > for all pending requests to complete. If disks are not saturated, we can > have the answer to the read quite fast, and then continue processing the > remaining writes. > > Anyway, I may be wrong, since all these things depend on too many factors. > I haven't done any specific tests about this. It's more like a > brainstorming. As soon as I can I would like to experiment with this and > get some empirical data. > > Xavi > > >> Thanks, >> Poornima >> >> >> On Fri, Feb 1, 2019, 5:34 PM Emmanuel Dreyfus > >>> On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: >>> > Perhaps we could throttle both aspects - number of I/O requests per >>> disk >>> >>> While there it would be nice to detect and report a disk with lower than >>> peer performance: that happen sometimes when a disk is dying, and last >>> time I was hit by that performance problem, I had a hard time finding >>> the culprit. >>> >>> -- >>> Emmanuel Dreyfus >>> m...@netbsd.org >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >> ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah wrote: > Can the threads be categorised to do certain kinds of fops? > Could be, but creating multiple thread groups for different tasks is generally bad because many times you end up with lots of idle threads which waste resources and could increase contention. I think we should only differentiate threads if it's absolutely necessary. > Read/write affinitise to certain set of threads, the other metadata fops > to other set of threads. So we limit the read/write threads and not the > metadata threads? Also if aio is enabled in the backend the threads will > not be blocked on disk IO right? > If we don't block the thread but we don't prevent more requests to go to the disk, then we'll probably have the same problem. Anyway, I'll try to run some tests with AIO to see if anything changes. All this is based on the assumption that large number of parallel read > writes make the disk perf bad but not the large number of dentry and > metadata ops. Is that true? > It depends. If metadata is not cached, it's as bad as a read or write since it requires a disk access (a clear example of this is the bad performance of 'ls' in cold cache, which is basically metadata reads). In fact, cached data reads are also very fast, and data writes could go to the cache and be updated later in background, so I think the important point is if things are cached or not, instead of if they are data or metadata. Since we don't have this information from the user side, it's hard to tell what's better. My opinion is that we shouldn't differentiate requests of data/metadata. If metadata requests happen to be faster, then that thread will be able to handle other requests immediately, which seems good enough. However there's one thing that I would do. I would differentiate reads (data or metadata) from writes. Normally writes come from cached information that is flushed to disk at some point, so this normally happens in the background. But reads tend to be in foreground, meaning that someone (user or application) is waiting for it. So I would give preference to reads over writes. To do so effectively, we need to not saturate the backend, otherwise when we need to send a read, it will still need to wait for all pending requests to complete. If disks are not saturated, we can have the answer to the read quite fast, and then continue processing the remaining writes. Anyway, I may be wrong, since all these things depend on too many factors. I haven't done any specific tests about this. It's more like a brainstorming. As soon as I can I would like to experiment with this and get some empirical data. Xavi > Thanks, > Poornima > > > On Fri, Feb 1, 2019, 5:34 PM Emmanuel Dreyfus >> On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: >> > Perhaps we could throttle both aspects - number of I/O requests per disk >> >> While there it would be nice to detect and report a disk with lower than >> peer performance: that happen sometimes when a disk is dying, and last >> time I was hit by that performance problem, I had a hard time finding >> the culprit. >> >> -- >> Emmanuel Dreyfus >> m...@netbsd.org >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel >> > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
Can the threads be categorised to do certain kinds of fops? Read/write affinitise to certain set of threads, the other metadata fops to other set of threads. So we limit the read/write threads and not the metadata threads? Also if aio is enabled in the backend the threads will not be blocked on disk IO right? All this is based on the assumption that large number of parallel read writes make the disk perf bad but not the large number of dentry and metadata ops. Is that true? Thanks, Poornima On Fri, Feb 1, 2019, 5:34 PM Emmanuel Dreyfus On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: > > Perhaps we could throttle both aspects - number of I/O requests per disk > > While there it would be nice to detect and report a disk with lower than > peer performance: that happen sometimes when a disk is dying, and last > time I was hit by that performance problem, I had a hard time finding > the culprit. > > -- > Emmanuel Dreyfus > m...@netbsd.org > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: > Perhaps we could throttle both aspects - number of I/O requests per disk While there it would be nice to detect and report a disk with lower than peer performance: that happen sometimes when a disk is dying, and last time I was hit by that performance problem, I had a hard time finding the culprit. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Thu, Jan 31, 2019 at 11:12 PM Xavi Hernandez wrote: > On Fri, Feb 1, 2019 at 7:54 AM Vijay Bellur wrote: > >> >> >> On Thu, Jan 31, 2019 at 10:01 AM Xavi Hernandez >> wrote: >> >>> Hi, >>> >>> I've been doing some tests with the global thread pool [1], and I've >>> observed one important thing: >>> >>> Since this new thread pool has very low contention (apparently), it >>> exposes other problems when the number of threads grows. What I've seen is >>> that some workloads use all available threads on bricks to do I/O, causing >>> avgload to grow rapidly and saturating the machine (or it seems so), which >>> really makes everything slower. Reducing the maximum number of threads >>> improves performance actually. Other workloads, though, do little I/O >>> (probably most is locking or smallfile operations). In this case limiting >>> the number of threads to a small value causes a performance reduction. To >>> increase performance we need more threads. >>> >>> So this is making me thing that maybe we should implement some sort of >>> I/O queue with a maximum I/O depth for each brick (or disk if bricks share >>> same disk). This way we can limit the amount of requests physically >>> accessing the underlying FS concurrently, without actually limiting the >>> number of threads that can be doing other things on each brick. I think >>> this could improve performance. >>> >> >> Perhaps we could throttle both aspects - number of I/O requests per disk >> and the number of threads too? That way we will have the ability to behave >> well when there is bursty I/O to the same disk and when there are multiple >> concurrent requests to different disks. Do you have a reason to not limit >> the number of threads? >> > > No, in fact the global thread pool does have a limit for the number of > threads. I'm not saying to replace the thread limit for I/O depth control, > I think we need both. I think we need to clearly identify which threads are > doing I/O and limit them, even if there are more threads available. The > reason is easy: suppose we have a fixed number of threads. If we have heavy > load sent in parallel, it's quite possible that all threads get blocked > doing some I/O. This has two consequences: > >1. There are no more threads to execute other things, like sending >answers to the client, or start processing new incoming requests. So CPU is >underutilized. >2. Massive parallel access to a FS actually decreases performance > > This means that we can do less work and this work takes more time, which > is bad. > > If we limit the number of threads that can actually be doing FS I/O, it's > easy to keep FS responsive and we'll still have more threads to do other > work. > Got it, thx. > > >> >>> Maybe this approach could also be useful in client side, but I think >>> it's not so critical there. >>> >> >> Agree, rate limiting on the server side would be more appropriate. >> > > Only thing to consider here is that if we limit rate on servers but > clients can generate more requests without limit, we may require lots of > memory to track all ongoing requests. Anyway, I think this is not the most > important thing now, so if we solve the server-side problem, then we can > check if this is really needed or not (it could happen that client > applications limit themselves automatically because they will be waiting > for answers from server before sending more requests, unless the number of > application running concurrently is really huge). > We could enable throttling in the rpc layer to handle a client performing aggressive I/O. RPC throttling should be able to handle the scenario described above. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Fri, Feb 1, 2019 at 7:54 AM Vijay Bellur wrote: > > > On Thu, Jan 31, 2019 at 10:01 AM Xavi Hernandez > wrote: > >> Hi, >> >> I've been doing some tests with the global thread pool [1], and I've >> observed one important thing: >> >> Since this new thread pool has very low contention (apparently), it >> exposes other problems when the number of threads grows. What I've seen is >> that some workloads use all available threads on bricks to do I/O, causing >> avgload to grow rapidly and saturating the machine (or it seems so), which >> really makes everything slower. Reducing the maximum number of threads >> improves performance actually. Other workloads, though, do little I/O >> (probably most is locking or smallfile operations). In this case limiting >> the number of threads to a small value causes a performance reduction. To >> increase performance we need more threads. >> >> So this is making me thing that maybe we should implement some sort of >> I/O queue with a maximum I/O depth for each brick (or disk if bricks share >> same disk). This way we can limit the amount of requests physically >> accessing the underlying FS concurrently, without actually limiting the >> number of threads that can be doing other things on each brick. I think >> this could improve performance. >> > > Perhaps we could throttle both aspects - number of I/O requests per disk > and the number of threads too? That way we will have the ability to behave > well when there is bursty I/O to the same disk and when there are multiple > concurrent requests to different disks. Do you have a reason to not limit > the number of threads? > No, in fact the global thread pool does have a limit for the number of threads. I'm not saying to replace the thread limit for I/O depth control, I think we need both. I think we need to clearly identify which threads are doing I/O and limit them, even if there are more threads available. The reason is easy: suppose we have a fixed number of threads. If we have heavy load sent in parallel, it's quite possible that all threads get blocked doing some I/O. This has two consequences: 1. There are no more threads to execute other things, like sending answers to the client, or start processing new incoming requests. So CPU is underutilized. 2. Massive parallel access to a FS actually decreases performance This means that we can do less work and this work takes more time, which is bad. If we limit the number of threads that can actually be doing FS I/O, it's easy to keep FS responsive and we'll still have more threads to do other work. > >> Maybe this approach could also be useful in client side, but I think it's >> not so critical there. >> > > Agree, rate limiting on the server side would be more appropriate. > Only thing to consider here is that if we limit rate on servers but clients can generate more requests without limit, we may require lots of memory to track all ongoing requests. Anyway, I think this is not the most important thing now, so if we solve the server-side problem, then we can check if this is really needed or not (it could happen that client applications limit themselves automatically because they will be waiting for answers from server before sending more requests, unless the number of application running concurrently is really huge). Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Thu, Jan 31, 2019 at 10:01 AM Xavi Hernandez wrote: > Hi, > > I've been doing some tests with the global thread pool [1], and I've > observed one important thing: > > Since this new thread pool has very low contention (apparently), it > exposes other problems when the number of threads grows. What I've seen is > that some workloads use all available threads on bricks to do I/O, causing > avgload to grow rapidly and saturating the machine (or it seems so), which > really makes everything slower. Reducing the maximum number of threads > improves performance actually. Other workloads, though, do little I/O > (probably most is locking or smallfile operations). In this case limiting > the number of threads to a small value causes a performance reduction. To > increase performance we need more threads. > > So this is making me thing that maybe we should implement some sort of I/O > queue with a maximum I/O depth for each brick (or disk if bricks share same > disk). This way we can limit the amount of requests physically accessing > the underlying FS concurrently, without actually limiting the number of > threads that can be doing other things on each brick. I think this could > improve performance. > Perhaps we could throttle both aspects - number of I/O requests per disk and the number of threads too? That way we will have the ability to behave well when there is bursty I/O to the same disk and when there are multiple concurrent requests to different disks. Do you have a reason to not limit the number of threads? > Maybe this approach could also be useful in client side, but I think it's > not so critical there. > Agree, rate limiting on the server side would be more appropriate. Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] I/O performance
Hi, I've been doing some tests with the global thread pool [1], and I've observed one important thing: Since this new thread pool has very low contention (apparently), it exposes other problems when the number of threads grows. What I've seen is that some workloads use all available threads on bricks to do I/O, causing avgload to grow rapidly and saturating the machine (or it seems so), which really makes everything slower. Reducing the maximum number of threads improves performance actually. Other workloads, though, do little I/O (probably most is locking or smallfile operations). In this case limiting the number of threads to a small value causes a performance reduction. To increase performance we need more threads. So this is making me thing that maybe we should implement some sort of I/O queue with a maximum I/O depth for each brick (or disk if bricks share same disk). This way we can limit the amount of requests physically accessing the underlying FS concurrently, without actually limiting the number of threads that can be doing other things on each brick. I think this could improve performance. Maybe this approach could also be useful in client side, but I think it's not so critical there. What do you think ? Xavi [1] https://review.gluster.org/c/glusterfs/+/20636 ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel