Re: repospanner and our Ansible repo
On Wed, Sep 18, 2019 at 9:58 AM Stephen John Smoogen wrote: > > On Wed, 18 Sep 2019 at 09:44, Randy Barlow > wrote: > > > > On Tue, 2019-09-17 at 19:01 -0400, Neal Gompa wrote: > > > Out of curiosity, do we know where the bottlenecks are in > > > repoSpanner? > > > In theory, the architecture of repoSpanner isn't supposed to be too > > > different from gitaly, so I'm curious where we're falling down. > > > > I believe it needs a more efficient way to store the git objects. As I > > understand it, it currently stores each one in its own file, resulting > > in a large number of small files. > > So my "hot-take probably wrong" look at things seems to indicate that > the reason it stores everything as a separate file is to make certain > git actions faster. When you pack the files, searches, diffs and other > checks become slower or memory intensive because you have to calculate > new deltas and other things 'lost' in the packing. > > Looking at the gitaly documents, I think that is the reason they have > multiple different types of in-memory caches at different layers. It > allows for both faster accesses but probably blows up the size of what > is needed for hardware. We have to be careful here because we don't > have a hardware reserve to dive into for more memory/cpu. > > I think that for gitlab.org (versus running a local gitlab) they also > use a lot of backend 'eventual' consistency caching. You push and it > begins to spread that out through the multiple regions it is housed. > The 'user' doesn't see this because the front end level just directs > you to the known hot caches for that particular pull/push request.. > but if you somehow were hardcoded to a region you might not see the > update/change for a while because it hasn't mirrored out completely. > That also would speed up push/pull/changes greatly and not something > we could 'duplicate'. > That definitely explains the performance consistency between repoSpanner and gitaly for my local deployment. So it's most likely related to how they simulate better performance as the backend catches up. That said, the most recent change to gitaly is that it now does hashed storage of git objects and does "fast forking" using alternates instead of storing as bare git repos and duplicating repos on disk. None of that changes the initial push for a unique repo. -- 真実はいつも一つ!/ Always, there's only one truth! ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
Re: repospanner and our Ansible repo
On Wed, 18 Sep 2019 at 09:44, Randy Barlow wrote: > > On Tue, 2019-09-17 at 19:01 -0400, Neal Gompa wrote: > > Out of curiosity, do we know where the bottlenecks are in > > repoSpanner? > > In theory, the architecture of repoSpanner isn't supposed to be too > > different from gitaly, so I'm curious where we're falling down. > > I believe it needs a more efficient way to store the git objects. As I > understand it, it currently stores each one in its own file, resulting > in a large number of small files. So my "hot-take probably wrong" look at things seems to indicate that the reason it stores everything as a separate file is to make certain git actions faster. When you pack the files, searches, diffs and other checks become slower or memory intensive because you have to calculate new deltas and other things 'lost' in the packing. Looking at the gitaly documents, I think that is the reason they have multiple different types of in-memory caches at different layers. It allows for both faster accesses but probably blows up the size of what is needed for hardware. We have to be careful here because we don't have a hardware reserve to dive into for more memory/cpu. I think that for gitlab.org (versus running a local gitlab) they also use a lot of backend 'eventual' consistency caching. You push and it begins to spread that out through the multiple regions it is housed. The 'user' doesn't see this because the front end level just directs you to the known hot caches for that particular pull/push request.. but if you somehow were hardcoded to a region you might not see the update/change for a while because it hasn't mirrored out completely. That also would speed up push/pull/changes greatly and not something we could 'duplicate'. -- Stephen J Smoogen. ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
Re: repospanner and our Ansible repo
On Tue, 2019-09-17 at 19:01 -0400, Neal Gompa wrote: > Out of curiosity, do we know where the bottlenecks are in > repoSpanner? > In theory, the architecture of repoSpanner isn't supposed to be too > different from gitaly, so I'm curious where we're falling down. I believe it needs a more efficient way to store the git objects. As I understand it, it currently stores each one in its own file, resulting in a large number of small files. signature.asc Description: This is a digitally signed message part ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
Re: repospanner and our Ansible repo
On Tue, 17 Sep 2019 at 19:02, Neal Gompa wrote: > > On Tue, Sep 17, 2019 at 6:47 PM Randy Barlow > wrote: > > > > I don't expect it would be useful to perform this test with GitHub > > since I'd expect essentially the same results (bottlenecked on my home > > internet connection). > > Out of curiosity, do we know where the bottlenecks are in repoSpanner? > In theory, the architecture of repoSpanner isn't supposed to be too > different from gitaly, so I'm curious where we're falling down. > > Looking at the architecture of gitaly, there seems to be a redis? cache in front of the gitaly and file cache behind it. If I read that correctly than those are things which would make things seem much faster as they would hold things in faster memory access that the gitaly would be interfacing with. However, that is just a rough look at what is written up versus a domain knowledge. -- Stephen J Smoogen. ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
Re: repospanner and our Ansible repo
stickster asked me today how these numbers would compare to Git{Hub,Lab}. I did a bit of testing with GitLab just now. Note that this isn't a particularly apples to apples test, because my repospanner nodes were on the same virtual host, and my git client was on a 1 Gbps LAN with them. My GitLab test results are from my house, where I only have a 60x6 Mbps connection to the Internet, and of course, higher latency. I considered testing from batcave01 to get higher bandwidth, but I didn't want to try to figure out a safe way to use my GitLab credentials on a shared server and I didn't want to make a throw away account just to test this. On Mon, 2019-09-16 at 18:51 -0400, Randy Barlow wrote: > I pushed the Ansible repository into it. This took a very long time: > 298m2.157s! This took 6m44.705s to get to GitLab. However, since I only have 6 Mbps outbound and the repository is 268.43 MiB, I calculate that almost all of this time was just due to waiting on my outbound pipe. > The next test was to see how long it takes to clone our repo. I did > this on another machine on the same LAN (so again, ideal network > latency) and it took 2m27.433s. This took 0m40.359s, and again, almost all of the time was just due to how long it would take to send that much data over a 60 Mpbs link. > Next, I made a small commit (just added/deleted some lines) and > pushed > it into the cluster. This went reasonably quick at 0.366s, which I > think we would be OK with. This took 1.443s to GitLab, and I bet most of it was just latency/round trip crypto setup time. > The last test I performed was to see how quickly another checkout > could > pull that commit, and this was again a speed I might consider to be a > bit slow at 4.931s, especially considering that the commit was small > and was only one. This took 0m1.523s to GitLab, and I bet most of it was just latency/round trip crypto setup time. I don't expect it would be useful to perform this test with GitHub since I'd expect essentially the same results (bottlenecked on my home internet connection). signature.asc Description: This is a digitally signed message part ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
Re: repospanner and our Ansible repo
I don't contribute much to infra repo. Although I do pulls from time to time. Sending PR is definitely cool, but I think waiting 10's of seconds for pulling few commits is not very good. On Tue, Sep 17, 2019 at 1:01 AM Randy Barlow wrote: > > Greetings! > > Kevin asked me last week whether we are ready to move our > infrastructure Ansible repository into repospanner. The benefit of > moving it into repospanner is that it is one way to enable us to allow > pull requests into the repository, which I think would be nice. > > repospanner seems to work correctly as a git server, but it does need > improvements in its performance, so I offered to do a little > benchmarking with our Ansible repo and repospanner to see what kind of > performance we might see. > > I deployed a 3-node repospanner cluster today on fairly high > performance hardware (SSD storage). It was three VMs on the same > physical machine. Note that due to my test setup, network latency was > about as good as it could get, and so was storage iops. I believe the > performance bottlenecks will depend heavily on storage iops. Thus, this > hardware is not really a great way to predict how the performance might > be if we deployed into our infra, but it was easy for me to do and get > a "best case" performance benchmark. I am willing to attempt to > replicate this test on more realistic hardware in our infra if we want > more realistic data for our own use case. > > I pushed the Ansible repository into it. This took a very long time: > 298m2.157s! If we were to deploy nodes in different geos and use NAS > storage, I believe this would take longer. The good thing is that we'd > only need to do this operation once, if we were to decide to proceed. > > The next test was to see how long it takes to clone our repo. I did > this on another machine on the same LAN (so again, ideal network > latency) and it took 2m27.433s. That's a pretty long time too I'd say, > but maybe liveable? This would impact every contributor who wanted to > clone us, so I'll let the list debate whether that is acceptable. > > Next, I made a small commit (just added/deleted some lines) and pushed > it into the cluster. This went reasonably quick at 0.366s, which I > think we would be OK with. > > The last test I performed was to see how quickly another checkout could > pull that commit, and this was again a speed I might consider to be a > bit slow at 4.931s, especially considering that the commit was small > and was only one. I would expect this to be somewhat proportional to > the amount of change that has happened since the user last fetched, and > this repo does see a lot of activity. So I might expect git pull to > take 10's of seconds for contributors who are fairly active and pull > once every few days or so, and maybe longer for users who pull less > frequently. > > The repo copy I tested with has 199717 objects and 132918 deltas in it. > repospanner performance seems to be fairly proportionally correlated > with these numbers, as the bodhi repo pushed into it in about an hour > and has 50kish objects, iirc (didn't write it down, so from memory). > > I personally am on the fence about whether we should proceed at this > time. I am certain that people will notice the speed issues, and I also > expect that it will be slower than the numbers I listed above since my > tests were done on consumer hardware. But it would also be pretty sweet > if we had pull requests on the repo. > > Improving repospanner's performance is a goal I am focusing on, so if > we deployed it now I would hopefully be able to get it into better > shape soon. Alternatively, we hopefully wouldn't have to wait that long > if we wanted to wait for performance fixes before proceeding. I could > see either decision being reasonable. > > To reiterate, I'd be willing to replicate the tests I did above on > infra hardware if we are on the fence about the numbers I've reported > here and want to see more realistic numbers to make a final decision. I > think that would give us more realistic numbers since the tests I did > here were on a much more ideal situation, performance wise. > > What do others think? > ___ > infrastructure mailing list -- infrastructure@lists.fedoraproject.org > To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org ___ infrastructure mailing list
repospanner and our Ansible repo
Greetings! Kevin asked me last week whether we are ready to move our infrastructure Ansible repository into repospanner. The benefit of moving it into repospanner is that it is one way to enable us to allow pull requests into the repository, which I think would be nice. repospanner seems to work correctly as a git server, but it does need improvements in its performance, so I offered to do a little benchmarking with our Ansible repo and repospanner to see what kind of performance we might see. I deployed a 3-node repospanner cluster today on fairly high performance hardware (SSD storage). It was three VMs on the same physical machine. Note that due to my test setup, network latency was about as good as it could get, and so was storage iops. I believe the performance bottlenecks will depend heavily on storage iops. Thus, this hardware is not really a great way to predict how the performance might be if we deployed into our infra, but it was easy for me to do and get a "best case" performance benchmark. I am willing to attempt to replicate this test on more realistic hardware in our infra if we want more realistic data for our own use case. I pushed the Ansible repository into it. This took a very long time: 298m2.157s! If we were to deploy nodes in different geos and use NAS storage, I believe this would take longer. The good thing is that we'd only need to do this operation once, if we were to decide to proceed. The next test was to see how long it takes to clone our repo. I did this on another machine on the same LAN (so again, ideal network latency) and it took 2m27.433s. That's a pretty long time too I'd say, but maybe liveable? This would impact every contributor who wanted to clone us, so I'll let the list debate whether that is acceptable. Next, I made a small commit (just added/deleted some lines) and pushed it into the cluster. This went reasonably quick at 0.366s, which I think we would be OK with. The last test I performed was to see how quickly another checkout could pull that commit, and this was again a speed I might consider to be a bit slow at 4.931s, especially considering that the commit was small and was only one. I would expect this to be somewhat proportional to the amount of change that has happened since the user last fetched, and this repo does see a lot of activity. So I might expect git pull to take 10's of seconds for contributors who are fairly active and pull once every few days or so, and maybe longer for users who pull less frequently. The repo copy I tested with has 199717 objects and 132918 deltas in it. repospanner performance seems to be fairly proportionally correlated with these numbers, as the bodhi repo pushed into it in about an hour and has 50kish objects, iirc (didn't write it down, so from memory). I personally am on the fence about whether we should proceed at this time. I am certain that people will notice the speed issues, and I also expect that it will be slower than the numbers I listed above since my tests were done on consumer hardware. But it would also be pretty sweet if we had pull requests on the repo. Improving repospanner's performance is a goal I am focusing on, so if we deployed it now I would hopefully be able to get it into better shape soon. Alternatively, we hopefully wouldn't have to wait that long if we wanted to wait for performance fixes before proceeding. I could see either decision being reasonable. To reiterate, I'd be willing to replicate the tests I did above on infra hardware if we are on the fence about the numbers I've reported here and want to see more realistic numbers to make a final decision. I think that would give us more realistic numbers since the tests I did here were on a much more ideal situation, performance wise. What do others think? signature.asc Description: This is a digitally signed message part ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org