Re: [rdo-users] Poor Ceph Performance
Hi Donny, Thank you for the reply. > What kind of images are you using? The image used is CentOS 7 cloud image in RAW format (approx. 8GB in size). >Also how are you uploading the images? I was uploading the image file from the undercloud node. Thank you very much. Best regards, Cody On Mon, Nov 26, 2018 at 10:57 AM Donny Davis wrote: > > Also how are you uploading the images? > > On Mon, Nov 26, 2018 at 10:54 AM Donny Davis wrote: >> >> What kind of images are you using? >> >> On Mon, Nov 26, 2018 at 9:14 AM John Fulton wrote: >>> >>> On Sun, Nov 25, 2018 at 11:29 PM Cody wrote: >>> > >>> > Hello, >>> > >>> > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD >>> > as backend. While all essential functions work, services involving >>> > Ceph are getting very poor performance. E.g., it takes several hours >>> > to upload an 8GB image into Cinder and about 20 minutes to completely >>> > boot up an instance (from launch to ssh ready). >>> > >>> > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image >>> > upload and read speed 2 MiB/s during instance launch. >>> > >>> > I used the default scheme for network isolation and a single 1G port >>> > for all VLAN traffics on each overcloud node. I haven't set jumbo >>> > frame on the storage network VLAN yet, but think the performance >>> > should not be this bad with MTU 1500. Something must be wrong. Any >>> > suggestions for debugging? >>> >>> Hi Cody, >>> >>> If you're using queens or rocky, then ceph luminous was deployed in >>> containers. Though tripleo did the overall deployment, ceph-ansible >>> would have done the actual ceph deployment and configuration and you >>> can determine the ceph-ansible version via 'rpm -q ceph-ansible' on >>> your undercloud. It probably makes sense for you to pass along what >>> you mentioned above in addition to some other info, which I'll note >>> below, to the ceph-users list >>> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be >>> focused on ceph itself. When you contact them (I'm on the list too) >>> also let them know the following: >>> >>> 1. How many OSD servers you have and how many OSDs per server >>> 2. What type of disks you're using per OSD and how you set up journaling >>> 3. Specs of your servers themselves (OpenStack controller servers w/ >>> CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU >>> info) >>> 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers? >>> If so, what did you override them to? >>> >>> TripleO can pass any parameter you would normally pass to ceph-ansible >>> as described in the following: >>> >>> >>> https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible >>> >>> So if you let them know things in terms of a containerized >>> ceph-ansible luminous deployment and the ceph.conf and they have >>> suggestions, then you can apply the suggestions back to ceph-ansible >>> through tripleo as described above. If you start troubleshooting the >>> cluster as per this troubleshooting guide [2] and share the results >>> that would also help. >>> >>> I've gotten better performance than you describe on a completely >>> virtualized deployment using my PC [1] using quickstart with the >>> defaults that TripleO passes using queens and rocky. Though, TripleO >>> tends to favor the defaults which ceph-ansible uses. However, with a >>> single 1G port for all network traffic I don't expect great >>> performance. >>> >>> Feel free to CC me when you email ceph-users and feel free to share on >>> rdo-users a link to the thread you started there in case anyone else >>> on this list is interested. >>> >>> John >>> >>> [1] >>> http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html >>> [2] >>> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf >>> >>> > Thank you very much. >>> > >>> > Best regards, >>> > Cody >>> > ___ >>> > users mailing list >>> > users@lists.rdoproject.org >>> > http://lists.rdoproject.org/mailman/listinfo/users >>> > >>> > To unsubscribe: users-unsubscr...@lists.rdoproject.org >>> ___ >>> users mailing list >>> users@lists.rdoproject.org >>> http://lists.rdoproject.org/mailman/listinfo/users >>> >>> To unsubscribe: users-unsubscr...@lists.rdoproject.org ___ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: users-unsubscr...@lists.rdoproject.org
Re: [rdo-users] Poor Ceph Performance
Hi John, Thank you so much for the reply. I will make a post to the Ceph ML and add the link back to this thread. Here I am attaching the cluster specs below just for the reference. It uses 9 baremetal nodes (1 Undercloud, 3 Controllers (HA), 3 Ceph, 2 Compute) with following details: Undercloud & Compute nodes: CPU: E3-1230V2 @3.7GHz RAM: 16GB Ports: 1Gbps for provisioning; 1Gbps for external/VLANs Controller nodes (with Ceph mon & mgr) : CPU: 2 x E5-2603 @1.8GHz RAM: 16GB Ports: 1Gbps for provisioning; 1Gbps for VLANs Ceph nodes: CPU: 2 x E5-2603 @1.8GHz RAM: 16GB Ports: 1Gbps for provisioning; 1Gbps for VLANs Journaling: 1 SSD (SATA3, consumer grade) OSDs: 2 x 2TB @ 7200rpm (SATA3, consumer grade) Switch: HUAWEI S1700 Series (24 x 1Gbps ports, 56Gbps switching capacity) The gears are old and under-configured especially for their RAM capacity. But this is just for PoC with minimal usages and with no sign of CPU/RAM starvation during the test. On the software side, it is running Queens release. The ceph-ansible version is 3.1.6 and is using filestore with non-collocated setup. Best regards, Cody On Mon, Nov 26, 2018 at 9:13 AM John Fulton wrote: > > On Sun, Nov 25, 2018 at 11:29 PM Cody wrote: > > > > Hello, > > > > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD > > as backend. While all essential functions work, services involving > > Ceph are getting very poor performance. E.g., it takes several hours > > to upload an 8GB image into Cinder and about 20 minutes to completely > > boot up an instance (from launch to ssh ready). > > > > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image > > upload and read speed 2 MiB/s during instance launch. > > > > I used the default scheme for network isolation and a single 1G port > > for all VLAN traffics on each overcloud node. I haven't set jumbo > > frame on the storage network VLAN yet, but think the performance > > should not be this bad with MTU 1500. Something must be wrong. Any > > suggestions for debugging? > > Hi Cody, > > If you're using queens or rocky, then ceph luminous was deployed in > containers. Though tripleo did the overall deployment, ceph-ansible > would have done the actual ceph deployment and configuration and you > can determine the ceph-ansible version via 'rpm -q ceph-ansible' on > your undercloud. It probably makes sense for you to pass along what > you mentioned above in addition to some other info, which I'll note > below, to the ceph-users list > (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be > focused on ceph itself. When you contact them (I'm on the list too) > also let them know the following: > > 1. How many OSD servers you have and how many OSDs per server > 2. What type of disks you're using per OSD and how you set up journaling > 3. Specs of your servers themselves (OpenStack controller servers w/ > CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU > info) > 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers? > If so, what did you override them to? > > TripleO can pass any parameter you would normally pass to ceph-ansible > as described in the following: > > > https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible > > So if you let them know things in terms of a containerized > ceph-ansible luminous deployment and the ceph.conf and they have > suggestions, then you can apply the suggestions back to ceph-ansible > through tripleo as described above. If you start troubleshooting the > cluster as per this troubleshooting guide [2] and share the results > that would also help. > > I've gotten better performance than you describe on a completely > virtualized deployment using my PC [1] using quickstart with the > defaults that TripleO passes using queens and rocky. Though, TripleO > tends to favor the defaults which ceph-ansible uses. However, with a > single 1G port for all network traffic I don't expect great > performance. > > Feel free to CC me when you email ceph-users and feel free to share on > rdo-users a link to the thread you started there in case anyone else > on this list is interested. > > John > > [1] http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html > [2] > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf > > > Thank you very much. > > > > Best regards, > > Cody > > ___ > > users mailing list > > users@lists.rdoproject.org > > http://lists.rdoproject.org/mailman/listinfo/users > > > > To unsubscribe: users-unsubscr...@lists.rdoproject.org ___ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: users-unsubscr...@lists.rdoproject.org
Re: [rdo-users] Poor Ceph Performance
Also how are you uploading the images? On Mon, Nov 26, 2018 at 10:54 AM Donny Davis wrote: > What kind of images are you using? > > On Mon, Nov 26, 2018 at 9:14 AM John Fulton wrote: > >> On Sun, Nov 25, 2018 at 11:29 PM Cody wrote: >> > >> > Hello, >> > >> > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD >> > as backend. While all essential functions work, services involving >> > Ceph are getting very poor performance. E.g., it takes several hours >> > to upload an 8GB image into Cinder and about 20 minutes to completely >> > boot up an instance (from launch to ssh ready). >> > >> > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image >> > upload and read speed 2 MiB/s during instance launch. >> > >> > I used the default scheme for network isolation and a single 1G port >> > for all VLAN traffics on each overcloud node. I haven't set jumbo >> > frame on the storage network VLAN yet, but think the performance >> > should not be this bad with MTU 1500. Something must be wrong. Any >> > suggestions for debugging? >> >> Hi Cody, >> >> If you're using queens or rocky, then ceph luminous was deployed in >> containers. Though tripleo did the overall deployment, ceph-ansible >> would have done the actual ceph deployment and configuration and you >> can determine the ceph-ansible version via 'rpm -q ceph-ansible' on >> your undercloud. It probably makes sense for you to pass along what >> you mentioned above in addition to some other info, which I'll note >> below, to the ceph-users list >> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be >> focused on ceph itself. When you contact them (I'm on the list too) >> also let them know the following: >> >> 1. How many OSD servers you have and how many OSDs per server >> 2. What type of disks you're using per OSD and how you set up journaling >> 3. Specs of your servers themselves (OpenStack controller servers w/ >> CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU >> info) >> 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers? >> If so, what did you override them to? >> >> TripleO can pass any parameter you would normally pass to ceph-ansible >> as described in the following: >> >> >> https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible >> >> So if you let them know things in terms of a containerized >> ceph-ansible luminous deployment and the ceph.conf and they have >> suggestions, then you can apply the suggestions back to ceph-ansible >> through tripleo as described above. If you start troubleshooting the >> cluster as per this troubleshooting guide [2] and share the results >> that would also help. >> >> I've gotten better performance than you describe on a completely >> virtualized deployment using my PC [1] using quickstart with the >> defaults that TripleO passes using queens and rocky. Though, TripleO >> tends to favor the defaults which ceph-ansible uses. However, with a >> single 1G port for all network traffic I don't expect great >> performance. >> >> Feel free to CC me when you email ceph-users and feel free to share on >> rdo-users a link to the thread you started there in case anyone else >> on this list is interested. >> >> John >> >> [1] >> http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html >> [2] >> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf >> >> > Thank you very much. >> > >> > Best regards, >> > Cody >> > ___ >> > users mailing list >> > users@lists.rdoproject.org >> > http://lists.rdoproject.org/mailman/listinfo/users >> > >> > To unsubscribe: users-unsubscr...@lists.rdoproject.org >> ___ >> users mailing list >> users@lists.rdoproject.org >> http://lists.rdoproject.org/mailman/listinfo/users >> >> To unsubscribe: users-unsubscr...@lists.rdoproject.org >> > ___ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: users-unsubscr...@lists.rdoproject.org
Re: [rdo-users] Poor Ceph Performance
What kind of images are you using? On Mon, Nov 26, 2018 at 9:14 AM John Fulton wrote: > On Sun, Nov 25, 2018 at 11:29 PM Cody wrote: > > > > Hello, > > > > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD > > as backend. While all essential functions work, services involving > > Ceph are getting very poor performance. E.g., it takes several hours > > to upload an 8GB image into Cinder and about 20 minutes to completely > > boot up an instance (from launch to ssh ready). > > > > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image > > upload and read speed 2 MiB/s during instance launch. > > > > I used the default scheme for network isolation and a single 1G port > > for all VLAN traffics on each overcloud node. I haven't set jumbo > > frame on the storage network VLAN yet, but think the performance > > should not be this bad with MTU 1500. Something must be wrong. Any > > suggestions for debugging? > > Hi Cody, > > If you're using queens or rocky, then ceph luminous was deployed in > containers. Though tripleo did the overall deployment, ceph-ansible > would have done the actual ceph deployment and configuration and you > can determine the ceph-ansible version via 'rpm -q ceph-ansible' on > your undercloud. It probably makes sense for you to pass along what > you mentioned above in addition to some other info, which I'll note > below, to the ceph-users list > (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be > focused on ceph itself. When you contact them (I'm on the list too) > also let them know the following: > > 1. How many OSD servers you have and how many OSDs per server > 2. What type of disks you're using per OSD and how you set up journaling > 3. Specs of your servers themselves (OpenStack controller servers w/ > CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU > info) > 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers? > If so, what did you override them to? > > TripleO can pass any parameter you would normally pass to ceph-ansible > as described in the following: > > > https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible > > So if you let them know things in terms of a containerized > ceph-ansible luminous deployment and the ceph.conf and they have > suggestions, then you can apply the suggestions back to ceph-ansible > through tripleo as described above. If you start troubleshooting the > cluster as per this troubleshooting guide [2] and share the results > that would also help. > > I've gotten better performance than you describe on a completely > virtualized deployment using my PC [1] using quickstart with the > defaults that TripleO passes using queens and rocky. Though, TripleO > tends to favor the defaults which ceph-ansible uses. However, with a > single 1G port for all network traffic I don't expect great > performance. > > Feel free to CC me when you email ceph-users and feel free to share on > rdo-users a link to the thread you started there in case anyone else > on this list is interested. > > John > > [1] > http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html > [2] > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf > > > Thank you very much. > > > > Best regards, > > Cody > > ___ > > users mailing list > > users@lists.rdoproject.org > > http://lists.rdoproject.org/mailman/listinfo/users > > > > To unsubscribe: users-unsubscr...@lists.rdoproject.org > ___ > users mailing list > users@lists.rdoproject.org > http://lists.rdoproject.org/mailman/listinfo/users > > To unsubscribe: users-unsubscr...@lists.rdoproject.org > ___ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: users-unsubscr...@lists.rdoproject.org
[rdo-users] [Fedocal] Reminder meeting : RDO meeting
Dear all, You are kindly invited to the meeting: RDO meeting on 2018-11-28 from 15:00:00 to 16:00:00 UTC At r...@irc.freenode.net The meeting will be about: RDO IRC meeting [Agenda at https://etherpad.openstack.org/p/RDO-Meeting ](https://etherpad.openstack.org/p/RDO-Meeting) Every Wednesday on #rdo on Freenode IRC Source: https://apps.fedoraproject.org/calendar/meeting/8759/ ___ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: users-unsubscr...@lists.rdoproject.org
Re: [rdo-users] Poor Ceph Performance
On Sun, Nov 25, 2018 at 11:29 PM Cody wrote: > > Hello, > > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD > as backend. While all essential functions work, services involving > Ceph are getting very poor performance. E.g., it takes several hours > to upload an 8GB image into Cinder and about 20 minutes to completely > boot up an instance (from launch to ssh ready). > > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image > upload and read speed 2 MiB/s during instance launch. > > I used the default scheme for network isolation and a single 1G port > for all VLAN traffics on each overcloud node. I haven't set jumbo > frame on the storage network VLAN yet, but think the performance > should not be this bad with MTU 1500. Something must be wrong. Any > suggestions for debugging? Hi Cody, If you're using queens or rocky, then ceph luminous was deployed in containers. Though tripleo did the overall deployment, ceph-ansible would have done the actual ceph deployment and configuration and you can determine the ceph-ansible version via 'rpm -q ceph-ansible' on your undercloud. It probably makes sense for you to pass along what you mentioned above in addition to some other info, which I'll note below, to the ceph-users list (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be focused on ceph itself. When you contact them (I'm on the list too) also let them know the following: 1. How many OSD servers you have and how many OSDs per server 2. What type of disks you're using per OSD and how you set up journaling 3. Specs of your servers themselves (OpenStack controller servers w/ CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU info) 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers? If so, what did you override them to? TripleO can pass any parameter you would normally pass to ceph-ansible as described in the following: https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible So if you let them know things in terms of a containerized ceph-ansible luminous deployment and the ceph.conf and they have suggestions, then you can apply the suggestions back to ceph-ansible through tripleo as described above. If you start troubleshooting the cluster as per this troubleshooting guide [2] and share the results that would also help. I've gotten better performance than you describe on a completely virtualized deployment using my PC [1] using quickstart with the defaults that TripleO passes using queens and rocky. Though, TripleO tends to favor the defaults which ceph-ansible uses. However, with a single 1G port for all network traffic I don't expect great performance. Feel free to CC me when you email ceph-users and feel free to share on rdo-users a link to the thread you started there in case anyone else on this list is interested. John [1] http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html [2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf > Thank you very much. > > Best regards, > Cody > ___ > users mailing list > users@lists.rdoproject.org > http://lists.rdoproject.org/mailman/listinfo/users > > To unsubscribe: users-unsubscr...@lists.rdoproject.org ___ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: users-unsubscr...@lists.rdoproject.org
[rdo-users] [Fedocal] Reminder meeting : RDO Office Hours
Dear all, You are kindly invited to the meeting: RDO Office Hours on 2018-11-27 from 13:30:00 to 14:30:00 UTC The meeting will be about: The meeting will be about RDO Office Hour. Aim: To keep up with increasing participation, we'll host office hours to add more easy fixes and provide mentoring to newcomers. [Agenda at RDO Office Hour easyfixes](https://review.rdoproject.org/etherpad/p/rdo-office-hour-easyfixes) Source: https://apps.fedoraproject.org/calendar/meeting/6374/ ___ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users To unsubscribe: users-unsubscr...@lists.rdoproject.org