Re: [ceph-users] Best method to limit snapshot/clone space overhead
On 07/23/2015 06:31 AM, Jan Schermer wrote: Hi all, I am looking for a way to alleviate the overhead of RBD snapshots/clones for some time. In our scenario there are a few “master” volumes that contain production data, and are frequently snapshotted and cloned for dev/qa use. Those snapshots/clones live for a few days to a few weeks before they get dropped, and they sometimes grow very fast (databases, etc.). With the default 4MB object size there seems to be huge overhead involved with this, could someone give me some hints on how to solve that? I have some hope in 1) FIEMAP I’ve calculated that files on my OSDs are approx. 30% filled with NULLs - I suppose this is what it could save (best-scenario) and it should also make COW operations much faster. But there are lots of bugs in FIEMAP in kernels (i saw some reference to CentOS 6.5 kernel being buggy - which is what we use) and filesystems (like XFS). No idea about ext4 which we’d like to use in the future. Is enabling FIEMAP a good idea at all? I saw some mention of it being replaced with SEEK_DATA and SEEK_HOLE. fiemap (and ceph's use of it) has been buggy on all fses in the past. SEEK_DATA and SEEK_HOLE are the proper interfaces to use for these purposes. That said, it's not incredibly well tested since it's off by default, so I wouldn't recommend using it without careful testing on the fs you're using. I wouldn't expect it to make much of a difference if you use small objects. 2) object size 4MB for clones I did some quick performance testing and setting this lower for production is probably not a good idea. My sweet spot is 8MB object size, however this would make the overhead for clones even worse than it already is. But I could make the cloned images with a different block size from the snapshot (at least according to docs). Does someone use it like that? Any caveats? That way I could have the production data with 8MB block size but make the development snapshots with for example 64KiB granularity, probably at expense of some performance, but most of the data would remain in the (faster) master snapshot anyway. This should drop overhead tremendously, maybe even more than neabling FIEMAP. (Even better when working in tandem I suppose?) Since these clones are relatively short-lived this seems like a better way to go in the short term. 64k may be extreme, but if there aren't too many of these clones it's not a big deal. There is more overhead for recovery and scrub with smaller objects, so I wouldn't recommend using tiny objects in general. It'll be interesting to see your results. I'm not sure many folks have looked at optimizing this use case. Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph-deploy won't write journal if partition exists and using -- dmcrypt
Sorry for the broken post previously. I have looked into this more and it looks like ceph-deploy is not seeing that it is a partition and attempting to create an additional partition in the journals place. I read in the documentation that if I set osd journal size = 0, that it will assume that the target is a block device and use the entire block. I tried this and it still doesn't work. I have since zapped the journals and specified a 20G journal size. Now in my ceph-deploy line I just specify : ceph-deploy osd --dmcrypt --fs-type ${fs} create ${host}:${disk}:/dev/${journal_disk} IE:: ceph-deploy osd --dmcrypt --fs-type btrfs create kh28-1:sde:/dev/sdab ceph-deploy osd --dmcrypt --fs-type btrfs create kh28-1:sdf:/dev/sdab and ceph-deploy seems to try to create a new partition every time. I have now run into a new issue though. After ceph-deploy creates the partitions and seems to bootstrap the disks successfully it doest not mount them properly to create the journal. [ceph_deploy.osd][DEBUG ] Calling partprobe on zapped device /dev/sdr [1565/1920] [kh28-3.osdc.io][INFO ] Running command: sudo partprobe /dev/sdr [ceph_deploy.conf][DEBUG ] found configuration file at: /home/lacadmin/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.25): /usr/local/bin/ceph-deploy osd --dmcrypt --fs-type btrfs create kh28-3.osdc.io:sdr:/dev/sdp2 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks kh28-3.osdc.io:/dev/sdr:/dev/sdp2 [kh28-3.osdc.io][DEBUG ] connection detected need for sudo [kh28-3.osdc.io][DEBUG ] connected to host: kh28-3.osdc.io [kh28-3.osdc.io][DEBUG ] detect platform information from remote host [kh28-3.osdc.io][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty [ceph_deploy.osd][DEBUG ] Deploying osd to kh28-3.osdc.io [kh28-3.osdc.io][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [kh28-3.osdc.io][INFO ] Running command: sudo udevadm trigger --subsystem-match=block --action=add [ceph_deploy.osd][DEBUG ] Preparing host kh28-3.osdc.io disk /dev/sdr journal /dev/sdp2 activate True [kh28-3.osdc.io][INFO ] Running command: sudo ceph-disk -v prepare --fs-type btrfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdr /dev/sdp2 [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_btrfs [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_btrfs [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_btrfs [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type [kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Creating journal file /dev/sdp2 with size 0 (ceph-osd will resize and allocate) [kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Journal is file /dev/sdp2 [kh28-3.osdc.io][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data [kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdr [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:c1879421-bcd0-4419-bc96-63d2d51176db --typecode=1:89c57f98-2fe5-4dc0-89c1-5ec00ceff2be -- /dev/sdr [kh28-3.osdc.io][DEBUG ] The operation has completed successfully. [kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdr [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdr [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/udevadm settle [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/cryptsetup --batch-mode --key-file /etc/ceph/dmcrypt-keys/c1879421-bcd0-4419-bc96-63d2d51176db.luks.key luksFormat /dev/sdr1 [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/cryptsetup --key-file /etc/ceph/dmcrypt-keys/c1879421-bcd0-4419-bc96-63d2d51176db.luks.key luksOpen /dev/sdr1 c1879421-bcd0-4419-bc96-63d2d51176db [kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Creating btrfs fs on /dev/mapper/c1879421-bcd0-4419-bc96-63d2d51176db [kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/mkfs -t btrfs -m single -l 32768 -n 32768 -- /dev/mapper/c1879421-bcd0-4419-bc96-63d2d51176db [kh28-3.osdc.io][WARNIN] Turning ON
Re: [ceph-users] Issue in communication of swift client and radosgw
Hi, Please Respond Regards, Bindu On Fri, Jul 3, 2015 at 11:52 AM, Bindu Kharb bindu21in...@gmail.com wrote: Hi, I am trying to use swift as frontend with ceph storage. I have a small cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw on one of my machine and radosgw(gateway1) is also up and communicating with cluster. Now I have installed swift client and created user and subuser. But I am unable to get bucket for the user. Below is my config file : /etc/ceph/ceph.conf [global] public_network = 172.18.59.0/24 osd_pool_default_size = 2 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996 mon_host = 172.18.59.205 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true debug ms = 1 debug rgw = 20 [client.radosgw.gateway1] host = ceph-Veriton-Series #rgw_dns_name = 172.18.59.201 rgw_url = http://172.18.59.201:7481; #rgw_admin=admin keyring = /etc/ceph/keyring.radosgw.gateway1 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok #rgw frontends=civetweb port=7481 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log rgw print continue = false The file at location /etc/apache2/conf-available/gateway1.conf: VirtualHost *:80 ServerName 172.18.59.201.ceph-Veriton-Series ServerAdmin ceph@172.18.59.201 DocumentRoot /var/www # rewrting rules only need for amazon s3 RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/ceph-client.radosgw.gateway1.asok IfModule mod_fastcgi.c Directory /var/www Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off /Directory /IfModule AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off /VirtualHost My cluster state is: ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s cluster 17848c62-d69e-4991-a4dd-298358bb19ea health HEALTH_OK monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996= 172.18.59.205:6789/0}, election epoch 1, quorum 0 ceph4-Standard-PC-i440FX-PIIX-1996 osdmap e1071: 2 osds: 2 up, 2 in pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects 82106 MB used, 79394 MB / 166 GB avail 264 active+clean ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ user info: sudo radosgw-admin user info --uid=testuser { user_id: testuser, display_name: First User, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: testuser:swift, permissions: none}], keys: [ { user: testuser, access_key: NC4E8QHUSNDWDX18M6GB, secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T}, { user: testuser:swift, access_key: R8UYRI7HXNW05BTJE2N7, secret_key: }], swift_keys: [ { user: testuser:swift, secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} I am using below script to get bucket: import boto import boto.s3.connection access_key = 'NC4E8QHUSNDWDX18M6GB' secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = 'ceph-Veriton-Series', is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) bucket = conn.create_bucket('my-new-bucket') for bucket in conn.get_all_buckets(): print {name}\t{created}.format( name = bucket.name, created = bucket.creation_date, ) *When I run the script below is the error:* *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call last): File s3test.py, line 12, in modulebucket = conn.create_bucket('my-new-bucket') File /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in create_bucketresponse.status, response.reason, body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not AllowedNone* Below are the logs from log file: 015-07-03 11:46:38.940247 af9f7b40 1 -- 172.18.59.201:0/1001130 -- 172.18.59.204:6800/3675 --
[ceph-users] rbd image-meta
Hello i am trying to use the rbd image-meta set. i get an error from rbd that this command is not recognized yet it is documented in rdb documentation: http://ceph.com/docs/next/man/8/rbd/ I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04 Is image-meta set supported in rbd in Hammer release ? Any help much appreciated. /Maged ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Issue in communication of swift client and RADOSGW
Hi, I am trying to use swift as frontend with ceph storage. I have a small cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw on one of my machine and radosgw(gateway1) is also up and communicating with cluster. Now I have installed swift client and created user and subuser. But I am unable to get bucket for the user. Below is my config file : /etc/ceph/ceph.conf [global] public_network = 172.18.59.0/24 osd_pool_default_size = 2 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996 mon_host = 172.18.59.205 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true debug ms = 1 debug rgw = 20 [client.radosgw.gateway1] host = ceph-Veriton-Series #rgw_dns_name = 172.18.59.201 rgw_url = http://172.18.59.201:7481; #rgw_admin=admin keyring = /etc/ceph/keyring.radosgw.gateway1 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok #rgw frontends=civetweb port=7481 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log rgw print continue = false The file at location /etc/apache2/conf-available/gateway1.conf: VirtualHost *:80 ServerName 172.18.59.201.ceph-Veriton-Series ServerAdmin ceph@172.18.59.201 DocumentRoot /var/www # rewrting rules only need for amazon s3 RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/ceph-client.radosgw.gateway1.asok IfModule mod_fastcgi.c Directory /var/www Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off /Directory /IfModule AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off /VirtualHost My cluster state is: ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s cluster 17848c62-d69e-4991-a4dd-298358bb19ea health HEALTH_OK monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996= 172.18.59.205:6789/0}, election epoch 1, quorum 0 ceph4-Standard-PC-i440FX-PIIX-1996 osdmap e1071: 2 osds: 2 up, 2 in pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects 82106 MB used, 79394 MB / 166 GB avail 264 active+clean ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ user info: sudo radosgw-admin user info --uid=testuser { user_id: testuser, display_name: First User, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: testuser:swift, permissions: none}], keys: [ { user: testuser, access_key: NC4E8QHUSNDWDX18M6GB, secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T}, { user: testuser:swift, access_key: R8UYRI7HXNW05BTJE2N7, secret_key: }], swift_keys: [ { user: testuser:swift, secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} I am using below script to get bucket: import boto import boto.s3.connection access_key = 'NC4E8QHUSNDWDX18M6GB' secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = 'ceph-Veriton-Series', is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) bucket = conn.create_bucket('my-new-bucket') for bucket in conn.get_all_buckets(): print {name}\t{created}.format( name = bucket.name, created = bucket.creation_date, ) *When I run the script below is the error:* *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call last): File s3test.py, line 12, in modulebucket = conn.create_bucket('my-new-bucket') File /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in create_bucketresponse.status, response.reason, body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not AllowedNone* Below are the logs from log file: 015-07-03 11:46:38.940247 af9f7b40 1 -- 172.18.59.201:0/1001130 -- 172.18.59.204:6800/3675 -- osd_op(client.5314.0:126 gc.24 [call lock.unlock] 14.8bdc9d ondisk+write e1071) v4 -- ?+0 0xb3c08ae8 con 0xb7f6f1e8 2015-07-03 11:46:39.039568 b3bffb40 1 -- 172.18.59.201:0/1001130 == osd.0 172.18.59.204:6800/3675 109 osd_op_reply(126 gc.24 [call] v1071'856
Re: [ceph-users] Issue in communication of swift client and radosgw
Hi ceph users, Please respond to my query.. Regards, Bindu On Fri, Jul 3, 2015 at 11:52 AM, Bindu Kharb bindu21in...@gmail.com wrote: Hi, I am trying to use swift as frontend with ceph storage. I have a small cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw on one of my machine and radosgw(gateway1) is also up and communicating with cluster. Now I have installed swift client and created user and subuser. But I am unable to get bucket for the user. Below is my config file : /etc/ceph/ceph.conf [global] public_network = 172.18.59.0/24 osd_pool_default_size = 2 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996 mon_host = 172.18.59.205 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true debug ms = 1 debug rgw = 20 [client.radosgw.gateway1] host = ceph-Veriton-Series #rgw_dns_name = 172.18.59.201 rgw_url = http://172.18.59.201:7481; #rgw_admin=admin keyring = /etc/ceph/keyring.radosgw.gateway1 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok #rgw frontends=civetweb port=7481 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log rgw print continue = false The file at location /etc/apache2/conf-available/gateway1.conf: VirtualHost *:80 ServerName 172.18.59.201.ceph-Veriton-Series ServerAdmin ceph@172.18.59.201 DocumentRoot /var/www # rewrting rules only need for amazon s3 RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/ceph-client.radosgw.gateway1.asok IfModule mod_fastcgi.c Directory /var/www Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off /Directory /IfModule AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off /VirtualHost My cluster state is: ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s cluster 17848c62-d69e-4991-a4dd-298358bb19ea health HEALTH_OK monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996= 172.18.59.205:6789/0}, election epoch 1, quorum 0 ceph4-Standard-PC-i440FX-PIIX-1996 osdmap e1071: 2 osds: 2 up, 2 in pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects 82106 MB used, 79394 MB / 166 GB avail 264 active+clean ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ user info: sudo radosgw-admin user info --uid=testuser { user_id: testuser, display_name: First User, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: testuser:swift, permissions: none}], keys: [ { user: testuser, access_key: NC4E8QHUSNDWDX18M6GB, secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T}, { user: testuser:swift, access_key: R8UYRI7HXNW05BTJE2N7, secret_key: }], swift_keys: [ { user: testuser:swift, secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} I am using below script to get bucket: import boto import boto.s3.connection access_key = 'NC4E8QHUSNDWDX18M6GB' secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = 'ceph-Veriton-Series', is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) bucket = conn.create_bucket('my-new-bucket') for bucket in conn.get_all_buckets(): print {name}\t{created}.format( name = bucket.name, created = bucket.creation_date, ) *When I run the script below is the error:* *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call last): File s3test.py, line 12, in modulebucket = conn.create_bucket('my-new-bucket') File /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in create_bucketresponse.status, response.reason, body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not AllowedNone* Below are the logs from log file: 015-07-03 11:46:38.940247 af9f7b40 1 -- 172.18.59.201:0/1001130 -- 172.18.59.204:6800/3675
[ceph-users] [ANN] ceps-deploy 1.5.26 released
Hi everyone, This is announcing a new release of ceph-deploy that focuses on usability improvements. - Most of the help menus for ceph-deploy subcommands (e.ge. “ceph-deploy mon” and “ceph-deploy osd”) have been improved to be more context aware, such that help for “ceph-deploy osd create --help “ and “ceph-deploy osd zap --help” return different output specific to the command. Previously it would show generic help for “ceph-deploy osd”. Additionally, the list of optional arguments shown for the command are always correct for the subcommand in question. Previously the options shown were the aggregate of all options. - ceph-deploy now points to git.ceph.com for downloading GPG keys - ceph-deploy will now work on the Mint Linux distribution (by pointing to Ubuntu packages) - SUSE distro users will now be pointed to SUSE packages by default, as there have not been updated SUSE packages on ceph.com in quite some time. Full changelog is available at: http://ceph.com/ceph-deploy/docs/changelog.html#id1 New packages are available in the usual places of ceph.com hosted repos and PyPI. Cheers, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd image-meta
image metadata isn't supported by hammer, interfails supports On Mon, Jul 13, 2015 at 11:29 PM, Maged Mokhtar magedsmokh...@gmail.com wrote: Hello i am trying to use the rbd image-meta set. i get an error from rbd that this command is not recognized yet it is documented in rdb documentation: http://ceph.com/docs/next/man/8/rbd/ I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04 Is image-meta set supported in rbd in Hammer release ? Any help much appreciated. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Wheat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] debugging ceps-deploy warning: could not open file descriptor -1
The docker/distribution project runs a continuous integration VM using CircleCI, and part of the VM setup installs Ceph packages using ceph-deploy. This has been working well for quite a while, but we are seeing a failure running `ceph-deploy install --release hammer`. The snippet is here where it looks the first problem shows up. ... [box156][DEBUG ] Get:24 http://ceph.com/debian-hammer/ precise/main ceph-mds amd64 0.94.2-1precise [10.5 MB] [box156][DEBUG ] Get:25 http://ceph.com/debian-hammer/ precise/main radosgw amd64 0.94.2-1precise [3,619 kB] [box156][WARNIN] E: Could not open file descriptor -1 [box156][WARNIN] E: Prior errors apply to /var/cache/apt/archives/parted_2.3-19ubuntu1_amd64.deb ... On the surface it seems that the problem is coming from apt-get under the hood. Any pointers here? It doesn't seem like anything has changed configuration wise. The full build log can be found here which starts off with the ceph-deploy command that is failing: https://circleci.com/gh/docker/distribution/1848 Thanks, -Noah ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD
Hi, I use ceph 0.94 from wheezy repro (deb http://eu.ceph.com/debian-hammer wheezy main) inside jessie. 0.94.1 are installable without trouble, but an upgrade to 0.94.2 don't work correctly: dpkg -l | grep ceph ii ceph 0.94.1-1~bpo70+1 amd64 distributed storage and file system ii ceph-common0.94.2-1~bpo70+1 amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-fs-common 0.94.2-1~bpo70+1 amd64 common utilities to mount and interact with a ceph file system ii ceph-fuse 0.94.2-1~bpo70+1 amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 0.94.2-1~bpo70+1 amd64 metadata server for the ceph distributed file system ii libcephfs1 0.94.2-1~bpo70+1 amd64 Ceph distributed file system client library ii python-cephfs 0.94.2-1~bpo70+1 amd64 Python libraries for the Ceph libcephfs library This is the reason, why I switched back to wheezy (and clean 0.94.2) but than all OSDs on that node failed to start. Switching back to the jessie-system-disk don't solve this ploblem, because only 3 OSDs started again... My conclusion is, if now die one of my (partly brocken) jessie osd-node (like failed system ssd) I need less than an hour for a new system (wheezy), around two ours to reinitilize all OSDs (format new, install ceph) and around two days to refill the whole node. Udo Am 23.07.2015 13:21, schrieb Haomai Wang: Do you use upstream ceph version previously? Or do you shutdown running ceph-osd when upgrading osd? How many osds meet this problems? This assert failure means that osd detects a upgraded pg meta object but failed to read(or lack of 1 key) meta keys from object. On Thu, Jul 23, 2015 at 7:03 PM, Udo Lembke ulem...@polarzone.de wrote: Am 21.07.2015 12:06, schrieb Udo Lembke: Hi all, ... Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph and I'm back again... but this looks bad for me. Unfortunality the system also don't start 9 OSDs as I switched back to the old system-disk... (only three of the big OSDs are running well) What is the best solution for that? Empty one node (crush weight 0), fresh reinstall OS/ceph, reinitialise all OSDs? This will take a long long time, because we use 173TB in this cluster... Hi, answer myself if anybody has similiar issues and find the posting. Empty the whole nodes takes too long. I used the puppet wheezy system and have to recreate all OSDs (in this case I need to empty the first blocks of the journal before create the OSD again). Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Issue in communication of swift client and radosgw
Hi, I am trying to use swift as frontend with ceph storage. I have a small cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw on one of my machine and radosgw(gateway1) is also up and communicating with cluster. Now I have installed swift client and created user and subuser. But I am unable to get bucket for the user. Below is my config file : /etc/ceph/ceph.conf [global] public_network = 172.18.59.0/24 osd_pool_default_size = 2 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996 mon_host = 172.18.59.205 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true debug ms = 1 debug rgw = 20 [client.radosgw.gateway1] host = ceph-Veriton-Series #rgw_dns_name = 172.18.59.201 rgw_url = http://172.18.59.201:7481; #rgw_admin=admin keyring = /etc/ceph/keyring.radosgw.gateway1 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok #rgw frontends=civetweb port=7481 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log rgw print continue = false The file at location /etc/apache2/conf-available/gateway1.conf: VirtualHost *:80 ServerName 172.18.59.201.ceph-Veriton-Series ServerAdmin ceph@172.18.59.201 DocumentRoot /var/www # rewrting rules only need for amazon s3 RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/ceph-client.radosgw.gateway1.asok IfModule mod_fastcgi.c Directory /var/www Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off /Directory /IfModule AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off /VirtualHost My cluster state is: ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s cluster 17848c62-d69e-4991-a4dd-298358bb19ea health HEALTH_OK monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996= 172.18.59.205:6789/0}, election epoch 1, quorum 0 ceph4-Standard-PC-i440FX-PIIX-1996 osdmap e1071: 2 osds: 2 up, 2 in pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects 82106 MB used, 79394 MB / 166 GB avail 264 active+clean ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ user info: sudo radosgw-admin user info --uid=testuser { user_id: testuser, display_name: First User, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: testuser:swift, permissions: none}], keys: [ { user: testuser, access_key: NC4E8QHUSNDWDX18M6GB, secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T}, { user: testuser:swift, access_key: R8UYRI7HXNW05BTJE2N7, secret_key: }], swift_keys: [ { user: testuser:swift, secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} I am using below script to get bucket: import boto import boto.s3.connection access_key = 'NC4E8QHUSNDWDX18M6GB' secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = 'ceph-Veriton-Series', is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) bucket = conn.create_bucket('my-new-bucket') for bucket in conn.get_all_buckets(): print {name}\t{created}.format( name = bucket.name, created = bucket.creation_date, ) *When I run the script below is the error:* *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call last): File s3test.py, line 12, in modulebucket = conn.create_bucket('my-new-bucket') File /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in create_bucketresponse.status, response.reason, body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not AllowedNone* Below are the logs from log file: 015-07-03 11:46:38.940247 af9f7b40 1 -- 172.18.59.201:0/1001130 -- 172.18.59.204:6800/3675 -- osd_op(client.5314.0:126 gc.24 [call lock.unlock] 14.8bdc9d ondisk+write e1071) v4 -- ?+0 0xb3c08ae8 con 0xb7f6f1e8 2015-07-03 11:46:39.039568 b3bffb40 1 -- 172.18.59.201:0/1001130 == osd.0 172.18.59.204:6800/3675 109 osd_op_reply(126 gc.24 [call] v1071'856
Re: [ceph-users] RADOS + deep scrubbing performance issues in production environment
All IO drops to ZERO IOPS for 1-15 minutes during the deep-scrub on my cluster. There is clearly a locking bug! I have VMs - every day, several times, sometime on all of them disk IO _completely_ stops. Disk queue is growing, 0 IOPS are performed, services are dying with timeouts... At the same time the CEPH (where the VM images are stored) is doing a deep scrub. No fiddling with priorities and number of different threads are helping. Actually, making the scrub slower makes those delays longer - so there is clearly a bug with locking. I am experiencing this for two years already, since then we tried everything and upgraded our cluster several times! Nothing helps!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Enclosure power failure pausing client IO till all connected hosts up
Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Enclosure power failure pausing client IO till all connected hosts up
Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Multi-DC Ceph replication
Hi, We are trying to implement CEPH and have really huge issue with replication between DC. The issue we have is related to replication setup in our infrastructure, single region, 2 zones in different datacenter. While trying to configure replication we receive bellow message. We wonder if this replication is really working and if you know anyone who manage to have it configure and run on production? #39;, 'User-Agent': 'Boto/2.20.1 Python/2.7.6 Linux/3.13.0-45-generic', 'x-amz-copy-source': 'testowy_bucket/obiekt1.png', 'Date': 'Wed, 01 Jul 2015 12:57:03 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Authorization': 'AWS B:nD/AJX8ezov3qOCASK6Irz7yq30='} 2015-07-01 14:57:03,538 4782 [boto][DEBUG ] Path: /testowy_bucket/obiekt1.png?rgwx-op-id=rgw0%3A4775%3A3rgwx-source-zone=pl-krargwx-client-id=radosgw-agent 2015-07-01 14:57:03,538 4782 [boto][DEBUG ] Headers: {'Content-Type': 'application/json; charset=UTF-8', 'x-amz-copy-source': 'testowy_bucket/obiekt1.png'}x-amz-copy-source:testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,675 4782 [radosgw_agent.worker][DEBUG ] object testowy_bucket/obiekt1.png not found on master, deleting from secondary 2015-07-01 14:57:03,724 4782 [boto][DEBUG ] path=/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,724 4782 [boto][DEBUG ] auth_path=/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,725 4782 [boto][DEBUG ] Path: /testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,760 4782 [radosgw_agent.worker][DEBUG ] syncing object testowy_bucket/obiekt1.pngx-amz-copy-source:testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,761 4782 [boto][DEBUG ] url = 'http://s3.5stor-dc5-test.local/testowy_bucket/obiekt1.png'headers={'Content-Length'http://s3.5stor-dc5-test.local/testowy_bucket/obiekt1.png'headers=%7b'Content-Length';: '0', 'User-Agent': 'Boto/2.20.1 Python/2.7.6 Linux/3.13.0-45-generic', 'x-amz-copy-source': 'testowy_bucket/obiekt1.png', 'Date': 'Wed, 01 Jul 2015 12:57:03 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Authorization': 'AWS B:nD/AJX8ezov3qOCASK6Irz7yq30='} 2015-07-01 14:57:03,761 4782 [boto][DEBUG ] Path: /testowy_bucket/obiekt1.png?rgwx-op-id=rgw0%3A4775%3A4rgwx-source-zone=pl-krargwx-client-id=radosgw-agent 2015-07-01 14:57:03,761 4782 [boto][DEBUG ] Headers: {'Content-Type': 'application/json; charset=UTF-8', 'x-amz-copy-source': 'testowy_bucket/obiekt1.png'}x-amz-copy-source:testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,854 4782 [radosgw_agent.worker][DEBUG ] object testowy_bucket/obiekt1.png not found on master, deleting from secondary 2015-07-01 14:57:03,899 4782 [boto][DEBUG ] path=/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,899 4782 [boto][DEBUG ] auth_path=/testowy_bucket/obiekt1.png 2015-07-01 14:57:03,899 4782 [boto][DEBUG ] Path: /testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png [Opis: cid:C14BC2E9-C105-482E-A443-F0D6567EA2F8@allegrogroup.internal] Pawel Komorowski Product Owner M: x +48 664 434 518 Grunwaldzka 182 60-166 Poznan, Poland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd image-meta
Hello i am trying to use the rbd image-meta set. i get an error from rbd that this command is not recognized yet it is documented in rdb documentation: http://ceph.com/docs/next/man/8/rbd/ I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04 Is image-meta set supported in rbd in Hammer release ? Any help much appreciated. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Enclosure power failure pausing client IO till all connected hosts up
Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue in communication of swift client and radosgw
You should add the required capabilities to your user: # radosgw-admin caps add --uid=testuser --caps=users=* # radosgw-admin caps add --uid=testuser --caps=buckets=* # radosgw-admin caps add --uid=testuser --caps=metadata=* # radosgw-admin caps add --uid=testuser --caps=zone=* On 3 July 2015 at 08:22, Bindu Kharb bindu21in...@gmail.com wrote: Hi, I am trying to use swift as frontend with ceph storage. I have a small cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw on one of my machine and radosgw(gateway1) is also up and communicating with cluster. Now I have installed swift client and created user and subuser. But I am unable to get bucket for the user. Below is my config file : /etc/ceph/ceph.conf [global] public_network = 172.18.59.0/24 osd_pool_default_size = 2 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996 mon_host = 172.18.59.205 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true debug ms = 1 debug rgw = 20 [client.radosgw.gateway1] host = ceph-Veriton-Series #rgw_dns_name = 172.18.59.201 rgw_url = http://172.18.59.201:7481; #rgw_admin=admin keyring = /etc/ceph/keyring.radosgw.gateway1 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok #rgw frontends=civetweb port=7481 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log rgw print continue = false The file at location /etc/apache2/conf-available/gateway1.conf: VirtualHost *:80 ServerName 172.18.59.201.ceph-Veriton-Series ServerAdmin ceph@172.18.59.201 DocumentRoot /var/www # rewrting rules only need for amazon s3 RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/ceph-client.radosgw.gateway1.asok IfModule mod_fastcgi.c Directory /var/www Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off /Directory /IfModule AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off /VirtualHost My cluster state is: ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s cluster 17848c62-d69e-4991-a4dd-298358bb19ea health HEALTH_OK monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996= 172.18.59.205:6789/0}, election epoch 1, quorum 0 ceph4-Standard-PC-i440FX-PIIX-1996 osdmap e1071: 2 osds: 2 up, 2 in pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects 82106 MB used, 79394 MB / 166 GB avail 264 active+clean ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ user info: sudo radosgw-admin user info --uid=testuser { user_id: testuser, display_name: First User, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: testuser:swift, permissions: none}], keys: [ { user: testuser, access_key: NC4E8QHUSNDWDX18M6GB, secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T}, { user: testuser:swift, access_key: R8UYRI7HXNW05BTJE2N7, secret_key: }], swift_keys: [ { user: testuser:swift, secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} I am using below script to get bucket: import boto import boto.s3.connection access_key = 'NC4E8QHUSNDWDX18M6GB' secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = 'ceph-Veriton-Series', is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) bucket = conn.create_bucket('my-new-bucket') for bucket in conn.get_all_buckets(): print {name}\t{created}.format( name = bucket.name, created = bucket.creation_date, ) *When I run the script below is the error:* *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call last): File s3test.py, line 12, in modulebucket = conn.create_bucket('my-new-bucket') File /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in create_bucketresponse.status, response.reason,
Re: [ceph-users] el6 repo problem?
The packages were probably rebuilt without changing their name/version (bad idea btw) and metadata either weren’t regenerated because of that or because of some other problem. You can mirror it and generate your own metadata or install the packages by hand until it gets fixed. Jan P.S.In my experience it’s best to always put a build number in the filename to avoid stuff like this, unless you can make sure you generate the same binary package every time (And that’s pretty hard usually). On 23 Jul 2015, at 15:14, Samuel Taylor Liston sam.lis...@utah.edu wrote: I am having the same issue and haven't figured out a resolution yet. The repo is pointing to a valid URL, and I can whet the packages from that URL, but yum complains about them. My initial thought is that something is screwy with the md5sum either on package versions in the repo, or in my rpm db, but I have not confirmed that. Samuel T. Liston Ctr. for High Perf. Computing Univ. of Utah 801.232.6932 On Jul 23, 2015, at 7:05 AM, Wayne Betts wbe...@bnl.gov wrote: I'm trying to use the ceph el6 yum repo. Yesterday afternoon, I found yum complain about 8 packages when trying to install or update ceph, such as this: (4/46): ceph-0.94.2-0.el6.x86_64.rpm | 21 MB 00:01 http://ceph.com/rpm-hammer/el6/x86_64/ceph-0.94.2-0.el6.x86_64.rpm: [Errno -1] Package does not match intended download. Suggestion: run yum --enablerepo=Ceph clean metadata The other packages with the same fault are libcephfs1-0.94.2-0.el6.x86_64 librbd1-0.94.2-0.el6.x86_64 python-rados-0.94.2-0.el6.x86_64 python-cephfs-0.94.2-0.el6.x86_64 librados2-0.94.2-0.el6.x86_64 python-rbd-0.94.2-0.el6.x86_64 ceph-common-0.94.2-0.el6.x86_64 This is happening on all three machines I've tried it on. I've tried cleaning the metadata on my hosts as per the suggestion, without any change. There was no trouble pulling these packages with yum on Tuesday, July 14, and I can still use wget to pull individual packages seemingly without any problem. Is anyone else experiencing the same problem? Can the repo maintainers look into this? (Rebuild the metadata, or flush their reverse proxy server(s) if any?) Any suggestions for me to try on the client side? -- -Wayne Betts STAR Computing Support at BNL Physics Dept. PO Box 5000 Upton, NY 11973 wbe...@bnl.gov 631-344-3285 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cephfs and ERESTARTSYS on writes
Hello, I'm having an issue with nginx writing to cephfs. Often I'm getting: writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted system call) while reading upstream looking with strace, this happens: ... write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted) It happens after first 4MBs (exactly) are written, subsequent write gets ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or 64MBs, etc are written). Apparently nginx doesn't expect this and doesn't handle it so it cancels writes and deletes this partial file. Is it possible Ceph cannot find the destination PG fast enough and returns ERESTARTSYS? Is there any way to fix this behavior or reduce it? Regards, Vedran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Help with radosgw admin ops
Hi, I'm trying to use the Curl for rados admin ops requests. I have problems with the keys, you use this autorizacción Authorization: AWS {access-key}: {hash-of-header-and-secret}. Where I can get the hash-of-header-and-secret? Info of user: radosgw-admin user info --uid=usuario1 { user_id: usuario1, display_name: usuario1, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [], keys: [ { user: usuario1, access_key: claveacceso, secret_key: temporal } ], swift_keys: [], caps: [ { type: usage, perm: write }, { type: user, perm: write } ], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1 }, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1 }, temp_url_keys: [] } I made this scrip : #!/bin/bash token=claveacceso secret=temporal query=$1 date=`date -Ru` header=PUT\n${content_md5}\n${content_type}\n${date}\n${query} sig=`echo -en ${header} | openssl sha1 -hmac ${secret} -binary | base64` curl -i -X GET http://10.0.2.10/admin/usage?format=json; -H Date: ${date} \ -H Authorization: AWS ${token}:${sig} -H 'Host: adminnode' The result is: sh prueba3.sh HTTP/1.1 403 Forbidden Date: Fri, 03 Jul 2015 11:41:17 GMT Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips Accept-Ranges: bytes Content-Length: 32 Content-Type: application/json Version of ceph is: ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) Version CentOS Linux release 7.1.1503 (Core) could give me documentation of how to use them? Thanks, Oscar. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote: Hello, I'm having an issue with nginx writing to cephfs. Often I'm getting: writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted system call) while reading upstream looking with strace, this happens: ... write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted) It happens after first 4MBs (exactly) are written, subsequent write gets ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or 64MBs, etc are written). Apparently nginx doesn't expect this and doesn't handle it so it cancels writes and deletes this partial file. Is it possible Ceph cannot find the destination PG fast enough and returns ERESTARTSYS? Is there any way to fix this behavior or reduce it? That's...odd. Are you using the kernel client or ceph-fuse, and on which version? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] any recommendation of using EnhanceIO?
Hi, I’ve asked same question last weeks or so (just search the mailing list archives for EnhanceIO :) and got some interesting answers. Looks like the project is pretty much dead since it was bought out by HGST. Even their website has some broken links in regards to EnhanceIO I’m keen to try flashcache or bcache (its been in the mainline kernel for some time) Dominik On 1 Jul 2015, at 21:13, German Anders gand...@despegar.com wrote: Hi cephers, Is anyone out there that implement enhanceIO in a production environment? any recommendation? any perf output to share with the diff between using it and not? Thanks in advance, German ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] debugging ceps-deploy warning: could not open file descriptor -1
Nevermind. I see that `ceph-deploy mon create-initial` has stopped accepting the trailing hostname which was causing the failure. I don't know if those problems above I showed are actually anything to worry about :) On Tue, Jul 21, 2015 at 3:17 PM, Noah Watkins noahwatk...@gmail.com wrote: The docker/distribution project runs a continuous integration VM using CircleCI, and part of the VM setup installs Ceph packages using ceph-deploy. This has been working well for quite a while, but we are seeing a failure running `ceph-deploy install --release hammer`. The snippet is here where it looks the first problem shows up. ... [box156][DEBUG ] Get:24 http://ceph.com/debian-hammer/ precise/main ceph-mds amd64 0.94.2-1precise [10.5 MB] [box156][DEBUG ] Get:25 http://ceph.com/debian-hammer/ precise/main radosgw amd64 0.94.2-1precise [3,619 kB] [box156][WARNIN] E: Could not open file descriptor -1 [box156][WARNIN] E: Prior errors apply to /var/cache/apt/archives/parted_2.3-19ubuntu1_amd64.deb ... On the surface it seems that the problem is coming from apt-get under the hood. Any pointers here? It doesn't seem like anything has changed configuration wise. The full build log can be found here which starts off with the ceph-deploy command that is failing: https://circleci.com/gh/docker/distribution/1848 Thanks, -Noah ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Enclosure power failure pausing client IO till all connected hosts up
Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: Enclosure power failure pausing client IO till all connected hosts up
Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] el6 repo problem?
I am having the same issue and haven't figured out a resolution yet. The repo is pointing to a valid URL, and I can whet the packages from that URL, but yum complains about them. My initial thought is that something is screwy with the md5sum either on package versions in the repo, or in my rpm db, but I have not confirmed that. Samuel T. Liston Ctr. for High Perf. Computing Univ. of Utah 801.232.6932 On Jul 23, 2015, at 7:05 AM, Wayne Betts wbe...@bnl.gov wrote: I'm trying to use the ceph el6 yum repo. Yesterday afternoon, I found yum complain about 8 packages when trying to install or update ceph, such as this: (4/46): ceph-0.94.2-0.el6.x86_64.rpm | 21 MB 00:01 http://ceph.com/rpm-hammer/el6/x86_64/ceph-0.94.2-0.el6.x86_64.rpm: [Errno -1] Package does not match intended download. Suggestion: run yum --enablerepo=Ceph clean metadata The other packages with the same fault are libcephfs1-0.94.2-0.el6.x86_64 librbd1-0.94.2-0.el6.x86_64 python-rados-0.94.2-0.el6.x86_64 python-cephfs-0.94.2-0.el6.x86_64 librados2-0.94.2-0.el6.x86_64 python-rbd-0.94.2-0.el6.x86_64 ceph-common-0.94.2-0.el6.x86_64 This is happening on all three machines I've tried it on. I've tried cleaning the metadata on my hosts as per the suggestion, without any change. There was no trouble pulling these packages with yum on Tuesday, July 14, and I can still use wget to pull individual packages seemingly without any problem. Is anyone else experiencing the same problem? Can the repo maintainers look into this? (Rebuild the metadata, or flush their reverse proxy server(s) if any?) Any suggestions for me to try on the client side? -- -Wayne Betts STAR Computing Support at BNL Physics Dept. PO Box 5000 Upton, NY 11973 wbe...@bnl.gov 631-344-3285 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On 07/23/2015 03:20 PM, Gregory Farnum wrote: On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote: Is it possible Ceph cannot find the destination PG fast enough and returns ERESTARTSYS? Is there any way to fix this behavior or reduce it? That's...odd. Are you using the kernel client or ceph-fuse, and on which version? Not seeing write errors with ceph-fuse, but it's slow. Regards, Vedran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ruby bindings for Librados
- Original Message - From: Ken Dreyer kdre...@redhat.com To: ceph-users@lists.ceph.com Sent: Tuesday, July 14, 2015 9:06:01 PM Subject: Re: [ceph-users] Ruby bindings for Librados On 07/13/2015 02:11 PM, Wido den Hollander wrote: On 07/13/2015 09:43 PM, Corin Langosch wrote: Hi Wido, I'm the dev of https://github.com/netskin/ceph-ruby and still use it in production on some systems. It has everything I need so I didn't develop any further. If you find any bugs or need new features, just open an issue and I'm happy to have a look. Ah, that's great! We should look into making a Ruby binding official and moving it to Ceph's Github project. That would make it more clear for end-users. I see that RADOS namespaces are currently not implemented in the Ruby bindings. Not many bindings have them though. Might be worth looking at. I'll give the current bindings a try btw! I'd like to see this happen too. Corin, would you be amenable to moving this under the ceph GitHub org? You'd still have control over it, similar to the way Wido manages https://github.com/ceph/phprados After some off-list email with Wido and Corin, I've set up https://github.com/ceph/ceph-ruby and a ceph-ruby GitHub team with Corin as the admin (similar to Wido's admin rights to phprados). Have fun! - Ken ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up
(Adding devel list to the CC) Hi Eric, To add more context to the problem: Min_size was set to 1 and replication size is 2. There was a flaky power connection to one of the enclosures. With min_size 1, we were able to continue the IO's, and recovery was active once the power comes back. But if there is a power failure again when recovery is in progress, some of the PGs are going to down+peering state. Extract from pg query. $ ceph pg 1.143 query { state: down+peering, snap_trimq: [], epoch: 3918, up: [ 17], acting: [ 17], info: { pgid: 1.143, last_update: 3166'40424, last_complete: 3166'40424, log_tail: 2577'36847, last_user_version: 40424, last_backfill: MAX, purged_snaps: [], .. recovery_state: [ { name: Started\/Primary\/Peering\/GetInfo, enter_time: 2015-07-15 12:48:51.372676, requested_info_from: []}, { name: Started\/Primary\/Peering, enter_time: 2015-07-15 12:48:51.372675, past_intervals: [ { first: 3147, last: 3166, maybe_went_rw: 1, up: [ 17, 4], acting: [ 17, 4], primary: 17, up_primary: 17}, { first: 3167, last: 3167, maybe_went_rw: 0, up: [ 10, 20], acting: [ 10, 20], primary: 10, up_primary: 10}, { first: 3168, last: 3181, maybe_went_rw: 1, up: [ 10, 20], acting: [ 10, 4], primary: 10, up_primary: 10}, { first: 3182, last: 3184, maybe_went_rw: 0, up: [ 20], acting: [ 4], primary: 4, up_primary: 20}, { first: 3185, last: 3188, maybe_went_rw: 1, up: [ 20], acting: [ 20], primary: 20, up_primary: 20}], probing_osds: [ 17, 20], blocked: peering is blocked due to down osds, down_osds_we_would_probe: [ 4, 10], peering_blocked_by: [ { osd: 4, current_lost_at: 0, comment: starting or marking this osd lost may let us proceed}, { osd: 10, current_lost_at: 0, comment: starting or marking this osd lost may let us proceed}]}, { name: Started, enter_time: 2015-07-15 12:48:51.372671}], agent_state: {}} And Pgs are not coming to active+clean till power is resumed again. During this period no IOs are allowed to the cluster. Not able to follow why the PGs are ending up in peering state? Each Pg has two copies in both the enclosures. If one of enclosure is down for some time, should be able to serve IO's from the second one. That was true, if no recovery IO is involved. In case of any recovery, we are ending up some Pg's in down and peering state. Thanks, Varada -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eric Eastman Sent: Thursday, July 23, 2015 8:37 PM To: Mallikarjun Biradar mallikarjuna.bira...@gmail.com Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up You may want to check your min_size value for your pools. If it is set to the pool size value, then the cluster will not do I/O if you loose a chassis. On Sun, Jul 5, 2015 at 11:04 PM, Mallikarjun Biradar mallikarjuna.bira...@gmail.com wrote: Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On 07/23/2015 03:20 PM, Gregory Farnum wrote: On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote: Hello, I'm having an issue with nginx writing to cephfs. Often I'm getting: writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted system call) while reading upstream looking with strace, this happens: ... write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted) It happens after first 4MBs (exactly) are written, subsequent write gets ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or 64MBs, etc are written). Apparently nginx doesn't expect this and doesn't handle it so it cancels writes and deletes this partial file. Is it possible Ceph cannot find the destination PG fast enough and returns ERESTARTSYS? Is there any way to fix this behavior or reduce it? That's...odd. Are you using the kernel client or ceph-fuse, and on which version? Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1, but it's the same. Ceph is firefly. I'll also try fuse. Regards, Vedran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On Thu, Jul 23, 2015 at 5:37 PM, Vedran Furač vedran.fu...@gmail.com wrote: On 07/23/2015 04:19 PM, Ilya Dryomov wrote: On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač vedran.fu...@gmail.com wrote: On 07/23/2015 03:20 PM, Gregory Farnum wrote: That's...odd. Are you using the kernel client or ceph-fuse, and on which version? Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1, but it's the same. Ceph is firefly. That's probably a wait_*() return value, meaning it timed out, so userspace logs might help understand what's going on. A separate issue is that we leak ERESTARTSYS to userspace - this needs to be fixed. Hmm, what's the timeout value? This happens even when ceph is nearly idle. When you mention logs, do you mean Ceph server logs? MON logs don't have anything special, OSD logs are full of: 2015-07-23 16:31:35.535622 7ff3fe020700 0 -- x.x.x.x:6849/27688 x.x.x.x:6841/27679 pipe(0x241e58c0 sd=183 :6849 s=2 pgs=1240 cs=127 l=0 c=0x19855de0).fault with nothing to send, going to standby 2015-07-23 16:31:42.492520 7ff401a53700 0 -- x.x.x.x:6849/27688 x.x.x.x:6841/27679 pipe(0x241e5080 sd=226 :6849 s=0 pgs=0 cs=0 l=0 c=0x21b31860).accept connect_seq 128 vs existing 127 state standby 2015-07-23 16:32:02.989102 7ff401851700 0 -- x.x.x.x:6849/27688 x.x.x.x:6854/27690 pipe(0x1916a680 sd=33 :43507 s=2 pgs=1366 cs=131 l=0 c=0x177e8680).fault with nothing to send, going to standby 2015-07-23 16:32:12.339357 7ff40144d700 0 -- x.x.x.x:6849/27688 x.x.x.x:6823/27279 pipe(0x241e7c80 sd=249 :6849 s=2 pgs=1246 cs=155 l=0 c=0x16ea46e0).fault with nothing to send, going to standby 2015-07-23 16:32:13.279426 7ff3fe828700 0 -- x.x.x.x:6849/27688 185.75.253.10:6810/9746 pipe(0x1c75e840 sd=72 :57221 s=2 pgs=1352 cs=149 l=0 c=0x147cbde0).fault with nothing to send, going to standby 2015-07-23 16:32:17.916440 7ff3fb3f4700 0 -- x.x.x.x:6849/27688 185.75.253.10:6810/9746 pipe(0x241e4000 sd=34 :6849 s=0 pgs=0 cs=0 l=0 c=0x21b2e160).accept connect_seq 150 vs existing 149 state standby 2015-07-23 16:32:22.922462 7ff40154e700 0 -- x.x.x.x:6849/27688 x.x.x.x:6823/27279 pipe(0x241e5e40 sd=216 :6849 s=0 pgs=0 cs=0 l=0 c=0x10089b80).accept connect_seq 156 vs existing 155 state standby ... Can you provide the full strace output? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] any recommendation of using EnhanceIO?
I did some (non-ceph) work on these, and concluded that bcache was the best supported, most stable, and fastest. This was ~1 year ago, to take it with a grain of salt, but that's what I would recommend. Daniel - Original Message - From: Dominik Zalewski dzalew...@optlink.net To: German Anders gand...@despegar.com Cc: ceph-users ceph-users@lists.ceph.com Sent: Wednesday, July 1, 2015 5:28:10 PM Subject: Re: [ceph-users] any recommendation of using EnhanceIO? Hi, I’ve asked same question last weeks or so (just search the mailing list archives for EnhanceIO :) and got some interesting answers. Looks like the project is pretty much dead since it was bought out by HGST. Even their website has some broken links in regards to EnhanceIO I’m keen to try flashcache or bcache (its been in the mainline kernel for some time) Dominik On 1 Jul 2015, at 21:13, German Anders gand...@despegar.com wrote: Hi cephers, Is anyone out there that implement enhanceIO in a production environment? any recommendation? any perf output to share with the diff between using it and not? Thanks in advance, German ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač vedran.fu...@gmail.com wrote: On 07/23/2015 03:20 PM, Gregory Farnum wrote: On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote: Hello, I'm having an issue with nginx writing to cephfs. Often I'm getting: writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted system call) while reading upstream looking with strace, this happens: ... write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted) It happens after first 4MBs (exactly) are written, subsequent write gets ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or 64MBs, etc are written). Apparently nginx doesn't expect this and doesn't handle it so it cancels writes and deletes this partial file. Is it possible Ceph cannot find the destination PG fast enough and returns ERESTARTSYS? Is there any way to fix this behavior or reduce it? That's...odd. Are you using the kernel client or ceph-fuse, and on which version? Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1, but it's the same. Ceph is firefly. That's probably a wait_*() return value, meaning it timed out, so userspace logs might help understand what's going on. A separate issue is that we leak ERESTARTSYS to userspace - this needs to be fixed. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On 07/23/2015 04:19 PM, Ilya Dryomov wrote: On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač vedran.fu...@gmail.com wrote: On 07/23/2015 03:20 PM, Gregory Farnum wrote: That's...odd. Are you using the kernel client or ceph-fuse, and on which version? Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1, but it's the same. Ceph is firefly. That's probably a wait_*() return value, meaning it timed out, so userspace logs might help understand what's going on. A separate issue is that we leak ERESTARTSYS to userspace - this needs to be fixed. Hmm, what's the timeout value? This happens even when ceph is nearly idle. When you mention logs, do you mean Ceph server logs? MON logs don't have anything special, OSD logs are full of: 2015-07-23 16:31:35.535622 7ff3fe020700 0 -- x.x.x.x:6849/27688 x.x.x.x:6841/27679 pipe(0x241e58c0 sd=183 :6849 s=2 pgs=1240 cs=127 l=0 c=0x19855de0).fault with nothing to send, going to standby 2015-07-23 16:31:42.492520 7ff401a53700 0 -- x.x.x.x:6849/27688 x.x.x.x:6841/27679 pipe(0x241e5080 sd=226 :6849 s=0 pgs=0 cs=0 l=0 c=0x21b31860).accept connect_seq 128 vs existing 127 state standby 2015-07-23 16:32:02.989102 7ff401851700 0 -- x.x.x.x:6849/27688 x.x.x.x:6854/27690 pipe(0x1916a680 sd=33 :43507 s=2 pgs=1366 cs=131 l=0 c=0x177e8680).fault with nothing to send, going to standby 2015-07-23 16:32:12.339357 7ff40144d700 0 -- x.x.x.x:6849/27688 x.x.x.x:6823/27279 pipe(0x241e7c80 sd=249 :6849 s=2 pgs=1246 cs=155 l=0 c=0x16ea46e0).fault with nothing to send, going to standby 2015-07-23 16:32:13.279426 7ff3fe828700 0 -- x.x.x.x:6849/27688 185.75.253.10:6810/9746 pipe(0x1c75e840 sd=72 :57221 s=2 pgs=1352 cs=149 l=0 c=0x147cbde0).fault with nothing to send, going to standby 2015-07-23 16:32:17.916440 7ff3fb3f4700 0 -- x.x.x.x:6849/27688 185.75.253.10:6810/9746 pipe(0x241e4000 sd=34 :6849 s=0 pgs=0 cs=0 l=0 c=0x21b2e160).accept connect_seq 150 vs existing 149 state standby 2015-07-23 16:32:22.922462 7ff40154e700 0 -- x.x.x.x:6849/27688 x.x.x.x:6823/27279 pipe(0x241e5e40 sd=216 :6849 s=0 pgs=0 cs=0 l=0 c=0x10089b80).accept connect_seq 156 vs existing 155 state standby ... Regards, Vedran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up
You may want to check your min_size value for your pools. If it is set to the pool size value, then the cluster will not do I/O if you loose a chassis. On Sun, Jul 5, 2015 at 11:04 PM, Mallikarjun Biradar mallikarjuna.bira...@gmail.com wrote: Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Tech Talk next week
correct. Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph On Tue, Jul 21, 2015 at 6:03 PM, Gregory Farnum g...@gregs42.com wrote: On Tue, Jul 21, 2015 at 6:09 PM, Patrick McGarry pmcga...@redhat.com wrote: Hey cephers, Just a reminder that the Ceph Tech Talk on CephFS that was scheduled for last month (and cancelled due to technical difficulties) has been rescheduled for this month's talk. It will be happening next Thurs at 17:00 UTC (1p EST) So that's July 30, according to the website, right? :) -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On 07/23/2015 04:45 PM, Ilya Dryomov wrote: Can you provide the full strace output? This is pretty much the all the relevant part: 4118 open(/home/ceph/temp/45/45/5/154545, O_RDWR|O_CREAT|O_EXCL, 0600) = 377 4118 writev(377, [{\3\0\0\0\0..., 4096}, {\247\0\0\3\23..., 4096}, {\225\0\0\4\334..., 4096}, {\204\0\0\t\n..., 4096}, {9\0\0\v\322..., 4096}, ...], 33) = 135168 4118 readv(1206, [{\334\1\210C\315..., 4096}, {X\1\266\343\320..., 4096}, {\304\1\345k\226..., 4096}, {}\2\17\27\371..., 4096}, {\203\2:\0e..., 4096}, ...], 33) = 135168 4118 writev(377, [{\334\1\210C\315..., 4096}, {X\1\266\343\320..., 4096}, {\304\1\345k\226..., 4096}, {}\2\17\27\371..., 4096}, {\203\2:\0e..., 4096}, ...], 33) = 135168 4118 readv(1206, [{\206\0\0\1c..., 4096}, {\336\0\0\1\351..., 4096}, {\265\0\0\0\313..., 4096}, {K\0\0\1A..., 4096}, {\217\0\0\1l..., 4096}, ...], 33) = 135168 4118 writev(377, [{\206\0\0\1c..., 4096}, {\336\0\0\1\351..., 4096}, {\265\0\0\0\313..., 4096}, {K\0\0\1A..., 4096}, {\217\0\0\1l..., 4096}, ...], 33) = 135168 4118 readv(1206, [{\2\0\366\371\273..., 4096}, {\256\1\22\3015..., 4096}, {\252\1-\361\225..., 4096}, {{\1I\335\4..., 4096}, {V\1`{\303..., 4096}, ...], 33) = 135168 4118 writev(377, [{\2\0\366\371\273..., 4096}, {\256\1\22\3015..., 4096}, {\252\1-\361\225..., 4096}, {{\1I\335\4..., 4096}, {V\1`{\303..., 4096}, ...], 33) = 135168 4118 readv(1206, [{O\\U\377\210..., 4096}, {\354 Gww..., 4096}, {\356\357|\317\250..., 4096}, {\272J\231\222E..., 4096}, {w\35W\213\277..., 4096}, ...], 33) = 135168 4118 writev(377, [{O\\U\377\210..., 4096}, {\354 Gww..., 4096}, {\356\357|\317\250..., 4096}, {\272J\231\222E..., 4096}, {w\35W\213\277..., 4096}, ...], 33) = 135168 4118 readv(1206, [{O\30\256|\350..., 4096}, {\316f\21|..., 4096}, {\346\330\354YU..., 4096}, {\257{R\5\16..., 4096}, {_C\n\21w..., 4096}, ...], 33) = 135168 4118 writev(377, [{O\30\256|\350..., 4096}, {\316f\21|..., 4096}, {\346\330\354YU..., 4096}, {\257{R\5\16..., 4096}, {_C\n\21w..., 4096}, ...], 33) = 135168 4118 readv(1206, [{\233p\217\356[..., 4096}, {m\264\323F\7..., 4096}, {q\5\362/\21..., 4096}, {\262\353z(\251..., 4096}, {of\365\245U..., 4096}, ...], 33) = 135168 4118 writev(377, [{\233p\217\356[..., 4096}, {m\264\323F\7..., 4096}, {q\5\362/\21..., 4096}, {\262\353z(\251..., 4096}, {of\365\245U..., 4096}, ...], 33) = 135168 4118 readv(1206, [{\257\3335X\300..., 4096}, {\207\37BW\252..., 4096}, {U\331a)..., 4096}, {\323\33i\256`..., 4096}, {\271m\356\]..., 4096}, ...], 33) = 135168 4118 writev(377, [{\257\3335X\300..., 4096}, {\207\37BW\252..., 4096}, {U\331a)..., 4096}, {\323\33i\256`..., 4096}, {\271m\356\]..., 4096}, ...], 33) = 135168 4118 readv(1206, [{b\\\337Y\240..., 4096}, {\233\r\326o\372..., 4096}, {\346(.\32\252..., 4096}, {\252FpJW..., 4096}, {\3648\237\220\352..., 4096}, ...], 33) = 135168 4118 writev(377, [{b\\\337Y\240..., 4096}, {\233\r\326o\372..., 4096}, {\346(.\32\252..., 4096}, {\252FpJW..., 4096}, {\3648\237\220\352..., 4096}, ...], 33) = 135168 4118 readv(1206, [{\376\375\257'\310..., 4096}, {\352\256R\342..., 4096}, {\361\340\342Rq..., 4096}, {|7 \3017..., 4096}, {\224\256\356\353\312..., 4096}, ...], 33) = 135168 4118 writev(377, [{\376\375\257'\310..., 4096}, {\352\256R\342..., 4096}, {\361\340\342Rq..., 4096}, {|7 \3017..., 4096}, {\224\256\356\353\312..., 4096}, ...], 33) = 135168 4118 readv(1206, [{}y\\vJ..., 4096}, {0$\v\6\2..., 4096}, {\2135\357zy..., 4096}, {{\343N\352\215..., 4096}, {\347\321x\352\272..., 4096}, ...], 33) = 135168 4118 writev(377, [{}y\\vJ..., 4096}, {0$\v\6\2..., 4096}, {\2135\357zy..., 4096}, {{\343N\352\215..., 4096}, {\347\321x\352\272..., 4096}, ...], 33) = 135168 4118 readv(1206, [{\v\6\2\301\200..., 4096}, {C\276\232\207\210..., 4096}, {\21\0006\262\255..., 4096}, {\224\222\n\276{..., 4096}, {Ys\337w\357..., 4096}, ...], 33) = 135168 4118 writev(377, [{\v\6\2\301\200..., 4096}, {C\276\232\207\210..., 4096}, {\21\0006\262\255..., 4096}, {\224\222\n\276{..., 4096}, {Ys\337w\357..., 4096}, ...], 33) = 135168 4118 readv(1206, [{6Y\236W\345..., 4096}, {Q\207uu\252..., 4096}, {\32\346]\313i..., 4096}, {n\356\\-\336..., 4096}, {{y~]\247..., 4096}, ...], 33) = 135168 4118 writev(377, [{6Y\236W\345..., 4096}, {Q\207uu\252..., 4096}, {\32\346]\313i..., 4096}, {n\356\\-\336..., 4096}, {{y~]\247..., 4096}, ...], 33) = 135168 4118 readv(1206, [{H0\337\275\302..., 4096}, {g\177\225\316\333..., 4096}, {\364\212\374X\360..., 4096}, {\337\260\226XL..., 4096}, {Y\356\360\301r..., 4096}, ...], 33) = 135168 4118 writev(377, [{H0\337\275\302..., 4096}, {g\177\225\316\333..., 4096}, {\364\212\374X\360..., 4096}, {\337\260\226XL..., 4096}, {Y\356\360\301r..., 4096}, ...], 33) = 135168 4118 readv(1206, [{_'\255\374v..., 4096}, {\271\231/II..., 4096}, {\277]\274\200\253..., 4096}, {'\3Qe\244..., 4096}, {\341\361\210h\363..., 4096}, ...], 33) = 135168 4118 writev(377, [{_'\255\374v..., 4096}, {\271\231/II..., 4096}, {\277]\274\200\253..., 4096}, {'\3Qe\244..., 4096},
[ceph-users] Fw: Ceph problem
From: Aaron fjw6...@163.com Sent: Jul 23, 2015 6:39 AM To: dan.m...@inktank.com Subject: Ceph problem hello, I am a user of ceph, I'm from china I have two problem on ceph, I need your help import boto import boto.s3.connection access_key = '2EOCDA99UCZQFA1CQRCM' secret_key = 'avxcywxBPMtiDriwBTOk+cO1zrBikHqoSB0GUtqV' conn = boto.connect_s3( ... aws_access_key_id = access_key, ... aws_secret_access_key = secret_key, ... host = 'localhost', ... calling_format = boto.s3.connection.OrdinaryCallingFormat(),) b=conn.list_all_buckets()[0] list(b.list()) [Key: my-new-bucket,1/123.txt, Key: my-new-bucket,1234.txt, Key: my-new-bucket,2.txt, Key: my-new-bucket,3.txt, Key: my-new-bucket,N01/hello.txt, Key: my-new-bucket,aaa, Key: my-new-bucket,hello] problem 1 : some error show after I run this command b.get_website_configuration() Traceback (most recent call last): File stdin, line 1, in module File /usr/lib/python2.7/site-packages/boto/s3/bucket.py, line 1480, in get_website_configuration return self.get_website_configuration_with_xml(headers)[0] File /usr/lib/python2.7/site-packages/boto/s3/bucket.py, line 1519, in get_website_configuration_with_xml body = self.get_website_configuration_xml(headers=headers) File /usr/lib/python2.7/site-packages/boto/s3/bucket.py, line 1534, in get_website_configuration_xml response.status, response.reason, body) boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden ?xml version=1.0 encoding=UTF-8?ErrorCodeSignatureDoesNotMatch/Code/Error problem 2 : I need an url start with N07 , but possible Bucket name can't start with upper-case , is there some method let me use a name with N07 ,or can I use name n07 and URL start with N07 ? means URL different with name conn.create_bucket(N07) Traceback (most recent call last): File stdin, line 1, in module File /usr/lib/python2.7/site-packages/boto/s3/connection.py, line 599, in create_bucket check_lowercase_bucketname(bucket_name) File /usr/lib/python2.7/site-packages/boto/s3/connection.py, line 59, in check_lowercase_bucketname raise BotoClientError(Bucket names cannot contain upper-case \ boto.exception.BotoClientError: BotoClientError: Bucket names cannot contain upper-case characters when using either the sub-domain or virtual hosting calling format. Thank you very much.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com wrote: 4118 writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257..., 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096}, {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted) 4118 --- SIGALRM (Alarm clock) @ 0 (0) --- 4118 rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) 4118 gettid() = 4118 4118 write(4, 2015/..., 520) = 520 4118 close(1206) = 0 4118 unlink(/home/ceph/temp/45/45/5/154545) = 0 Sorry, I misread your original email and missed the nginx part entirely. Looks like Zheng, who commented on IRC, was right: the ERESTARTSYS is likely caused by some timeout mechanism in nginx signal handler for SIGALARM does not want to restart the write syscall Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On 07/23/2015 05:25 PM, Ilya Dryomov wrote: On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com wrote: 4118 writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257..., 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096}, {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted) 4118 --- SIGALRM (Alarm clock) @ 0 (0) --- 4118 rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) 4118 gettid() = 4118 4118 write(4, 2015/..., 520) = 520 4118 close(1206) = 0 4118 unlink(/home/ceph/temp/45/45/5/154545) = 0 Sorry, I misread your original email and missed the nginx part entirely. Looks like Zheng, who commented on IRC, was right: the ERESTARTSYS is likely caused by some timeout mechanism in nginx signal handler for SIGALARM does not want to restart the write syscall Knowing that this might be an nginx issues as well, I've asked the same thing on their mailing list in parallel, their response was: It more looks like a bug in cephfs. writev() should never return ERESTARTSYS. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs going inconsistent after stopping the primary
Oh, if you were running dev releases, it's not super surprising that the stat tracking was at some point buggy. -Sam - Original Message - From: Dan van der Ster d...@vanderster.com To: Samuel Just sj...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Thursday, July 23, 2015 8:21:07 AM Subject: Re: [ceph-users] PGs going inconsistent after stopping the primary Those pools were a few things: rgw.buckets plus a couple pools we use for developing new librados clients. But the source of this issue is likely related to the few pre-hammer development releases (and crashes) we upgraded through whilst running a large scale test. Anyway, now I'll know how to better debug this in future so we'll let you know if it reoccurs. Cheers, Dan On Wed, Jul 22, 2015 at 9:42 PM, Samuel Just sj...@redhat.com wrote: Annoying that we don't know what caused the replica's stat structure to get out of sync. Let us know if you see it recur. What were those pools used for? -Sam - Original Message - From: Dan van der Ster d...@vanderster.com To: Samuel Just sj...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, July 22, 2015 12:36:53 PM Subject: Re: [ceph-users] PGs going inconsistent after stopping the primary Cool, writing some objects to the affected PGs has stopped the consistent/inconsistent cycle. I'll keep an eye on them but this seems to have fixed the problem. Thanks!! Dan On Wed, Jul 22, 2015 at 6:07 PM, Samuel Just sj...@redhat.com wrote: Looks like it's just a stat error. The primary appears to have the correct stats, but the replica for some reason doesn't (thinks there's an object for some reason). I bet it clears itself it you perform a write on the pg since the primary will send over its stats. We'd need information from when the stat error originally occurred to debug further. -Sam - Original Message - From: Dan van der Ster d...@vanderster.com To: ceph-users@lists.ceph.com Sent: Wednesday, July 22, 2015 7:49:00 AM Subject: [ceph-users] PGs going inconsistent after stopping the primary Hi Ceph community, Env: hammer 0.94.2, Scientific Linux 6.6, kernel 2.6.32-431.5.1.el6.x86_64 We wanted to post here before the tracker to see if someone else has had this problem. We have a few PGs (different pools) which get marked inconsistent when we stop the primary OSD. The problem is strange because once we restart the primary, then scrub the PG, the PG is marked active+clean. But inevitably next time we stop the primary OSD, the same PG is marked inconsistent again. There is no user activity on this PG, and nothing interesting is logged in any of the 2nd/3rd OSDs (with debug_osd=20, the first line mentioning the PG already says inactive+inconsistent). We suspect this is related to garbage files left in the PG folder. One of our PGs is acting basically like above, except it goes through this cycle: active+clean - (deep-scrub) - active+clean+inconsistent - (repair) - active+clean - (restart primary OSD) - (deep-scrub) - active+clean+inconsistent. This one at least logs: 2015-07-22 16:42:41.821326 osd.303 [INF] 55.10d deep-scrub starts 2015-07-22 16:42:41.823834 osd.303 [ERR] 55.10d deep-scrub stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 0/0 bytes,0/0 hit_set_archive bytes. 2015-07-22 16:42:41.823842 osd.303 [ERR] 55.10d deep-scrub 1 errors and this should be debuggable because there is only one object in the pool: tapetest 55 0 073575G 1 even though rados ls returns no objects: # rados ls -p tapetest # Any ideas? Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
Hi, That looks like a bug, ERESTARTSYS is not a valid error condition for write(). http://pubs.opengroup.org/onlinepubs/9699919799/ -- Eino Tuominen Vedran Furač vedran.fu...@gmail.com kirjoitti 23.7.2015 kello 15.18: Hello, I'm having an issue with nginx writing to cephfs. Often I'm getting: writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted system call) while reading upstream looking with strace, this happens: ... write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted) It happens after first 4MBs (exactly) are written, subsequent write gets ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or 64MBs, etc are written). Apparently nginx doesn't expect this and doesn't handle it so it cancels writes and deletes this partial file. Is it possible Ceph cannot find the destination PG fast enough and returns ERESTARTSYS? Is there any way to fix this behavior or reduce it? Regards, Vedran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs going inconsistent after stopping the primary
Those pools were a few things: rgw.buckets plus a couple pools we use for developing new librados clients. But the source of this issue is likely related to the few pre-hammer development releases (and crashes) we upgraded through whilst running a large scale test. Anyway, now I'll know how to better debug this in future so we'll let you know if it reoccurs. Cheers, Dan On Wed, Jul 22, 2015 at 9:42 PM, Samuel Just sj...@redhat.com wrote: Annoying that we don't know what caused the replica's stat structure to get out of sync. Let us know if you see it recur. What were those pools used for? -Sam - Original Message - From: Dan van der Ster d...@vanderster.com To: Samuel Just sj...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, July 22, 2015 12:36:53 PM Subject: Re: [ceph-users] PGs going inconsistent after stopping the primary Cool, writing some objects to the affected PGs has stopped the consistent/inconsistent cycle. I'll keep an eye on them but this seems to have fixed the problem. Thanks!! Dan On Wed, Jul 22, 2015 at 6:07 PM, Samuel Just sj...@redhat.com wrote: Looks like it's just a stat error. The primary appears to have the correct stats, but the replica for some reason doesn't (thinks there's an object for some reason). I bet it clears itself it you perform a write on the pg since the primary will send over its stats. We'd need information from when the stat error originally occurred to debug further. -Sam - Original Message - From: Dan van der Ster d...@vanderster.com To: ceph-users@lists.ceph.com Sent: Wednesday, July 22, 2015 7:49:00 AM Subject: [ceph-users] PGs going inconsistent after stopping the primary Hi Ceph community, Env: hammer 0.94.2, Scientific Linux 6.6, kernel 2.6.32-431.5.1.el6.x86_64 We wanted to post here before the tracker to see if someone else has had this problem. We have a few PGs (different pools) which get marked inconsistent when we stop the primary OSD. The problem is strange because once we restart the primary, then scrub the PG, the PG is marked active+clean. But inevitably next time we stop the primary OSD, the same PG is marked inconsistent again. There is no user activity on this PG, and nothing interesting is logged in any of the 2nd/3rd OSDs (with debug_osd=20, the first line mentioning the PG already says inactive+inconsistent). We suspect this is related to garbage files left in the PG folder. One of our PGs is acting basically like above, except it goes through this cycle: active+clean - (deep-scrub) - active+clean+inconsistent - (repair) - active+clean - (restart primary OSD) - (deep-scrub) - active+clean+inconsistent. This one at least logs: 2015-07-22 16:42:41.821326 osd.303 [INF] 55.10d deep-scrub starts 2015-07-22 16:42:41.823834 osd.303 [ERR] 55.10d deep-scrub stat mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 0/0 bytes,0/0 hit_set_archive bytes. 2015-07-22 16:42:41.823842 osd.303 [ERR] 55.10d deep-scrub 1 errors and this should be debuggable because there is only one object in the pool: tapetest 55 0 073575G 1 even though rados ls returns no objects: # rados ls -p tapetest # Any ideas? Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD
Do you use upstream ceph version previously? Or do you shutdown running ceph-osd when upgrading osd? How many osds meet this problems? This assert failure means that osd detects a upgraded pg meta object but failed to read(or lack of 1 key) meta keys from object. On Thu, Jul 23, 2015 at 7:03 PM, Udo Lembke ulem...@polarzone.de wrote: Am 21.07.2015 12:06, schrieb Udo Lembke: Hi all, ... Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph and I'm back again... but this looks bad for me. Unfortunality the system also don't start 9 OSDs as I switched back to the old system-disk... (only three of the big OSDs are running well) What is the best solution for that? Empty one node (crush weight 0), fresh reinstall OS/ceph, reinitialise all OSDs? This will take a long long time, because we use 173TB in this cluster... Hi, answer myself if anybody has similiar issues and find the posting. Empty the whole nodes takes too long. I used the puppet wheezy system and have to recreate all OSDs (in this case I need to empty the first blocks of the journal before create the OSD again). Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Wheat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??
On Thu, 23 Jul 2015 11:14:22 +0100 Gregory Farnum wrote: Your note that dd can do 2GB/s without networking makes me think that you should explore that. As you say, network interrupts can be problematic in some systems. The only thing I can think of that's been really bad in the past is that some systems process all network interrupts on cpu 0, and you probably want to make sure that it's splitting them across CPUs. An IRQ overload would be very visible with atop. Splitting the IRQs will help, but it is likely to need some smarts. As in, irqbalance may spread things across NUMA nodes. A card with just one IRQ line will need RPS (Receive Packet Steering), irqbalance can't help it. For example, I have a compute node with such a single line card and Quad Opterons (64 cores, 8 NUMA nodes). The default is all interrupt handling on CPU0 and that is very little, except for eth2. So this gets a special treatment: --- echo 4 /proc/irq/106/smp_affinity_list --- Pinning the IRQ for eth2 to CPU 4 by default --- echo f0 /sys/class/net/eth2/queues/rx-0/rps_cpus --- giving RPS CPUs 4-7 to work with. At peak times it needs more than 2 cores, otherwise with this architecture just using 4 and 5 (same L2 cache) would be better. Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] el6 repo problem?
I'm trying to use the ceph el6 yum repo. Yesterday afternoon, I found yum complain about 8 packages when trying to install or update ceph, such as this: (4/46): ceph-0.94.2-0.el6.x86_64.rpm | 21 MB 00:01 http://ceph.com/rpm-hammer/el6/x86_64/ceph-0.94.2-0.el6.x86_64.rpm: [Errno -1] Package does not match intended download. Suggestion: run yum --enablerepo=Ceph clean metadata The other packages with the same fault are libcephfs1-0.94.2-0.el6.x86_64 librbd1-0.94.2-0.el6.x86_64 python-rados-0.94.2-0.el6.x86_64 python-cephfs-0.94.2-0.el6.x86_64 librados2-0.94.2-0.el6.x86_64 python-rbd-0.94.2-0.el6.x86_64 ceph-common-0.94.2-0.el6.x86_64 This is happening on all three machines I've tried it on. I've tried cleaning the metadata on my hosts as per the suggestion, without any change. There was no trouble pulling these packages with yum on Tuesday, July 14, and I can still use wget to pull individual packages seemingly without any problem. Is anyone else experiencing the same problem? Can the repo maintainers look into this? (Rebuild the metadata, or flush their reverse proxy server(s) if any?) Any suggestions for me to try on the client side? -- -Wayne Betts STAR Computing Support at BNL Physics Dept. PO Box 5000 Upton, NY 11973 wbe...@bnl.gov 631-344-3285 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Enclosure power failure pausing client IO till all connected hosts up
Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size 64 Queue depth). One of enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected. But client IO got paused. After some time enclosure hosts connected to it came up. And all OSD's on that hosts came up. Till this time, cluster was not serving IO. Once all hosts OSD's pertaining to that enclosure came up, client IO resumed. Can anybody help me why cluster not serving IO during enclosure failure. OR its a bug? -Thanks regards, Mallikarjun Biradar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue in communication of swift client and radosgw
Hi, Please reply... Regards, Bindu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On 07/23/2015 03:20 PM, Gregory Farnum wrote: On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote: Hello, I'm having an issue with nginx writing to cephfs. Often I'm getting: writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted system call) while reading upstream looking with strace, this happens: ... write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted) It happens after first 4MBs (exactly) are written, subsequent write gets ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or 64MBs, etc are written). Apparently nginx doesn't expect this and doesn't handle it so it cancels writes and deletes this partial file. Is it possible Ceph cannot find the destination PG fast enough and returns ERESTARTSYS? Is there any way to fix this behavior or reduce it? That's...odd. Are you using the kernel client or ceph-fuse, and on which version? Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1, but it's the same. Ceph is firefly. I'll also try fuse. Regards, Vedran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] When setting up cache tiering, can i set a quota on the cache pool?
Hi, guys. These days we are testing the ceph cache tiering, it seems that the cache tiering agent does not honor the quota setting on the cache pool, which means that if we have set a smaller quota size on the cache pool than target_max_bytes * cache_target_dirty_ratio or so, the cache tiering agent won't flush or evict objects even the cached bytes reaches the quota size. Eventually results the write operation fail with a no space error. Here comes the question: Is this behavior implemented by design? Or should we never set a quota on the cache pool? Thank you every much:) runsisi AT hust.edu.cn ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Day Speakers (Chicago, Raleigh)
Hey cephers, Since Ceph Days for both Chicago and Raleigh are fast approaching, I wanted to put another call out on the mailing lists for anyone who might be interested in sharing their Ceph experiences with the community at either location. If you have something to share (integration, use case, performance, hardware tuning, etc) please let me know ASAP. Thanks! http://ceph.com/cephdays -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Best method to limit snapshot/clone space overhead
Hi all, I am looking for a way to alleviate the overhead of RBD snapshots/clones for some time. In our scenario there are a few “master” volumes that contain production data, and are frequently snapshotted and cloned for dev/qa use. Those snapshots/clones live for a few days to a few weeks before they get dropped, and they sometimes grow very fast (databases, etc.). With the default 4MB object size there seems to be huge overhead involved with this, could someone give me some hints on how to solve that? I have some hope in 1) FIEMAP I’ve calculated that files on my OSDs are approx. 30% filled with NULLs - I suppose this is what it could save (best-scenario) and it should also make COW operations much faster. But there are lots of bugs in FIEMAP in kernels (i saw some reference to CentOS 6.5 kernel being buggy - which is what we use) and filesystems (like XFS). No idea about ext4 which we’d like to use in the future. Is enabling FIEMAP a good idea at all? I saw some mention of it being replaced with SEEK_DATA and SEEK_HOLE. 2) object size 4MB for clones I did some quick performance testing and setting this lower for production is probably not a good idea. My sweet spot is 8MB object size, however this would make the overhead for clones even worse than it already is. But I could make the cloned images with a different block size from the snapshot (at least according to docs). Does someone use it like that? Any caveats? That way I could have the production data with 8MB block size but make the development snapshots with for example 64KiB granularity, probably at expense of some performance, but most of the data would remain in the (faster) master snapshot anyway. This should drop overhead tremendously, maybe even more than neabling FIEMAP. (Even better when working in tandem I suppose?) Your thoughts? Thanks Jan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On 07/23/2015 06:47 PM, Ilya Dryomov wrote: To me this looks like a writev() interrupted by a SIGALRM. I think nginx guys read your original email the same way I did, which is write syscall *returned* ERESTARTSYS, but I'm pretty sure that is not the case here. ERESTARTSYS shows up in strace output but it is handled by the kernel, userpace doesn't see it (but strace has to be able to see it, otherwise you wouldn't know if your system call has been restarted or not). You cut the output short - I asked for the entire output for a reason, please paste it somewhere. Might be, however I don't know why would be nginx interrupting it, all writes are done pretty fast and timeouts are set to 10 minutes. Here are 2 examples on 2 servers with slightly different configs (timestams included): http://pastebin.com/wUAAcdT7 http://pastebin.com/wHyWc9U5 Thanks, Vedran ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
Ah, I made the same mistake... Sorry for the noise. -- Eino Tuominen Ilya Dryomov idryo...@gmail.com kirjoitti 23.7.2015 kello 19.47: On Thu, Jul 23, 2015 at 6:28 PM, Vedran Furač vedran.fu...@gmail.com wrote: On 07/23/2015 05:25 PM, Ilya Dryomov wrote: On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com wrote: 4118 writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257..., 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096}, {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted) 4118 --- SIGALRM (Alarm clock) @ 0 (0) --- 4118 rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) 4118 gettid() = 4118 4118 write(4, 2015/..., 520) = 520 4118 close(1206) = 0 4118 unlink(/home/ceph/temp/45/45/5/154545) = 0 Sorry, I misread your original email and missed the nginx part entirely. Looks like Zheng, who commented on IRC, was right: the ERESTARTSYS is likely caused by some timeout mechanism in nginx signal handler for SIGALARM does not want to restart the write syscall Knowing that this might be an nginx issues as well, I've asked the same thing on their mailing list in parallel, their response was: It more looks like a bug in cephfs. writev() should never return ERESTARTSYS. To me this looks like a writev() interrupted by a SIGALRM. I think nginx guys read your original email the same way I did, which is write syscall *returned* ERESTARTSYS, but I'm pretty sure that is not the case here. ERESTARTSYS shows up in strace output but it is handled by the kernel, userpace doesn't see it (but strace has to be able to see it, otherwise you wouldn't know if your system call has been restarted or not). You cut the output short - I asked for the entire output for a reason, please paste it somewhere. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD Connections with Public and Cluster Networks
Greetings, I am working on standing up a fresh Ceph object storage cluster and have some questions about what I should be seeing as far as inter-OSD connectivity. I have spun up my monitor and radosgw nodes as VMs, all running on a 192.168.10.0/24 network (all IP ranges have been changed to protect the innocent). I also have four physical servers to serve as my storage nodes. These physical servers have two bonded interfaces, one IP'd on the public network (192.168.10.0/24) and one IP'd on a cluster/private network (172.22.20.0/24). I have added the following lines to my ceph.conf: public network = 192.168.10.0/24 cluster network = 172.22.20.0/24 So far, so good. Everything in the cluster starts up, all OSDs are up and in, and 'ceph -s' shows a happy, healthy cluster. I can create users and otherwise do all the normal things that one would expect. So why am I writing? When looking at my network connections on my storage servers, I was expecting to see a few OSD - Mon/RGW connections on the public network, then a much larger number of OSD - OSD connections on the cluster network. What I actually see is an equal number of connections between OSDs on both the public and cluster networks (in addition to OSD - Mon/RGW connections). My question - is this normal? If so, can someone explain what traffic is moving between OSDs on the public network? Based on some additional testing (read: bringing down the cluster interface), this causes all OSDs on that node to be marked down, so there's evidence to support all heartbeat traffic moving over the interface. I just want to ensure that what I'm seeing is normal and that I haven't otherwise botched the configuration. Many thanks, Brian Felton ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-mon cpu usage
Hi Greg, I've been looking at the tcmalloc issues, but did seem to affect osd's, and I do notice it in heavy read workloads (even after the patch and increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This is affecting the mon process though. looking at perf top I'm getting most of the CPU usage in mutex lock/unlock 5.02% libpthread-2.19.so[.] pthread_mutex_unlock 3.82% libsoftokn3.so[.] 0x0001e7cb 3.46% libpthread-2.19.so[.] pthread_mutex_lock I could try to use jemalloc, are you aware of any built binaries? Can I mix a cluster with different malloc binaries? On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum g...@gregs42.com wrote: On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito periqu...@gmail.com wrote: The ceph-mon is already taking a lot of memory, and I ran a heap stats MALLOC: 32391696 ( 30.9 MiB) Bytes in use by application MALLOC: + 27597135872 (26318.7 MiB) Bytes in page heap freelist MALLOC: + 16598552 ( 15.8 MiB) Bytes in central cache freelist MALLOC: + 14693536 ( 14.0 MiB) Bytes in transfer cache freelist MALLOC: + 17441592 ( 16.6 MiB) Bytes in thread cache freelists MALLOC: +116387992 ( 111.0 MiB) Bytes in malloc metadata MALLOC: MALLOC: = 27794649240 (26507.0 MiB) Actual memory used (physical + swap) MALLOC: + 26116096 ( 24.9 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used MALLOC: MALLOC: 5683 Spans in use MALLOC: 21 Thread heaps in use MALLOC: 8192 Tcmalloc page size after that I ran the heap release and it went back to normal. MALLOC: 22919616 ( 21.9 MiB) Bytes in use by application MALLOC: + 4792320 (4.6 MiB) Bytes in page heap freelist MALLOC: + 18743448 ( 17.9 MiB) Bytes in central cache freelist MALLOC: + 20645776 ( 19.7 MiB) Bytes in transfer cache freelist MALLOC: + 18456088 ( 17.6 MiB) Bytes in thread cache freelists MALLOC: +116387992 ( 111.0 MiB) Bytes in malloc metadata MALLOC: MALLOC: =201945240 ( 192.6 MiB) Actual memory used (physical + swap) MALLOC: + 27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used MALLOC: MALLOC: 5639 Spans in use MALLOC: 29 Thread heaps in use MALLOC: 8192 Tcmalloc page size So it just seems the monitor is not returning unused memory into the OS or reusing already allocated memory it deems as free... Yep. This is a bug (best we can tell) in some versions of tcmalloc combined with certain distribution stacks, although I don't think we've seen it reported on Trusty (nor on a tcmalloc distribution that new) before. Alternatively some folks are seeing tcmalloc use up lots of CPU in other scenarios involving memory return and it may manifest like this, but I'm not sure. You could look through the mailing list for information on it. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs and ERESTARTSYS on writes
On Thu, Jul 23, 2015 at 6:28 PM, Vedran Furač vedran.fu...@gmail.com wrote: On 07/23/2015 05:25 PM, Ilya Dryomov wrote: On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com wrote: 4118 writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257..., 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096}, {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted) 4118 --- SIGALRM (Alarm clock) @ 0 (0) --- 4118 rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) 4118 gettid() = 4118 4118 write(4, 2015/..., 520) = 520 4118 close(1206) = 0 4118 unlink(/home/ceph/temp/45/45/5/154545) = 0 Sorry, I misread your original email and missed the nginx part entirely. Looks like Zheng, who commented on IRC, was right: the ERESTARTSYS is likely caused by some timeout mechanism in nginx signal handler for SIGALARM does not want to restart the write syscall Knowing that this might be an nginx issues as well, I've asked the same thing on their mailing list in parallel, their response was: It more looks like a bug in cephfs. writev() should never return ERESTARTSYS. To me this looks like a writev() interrupted by a SIGALRM. I think nginx guys read your original email the same way I did, which is write syscall *returned* ERESTARTSYS, but I'm pretty sure that is not the case here. ERESTARTSYS shows up in strace output but it is handled by the kernel, userpace doesn't see it (but strace has to be able to see it, otherwise you wouldn't know if your system call has been restarted or not). You cut the output short - I asked for the entire output for a reason, please paste it somewhere. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-mon cpu usage
The ceph-mon is already taking a lot of memory, and I ran a heap stats MALLOC: 32391696 ( 30.9 MiB) Bytes in use by application MALLOC: + 27597135872 (26318.7 MiB) Bytes in page heap freelist MALLOC: + 16598552 ( 15.8 MiB) Bytes in central cache freelist MALLOC: + 14693536 ( 14.0 MiB) Bytes in transfer cache freelist MALLOC: + 17441592 ( 16.6 MiB) Bytes in thread cache freelists MALLOC: +116387992 ( 111.0 MiB) Bytes in malloc metadata MALLOC: MALLOC: = 27794649240 (26507.0 MiB) Actual memory used (physical + swap) MALLOC: + 26116096 ( 24.9 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used MALLOC: MALLOC: 5683 Spans in use MALLOC: 21 Thread heaps in use MALLOC: 8192 Tcmalloc page size after that I ran the heap release and it went back to normal. MALLOC: 22919616 ( 21.9 MiB) Bytes in use by application MALLOC: + 4792320 (4.6 MiB) Bytes in page heap freelist MALLOC: + 18743448 ( 17.9 MiB) Bytes in central cache freelist MALLOC: + 20645776 ( 19.7 MiB) Bytes in transfer cache freelist MALLOC: + 18456088 ( 17.6 MiB) Bytes in thread cache freelists MALLOC: +116387992 ( 111.0 MiB) Bytes in malloc metadata MALLOC: MALLOC: =201945240 ( 192.6 MiB) Actual memory used (physical + swap) MALLOC: + 27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used MALLOC: MALLOC: 5639 Spans in use MALLOC: 29 Thread heaps in use MALLOC: 8192 Tcmalloc page size So it just seems the monitor is not returning unused memory into the OS or reusing already allocated memory it deems as free... On Wed, Jul 22, 2015 at 4:29 PM, Luis Periquito periqu...@gmail.com wrote: This cluster is server RBD storage for openstack, and today all the I/O was just stopped. After looking in the boxes ceph-mon was using 17G ram - and this was on *all* the mons. Restarting the main one just made it work again (I restarted the other ones because they were using a lot of ram). This has happened twice now (first was last Monday). As this is considered a prod cluster there is no logging enabled, and I can't reproduce it - our test/dev clusters have been working fine, and have neither symptoms, but they were upgraded from firefly. What can we do to help debug the issue? Any ideas on how to identify the underlying issue? thanks, On Mon, Jul 20, 2015 at 1:59 PM, Luis Periquito periqu...@gmail.com wrote: Hi all, I have a cluster with 28 nodes (all physical, 4Cores, 32GB Ram), each node has 4 OSDs for a total of 112 OSDs. Each OSD has 106 PGs (counted including replication). There are 3 MONs on this cluster. I'm running on Ubuntu trusty with kernel 3.13.0-52-generic, with Hammer (0.94.2). This cluster was installed with Hammer (0.94.1) and has only been upgraded to the latest available version. On the three mons one is mostly idle, one is using ~170% CPU, and one is using ~270% CPU. They will change as I restart the process (usually the idle one is the one with the lowest uptime). Running a perf top againt the ceph-mon PID on the non-idle boxes it wields something like this: 4.62% libpthread-2.19.so[.] pthread_mutex_unlock 3.95% libpthread-2.19.so[.] pthread_mutex_lock 3.91% libsoftokn3.so[.] 0x0001db26 2.38% [kernel] [k] _raw_spin_lock 2.09% libtcmalloc.so.4.1.2 [.] operator new(unsigned long) 1.79% ceph-mon [.] DispatchQueue::enqueue(Message*, int, unsigned long) 1.62% ceph-mon [.] RefCountedObject::get() 1.58% libpthread-2.19.so[.] pthread_mutex_trylock 1.32% libtcmalloc.so.4.1.2 [.] operator delete(void*) 1.24% libc-2.19.so [.] 0x00097fd0 1.20% ceph-mon [.] ceph::buffer::ptr::release() 1.18% ceph-mon [.] RefCountedObject::put() 1.15% libfreebl3.so [.] 0x000542a8 1.05% [kernel] [k] update_cfs_shares 1.00% [kernel] [k] tcp_sendmsg The cluster is mostly idle, and it's healthy. The store is 69MB big, and the MONs are consuming around 700MB of RAM. Any ideas on this situation? Is it safe to ignore? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Clients' connection for concurrent access to ceph
On Wed, Jul 22, 2015 at 8:39 PM, Shneur Zalman Mattern shz...@eimsys.co.il wrote: Workaround... We're building now a huge computing cluster 140 computing DISKLESS nodes and they are pulling to storage a lot of computing data concurrently User that put job for the cluster - need also access to the same storage place (seeking progress results) We've built Ceph cluster: 3 mon nodes (one of them is combined with mds) 3 osd nodes (each one have 10 osd + ssd for journaling) switch 24 ports x 10G 10 gigabit - for public network 20 gigabit bonding - between osds Ubuntu 12.04.05 Ceph 0.87.2 - giant - Clients has: 10 gigabit for ceph-connection CentOS 6.6 with upgraded kernel 3.19.8 (already running computing cluster) Surely all nodes, switches and clients were configured to jumbo-frames of network = First test: I thought to make big rbd with shareing, but: - RBD supports multiple clients' mappingmounting but not parallel writes ... Second test: NFS over RBD - it's working pretty good, but: 1. NFS gateway - it's Single-Point-of-Failure 2. There's no performance scaling of scale-out storage e.g. bottleneck (limited with bandwidth of NFS-gateway) Third test: We wanted to try CephFS, because our client is familiar with Lustre, that's very near to CephFS capabilities: 1. I've used my CEPH nodes in the client's role. I've mounted CephFS on one of nodes, and ran dd with bs=1M ... - I've got wonderful write performance ~ 1.1 GBytes/s (really near to 10Gbit network throughput) 2. I've connected CentOS client to 10gig public network, mounted CephFS, but ... - It was just ~ 250 MBytes/s 3. I've connected Ubuntu client (non-ceph member) to 10gig public network, mounted CephFS, and ... - It was also ~ 260 MBytes/s Now I have to know: perhaps ceph-members-nodes have privileged access ??? There's nothing in the Ceph system that would do this directly. My first guess is that you're seeing the impact of write latencies (as opposed to bandwidth) on your system. What is the network latency from each node you've used as a client to the Ceph system? Exactly what dd command are you using? How are you mounting CephFS? Are you sure your network is functioning as expected? Run iperf (preferably, on all your nodes simultaneously) and verify the results. Separately, be aware that CephFS is generally not a supported technology right now. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-mon cpu usage
On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito periqu...@gmail.com wrote: The ceph-mon is already taking a lot of memory, and I ran a heap stats MALLOC: 32391696 ( 30.9 MiB) Bytes in use by application MALLOC: + 27597135872 (26318.7 MiB) Bytes in page heap freelist MALLOC: + 16598552 ( 15.8 MiB) Bytes in central cache freelist MALLOC: + 14693536 ( 14.0 MiB) Bytes in transfer cache freelist MALLOC: + 17441592 ( 16.6 MiB) Bytes in thread cache freelists MALLOC: +116387992 ( 111.0 MiB) Bytes in malloc metadata MALLOC: MALLOC: = 27794649240 (26507.0 MiB) Actual memory used (physical + swap) MALLOC: + 26116096 ( 24.9 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used MALLOC: MALLOC: 5683 Spans in use MALLOC: 21 Thread heaps in use MALLOC: 8192 Tcmalloc page size after that I ran the heap release and it went back to normal. MALLOC: 22919616 ( 21.9 MiB) Bytes in use by application MALLOC: + 4792320 (4.6 MiB) Bytes in page heap freelist MALLOC: + 18743448 ( 17.9 MiB) Bytes in central cache freelist MALLOC: + 20645776 ( 19.7 MiB) Bytes in transfer cache freelist MALLOC: + 18456088 ( 17.6 MiB) Bytes in thread cache freelists MALLOC: +116387992 ( 111.0 MiB) Bytes in malloc metadata MALLOC: MALLOC: =201945240 ( 192.6 MiB) Actual memory used (physical + swap) MALLOC: + 27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 27820765336 (26531.9 MiB) Virtual address space used MALLOC: MALLOC: 5639 Spans in use MALLOC: 29 Thread heaps in use MALLOC: 8192 Tcmalloc page size So it just seems the monitor is not returning unused memory into the OS or reusing already allocated memory it deems as free... Yep. This is a bug (best we can tell) in some versions of tcmalloc combined with certain distribution stacks, although I don't think we've seen it reported on Trusty (nor on a tcmalloc distribution that new) before. Alternatively some folks are seeing tcmalloc use up lots of CPU in other scenarios involving memory return and it may manifest like this, but I'm not sure. You could look through the mailing list for information on it. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Clients' connection for concurrent access to ceph
On 22/07/15 20:39, Shneur Zalman Mattern wrote: Third test: We wanted to try CephFS, because our client is familiar with Lustre, that's very near to CephFS capabilities: 1. I've used my CEPH nodes in the client's role. I've mounted CephFS on one of nodes, and ran dd with bs=1M ... - I've got wonderful write performance ~ 1.1 GBytes/s (really near to 10Gbit network throughput) 2. I've connected CentOS client to 10gig public network, mounted CephFS, but ... - It was just ~ 250 MBytes/s 3. I've connected Ubuntu client (non-ceph member) to 10gig public network, mounted CephFS, and ... - It was also ~ 260 MBytes/s Now I have to know: perhaps ceph-members-nodes have privileged access ??? While you're benchmarking, it's a good idea to try both the kernel client and the fuse client. You may find one works better than the other, and we'll find the numbers interesting too. You're using giant, the latest LTS release is hammer -- if you're interested in cephfs you'll be better off with hammer (lots of new stuff going in all the time). Aside from that, it's kind of surprising that your servers are working as better clients than your clients. Do the clients definitely have all the same kernel+ceph versions as the servers? Is the link between the clients and the servers definitely 10G all the way across your network? Is a pure network benchmark seeing the full 10gbps between a client node and a server node? Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd_agent_max_ops relating to number of OSDs in the cache pool
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gregory Farnum Sent: 22 July 2015 15:05 To: Nick Fisk n...@fisk.me.uk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] osd_agent_max_ops relating to number of OSDs in the cache pool On Sat, Jul 18, 2015 at 10:25 PM, Nick Fisk n...@fisk.me.uk wrote: Hi All, I’m doing some testing on the new High/Low speed cache tiering flushing and I’m trying to get my head round the effect that changing these 2 settings have on the flushing speed. When setting the osd_agent_max_ops to 1, I can get up to 20% improvement before the osd_agent_max_high_ops value kicks in for high speed flushing. Which is great for bursty workloads. As I understand it, these settings loosely effect the number of concurrent operations the cache pool OSD’s will flush down to the base pool. I may have got completely the wrong idea in my head but I can’t understand how a static default setting will work with different cache/base ratios. For example if I had a relatively small number of very fast cache tier OSD’s (PCI-E SSD perhaps) and a much larger number of base tier OSD’s, would the value need to be increased to ensure sufficient utilisation of the base tier and make sure that the cache tier doesn’t fill up too fast? Alternatively where the cache tier is based on spinning disks or where the base tier is not as comparatively large, this value may need to be reduced to stop it saturating the disks. Any Thoughts? I'm not terribly familiar with these exact values, but I think you've got it right. We can't make decisions at the level of the entire cache pool (because sharing that information isn't feasible), so we let you specify it on a per-OSD basis according to what setup you have. I've no idea if anybody has gathered up a matrix of baseline good settings or not. Thanks for your response. I will run a couple of tests to see if I can work out a rough rule of thumb for the settings. I'm guessing you don't want to do more than 1 or 2 concurrent ops per spinning disk to avoid over loading them. Maybe something like:- (# Base Tier Disks / Copies) / # Cache Tier Disks = Optimum number of concurrent flush operations ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??
I'm not sure. It looks like Ceph and your disk controllers are doing basically the right thing since you're going from 1GB/s to 420MB/s when moving from dd to Ceph (the full data journaling cuts it in half), but just fyi that dd task is not doing nearly the same thing as Ceph does — you'd need to use directio or similar; the conv=fsync flag means it will fsync the written data at the end of the run but not at any intermediate point. The change from 1 node to 2 cutting your performance so much is a bit odd. I do note that 1 node: 420 MB/s each 2 nodes: 320 MB/s each 5 nodes: 275 MB/s each so you appear to be reaching some kind of bound. Your note that dd can do 2GB/s without networking makes me think that you should explore that. As you say, network interrupts can be problematic in some systems. The only thing I can think of that's been really bad in the past is that some systems process all network interrupts on cpu 0, and you probably want to make sure that it's splitting them across CPUs. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??
Hi, Well I think the journaling would still appear in the dstat output, as that's still IOs : even if the user-side bandwidth indeed is cut in half, that should not be the case of disks IO. For instance I just tried a replicated pool for the test, and got around 1300MiB/s in dstat for about 600MiB/s in the rados bench - I take it that indeed, with replication/size=2, there's a total of 2 replicas, so that's 1 user IO for 2 * [1 replicas + 1 journals] / number of hosts = 600*2*2/2 = 1200MiBs of IOs per host (+/- the approximations) ... Using the dd flag oflag=sync indeed lowers the dstat values down to 1100-1300MiB/s. Still above what ceph uses with EC pools . I have tried to identify/watch interrupt issues (using the watch command), but I have to say I failed until know. The Broadcom card is indeed spreading the load on the cpus: # egrep 'CPU|p2p' /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 80: 881646372 1508 30 97328 0 10459270 2715 8753 0 12765 5100 9148 9420 0 PCI-MSI-edge p2p1 82: 179710 165107 94684 334842 210219 47403 270330 166877 3516 229043 709844660 16512 5088 2456312 12302 PCI-MSI-edge p2p1-fp-0 83: 12454 14073 5571 15196 5282 22301 11522 21299 4092581302069 1303 79810 705953243 1836 15190 883683 PCI-MSI-edge p2p1-fp-1 84: 6463 13994 57006 16200 16778 374815 558398 11902 695554360 94228 1252 18649 825684 7555 731875 190402 PCI-MSI-edge p2p1-fp-2 85: 163228 259899 143625 121326 107509 798435 168027 144088 75321 89962 55297 715175665 784356 53961 92153 92959 PCI-MSI-edge p2p1-fp-3 86:233267453226792070827220797122540051748938 39492831684674 65008514098872704778 140711 160954 5910372981286 672487805 PCI-MSI-edge p2p1-fp-4 87: 33772 233318 136341 58163 506773 183451 18269706 52425 226509 22150 17026 176203 5942 681346619 270341 87435 PCI-MSI-edge p2p1-fp-5 88: 65103573 105514146 51193688 51330824 41771147 61202946 41053735 49301547 181380 73028922 39525 172439 155778 108065 154750931 26348797 PCI-MSI-edge p2p1-fp-6 89: 59287698 120778879 43446789 47063897 39634087 39463210 46582805 48786230 342778 82670325 135397 438041 318995 3642955 179107495 833932 PCI-MSI-edge p2p1-fp-7 90: 1804 4453 2434 19885 11527 9771 12724 2392840 12721439 1166 3354 560 69386 9233 PCI-MSI-edge p2p2 92:6455149433007258203245273513 115645711838476 22200494039978 977482 15351931 9494511685983 772531 271810175312351954224 PCI-MSI-edge p2p2-fp-0 I don't know yet how to check if there are memory bandwith/latency/whatever issues... Regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD
Am 21.07.2015 12:06, schrieb Udo Lembke: Hi all, ... Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph and I'm back again... but this looks bad for me. Unfortunality the system also don't start 9 OSDs as I switched back to the old system-disk... (only three of the big OSDs are running well) What is the best solution for that? Empty one node (crush weight 0), fresh reinstall OS/ceph, reinitialise all OSDs? This will take a long long time, because we use 173TB in this cluster... Hi, answer myself if anybody has similiar issues and find the posting. Empty the whole nodes takes too long. I used the puppet wheezy system and have to recreate all OSDs (in this case I need to empty the first blocks of the journal before create the OSD again). Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com