date:20150723

Re: [ceph-users] Best method to limit snapshot/clone space overhead

2015-07-23 Thread Josh Durgin


On 07/23/2015 06:31 AM, Jan Schermer wrote:

Hi all,
I am looking for a way to alleviate the overhead of RBD snapshots/clones for 
some time.

In our scenario there are a few “master” volumes that contain production data, 
and are frequently snapshotted and cloned for dev/qa use. Those 
snapshots/clones live for a few days to a few weeks before they get dropped, 
and they sometimes grow very fast (databases, etc.).

With the default 4MB object size there seems to be huge overhead involved with 
this, could someone give me some hints on how to solve that?

I have some hope in

1) FIEMAP
I’ve calculated that files on my OSDs are approx. 30% filled with NULLs - I 
suppose this is what it could save (best-scenario) and it should also make COW 
operations much faster.
But there are lots of bugs in FIEMAP in kernels (i saw some reference to CentOS 
6.5 kernel being buggy - which is what we use) and filesystems (like XFS). No 
idea about ext4 which we’d like to use in the future.

Is enabling FIEMAP a good idea at all? I saw some mention of it being replaced 
with SEEK_DATA and SEEK_HOLE.


fiemap (and ceph's use of it) has been buggy on all fses in the past.
SEEK_DATA and SEEK_HOLE are the proper interfaces to use for these
purposes. That said, it's not incredibly well tested since it's off by
default, so I wouldn't recommend using it without careful testing on
the fs you're using. I wouldn't expect it to make much of a difference
if you use small objects.


2) object size  4MB for clones
I did some quick performance testing and setting this lower for production is 
probably not a good idea. My sweet spot is 8MB object size, however this would 
make the overhead for clones even worse than it already is.
But I could make the cloned images with a different block size from the 
snapshot (at least according to docs). Does someone use it like that? Any 
caveats? That way I could have the production data with 8MB block size but make 
the development snapshots with for example 64KiB granularity, probably at 
expense of some performance, but most of the data would remain in the (faster) 
master snapshot anyway. This should drop overhead tremendously, maybe even more 
than neabling FIEMAP. (Even better when working in tandem I suppose?)


Since these clones are relatively short-lived this seems like a better
way to go in the short term. 64k may be extreme, but if there aren't
too many of these clones it's not a big deal. There is more overhead
for recovery and scrub with smaller objects, so I wouldn't recommend
using tiny objects in general.

It'll be interesting to see your results. I'm not sure many folks
have looked at optimizing this use case.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-deploy won't write journal if partition exists and using -- dmcrypt

2015-07-23 Thread Sean

Sorry for the broken post previously. I have looked into this more and 
it looks like ceph-deploy is not seeing that it is a partition and 
attempting to create an additional partition in the journals place. I 
read in the documentation that if I set osd journal size = 0, that it 
will assume that the target is a block device and use the entire block. 
I tried this and it still doesn't work. I have since zapped the journals 
and specified a 20G journal size. Now in my ceph-deploy line I just 
specify :


ceph-deploy osd --dmcrypt --fs-type ${fs} create 
${host}:${disk}:/dev/${journal_disk}


IE::
ceph-deploy osd --dmcrypt --fs-type btrfs create kh28-1:sde:/dev/sdab
ceph-deploy osd --dmcrypt --fs-type btrfs create kh28-1:sdf:/dev/sdab

and ceph-deploy seems to try to create a new partition every time.

I have now run into a new issue though. After ceph-deploy creates the 
partitions and seems to bootstrap the disks successfully it doest not 
mount them properly to create the journal.




[ceph_deploy.osd][DEBUG ] Calling partprobe on zapped device /dev/sdr 
[1565/1920]

[kh28-3.osdc.io][INFO  ] Running command: sudo partprobe /dev/sdr
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/lacadmin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/local/bin/ceph-deploy 
osd --dmcrypt --fs-type btrfs create kh28-3.osdc.io:sdr:/dev/sdp2
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
kh28-3.osdc.io:/dev/sdr:/dev/sdp2

[kh28-3.osdc.io][DEBUG ] connection detected need for sudo
[kh28-3.osdc.io][DEBUG ] connected to host: kh28-3.osdc.io
[kh28-3.osdc.io][DEBUG ] detect platform information from remote host
[kh28-3.osdc.io][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to kh28-3.osdc.io
[kh28-3.osdc.io][DEBUG ] write cluster configuration to 
/etc/ceph/{cluster}.conf
[kh28-3.osdc.io][INFO  ] Running command: sudo udevadm trigger 
--subsystem-match=block --action=add
[ceph_deploy.osd][DEBUG ] Preparing host kh28-3.osdc.io disk /dev/sdr 
journal /dev/sdp2 activate True
[kh28-3.osdc.io][INFO  ] Running command: sudo ceph-disk -v prepare 
--fs-type btrfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys 
--cluster ceph -- /dev/sdr /dev/sdp2
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup 
osd_mkfs_options_btrfs
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup 
osd_fs_mkfs_options_btrfs
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup 
osd_mount_options_btrfs
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup 
osd_cryptsetup_parameters
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Creating journal file /dev/sdp2 
with size 0 (ceph-osd will resize and allocate)

[kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Journal is file /dev/sdp2
[kh28-3.osdc.io][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable 
if journal is not the same device as the osd data

[kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdr
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk 
--largest-new=1 --change-name=1:ceph data 
--partition-guid=1:c1879421-bcd0-4419-bc96-63d2d51176db 
--typecode=1:89c57f98-2fe5-4dc0-89c1-5ec00ceff2be -- /dev/sdr

[kh28-3.osdc.io][DEBUG ] The operation has completed successfully.
[kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Calling partprobe on created 
device /dev/sdr
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe 
/dev/sdr
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/udevadm 
settle
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/sbin/cryptsetup --batch-mode --key-file 
/etc/ceph/dmcrypt-keys/c1879421-bcd0-4419-bc96-63d2d51176db.luks.key 
luksFormat /dev/sdr1
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: 
/sbin/cryptsetup --key-file 
/etc/ceph/dmcrypt-keys/c1879421-bcd0-4419-bc96-63d2d51176db.luks.key 
luksOpen /dev/sdr1 c1879421-bcd0-4419-bc96-63d2d51176db
[kh28-3.osdc.io][WARNIN] DEBUG:ceph-disk:Creating btrfs fs on 
/dev/mapper/c1879421-bcd0-4419-bc96-63d2d51176db
[kh28-3.osdc.io][WARNIN] INFO:ceph-disk:Running command: /sbin/mkfs -t 
btrfs -m single -l 32768 -n 32768 -- 
/dev/mapper/c1879421-bcd0-4419-bc96-63d2d51176db
[kh28-3.osdc.io][WARNIN] Turning ON

Re: [ceph-users] Issue in communication of swift client and radosgw

2015-07-23 Thread Bindu Kharb

Hi,

Please Respond

Regards,
Bindu

On Fri, Jul 3, 2015 at 11:52 AM, Bindu Kharb bindu21in...@gmail.com wrote:

 Hi,

 I am trying to use swift as frontend with ceph storage. I have a small
 cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
 on one of my machine and radosgw(gateway1) is also up and communicating
 with cluster.

 Now I have installed swift client and created user and subuser. But I am
 unable to get bucket for the user.

 Below is my config file :

 /etc/ceph/ceph.conf
 [global]
 public_network = 172.18.59.0/24
 osd_pool_default_size = 2
 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea
 mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996
 mon_host = 172.18.59.205
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 filestore_xattr_use_omap = true
 debug ms = 1
 debug rgw = 20
 [client.radosgw.gateway1]
 host = ceph-Veriton-Series
 #rgw_dns_name = 172.18.59.201
 rgw_url = http://172.18.59.201:7481;
 #rgw_admin=admin
 keyring = /etc/ceph/keyring.radosgw.gateway1
 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok
 #rgw frontends=civetweb port=7481
 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log
 rgw print continue = false


 The file at location /etc/apache2/conf-available/gateway1.conf:

 VirtualHost *:80
 ServerName 172.18.59.201.ceph-Veriton-Series
 ServerAdmin ceph@172.18.59.201
 DocumentRoot /var/www
 # rewrting rules only need for amazon s3
 RewriteEngine On
 RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
 /s3gw.fcgi?page=$1params=$2%{QUERY_STRING}
 [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
 FastCgiExternalServer /var/www/s3gw.fcgi -socket
 /var/run/ceph/ceph-client.radosgw.gateway1.asok
 IfModule mod_fastcgi.c
 Directory /var/www
 Options +ExecCGI
 AllowOverride All
 SetHandler fastcgi-script
 Order allow,deny
 Allow from all
 AuthBasicAuthoritative Off
 /Directory
 /IfModule
 AllowEncodedSlashes On
 ErrorLog /var/log/apache2/error.log
 CustomLog /var/log/apache2/access.log combined
 ServerSignature Off
 /VirtualHost

 My cluster state  is:

 ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s
 cluster 17848c62-d69e-4991-a4dd-298358bb19ea
  health HEALTH_OK
  monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996=
 172.18.59.205:6789/0}, election epoch 1, quorum 0
 ceph4-Standard-PC-i440FX-PIIX-1996
  osdmap e1071: 2 osds: 2 up, 2 in
   pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects
 82106 MB used, 79394 MB / 166 GB avail
  264 active+clean
 ceph@ceph-Veriton-Series:/etc/apache2/conf-available$


 user info:
 sudo radosgw-admin user info --uid=testuser
 { user_id: testuser,
   display_name: First User,
   email: ,
   suspended: 0,
   max_buckets: 1000,
   auid: 0,
   subusers: [
 { id: testuser:swift,
   permissions: none}],
   keys: [
 { user: testuser,
   access_key: NC4E8QHUSNDWDX18M6GB,
   secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T},
 { user: testuser:swift,
   access_key: R8UYRI7HXNW05BTJE2N7,
   secret_key: }],
   swift_keys: [
 { user: testuser:swift,
   secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}],
   caps: [],
   op_mask: read, write, delete,
   default_placement: ,
   placement_tags: [],
   bucket_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   user_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   temp_url_keys: []}

 I am using below script to get bucket:

 import boto
 import boto.s3.connection
 access_key = 'NC4E8QHUSNDWDX18M6GB'
 secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T'
 conn = boto.connect_s3(
 aws_access_key_id = access_key,
 aws_secret_access_key = secret_key,
 host = 'ceph-Veriton-Series',
 is_secure=False,
 calling_format = boto.s3.connection.OrdinaryCallingFormat(),
 )
 bucket = conn.create_bucket('my-new-bucket')
 for bucket in conn.get_all_buckets():
 print {name}\t{created}.format(
 name = bucket.name,
 created = bucket.creation_date,
 )
 *When I run the script below is the error:*








 *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call
 last):  File s3test.py, line 12, in modulebucket =
 conn.create_bucket('my-new-bucket')  File
 /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in
 create_bucketresponse.status, response.reason,
 body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not
 AllowedNone*


 Below are the logs from log file:

 015-07-03 11:46:38.940247 af9f7b40  1 -- 172.18.59.201:0/1001130 --
 172.18.59.204:6800/3675 --

[ceph-users] rbd image-meta

2015-07-23 Thread Maged Mokhtar

Hello

i am trying to use the rbd image-meta set.
i get an error from rbd that this command is not recognized
yet it is documented in rdb documentation:
http://ceph.com/docs/next/man/8/rbd/

I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04
Is image-meta set supported in rbd in Hammer release ?

Any help much appreciated.
/Maged
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Issue in communication of swift client and RADOSGW

2015-07-23 Thread Bindu Kharb

Hi,

I am trying to use swift as frontend with ceph storage. I have a small
cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
on one of my machine and radosgw(gateway1) is also up and communicating
with cluster.

Now I have installed swift client and created user and subuser. But I am
unable to get bucket for the user.

Below is my config file :

/etc/ceph/ceph.conf
[global]
public_network = 172.18.59.0/24
osd_pool_default_size = 2
fsid = 17848c62-d69e-4991-a4dd-298358bb19ea
mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996
mon_host = 172.18.59.205
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
debug ms = 1
debug rgw = 20
[client.radosgw.gateway1]
host = ceph-Veriton-Series
#rgw_dns_name = 172.18.59.201
rgw_url = http://172.18.59.201:7481;
#rgw_admin=admin
keyring = /etc/ceph/keyring.radosgw.gateway1
rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok
#rgw frontends=civetweb port=7481
log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log
rgw print continue = false


The file at location /etc/apache2/conf-available/gateway1.conf:

VirtualHost *:80
ServerName 172.18.59.201.ceph-Veriton-Series
ServerAdmin ceph@172.18.59.201
DocumentRoot /var/www
# rewrting rules only need for amazon s3
RewriteEngine On
RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
/s3gw.fcgi?page=$1params=$2%{QUERY_STRING}
[E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
FastCgiExternalServer /var/www/s3gw.fcgi -socket
/var/run/ceph/ceph-client.radosgw.gateway1.asok
IfModule mod_fastcgi.c
Directory /var/www
Options +ExecCGI
AllowOverride All
SetHandler fastcgi-script
Order allow,deny
Allow from all
AuthBasicAuthoritative Off
/Directory
/IfModule
AllowEncodedSlashes On
ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log combined
ServerSignature Off
/VirtualHost

My cluster state  is:

ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s
cluster 17848c62-d69e-4991-a4dd-298358bb19ea
 health HEALTH_OK
 monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996=
172.18.59.205:6789/0}, election epoch 1, quorum 0
ceph4-Standard-PC-i440FX-PIIX-1996
 osdmap e1071: 2 osds: 2 up, 2 in
  pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects
82106 MB used, 79394 MB / 166 GB avail
 264 active+clean
ceph@ceph-Veriton-Series:/etc/apache2/conf-available$


user info:
sudo radosgw-admin user info --uid=testuser
{ user_id: testuser,
  display_name: First User,
  email: ,
  suspended: 0,
  max_buckets: 1000,
  auid: 0,
  subusers: [
{ id: testuser:swift,
  permissions: none}],
  keys: [
{ user: testuser,
  access_key: NC4E8QHUSNDWDX18M6GB,
  secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T},
{ user: testuser:swift,
  access_key: R8UYRI7HXNW05BTJE2N7,
  secret_key: }],
  swift_keys: [
{ user: testuser:swift,
  secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}],
  caps: [],
  op_mask: read, write, delete,
  default_placement: ,
  placement_tags: [],
  bucket_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  user_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  temp_url_keys: []}

I am using below script to get bucket:

import boto
import boto.s3.connection
access_key = 'NC4E8QHUSNDWDX18M6GB'
secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = 'ceph-Veriton-Series',
is_secure=False,
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.create_bucket('my-new-bucket')
for bucket in conn.get_all_buckets():
print {name}\t{created}.format(
name = bucket.name,
created = bucket.creation_date,
)
*When I run the script below is the error:*








*ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call
last):  File s3test.py, line 12, in modulebucket =
conn.create_bucket('my-new-bucket')  File
/usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in
create_bucketresponse.status, response.reason,
body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not
AllowedNone*


Below are the logs from log file:

015-07-03 11:46:38.940247 af9f7b40  1 -- 172.18.59.201:0/1001130 --
172.18.59.204:6800/3675 -- osd_op(client.5314.0:126 gc.24 [call
lock.unlock] 14.8bdc9d ondisk+write e1071) v4 -- ?+0 0xb3c08ae8 con
0xb7f6f1e8
2015-07-03 11:46:39.039568 b3bffb40  1 -- 172.18.59.201:0/1001130 == osd.0
172.18.59.204:6800/3675 109  osd_op_reply(126 gc.24 [call] v1071'856

Re: [ceph-users] Issue in communication of swift client and radosgw

2015-07-23 Thread Bindu Kharb

Hi ceph users,

Please respond to my query..

Regards,
Bindu

On Fri, Jul 3, 2015 at 11:52 AM, Bindu Kharb bindu21in...@gmail.com wrote:

 Hi,

 I am trying to use swift as frontend with ceph storage. I have a small
 cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
 on one of my machine and radosgw(gateway1) is also up and communicating
 with cluster.

 Now I have installed swift client and created user and subuser. But I am
 unable to get bucket for the user.

 Below is my config file :

 /etc/ceph/ceph.conf
 [global]
 public_network = 172.18.59.0/24
 osd_pool_default_size = 2
 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea
 mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996
 mon_host = 172.18.59.205
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 filestore_xattr_use_omap = true
 debug ms = 1
 debug rgw = 20
 [client.radosgw.gateway1]
 host = ceph-Veriton-Series
 #rgw_dns_name = 172.18.59.201
 rgw_url = http://172.18.59.201:7481;
 #rgw_admin=admin
 keyring = /etc/ceph/keyring.radosgw.gateway1
 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok
 #rgw frontends=civetweb port=7481
 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log
 rgw print continue = false


 The file at location /etc/apache2/conf-available/gateway1.conf:

 VirtualHost *:80
 ServerName 172.18.59.201.ceph-Veriton-Series
 ServerAdmin ceph@172.18.59.201
 DocumentRoot /var/www
 # rewrting rules only need for amazon s3
 RewriteEngine On
 RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
 /s3gw.fcgi?page=$1params=$2%{QUERY_STRING}
 [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
 FastCgiExternalServer /var/www/s3gw.fcgi -socket
 /var/run/ceph/ceph-client.radosgw.gateway1.asok
 IfModule mod_fastcgi.c
 Directory /var/www
 Options +ExecCGI
 AllowOverride All
 SetHandler fastcgi-script
 Order allow,deny
 Allow from all
 AuthBasicAuthoritative Off
 /Directory
 /IfModule
 AllowEncodedSlashes On
 ErrorLog /var/log/apache2/error.log
 CustomLog /var/log/apache2/access.log combined
 ServerSignature Off
 /VirtualHost

 My cluster state  is:

 ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s
 cluster 17848c62-d69e-4991-a4dd-298358bb19ea
  health HEALTH_OK
  monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996=
 172.18.59.205:6789/0}, election epoch 1, quorum 0
 ceph4-Standard-PC-i440FX-PIIX-1996
  osdmap e1071: 2 osds: 2 up, 2 in
   pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects
 82106 MB used, 79394 MB / 166 GB avail
  264 active+clean
 ceph@ceph-Veriton-Series:/etc/apache2/conf-available$


 user info:
 sudo radosgw-admin user info --uid=testuser
 { user_id: testuser,
   display_name: First User,
   email: ,
   suspended: 0,
   max_buckets: 1000,
   auid: 0,
   subusers: [
 { id: testuser:swift,
   permissions: none}],
   keys: [
 { user: testuser,
   access_key: NC4E8QHUSNDWDX18M6GB,
   secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T},
 { user: testuser:swift,
   access_key: R8UYRI7HXNW05BTJE2N7,
   secret_key: }],
   swift_keys: [
 { user: testuser:swift,
   secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}],
   caps: [],
   op_mask: read, write, delete,
   default_placement: ,
   placement_tags: [],
   bucket_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   user_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   temp_url_keys: []}

 I am using below script to get bucket:

 import boto
 import boto.s3.connection
 access_key = 'NC4E8QHUSNDWDX18M6GB'
 secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T'
 conn = boto.connect_s3(
 aws_access_key_id = access_key,
 aws_secret_access_key = secret_key,
 host = 'ceph-Veriton-Series',
 is_secure=False,
 calling_format = boto.s3.connection.OrdinaryCallingFormat(),
 )
 bucket = conn.create_bucket('my-new-bucket')
 for bucket in conn.get_all_buckets():
 print {name}\t{created}.format(
 name = bucket.name,
 created = bucket.creation_date,
 )
 *When I run the script below is the error:*








 *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call
 last):  File s3test.py, line 12, in modulebucket =
 conn.create_bucket('my-new-bucket')  File
 /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in
 create_bucketresponse.status, response.reason,
 body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not
 AllowedNone*


 Below are the logs from log file:

 015-07-03 11:46:38.940247 af9f7b40  1 -- 172.18.59.201:0/1001130 --
 172.18.59.204:6800/3675

[ceph-users] [ANN] ceps-deploy 1.5.26 released

2015-07-23 Thread Travis Rhoden

Hi everyone,

This is announcing a new release of ceph-deploy that focuses on usability 
improvements.

 - Most of the help menus for ceph-deploy subcommands (e.ge. “ceph-deploy mon” 
and “ceph-deploy osd”) have been improved to be more context aware, such that 
help for “ceph-deploy osd create --help “ and “ceph-deploy osd zap --help” 
return different output specific to the command.  Previously it would show 
generic help for “ceph-deploy osd”.  Additionally, the list of optional 
arguments shown for the command are always correct for the subcommand in 
question.  Previously the options shown were the aggregate of all options.

 - ceph-deploy now points to git.ceph.com for downloading GPG keys

 - ceph-deploy will now work on the Mint Linux distribution (by pointing to 
Ubuntu packages)

 - SUSE distro users will now be pointed to SUSE packages by default, as there 
have not been updated SUSE packages on ceph.com in quite some time.

Full changelog is available at: 
http://ceph.com/ceph-deploy/docs/changelog.html#id1

New packages are available in the usual places of ceph.com hosted repos and 
PyPI.

Cheers,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd image-meta

2015-07-23 Thread Haomai Wang

image metadata isn't supported by hammer, interfails supports

On Mon, Jul 13, 2015 at 11:29 PM, Maged Mokhtar magedsmokh...@gmail.com wrote:
 Hello

 i am trying to use the rbd image-meta set.
 i get an error from rbd that this command is not recognized
 yet it is documented in rdb documentation:
 http://ceph.com/docs/next/man/8/rbd/

 I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04
 Is image-meta set supported in rbd in Hammer release ?

 Any help much appreciated.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] debugging ceps-deploy warning: could not open file descriptor -1

2015-07-23 Thread Noah Watkins

The docker/distribution project runs a continuous integration VM using
CircleCI, and part of the VM setup installs Ceph packages using
ceph-deploy. This has been working well for quite a while, but we are
seeing a failure running `ceph-deploy install --release hammer`. The
snippet is here where it looks the first problem shows up.

...
[box156][DEBUG ] Get:24 http://ceph.com/debian-hammer/ precise/main
ceph-mds amd64 0.94.2-1precise [10.5 MB]
[box156][DEBUG ] Get:25 http://ceph.com/debian-hammer/ precise/main
radosgw amd64 0.94.2-1precise [3,619 kB]
[box156][WARNIN] E: Could not open file descriptor -1
[box156][WARNIN] E: Prior errors apply to
/var/cache/apt/archives/parted_2.3-19ubuntu1_amd64.deb
...

On the surface it seems that the problem is coming from apt-get under
the hood. Any pointers here? It doesn't seem like anything has changed
configuration wise. The full build log can be found here which starts
off with the ceph-deploy command that is failing:

https://circleci.com/gh/docker/distribution/1848

Thanks,
-Noah
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD

2015-07-23 Thread Udo Lembke

Hi,
I use ceph 0.94 from wheezy repro (deb http://eu.ceph.com/debian-hammer wheezy 
main) inside jessie.
0.94.1 are installable without trouble, but an upgrade to 0.94.2 don't work 
correctly:
dpkg -l | grep ceph
ii  ceph   0.94.1-1~bpo70+1  amd64  
  distributed storage and file system
ii  ceph-common0.94.2-1~bpo70+1  amd64  
  common utilities to mount and interact
with a ceph storage cluster
ii  ceph-fs-common 0.94.2-1~bpo70+1  amd64  
  common utilities to mount and interact
with a ceph file system
ii  ceph-fuse  0.94.2-1~bpo70+1  amd64  
  FUSE-based client for the Ceph
distributed file system
ii  ceph-mds   0.94.2-1~bpo70+1  amd64  
  metadata server for the ceph
distributed file system
ii  libcephfs1 0.94.2-1~bpo70+1  amd64  
  Ceph distributed file system client
library
ii  python-cephfs  0.94.2-1~bpo70+1  amd64  
  Python libraries for the Ceph
libcephfs library

This is the reason, why I switched back to wheezy (and clean 0.94.2) but than 
all OSDs on that node failed to start.
Switching back to the jessie-system-disk don't solve this ploblem, because only 
3 OSDs started again...


My conclusion is, if now die one of my (partly brocken) jessie osd-node (like 
failed system ssd) I need less than an
hour for a new system (wheezy), around two ours to reinitilize all OSDs (format 
new, install ceph) and around two days
to refill the whole node.

Udo

Am 23.07.2015 13:21, schrieb Haomai Wang:
 Do you use upstream ceph version previously? Or do you shutdown
 running ceph-osd when upgrading osd?
 
 How many osds meet this problems?
 
 This assert failure means that osd detects a upgraded pg meta object
 but failed to read(or lack of 1 key) meta keys from object.
 
 On Thu, Jul 23, 2015 at 7:03 PM, Udo Lembke ulem...@polarzone.de wrote:
 Am 21.07.2015 12:06, schrieb Udo Lembke:
 Hi all,
 ...

 Normaly I would say, if one OSD-Node die, I simply reinstall the OS and 
 ceph and I'm back again... but this looks bad
 for me.
 Unfortunality the system also don't start 9 OSDs as I switched back to the 
 old system-disk... (only three of the big
 OSDs are running well)

 What is the best solution for that? Empty one node (crush weight 0), fresh 
 reinstall OS/ceph, reinitialise all OSDs?
 This will take a long long time, because we use 173TB in this cluster...



 Hi,
 answer myself if anybody has similiar issues and find the posting.

 Empty the whole nodes takes too long.
 I used the puppet wheezy system and have to recreate all OSDs (in this case 
 I need to empty the first blocks of the
 journal before create the OSD again).


 Udo
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Issue in communication of swift client and radosgw

2015-07-23 Thread Bindu Kharb

Hi,

I am trying to use swift as frontend with ceph storage. I have a small
cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
on one of my machine and radosgw(gateway1) is also up and communicating
with cluster.

Now I have installed swift client and created user and subuser. But I am
unable to get bucket for the user.

Below is my config file :

/etc/ceph/ceph.conf
[global]
public_network = 172.18.59.0/24
osd_pool_default_size = 2
fsid = 17848c62-d69e-4991-a4dd-298358bb19ea
mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996
mon_host = 172.18.59.205
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
debug ms = 1
debug rgw = 20
[client.radosgw.gateway1]
host = ceph-Veriton-Series
#rgw_dns_name = 172.18.59.201
rgw_url = http://172.18.59.201:7481;
#rgw_admin=admin
keyring = /etc/ceph/keyring.radosgw.gateway1
rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok
#rgw frontends=civetweb port=7481
log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log
rgw print continue = false


The file at location /etc/apache2/conf-available/gateway1.conf:

VirtualHost *:80
ServerName 172.18.59.201.ceph-Veriton-Series
ServerAdmin ceph@172.18.59.201
DocumentRoot /var/www
# rewrting rules only need for amazon s3
RewriteEngine On
RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
/s3gw.fcgi?page=$1params=$2%{QUERY_STRING}
[E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
FastCgiExternalServer /var/www/s3gw.fcgi -socket
/var/run/ceph/ceph-client.radosgw.gateway1.asok
IfModule mod_fastcgi.c
Directory /var/www
Options +ExecCGI
AllowOverride All
SetHandler fastcgi-script
Order allow,deny
Allow from all
AuthBasicAuthoritative Off
/Directory
/IfModule
AllowEncodedSlashes On
ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log combined
ServerSignature Off
/VirtualHost

My cluster state  is:

ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s
cluster 17848c62-d69e-4991-a4dd-298358bb19ea
 health HEALTH_OK
 monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996=
172.18.59.205:6789/0}, election epoch 1, quorum 0
ceph4-Standard-PC-i440FX-PIIX-1996
 osdmap e1071: 2 osds: 2 up, 2 in
  pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects
82106 MB used, 79394 MB / 166 GB avail
 264 active+clean
ceph@ceph-Veriton-Series:/etc/apache2/conf-available$


user info:
sudo radosgw-admin user info --uid=testuser
{ user_id: testuser,
  display_name: First User,
  email: ,
  suspended: 0,
  max_buckets: 1000,
  auid: 0,
  subusers: [
{ id: testuser:swift,
  permissions: none}],
  keys: [
{ user: testuser,
  access_key: NC4E8QHUSNDWDX18M6GB,
  secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T},
{ user: testuser:swift,
  access_key: R8UYRI7HXNW05BTJE2N7,
  secret_key: }],
  swift_keys: [
{ user: testuser:swift,
  secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}],
  caps: [],
  op_mask: read, write, delete,
  default_placement: ,
  placement_tags: [],
  bucket_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  user_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  temp_url_keys: []}

I am using below script to get bucket:

import boto
import boto.s3.connection
access_key = 'NC4E8QHUSNDWDX18M6GB'
secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = 'ceph-Veriton-Series',
is_secure=False,
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.create_bucket('my-new-bucket')
for bucket in conn.get_all_buckets():
print {name}\t{created}.format(
name = bucket.name,
created = bucket.creation_date,
)
*When I run the script below is the error:*








*ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call
last):  File s3test.py, line 12, in modulebucket =
conn.create_bucket('my-new-bucket')  File
/usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in
create_bucketresponse.status, response.reason,
body)boto.exception.S3ResponseError: S3ResponseError: 405 Method Not
AllowedNone*


Below are the logs from log file:

015-07-03 11:46:38.940247 af9f7b40  1 -- 172.18.59.201:0/1001130 --
172.18.59.204:6800/3675 -- osd_op(client.5314.0:126 gc.24 [call
lock.unlock] 14.8bdc9d ondisk+write e1071) v4 -- ?+0 0xb3c08ae8 con
0xb7f6f1e8
2015-07-03 11:46:39.039568 b3bffb40  1 -- 172.18.59.201:0/1001130 == osd.0
172.18.59.204:6800/3675 109  osd_op_reply(126 gc.24 [call] v1071'856

Re: [ceph-users] RADOS + deep scrubbing performance issues in production environment

2015-07-23 Thread icq2206241

All IO drops to ZERO IOPS for 1-15 minutes during the deep-scrub on my cluster. 
There is clearly a locking bug! 

I have VMs - every day, several times, sometime on all of them disk IO 
_completely_ stops. Disk queue is growing, 0 IOPS are performed, services are 
dying with timeouts... At the same time the CEPH (where the VM images are 
stored) is doing a deep scrub. No fiddling with priorities and number of 
different threads are helping. Actually, making the scrub slower makes those 
delays longer - so there is clearly a bug with locking. 

I am experiencing this for two years already, since then we tried everything 
and upgraded our cluster several times! Nothing helps!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Mallikarjun Biradar

Hi all,

Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.

I have active client IO running on cluster. (Random write profile with
4M block size  64 Queue depth).

One of enclosure had power loss. So all OSD's from hosts that are
connected to this enclosure went down as expected.

But client IO got paused. After some time enclosure  hosts connected
to it came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts  OSD's
pertaining to that enclosure came up, client IO resumed.


Can anybody help me why cluster not serving IO during enclosure
failure. OR its a bug?

-Thanks  regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Mallikarjun Biradar

Hi all,

Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.

I have active client IO running on cluster. (Random write profile with
4M block size  64 Queue depth).

One of enclosure had power loss. So all OSD's from hosts that are
connected to this enclosure went down as expected.

But client IO got paused. After some time enclosure  hosts connected
to it came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts  OSD's
pertaining to that enclosure came up, client IO resumed.


Can anybody help me why cluster not serving IO during enclosure
failure. OR its a bug?

-Thanks  regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Multi-DC Ceph replication

2015-07-23 Thread Pawel Komorowski

Hi,

We are trying to implement CEPH and have really huge issue with replication 
between DC.
The issue we have is related to replication setup in our infrastructure, single 
region, 2 zones in different datacenter. While trying to configure replication 
we receive bellow message. We wonder if this replication is really working and 
if you know anyone who manage to have it configure and run on production?


#39;, 'User-Agent': 'Boto/2.20.1 Python/2.7.6 Linux/3.13.0-45-generic', 
'x-amz-copy-source': 'testowy_bucket/obiekt1.png', 'Date': 'Wed, 01 Jul 2015 
12:57:03 GMT', 'Content-Type': 'application/json; charset=UTF-8', 
'Authorization': 'AWS B:nD/AJX8ezov3qOCASK6Irz7yq30='}
2015-07-01 14:57:03,538 4782 [boto][DEBUG ] Path: 
/testowy_bucket/obiekt1.png?rgwx-op-id=rgw0%3A4775%3A3rgwx-source-zone=pl-krargwx-client-id=radosgw-agent
2015-07-01 14:57:03,538 4782 [boto][DEBUG ] Headers: {'Content-Type': 
'application/json; charset=UTF-8', 'x-amz-copy-source': 
'testowy_bucket/obiekt1.png'}x-amz-copy-source:testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,675 4782 [radosgw_agent.worker][DEBUG ] object 
testowy_bucket/obiekt1.png not found on master, deleting from secondary
2015-07-01 14:57:03,724 4782 [boto][DEBUG ] path=/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,724 4782 [boto][DEBUG ] 
auth_path=/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,725 4782 [boto][DEBUG ] Path: 
/testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,760 4782 [radosgw_agent.worker][DEBUG ] syncing object 
testowy_bucket/obiekt1.pngx-amz-copy-source:testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,761 4782 [boto][DEBUG ] url = 
'http://s3.5stor-dc5-test.local/testowy_bucket/obiekt1.png'headers={'Content-Length'http://s3.5stor-dc5-test.local/testowy_bucket/obiekt1.png'headers=%7b'Content-Length';:
 '0', 'User-Agent': 'Boto/2.20.1 Python/2.7.6 Linux/3.13.0-45-generic', 
'x-amz-copy-source': 'testowy_bucket/obiekt1.png', 'Date': 'Wed, 01 Jul 2015 
12:57:03 GMT', 'Content-Type': 'application/json; charset=UTF-8', 
'Authorization': 'AWS B:nD/AJX8ezov3qOCASK6Irz7yq30='}
2015-07-01 14:57:03,761 4782 [boto][DEBUG ] Path: 
/testowy_bucket/obiekt1.png?rgwx-op-id=rgw0%3A4775%3A4rgwx-source-zone=pl-krargwx-client-id=radosgw-agent
2015-07-01 14:57:03,761 4782 [boto][DEBUG ] Headers: {'Content-Type': 
'application/json; charset=UTF-8', 'x-amz-copy-source': 
'testowy_bucket/obiekt1.png'}x-amz-copy-source:testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,854 4782 [radosgw_agent.worker][DEBUG ] object 
testowy_bucket/obiekt1.png not found on master, deleting from secondary
2015-07-01 14:57:03,899 4782 [boto][DEBUG ] path=/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,899 4782 [boto][DEBUG ] 
auth_path=/testowy_bucket/obiekt1.png
2015-07-01 14:57:03,899 4782 [boto][DEBUG ] Path: 
/testowy_bucket/obiekt1.png/testowy_bucket/obiekt1.png


[Opis: cid:C14BC2E9-C105-482E-A443-F0D6567EA2F8@allegrogroup.internal]


Pawel Komorowski
Product Owner



M:
x

+48 664 434 518

Grunwaldzka 182
60-166 Poznan, Poland



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rbd image-meta

2015-07-23 Thread Maged Mokhtar

Hello

i am trying to use the rbd image-meta set.
i get an error from rbd that this command is not recognized
yet it is documented in rdb documentation:
http://ceph.com/docs/next/man/8/rbd/

I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04
Is image-meta set supported in rbd in Hammer release ?

Any help much appreciated.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Mallikarjun Biradar

Hi all,

Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.

I have active client IO running on cluster. (Random write profile with
4M block size  64 Queue depth).

One of enclosure had power loss. So all OSD's from hosts that are
connected to this enclosure went down as expected.

But client IO got paused. After some time enclosure  hosts connected
to it came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts  OSD's
pertaining to that enclosure came up, client IO resumed.


Can anybody help me why cluster not serving IO during enclosure
failure. OR its a bug?

-Thanks  regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Issue in communication of swift client and radosgw

2015-07-23 Thread Massimo Fazzolari

 You should add the required capabilities to your user:

# radosgw-admin caps add --uid=testuser  --caps=users=*
# radosgw-admin caps add --uid=testuser  --caps=buckets=*
# radosgw-admin caps add --uid=testuser  --caps=metadata=*
# radosgw-admin caps add --uid=testuser --caps=zone=*

On 3 July 2015 at 08:22, Bindu Kharb bindu21in...@gmail.com wrote:

 Hi,

 I am trying to use swift as frontend with ceph storage. I have a small
 cluster(1MON, 2OSD). My cluster is working fine. I have installed radosgw
 on one of my machine and radosgw(gateway1) is also up and communicating
 with cluster.

 Now I have installed swift client and created user and subuser. But I am
 unable to get bucket for the user.

 Below is my config file :

 /etc/ceph/ceph.conf
 [global]
 public_network = 172.18.59.0/24
 osd_pool_default_size = 2
 fsid = 17848c62-d69e-4991-a4dd-298358bb19ea
 mon_initial_members = ceph4-Standard-PC-i440FX-PIIX-1996
 mon_host = 172.18.59.205
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 filestore_xattr_use_omap = true
 debug ms = 1
 debug rgw = 20
 [client.radosgw.gateway1]
 host = ceph-Veriton-Series
 #rgw_dns_name = 172.18.59.201
 rgw_url = http://172.18.59.201:7481;
 #rgw_admin=admin
 keyring = /etc/ceph/keyring.radosgw.gateway1
 rgw socket path = /var/run/ceph/ceph-client.radosgw.gateway1.asok
 #rgw frontends=civetweb port=7481
 log file = /var/log/radosgw/ceph-client.radosgw.gateway1.log
 rgw print continue = false


 The file at location /etc/apache2/conf-available/gateway1.conf:

 VirtualHost *:80
 ServerName 172.18.59.201.ceph-Veriton-Series
 ServerAdmin ceph@172.18.59.201
 DocumentRoot /var/www
 # rewrting rules only need for amazon s3
 RewriteEngine On
 RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
 /s3gw.fcgi?page=$1params=$2%{QUERY_STRING}
 [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
 FastCgiExternalServer /var/www/s3gw.fcgi -socket
 /var/run/ceph/ceph-client.radosgw.gateway1.asok
 IfModule mod_fastcgi.c
 Directory /var/www
 Options +ExecCGI
 AllowOverride All
 SetHandler fastcgi-script
 Order allow,deny
 Allow from all
 AuthBasicAuthoritative Off
 /Directory
 /IfModule
 AllowEncodedSlashes On
 ErrorLog /var/log/apache2/error.log
 CustomLog /var/log/apache2/access.log combined
 ServerSignature Off
 /VirtualHost

 My cluster state  is:

 ceph@ceph-Veriton-Series:/etc/apache2/conf-available$ ceph -s
 cluster 17848c62-d69e-4991-a4dd-298358bb19ea
  health HEALTH_OK
  monmap e1: 1 mons at {ceph4-Standard-PC-i440FX-PIIX-1996=
 172.18.59.205:6789/0}, election epoch 1, quorum 0
 ceph4-Standard-PC-i440FX-PIIX-1996
  osdmap e1071: 2 osds: 2 up, 2 in
   pgmap v3493: 264 pgs, 12 pools, 1145 kB data, 59 objects
 82106 MB used, 79394 MB / 166 GB avail
  264 active+clean
 ceph@ceph-Veriton-Series:/etc/apache2/conf-available$


 user info:
 sudo radosgw-admin user info --uid=testuser
 { user_id: testuser,
   display_name: First User,
   email: ,
   suspended: 0,
   max_buckets: 1000,
   auid: 0,
   subusers: [
 { id: testuser:swift,
   permissions: none}],
   keys: [
 { user: testuser,
   access_key: NC4E8QHUSNDWDX18M6GB,
   secret_key: kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T},
 { user: testuser:swift,
   access_key: R8UYRI7HXNW05BTJE2N7,
   secret_key: }],
   swift_keys: [
 { user: testuser:swift,
   secret_key: eSWgLkDXTBPxOKf2cMWDdHwZPuFHAnDwQ3aUYXRF}],
   caps: [],
   op_mask: read, write, delete,
   default_placement: ,
   placement_tags: [],
   bucket_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   user_quota: { enabled: false,
   max_size_kb: -1,
   max_objects: -1},
   temp_url_keys: []}

 I am using below script to get bucket:

 import boto
 import boto.s3.connection
 access_key = 'NC4E8QHUSNDWDX18M6GB'
 secret_key = 'kRnFVL\/Z5oUur15E+CNbGPqCLDpBV1AgvLHTos7T'
 conn = boto.connect_s3(
 aws_access_key_id = access_key,
 aws_secret_access_key = secret_key,
 host = 'ceph-Veriton-Series',
 is_secure=False,
 calling_format = boto.s3.connection.OrdinaryCallingFormat(),
 )
 bucket = conn.create_bucket('my-new-bucket')
 for bucket in conn.get_all_buckets():
 print {name}\t{created}.format(
 name = bucket.name,
 created = bucket.creation_date,
 )
 *When I run the script below is the error:*








 *ceph@ceph-Veriton-Series:~$ python s3test.py Traceback (most recent call
 last):  File s3test.py, line 12, in modulebucket =
 conn.create_bucket('my-new-bucket')  File
 /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 504, in
 create_bucketresponse.status, response.reason,

Re: [ceph-users] el6 repo problem?

2015-07-23 Thread Jan Schermer

The packages were probably rebuilt without changing their name/version (bad 
idea btw) and metadata either weren’t regenerated because of that or because of 
some other problem.
You can mirror it and generate your own metadata or install the packages by 
hand until it gets fixed.

Jan

P.S.In my experience it’s best to always put a build number in the filename to 
avoid stuff like this, unless you can make sure you generate the same binary 
package every time (And that’s pretty hard usually).


 On 23 Jul 2015, at 15:14, Samuel Taylor Liston sam.lis...@utah.edu wrote:
 
 I am having the same issue and haven't figured out a resolution yet. The repo 
 is pointing to a valid URL, and I can whet the packages from that URL, but 
 yum complains about them. My initial thought is that something is screwy with 
 the md5sum either on package versions  in the repo, or in my rpm db, but I 
 have not confirmed that. 
 
 Samuel T. Liston
 
 Ctr. for High Perf. Computing
 Univ. of Utah
 801.232.6932
 
 
 On Jul 23, 2015, at 7:05 AM, Wayne Betts wbe...@bnl.gov wrote:
 
 I'm trying to use the ceph el6 yum repo.  Yesterday afternoon, I found yum 
 complain about 8 packages when trying to install or update ceph, such as 
 this:
 
 (4/46): ceph-0.94.2-0.el6.x86_64.rpm   |  21 MB 00:01
 http://ceph.com/rpm-hammer/el6/x86_64/ceph-0.94.2-0.el6.x86_64.rpm: [Errno 
 -1] Package does not match intended download. Suggestion: run yum 
 --enablerepo=Ceph clean metadata
 
 
 The other packages with the same fault are
 
 libcephfs1-0.94.2-0.el6.x86_64
 librbd1-0.94.2-0.el6.x86_64
 python-rados-0.94.2-0.el6.x86_64
 python-cephfs-0.94.2-0.el6.x86_64
 librados2-0.94.2-0.el6.x86_64
 python-rbd-0.94.2-0.el6.x86_64
 ceph-common-0.94.2-0.el6.x86_64
 
 
 This is happening on all three machines I've tried it on.  I've tried 
 cleaning the metadata on my hosts as per the suggestion, without any change.
 
 There was no trouble pulling these packages with yum on Tuesday, July 14, 
 and I can still use wget to pull individual packages seemingly without any 
 problem.
 
 Is anyone else experiencing the same problem?  Can the repo maintainers look 
 into this?  (Rebuild the metadata, or flush their reverse proxy server(s) if 
 any?)
 
 Any suggestions for me to try on the client side?
 
 -- 
 -Wayne Betts
 STAR Computing Support at BNL
 
 Physics Dept.
 PO Box 5000
 Upton, NY 11973
 
 wbe...@bnl.gov
 631-344-3285
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

Hello,

I'm having an issue with nginx writing to cephfs. Often I'm getting:

writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted
system call) while reading upstream

looking with strace, this happens:

...
write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted)

It happens after first 4MBs (exactly) are written, subsequent write gets
ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or
64MBs, etc are written). Apparently nginx doesn't expect this and
doesn't handle it so it cancels writes and deletes this partial file.

Is it possible Ceph cannot find the destination PG fast enough and
returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?

Regards,
Vedran



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Help with radosgw admin ops

2015-07-23 Thread Oscar Redondo Villoslada

Hi, I'm trying to use the Curl for rados admin ops requests.

I have problems with the keys, you use this autorizacción Authorization: AWS 
{access-key}: {hash-of-header-and-secret}.

Where I can get the hash-of-header-and-secret?


Info of user:

radosgw-admin user info --uid=usuario1

{

user_id: usuario1,

display_name: usuario1,

email: ,

suspended: 0,

max_buckets: 1000,

auid: 0,

subusers: [],

keys: [

{

user: usuario1,

access_key: claveacceso,

secret_key: temporal

}

],

swift_keys: [],

caps: [

{

type: usage,

perm: write

},

{

type: user,

perm: write

}

],

op_mask: read, write, delete,

default_placement: ,

placement_tags: [],

bucket_quota: {

enabled: false,

max_size_kb: -1,

max_objects: -1

},

user_quota: {

enabled: false,

max_size_kb: -1,

max_objects: -1

},

temp_url_keys: []

}


I made this scrip :

#!/bin/bash


token=claveacceso

secret=temporal

query=$1

date=`date -Ru`


header=PUT\n${content_md5}\n${content_type}\n${date}\n${query}

sig=`echo -en ${header} | openssl sha1 -hmac ${secret} -binary | base64`


curl -i -X GET http://10.0.2.10/admin/usage?format=json; -H Date: ${date} \

  -H Authorization: AWS ${token}:${sig} -H 'Host: adminnode'


The result is:


sh prueba3.sh

HTTP/1.1 403 Forbidden

Date: Fri, 03 Jul 2015 11:41:17 GMT

Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips

Accept-Ranges: bytes

Content-Length: 32

Content-Type: application/json




Version of  ceph is: ceph version 0.94.2 
(5fb85614ca8f354284c713a2f9c610860720bbf3)

Version  CentOS Linux release 7.1.1503 (Core)


could give me documentation of how to use them?


Thanks, Oscar.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Gregory Farnum

On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 Hello,

 I'm having an issue with nginx writing to cephfs. Often I'm getting:

 writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted
 system call) while reading upstream

 looking with strace, this happens:

 ...
 write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted)

 It happens after first 4MBs (exactly) are written, subsequent write gets
 ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or
 64MBs, etc are written). Apparently nginx doesn't expect this and
 doesn't handle it so it cancels writes and deletes this partial file.

 Is it possible Ceph cannot find the destination PG fast enough and
 returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?

That's...odd. Are you using the kernel client or ceph-fuse, and on
which version?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-07-23 Thread Dominik Zalewski

Hi,

I’ve asked same question last weeks or so (just search the mailing list 
archives for EnhanceIO :) and got some interesting answers.

Looks like the project is pretty much dead since it was bought out by HGST. 
Even their website has some broken links in regards to EnhanceIO

I’m keen to try flashcache or bcache (its been in the mainline kernel for some 
time)

Dominik

 On 1 Jul 2015, at 21:13, German Anders gand...@despegar.com wrote:
 
 Hi cephers,
 
Is anyone out there that implement enhanceIO in a production environment? 
 any recommendation? any perf output to share with the diff between using it 
 and not?
 
 Thanks in advance,
 
 
 German
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] debugging ceps-deploy warning: could not open file descriptor -1

2015-07-23 Thread Noah Watkins

Nevermind. I see that `ceph-deploy mon create-initial` has stopped
accepting the trailing hostname which was causing the failure. I don't
know if those problems above I showed are actually anything to worry
about :)

On Tue, Jul 21, 2015 at 3:17 PM, Noah Watkins noahwatk...@gmail.com wrote:
 The docker/distribution project runs a continuous integration VM using
 CircleCI, and part of the VM setup installs Ceph packages using
 ceph-deploy. This has been working well for quite a while, but we are
 seeing a failure running `ceph-deploy install --release hammer`. The
 snippet is here where it looks the first problem shows up.

 ...
 [box156][DEBUG ] Get:24 http://ceph.com/debian-hammer/ precise/main
 ceph-mds amd64 0.94.2-1precise [10.5 MB]
 [box156][DEBUG ] Get:25 http://ceph.com/debian-hammer/ precise/main
 radosgw amd64 0.94.2-1precise [3,619 kB]
 [box156][WARNIN] E: Could not open file descriptor -1
 [box156][WARNIN] E: Prior errors apply to
 /var/cache/apt/archives/parted_2.3-19ubuntu1_amd64.deb
 ...

 On the surface it seems that the problem is coming from apt-get under
 the hood. Any pointers here? It doesn't seem like anything has changed
 configuration wise. The full build log can be found here which starts
 off with the ceph-deploy command that is failing:

 https://circleci.com/gh/docker/distribution/1848

 Thanks,
 -Noah
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Mallikarjun Biradar

Hi all,

Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.

I have active client IO running on cluster. (Random write profile with 4M
block size  64 Queue depth).

One of enclosure had power loss. So all OSD's from hosts that are connected
to this enclosure went down as expected.

But client IO got paused. After some time enclosure  hosts connected to it
came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts  OSD's
pertaining to that enclosure came up, client IO resumed.


Can anybody help me why cluster not serving IO during enclosure failure. OR
its a bug?

-Thanks  regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Mallikarjun Biradar

Hi all,

Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.

I have active client IO running on cluster. (Random write profile with 4M
block size  64 Queue depth).

One of enclosure had power loss. So all OSD's from hosts that are connected
to this enclosure went down as expected.

But client IO got paused. After some time enclosure  hosts connected to it
came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts  OSD's
pertaining to that enclosure came up, client IO resumed.


Can anybody help me why cluster not serving IO during enclosure failure. OR
its a bug?

-Thanks  regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] el6 repo problem?

2015-07-23 Thread Samuel Taylor Liston

I am having the same issue and haven't figured out a resolution yet. The repo 
is pointing to a valid URL, and I can whet the packages from that URL, but yum 
complains about them. My initial thought is that something is screwy with the 
md5sum either on package versions  in the repo, or in my rpm db, but I have not 
confirmed that. 

Samuel T. Liston

Ctr. for High Perf. Computing
Univ. of Utah
801.232.6932


 On Jul 23, 2015, at 7:05 AM, Wayne Betts wbe...@bnl.gov wrote:
 
 I'm trying to use the ceph el6 yum repo.  Yesterday afternoon, I found yum 
 complain about 8 packages when trying to install or update ceph, such as this:
 
 (4/46): ceph-0.94.2-0.el6.x86_64.rpm   |  21 MB 00:01
 http://ceph.com/rpm-hammer/el6/x86_64/ceph-0.94.2-0.el6.x86_64.rpm: [Errno 
 -1] Package does not match intended download. Suggestion: run yum 
 --enablerepo=Ceph clean metadata
 
 
 The other packages with the same fault are
 
 libcephfs1-0.94.2-0.el6.x86_64
 librbd1-0.94.2-0.el6.x86_64
 python-rados-0.94.2-0.el6.x86_64
 python-cephfs-0.94.2-0.el6.x86_64
 librados2-0.94.2-0.el6.x86_64
 python-rbd-0.94.2-0.el6.x86_64
 ceph-common-0.94.2-0.el6.x86_64
 
 
 This is happening on all three machines I've tried it on.  I've tried 
 cleaning the metadata on my hosts as per the suggestion, without any change.
 
 There was no trouble pulling these packages with yum on Tuesday, July 14, and 
 I can still use wget to pull individual packages seemingly without any 
 problem.
 
 Is anyone else experiencing the same problem?  Can the repo maintainers look 
 into this?  (Rebuild the metadata, or flush their reverse proxy server(s) if 
 any?)
 
 Any suggestions for me to try on the client side?
 
 -- 
 -Wayne Betts
 STAR Computing Support at BNL
 
 Physics Dept.
 PO Box 5000
 Upton, NY 11973
 
 wbe...@bnl.gov
 631-344-3285
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

On 07/23/2015 03:20 PM, Gregory Farnum wrote:
 On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote:

 Is it possible Ceph cannot find the destination PG fast enough and
 returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?
 
 That's...odd. Are you using the kernel client or ceph-fuse, and on
 which version?

Not seeing write errors with ceph-fuse, but it's slow.

Regards,
Vedran


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ruby bindings for Librados

2015-07-23 Thread Ken Dreyer

- Original Message -
 From: Ken Dreyer kdre...@redhat.com
 To: ceph-users@lists.ceph.com
 Sent: Tuesday, July 14, 2015 9:06:01 PM
 Subject: Re: [ceph-users] Ruby bindings for Librados

 On 07/13/2015 02:11 PM, Wido den Hollander wrote:
  On 07/13/2015 09:43 PM, Corin Langosch wrote:
  Hi Wido,

  I'm the dev of https://github.com/netskin/ceph-ruby and still use it in
  production on some systems. It has everything I
  need so I didn't develop any further. If you find any bugs or need new
  features, just open an issue and I'm happy to
  have a look.

  Ah, that's great! We should look into making a Ruby binding official
  and moving it to Ceph's Github project. That would make it more clear
  for end-users.

  I see that RADOS namespaces are currently not implemented in the Ruby
  bindings. Not many bindings have them though. Might be worth looking at.

  I'll give the current bindings a try btw!

 I'd like to see this happen too. Corin, would you be amenable to moving
 this under the ceph GitHub org? You'd still have control over it,
 similar to the way Wido manages https://github.com/ceph/phprados

After some off-list email with Wido and Corin, I've set up 
https://github.com/ceph/ceph-ruby and a ceph-ruby GitHub team with Corin as 
the admin (similar to Wido's admin rights to phprados).

Have fun!

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Varada Kari

(Adding devel list to the CC)
Hi Eric,

To add more context to the problem:

Min_size was set to 1 and replication size is 2.

There was a flaky power connection to one of the enclosures.  With min_size 1, 
we were able to continue the IO's, and recovery was active once the power comes 
back. But if there is a power failure again when recovery is in progress, some 
of the PGs are going to down+peering state.

Extract from pg query.

$ ceph pg 1.143 query
{ state: down+peering,
  snap_trimq: [],
  epoch: 3918,
  up: [
17],
  acting: [
17],
  info: { pgid: 1.143,
  last_update: 3166'40424,
  last_complete: 3166'40424,
  log_tail: 2577'36847,
  last_user_version: 40424,
  last_backfill: MAX,
  purged_snaps: [],

.. recovery_state: [
{ name: Started\/Primary\/Peering\/GetInfo,
  enter_time: 2015-07-15 12:48:51.372676,
  requested_info_from: []},
{ name: Started\/Primary\/Peering,
  enter_time: 2015-07-15 12:48:51.372675,
  past_intervals: [
{ first: 3147,
  last: 3166,
  maybe_went_rw: 1,
  up: [
17,
4],
  acting: [
17,
4],
  primary: 17,
  up_primary: 17},
{ first: 3167,
  last: 3167,
  maybe_went_rw: 0,
  up: [
10,
20],
  acting: [
10,
20],
  primary: 10,
  up_primary: 10},
{ first: 3168,
  last: 3181,
  maybe_went_rw: 1,
  up: [
10,
20],
  acting: [
10,
4],
  primary: 10,
  up_primary: 10},
{ first: 3182,
  last: 3184,
  maybe_went_rw: 0,
  up: [
20],
  acting: [
4],
  primary: 4,
  up_primary: 20},
{ first: 3185,
  last: 3188,
  maybe_went_rw: 1,
  up: [
20],
  acting: [
20],
  primary: 20,
  up_primary: 20}],
  probing_osds: [
17,
20],
  blocked: peering is blocked due to down osds,
  down_osds_we_would_probe: [
4,
10],
  peering_blocked_by: [
{ osd: 4,
  current_lost_at: 0,
  comment: starting or marking this osd lost may let us 
proceed},
{ osd: 10,
  current_lost_at: 0,
  comment: starting or marking this osd lost may let us 
proceed}]},
{ name: Started,
  enter_time: 2015-07-15 12:48:51.372671}],
  agent_state: {}}

And Pgs are not coming to active+clean till power is resumed again. During this 
period no IOs are allowed to the cluster. Not able to follow why the PGs are 
ending up in peering state? Each Pg has two copies in both the enclosures. If 
one of enclosure is down for some time, should be able to serve IO's from the 
second one. That was true, if no recovery IO is involved. In case of any 
recovery, we are ending up some Pg's in down and peering state.

Thanks,
Varada


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eric 
Eastman
Sent: Thursday, July 23, 2015 8:37 PM
To: Mallikarjun Biradar mallikarjuna.bira...@gmail.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Enclosure power failure pausing client IO till all 
connected hosts up

You may want to check your min_size value for your pools.  If it is set to the 
pool size value, then the cluster will not do I/O if you loose a chassis.

On Sun, Jul 5, 2015 at 11:04 PM, Mallikarjun Biradar 
mallikarjuna.bira...@gmail.com wrote:
 Hi all,

 Setup details:
 Two storage enclosures each connected to 4 OSD nodes (Shared storage).
 Failure domain is Chassis (enclosure) level. Replication count is 2.
 Each host has allotted with 4 drives.

 I have active client IO running on cluster. (Random write profile with
 4M block size  64 Queue depth).

 One of enclosure had power loss. So all OSD's from hosts that are
 connected to this enclosure went down as expected.

 But client IO got paused. After some time enclosure  hosts connected
 to it came up.
 And all OSD's on that hosts came up.

 Till this time, cluster was not serving IO. Once all hosts  OSD's
 pertaining to that enclosure came up, client IO resumed.


 Can anybody help me why cluster not serving IO during

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

On 07/23/2015 03:20 PM, Gregory Farnum wrote:
 On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 Hello,

 I'm having an issue with nginx writing to cephfs. Often I'm getting:

 writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted
 system call) while reading upstream

 looking with strace, this happens:

 ...
 write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted)

 It happens after first 4MBs (exactly) are written, subsequent write gets
 ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or
 64MBs, etc are written). Apparently nginx doesn't expect this and
 doesn't handle it so it cancels writes and deletes this partial file.

 Is it possible Ceph cannot find the destination PG fast enough and
 returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?
 
 That's...odd. Are you using the kernel client or ceph-fuse, and on
 which version?

Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1,
but it's the same. Ceph is firefly.

I'll also try fuse.

Regards,
Vedran

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Ilya Dryomov

On Thu, Jul 23, 2015 at 5:37 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 On 07/23/2015 04:19 PM, Ilya Dryomov wrote:
 On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 On 07/23/2015 03:20 PM, Gregory Farnum wrote:

 That's...odd. Are you using the kernel client or ceph-fuse, and on
 which version?

 Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1,
 but it's the same. Ceph is firefly.

 That's probably a wait_*() return value, meaning it timed out, so
 userspace logs might help understand what's going on.  A separate issue
 is that we leak ERESTARTSYS to userspace - this needs to be fixed.

 Hmm, what's the timeout value? This happens even when ceph is nearly
 idle. When you mention logs, do you mean Ceph server logs? MON logs
 don't have anything special, OSD logs are full of:

 2015-07-23 16:31:35.535622 7ff3fe020700  0 -- x.x.x.x:6849/27688 
 x.x.x.x:6841/27679 pipe(0x241e58c0 sd=183 :6849 s=2 pgs=1240 cs=127 l=0
 c=0x19855de0).fault with nothing to send, going to standby
 2015-07-23 16:31:42.492520 7ff401a53700  0 -- x.x.x.x:6849/27688 
 x.x.x.x:6841/27679 pipe(0x241e5080 sd=226 :6849 s=0 pgs=0 cs=0 l=0
 c=0x21b31860).accept connect_seq 128 vs existing 127 state standby
 2015-07-23 16:32:02.989102 7ff401851700  0 -- x.x.x.x:6849/27688 
 x.x.x.x:6854/27690 pipe(0x1916a680 sd=33 :43507 s=2 pgs=1366 cs=131 l=0
 c=0x177e8680).fault with nothing to send, going to standby
 2015-07-23 16:32:12.339357 7ff40144d700  0 -- x.x.x.x:6849/27688 
 x.x.x.x:6823/27279 pipe(0x241e7c80 sd=249 :6849 s=2 pgs=1246 cs=155 l=0
 c=0x16ea46e0).fault with nothing to send, going to standby
 2015-07-23 16:32:13.279426 7ff3fe828700  0 -- x.x.x.x:6849/27688 
 185.75.253.10:6810/9746 pipe(0x1c75e840 sd=72 :57221 s=2 pgs=1352 cs=149
 l=0 c=0x147cbde0).fault with nothing to send, going to standby
 2015-07-23 16:32:17.916440 7ff3fb3f4700  0 -- x.x.x.x:6849/27688 
 185.75.253.10:6810/9746 pipe(0x241e4000 sd=34 :6849 s=0 pgs=0 cs=0 l=0
 c=0x21b2e160).accept connect_seq 150 vs existing 149 state standby
 2015-07-23 16:32:22.922462 7ff40154e700  0 -- x.x.x.x:6849/27688 
 x.x.x.x:6823/27279 pipe(0x241e5e40 sd=216 :6849 s=0 pgs=0 cs=0 l=0
 c=0x10089b80).accept connect_seq 156 vs existing 155 state standby
 ...

Can you provide the full strace output?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-07-23 Thread Daniel Gryniewicz

I did some (non-ceph) work on these, and concluded that bcache was the best 
supported, most stable, and fastest. This was ~1 year ago, to take it with a 
grain of salt, but that's what I would recommend. 

Daniel 


- Original Message -

From: Dominik Zalewski dzalew...@optlink.net 
To: German Anders gand...@despegar.com 
Cc: ceph-users ceph-users@lists.ceph.com 
Sent: Wednesday, July 1, 2015 5:28:10 PM 
Subject: Re: [ceph-users] any recommendation of using EnhanceIO? 

Hi, 

I’ve asked same question last weeks or so (just search the mailing list 
archives for EnhanceIO :) and got some interesting answers. 

Looks like the project is pretty much dead since it was bought out by HGST. 
Even their website has some broken links in regards to EnhanceIO 

I’m keen to try flashcache or bcache (its been in the mainline kernel for some 
time) 

Dominik 




On 1 Jul 2015, at 21:13, German Anders  gand...@despegar.com  wrote: 

Hi cephers, 

Is anyone out there that implement enhanceIO in a production environment? any 
recommendation? any perf output to share with the diff between using it and 
not? 

Thanks in advance, 



German 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 





___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Ilya Dryomov

On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 On 07/23/2015 03:20 PM, Gregory Farnum wrote:
 On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 Hello,

 I'm having an issue with nginx writing to cephfs. Often I'm getting:

 writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted
 system call) while reading upstream

 looking with strace, this happens:

 ...
 write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted)

 It happens after first 4MBs (exactly) are written, subsequent write gets
 ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or
 64MBs, etc are written). Apparently nginx doesn't expect this and
 doesn't handle it so it cancels writes and deletes this partial file.

 Is it possible Ceph cannot find the destination PG fast enough and
 returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?

 That's...odd. Are you using the kernel client or ceph-fuse, and on
 which version?

 Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1,
 but it's the same. Ceph is firefly.

That's probably a wait_*() return value, meaning it timed out, so
userspace logs might help understand what's going on.  A separate issue
is that we leak ERESTARTSYS to userspace - this needs to be fixed.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

On 07/23/2015 04:19 PM, Ilya Dryomov wrote:
 On Thu, Jul 23, 2015 at 4:23 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 On 07/23/2015 03:20 PM, Gregory Farnum wrote:

 That's...odd. Are you using the kernel client or ceph-fuse, and on
 which version?

 Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1,
 but it's the same. Ceph is firefly.
 
 That's probably a wait_*() return value, meaning it timed out, so
 userspace logs might help understand what's going on.  A separate issue
 is that we leak ERESTARTSYS to userspace - this needs to be fixed.

Hmm, what's the timeout value? This happens even when ceph is nearly
idle. When you mention logs, do you mean Ceph server logs? MON logs
don't have anything special, OSD logs are full of:

2015-07-23 16:31:35.535622 7ff3fe020700  0 -- x.x.x.x:6849/27688 
x.x.x.x:6841/27679 pipe(0x241e58c0 sd=183 :6849 s=2 pgs=1240 cs=127 l=0
c=0x19855de0).fault with nothing to send, going to standby
2015-07-23 16:31:42.492520 7ff401a53700  0 -- x.x.x.x:6849/27688 
x.x.x.x:6841/27679 pipe(0x241e5080 sd=226 :6849 s=0 pgs=0 cs=0 l=0
c=0x21b31860).accept connect_seq 128 vs existing 127 state standby
2015-07-23 16:32:02.989102 7ff401851700  0 -- x.x.x.x:6849/27688 
x.x.x.x:6854/27690 pipe(0x1916a680 sd=33 :43507 s=2 pgs=1366 cs=131 l=0
c=0x177e8680).fault with nothing to send, going to standby
2015-07-23 16:32:12.339357 7ff40144d700  0 -- x.x.x.x:6849/27688 
x.x.x.x:6823/27279 pipe(0x241e7c80 sd=249 :6849 s=2 pgs=1246 cs=155 l=0
c=0x16ea46e0).fault with nothing to send, going to standby
2015-07-23 16:32:13.279426 7ff3fe828700  0 -- x.x.x.x:6849/27688 
185.75.253.10:6810/9746 pipe(0x1c75e840 sd=72 :57221 s=2 pgs=1352 cs=149
l=0 c=0x147cbde0).fault with nothing to send, going to standby
2015-07-23 16:32:17.916440 7ff3fb3f4700  0 -- x.x.x.x:6849/27688 
185.75.253.10:6810/9746 pipe(0x241e4000 sd=34 :6849 s=0 pgs=0 cs=0 l=0
c=0x21b2e160).accept connect_seq 150 vs existing 149 state standby
2015-07-23 16:32:22.922462 7ff40154e700  0 -- x.x.x.x:6849/27688 
x.x.x.x:6823/27279 pipe(0x241e5e40 sd=216 :6849 s=0 pgs=0 cs=0 l=0
c=0x10089b80).accept connect_seq 156 vs existing 155 state standby
...

Regards,
Vedran


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Eric Eastman

You may want to check your min_size value for your pools.  If it is
set to the pool size value, then the cluster will not do I/O if you
loose a chassis.

On Sun, Jul 5, 2015 at 11:04 PM, Mallikarjun Biradar
mallikarjuna.bira...@gmail.com wrote:
 Hi all,

 Setup details:
 Two storage enclosures each connected to 4 OSD nodes (Shared storage).
 Failure domain is Chassis (enclosure) level. Replication count is 2.
 Each host has allotted with 4 drives.

 I have active client IO running on cluster. (Random write profile with 4M
 block size  64 Queue depth).

 One of enclosure had power loss. So all OSD's from hosts that are connected
 to this enclosure went down as expected.

 But client IO got paused. After some time enclosure  hosts connected to it
 came up.
 And all OSD's on that hosts came up.

 Till this time, cluster was not serving IO. Once all hosts  OSD's
 pertaining to that enclosure came up, client IO resumed.


 Can anybody help me why cluster not serving IO during enclosure failure. OR
 its a bug?

 -Thanks  regards,
 Mallikarjun Biradar

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Tech Talk next week

2015-07-23 Thread Patrick McGarry

correct.


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph


On Tue, Jul 21, 2015 at 6:03 PM, Gregory Farnum g...@gregs42.com wrote:
 On Tue, Jul 21, 2015 at 6:09 PM, Patrick McGarry pmcga...@redhat.com wrote:
 Hey cephers,

 Just a reminder that the Ceph Tech Talk on CephFS that was scheduled
 for last month (and cancelled due to technical difficulties) has been
 rescheduled for this month's talk. It will be happening next Thurs at
 17:00 UTC (1p EST)

 So that's July 30, according to the website, right? :)
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

On 07/23/2015 04:45 PM, Ilya Dryomov wrote:
 
 Can you provide the full strace output?

This is pretty much the all the relevant part:

4118  open(/home/ceph/temp/45/45/5/154545, O_RDWR|O_CREAT|O_EXCL,
0600) = 377
4118  writev(377, [{\3\0\0\0\0..., 4096}, {\247\0\0\3\23..., 4096},
{\225\0\0\4\334..., 4096}, {\204\0\0\t\n..., 4096},
{9\0\0\v\322..., 4096}, ...], 33) = 135168
4118  readv(1206, [{\334\1\210C\315..., 4096}, {X\1\266\343\320...,
4096}, {\304\1\345k\226..., 4096}, {}\2\17\27\371..., 4096},
{\203\2:\0e..., 4096}, ...], 33) = 135168
4118  writev(377, [{\334\1\210C\315..., 4096}, {X\1\266\343\320...,
4096}, {\304\1\345k\226..., 4096}, {}\2\17\27\371..., 4096},
{\203\2:\0e..., 4096}, ...], 33) = 135168
4118  readv(1206, [{\206\0\0\1c..., 4096}, {\336\0\0\1\351...,
4096}, {\265\0\0\0\313..., 4096}, {K\0\0\1A..., 4096},
{\217\0\0\1l..., 4096}, ...], 33) = 135168
4118  writev(377, [{\206\0\0\1c..., 4096}, {\336\0\0\1\351...,
4096}, {\265\0\0\0\313..., 4096}, {K\0\0\1A..., 4096},
{\217\0\0\1l..., 4096}, ...], 33) = 135168
4118  readv(1206, [{\2\0\366\371\273..., 4096}, {\256\1\22\3015...,
4096}, {\252\1-\361\225..., 4096}, {{\1I\335\4..., 4096},
{V\1`{\303..., 4096}, ...], 33) = 135168
4118  writev(377, [{\2\0\366\371\273..., 4096}, {\256\1\22\3015...,
4096}, {\252\1-\361\225..., 4096}, {{\1I\335\4..., 4096},
{V\1`{\303..., 4096}, ...], 33) = 135168
4118  readv(1206, [{O\\U\377\210..., 4096}, {\354 Gww..., 4096},
{\356\357|\317\250..., 4096}, {\272J\231\222E..., 4096},
{w\35W\213\277..., 4096}, ...], 33) = 135168
4118  writev(377, [{O\\U\377\210..., 4096}, {\354 Gww..., 4096},
{\356\357|\317\250..., 4096}, {\272J\231\222E..., 4096},
{w\35W\213\277..., 4096}, ...], 33) = 135168
4118  readv(1206, [{O\30\256|\350..., 4096}, {\316f\21|..., 4096},
{\346\330\354YU..., 4096}, {\257{R\5\16..., 4096}, {_C\n\21w...,
4096}, ...], 33) = 135168
4118  writev(377, [{O\30\256|\350..., 4096}, {\316f\21|..., 4096},
{\346\330\354YU..., 4096}, {\257{R\5\16..., 4096}, {_C\n\21w...,
4096}, ...], 33) = 135168
4118  readv(1206, [{\233p\217\356[..., 4096}, {m\264\323F\7...,
4096}, {q\5\362/\21..., 4096}, {\262\353z(\251..., 4096},
{of\365\245U..., 4096}, ...], 33) = 135168
4118  writev(377, [{\233p\217\356[..., 4096}, {m\264\323F\7...,
4096}, {q\5\362/\21..., 4096}, {\262\353z(\251..., 4096},
{of\365\245U..., 4096}, ...], 33) = 135168
4118  readv(1206, [{\257\3335X\300..., 4096}, {\207\37BW\252...,
4096}, {U\331a)..., 4096}, {\323\33i\256`..., 4096},
{\271m\356\]..., 4096}, ...], 33) = 135168
4118  writev(377, [{\257\3335X\300..., 4096}, {\207\37BW\252...,
4096}, {U\331a)..., 4096}, {\323\33i\256`..., 4096},
{\271m\356\]..., 4096}, ...], 33) = 135168
4118  readv(1206, [{b\\\337Y\240..., 4096}, {\233\r\326o\372...,
4096}, {\346(.\32\252..., 4096}, {\252FpJW..., 4096},
{\3648\237\220\352..., 4096}, ...], 33) = 135168
4118  writev(377, [{b\\\337Y\240..., 4096}, {\233\r\326o\372...,
4096}, {\346(.\32\252..., 4096}, {\252FpJW..., 4096},
{\3648\237\220\352..., 4096}, ...], 33) = 135168
4118  readv(1206, [{\376\375\257'\310..., 4096}, {\352\256R\342...,
4096}, {\361\340\342Rq..., 4096}, {|7 \3017..., 4096},
{\224\256\356\353\312..., 4096}, ...], 33) = 135168
4118  writev(377, [{\376\375\257'\310..., 4096}, {\352\256R\342...,
4096}, {\361\340\342Rq..., 4096}, {|7 \3017..., 4096},
{\224\256\356\353\312..., 4096}, ...], 33) = 135168
4118  readv(1206, [{}y\\vJ..., 4096}, {0$\v\6\2..., 4096},
{\2135\357zy..., 4096}, {{\343N\352\215..., 4096},
{\347\321x\352\272..., 4096}, ...], 33) = 135168
4118  writev(377, [{}y\\vJ..., 4096}, {0$\v\6\2..., 4096},
{\2135\357zy..., 4096}, {{\343N\352\215..., 4096},
{\347\321x\352\272..., 4096}, ...], 33) = 135168
4118  readv(1206, [{\v\6\2\301\200..., 4096}, {C\276\232\207\210...,
4096}, {\21\0006\262\255..., 4096}, {\224\222\n\276{..., 4096},
{Ys\337w\357..., 4096}, ...], 33) = 135168
4118  writev(377, [{\v\6\2\301\200..., 4096}, {C\276\232\207\210...,
4096}, {\21\0006\262\255..., 4096}, {\224\222\n\276{..., 4096},
{Ys\337w\357..., 4096}, ...], 33) = 135168
4118  readv(1206, [{6Y\236W\345..., 4096}, {Q\207uu\252..., 4096},
{\32\346]\313i..., 4096}, {n\356\\-\336..., 4096}, {{y~]\247...,
4096}, ...], 33) = 135168
4118  writev(377, [{6Y\236W\345..., 4096}, {Q\207uu\252..., 4096},
{\32\346]\313i..., 4096}, {n\356\\-\336..., 4096}, {{y~]\247...,
4096}, ...], 33) = 135168
4118  readv(1206, [{H0\337\275\302..., 4096}, {g\177\225\316\333...,
4096}, {\364\212\374X\360..., 4096}, {\337\260\226XL..., 4096},
{Y\356\360\301r..., 4096}, ...], 33) = 135168
4118  writev(377, [{H0\337\275\302..., 4096}, {g\177\225\316\333...,
4096}, {\364\212\374X\360..., 4096}, {\337\260\226XL..., 4096},
{Y\356\360\301r..., 4096}, ...], 33) = 135168
4118  readv(1206, [{_'\255\374v..., 4096}, {\271\231/II..., 4096},
{\277]\274\200\253..., 4096}, {'\3Qe\244..., 4096},
{\341\361\210h\363..., 4096}, ...], 33) = 135168
4118  writev(377, [{_'\255\374v..., 4096}, {\271\231/II..., 4096},
{\277]\274\200\253..., 4096}, {'\3Qe\244..., 4096},

[ceph-users] Fw: Ceph problem

2015-07-23 Thread Dan Mick

From: Aaron fjw6...@163.com
Sent: Jul 23, 2015 6:39 AM
To: dan.m...@inktank.com
Subject: Ceph problem

hello，
I am a user of ceph, I'm from china
I have two problem on ceph, I need your help

 import boto
 import boto.s3.connection
 access_key = '2EOCDA99UCZQFA1CQRCM'
 secret_key = 'avxcywxBPMtiDriwBTOk+cO1zrBikHqoSB0GUtqV'
 conn = boto.connect_s3(
... aws_access_key_id = access_key,
... aws_secret_access_key = secret_key,
... host = 'localhost',
... calling_format = boto.s3.connection.OrdinaryCallingFormat(),)
 b=conn.list_all_buckets()[0]
 list(b.list())
[Key: my-new-bucket,1/123.txt, Key: my-new-bucket,1234.txt, Key: 
my-new-bucket,2.txt, Key: my-new-bucket,3.txt, Key: 
my-new-bucket,N01/hello.txt, Key: my-new-bucket,aaa, Key: 
my-new-bucket,hello]

problem 1 : some error show after I run this command 

 b.get_website_configuration()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python2.7/site-packages/boto/s3/bucket.py, line 1480, in 
get_website_configuration
return self.get_website_configuration_with_xml(headers)[0]
  File /usr/lib/python2.7/site-packages/boto/s3/bucket.py, line 1519, in 
get_website_configuration_with_xml
body = self.get_website_configuration_xml(headers=headers)
  File /usr/lib/python2.7/site-packages/boto/s3/bucket.py, line 1534, in 
get_website_configuration_xml
response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
?xml version=1.0 
encoding=UTF-8?ErrorCodeSignatureDoesNotMatch/Code/Error

problem 2 : I need an url start with N07 , but possible Bucket name can't start 
with upper-case , is there some method let me use a name with N07 ,or can I use 
name n07 and URL start with N07 ? means URL different with name 

 conn.create_bucket(N07)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python2.7/site-packages/boto/s3/connection.py, line 599, in 
create_bucket
check_lowercase_bucketname(bucket_name)
  File /usr/lib/python2.7/site-packages/boto/s3/connection.py, line 59, in 
check_lowercase_bucketname
raise BotoClientError(Bucket names cannot contain upper-case  \
boto.exception.BotoClientError: BotoClientError: Bucket names cannot contain 
upper-case characters when using either the sub-domain or virtual hosting 
calling format.

Thank you very much.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Ilya Dryomov

On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 4118  writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257...,
 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096},
 {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted)
 4118  --- SIGALRM (Alarm clock) @ 0 (0) ---
 4118  rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
 4118  gettid()  = 4118
 4118  write(4, 2015/..., 520) = 520
 4118  close(1206)   = 0
 4118  unlink(/home/ceph/temp/45/45/5/154545) = 0

Sorry, I misread your original email and missed the nginx part
entirely.  Looks like Zheng, who commented on IRC, was right:

the ERESTARTSYS is likely caused by some timeout mechanism in nginx
signal handler for SIGALARM does not want to restart the write syscall

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

On 07/23/2015 05:25 PM, Ilya Dryomov wrote:
 On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 4118  writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257...,
 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096},
 {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted)
 4118  --- SIGALRM (Alarm clock) @ 0 (0) ---
 4118  rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
 4118  gettid()  = 4118
 4118  write(4, 2015/..., 520) = 520
 4118  close(1206)   = 0
 4118  unlink(/home/ceph/temp/45/45/5/154545) = 0
 
 Sorry, I misread your original email and missed the nginx part
 entirely.  Looks like Zheng, who commented on IRC, was right:
 
 the ERESTARTSYS is likely caused by some timeout mechanism in nginx
 signal handler for SIGALARM does not want to restart the write syscall

Knowing that this might be an nginx issues as well, I've asked the same
thing on their mailing list in parallel, their response was:

It more looks like a bug in cephfs.  writev() should never return
ERESTARTSYS.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs going inconsistent after stopping the primary

2015-07-23 Thread Samuel Just

Oh, if you were running dev releases, it's not super surprising that the stat 
tracking was at some point buggy.
-Sam

- Original Message -
From: Dan van der Ster d...@vanderster.com
To: Samuel Just sj...@redhat.com
Cc: ceph-users@lists.ceph.com
Sent: Thursday, July 23, 2015 8:21:07 AM
Subject: Re: [ceph-users] PGs going inconsistent after stopping the primary

Those pools were a few things: rgw.buckets plus a couple pools we use
for developing new librados clients. But the source of this issue is
likely related to the few pre-hammer development releases (and
crashes) we upgraded through whilst running a large scale test.
Anyway, now I'll know how to better debug this in future so we'll let
you know if it reoccurs.
Cheers, Dan

On Wed, Jul 22, 2015 at 9:42 PM, Samuel Just sj...@redhat.com wrote:
 Annoying that we don't know what caused the replica's stat structure to get 
 out of sync.  Let us know if you see it recur.  What were those pools used 
 for?
 -Sam

 - Original Message -
 From: Dan van der Ster d...@vanderster.com
 To: Samuel Just sj...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Wednesday, July 22, 2015 12:36:53 PM
 Subject: Re: [ceph-users] PGs going inconsistent after stopping the primary

 Cool, writing some objects to the affected PGs has stopped the
 consistent/inconsistent cycle. I'll keep an eye on them but this seems
 to have fixed the problem.
 Thanks!!
 Dan

 On Wed, Jul 22, 2015 at 6:07 PM, Samuel Just sj...@redhat.com wrote:
 Looks like it's just a stat error.  The primary appears to have the correct 
 stats, but the replica for some reason doesn't (thinks there's an object for 
 some reason).  I bet it clears itself it you perform a write on the pg since 
 the primary will send over its stats.  We'd need information from when the 
 stat error originally occurred to debug further.
 -Sam

 - Original Message -
 From: Dan van der Ster d...@vanderster.com
 To: ceph-users@lists.ceph.com
 Sent: Wednesday, July 22, 2015 7:49:00 AM
 Subject: [ceph-users] PGs going inconsistent after stopping the primary

 Hi Ceph community,

 Env: hammer 0.94.2, Scientific Linux 6.6, kernel 2.6.32-431.5.1.el6.x86_64

 We wanted to post here before the tracker to see if someone else has
 had this problem.

 We have a few PGs (different pools) which get marked inconsistent when
 we stop the primary OSD. The problem is strange because once we
 restart the primary, then scrub the PG, the PG is marked active+clean.
 But inevitably next time we stop the primary OSD, the same PG is
 marked inconsistent again.

 There is no user activity on this PG, and nothing interesting is
 logged in any of the 2nd/3rd OSDs (with debug_osd=20, the first line
 mentioning the PG already says inactive+inconsistent).

 We suspect this is related to garbage files left in the PG folder. One
 of our PGs is acting basically like above, except it goes through this
 cycle: active+clean - (deep-scrub) - active+clean+inconsistent -
 (repair) - active+clean - (restart primary OSD) - (deep-scrub) -
 active+clean+inconsistent. This one at least logs:

 2015-07-22 16:42:41.821326 osd.303 [INF] 55.10d deep-scrub starts
 2015-07-22 16:42:41.823834 osd.303 [ERR] 55.10d deep-scrub stat
 mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty, 0/0 omap, 0/0
 hit_set_archive, 0/0 whiteouts, 0/0 bytes,0/0 hit_set_archive bytes.
 2015-07-22 16:42:41.823842 osd.303 [ERR] 55.10d deep-scrub 1 errors

 and this should be debuggable because there is only one object in the pool:

 tapetest   55   0 073575G   1

 even though rados ls returns no objects:

 # rados ls -p tapetest
 #

 Any ideas?

 Cheers, Dan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Eino Tuominen

Hi,

That looks like a bug, ERESTARTSYS is not a valid error condition for write(). 

http://pubs.opengroup.org/onlinepubs/9699919799/

-- 
  Eino Tuominen

 Vedran Furač vedran.fu...@gmail.com kirjoitti 23.7.2015 kello 15.18:
 
 Hello,
 
 I'm having an issue with nginx writing to cephfs. Often I'm getting:
 
 writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted
 system call) while reading upstream
 
 looking with strace, this happens:
 
 ...
 write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted)
 
 It happens after first 4MBs (exactly) are written, subsequent write gets
 ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or
 64MBs, etc are written). Apparently nginx doesn't expect this and
 doesn't handle it so it cancels writes and deletes this partial file.
 
 Is it possible Ceph cannot find the destination PG fast enough and
 returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?
 
 Regards,
 Vedran
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs going inconsistent after stopping the primary

2015-07-23 Thread Dan van der Ster

Those pools were a few things: rgw.buckets plus a couple pools we use
for developing new librados clients. But the source of this issue is
likely related to the few pre-hammer development releases (and
crashes) we upgraded through whilst running a large scale test.
Anyway, now I'll know how to better debug this in future so we'll let
you know if it reoccurs.
Cheers, Dan

On Wed, Jul 22, 2015 at 9:42 PM, Samuel Just sj...@redhat.com wrote:
 Annoying that we don't know what caused the replica's stat structure to get 
 out of sync.  Let us know if you see it recur.  What were those pools used 
 for?
 -Sam

 - Original Message -
 From: Dan van der Ster d...@vanderster.com
 To: Samuel Just sj...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Wednesday, July 22, 2015 12:36:53 PM
 Subject: Re: [ceph-users] PGs going inconsistent after stopping the primary

 Cool, writing some objects to the affected PGs has stopped the
 consistent/inconsistent cycle. I'll keep an eye on them but this seems
 to have fixed the problem.
 Thanks!!
 Dan

 On Wed, Jul 22, 2015 at 6:07 PM, Samuel Just sj...@redhat.com wrote:
 Looks like it's just a stat error.  The primary appears to have the correct 
 stats, but the replica for some reason doesn't (thinks there's an object for 
 some reason).  I bet it clears itself it you perform a write on the pg since 
 the primary will send over its stats.  We'd need information from when the 
 stat error originally occurred to debug further.
 -Sam

 - Original Message -
 From: Dan van der Ster d...@vanderster.com
 To: ceph-users@lists.ceph.com
 Sent: Wednesday, July 22, 2015 7:49:00 AM
 Subject: [ceph-users] PGs going inconsistent after stopping the primary

 Hi Ceph community,

 Env: hammer 0.94.2, Scientific Linux 6.6, kernel 2.6.32-431.5.1.el6.x86_64

 We wanted to post here before the tracker to see if someone else has
 had this problem.

 We have a few PGs (different pools) which get marked inconsistent when
 we stop the primary OSD. The problem is strange because once we
 restart the primary, then scrub the PG, the PG is marked active+clean.
 But inevitably next time we stop the primary OSD, the same PG is
 marked inconsistent again.

 There is no user activity on this PG, and nothing interesting is
 logged in any of the 2nd/3rd OSDs (with debug_osd=20, the first line
 mentioning the PG already says inactive+inconsistent).


 We suspect this is related to garbage files left in the PG folder. One
 of our PGs is acting basically like above, except it goes through this
 cycle: active+clean - (deep-scrub) - active+clean+inconsistent -
 (repair) - active+clean - (restart primary OSD) - (deep-scrub) -
 active+clean+inconsistent. This one at least logs:

 2015-07-22 16:42:41.821326 osd.303 [INF] 55.10d deep-scrub starts
 2015-07-22 16:42:41.823834 osd.303 [ERR] 55.10d deep-scrub stat
 mismatch, got 0/1 objects, 0/0 clones, 0/1 dirty, 0/0 omap, 0/0
 hit_set_archive, 0/0 whiteouts, 0/0 bytes,0/0 hit_set_archive bytes.
 2015-07-22 16:42:41.823842 osd.303 [ERR] 55.10d deep-scrub 1 errors

 and this should be debuggable because there is only one object in the pool:

 tapetest   55   0 073575G   1

 even though rados ls returns no objects:

 # rados ls -p tapetest
 #

 Any ideas?

 Cheers, Dan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD

2015-07-23 Thread Haomai Wang

Do you use upstream ceph version previously? Or do you shutdown
running ceph-osd when upgrading osd?

How many osds meet this problems?

This assert failure means that osd detects a upgraded pg meta object
but failed to read(or lack of 1 key) meta keys from object.

On Thu, Jul 23, 2015 at 7:03 PM, Udo Lembke ulem...@polarzone.de wrote:
 Am 21.07.2015 12:06, schrieb Udo Lembke:
 Hi all,
 ...

 Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph 
 and I'm back again... but this looks bad
 for me.
 Unfortunality the system also don't start 9 OSDs as I switched back to the 
 old system-disk... (only three of the big
 OSDs are running well)

 What is the best solution for that? Empty one node (crush weight 0), fresh 
 reinstall OS/ceph, reinitialise all OSDs?
 This will take a long long time, because we use 173TB in this cluster...



 Hi,
 answer myself if anybody has similiar issues and find the posting.

 Empty the whole nodes takes too long.
 I used the puppet wheezy system and have to recreate all OSDs (in this case I 
 need to empty the first blocks of the
 journal before create the OSD again).


 Udo
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-23 Thread Christian Balzer

On Thu, 23 Jul 2015 11:14:22 +0100 Gregory Farnum wrote:

 Your note that dd can do 2GB/s without networking makes me think that
 you should explore that. As you say, network interrupts can be
 problematic in some systems. The only thing I can think of that's been
 really bad in the past is that some systems process all network
 interrupts on cpu 0, and you probably want to make sure that it's
 splitting them across CPUs.


An IRQ overload would be very visible with atop.

Splitting the IRQs will help, but it is likely to need some smarts.

As in, irqbalance may spread things across NUMA nodes.

A card with just one IRQ line will need RPS (Receive Packet Steering),
irqbalance can't help it.

For example, I have a compute node with such a single line card and Quad
Opterons (64 cores, 8 NUMA nodes).

The default is all interrupt handling on CPU0 and that is very little,
except for eth2. So this gets a special treatment:
---
echo 4 /proc/irq/106/smp_affinity_list
---
Pinning the IRQ for eth2 to CPU 4 by default

---
echo f0  /sys/class/net/eth2/queues/rx-0/rps_cpus
---
giving RPS CPUs 4-7 to work with. At peak times it needs more than 2
cores, otherwise with this architecture just using 4 and 5 (same L2 cache)
would be better.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] el6 repo problem?

2015-07-23 Thread Wayne Betts

I'm trying to use the ceph el6 yum repo.  Yesterday afternoon, I found 
yum complain about 8 packages when trying to install or update ceph, 
such as this:


(4/46): ceph-0.94.2-0.el6.x86_64.rpm 


  |  21 MB 00:01
http://ceph.com/rpm-hammer/el6/x86_64/ceph-0.94.2-0.el6.x86_64.rpm: 
[Errno -1] Package does not match intended download. Suggestion: run yum 
--enablerepo=Ceph clean metadata



The other packages with the same fault are

libcephfs1-0.94.2-0.el6.x86_64
librbd1-0.94.2-0.el6.x86_64
python-rados-0.94.2-0.el6.x86_64
python-cephfs-0.94.2-0.el6.x86_64
librados2-0.94.2-0.el6.x86_64
python-rbd-0.94.2-0.el6.x86_64
ceph-common-0.94.2-0.el6.x86_64


This is happening on all three machines I've tried it on.  I've tried 
cleaning the metadata on my hosts as per the suggestion, without any 
change.


There was no trouble pulling these packages with yum on Tuesday, July 
14, and I can still use wget to pull individual packages seemingly 
without any problem.


Is anyone else experiencing the same problem?  Can the repo maintainers 
look into this?  (Rebuild the metadata, or flush their reverse proxy 
server(s) if any?)


Any suggestions for me to try on the client side?

--
-Wayne Betts
STAR Computing Support at BNL

Physics Dept.
PO Box 5000
Upton, NY 11973

wbe...@bnl.gov
631-344-3285
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-23 Thread Mallikarjun Biradar

Hi all,

Setup details:
Two storage enclosures each connected to 4 OSD nodes (Shared storage).
Failure domain is Chassis (enclosure) level. Replication count is 2.
Each host has allotted with 4 drives.

I have active client IO running on cluster. (Random write profile with
4M block size  64 Queue depth).

One of enclosure had power loss. So all OSD's from hosts that are
connected to this enclosure went down as expected.

But client IO got paused. After some time enclosure  hosts connected
to it came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts  OSD's
pertaining to that enclosure came up, client IO resumed.


Can anybody help me why cluster not serving IO during enclosure
failure. OR its a bug?

-Thanks  regards,
Mallikarjun Biradar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Issue in communication of swift client and radosgw

2015-07-23 Thread Bindu Kharb

Hi,

Please reply...

Regards,
Bindu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

On 07/23/2015 03:20 PM, Gregory Farnum wrote:
 On Thu, Jul 23, 2015 at 1:17 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 Hello,

 I'm having an issue with nginx writing to cephfs. Often I'm getting:

 writev() /home/ceph/temp/44/94/1/119444 failed (4: Interrupted
 system call) while reading upstream

 looking with strace, this happens:

 ...
 write(65, e\314\366\36\302..., 65536) = ? ERESTARTSYS (To be restarted)

 It happens after first 4MBs (exactly) are written, subsequent write gets
 ERESTARTSYS (sometimes, but more rarely, it fails after first 32 or
 64MBs, etc are written). Apparently nginx doesn't expect this and
 doesn't handle it so it cancels writes and deletes this partial file.

 Is it possible Ceph cannot find the destination PG fast enough and
 returns ERESTARTSYS? Is there any way to fix this behavior or reduce it?
 
 That's...odd. Are you using the kernel client or ceph-fuse, and on
 which version?

Sorry, forgot to mention, it's kernel client, tried both 3.10 and 4.1,
but it's the same. Ceph is firefly.

I'll also try fuse.

Regards,
Vedran


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] When setting up cache tiering, can i set a quota on the cache pool?

2015-07-23 Thread runsisi

Hi, guys.

These days we are testing the ceph cache tiering, it seems that the cache 
tiering agent does not honor the quota setting on the cache
pool, which means that if we have set a smaller quota size on the cache pool 
than target_max_bytes * cache_target_dirty_ratio or so,  
the cache tiering agent won't flush or evict objects even the cached bytes 
reaches the quota size. Eventually results the write operation
fail with a no space error.

Here comes the question:
Is this behavior implemented by design? Or should we never set a quota on the 
cache pool?

Thank you every much:)


runsisi AT hust.edu.cn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph Day Speakers (Chicago, Raleigh)

2015-07-23 Thread Patrick McGarry

Hey cephers,

Since Ceph Days for both Chicago and Raleigh are fast approaching, I
wanted to put another call out on the mailing lists for anyone who
might be interested in sharing their Ceph experiences with the
community at either location. If you have something to share
(integration, use case, performance, hardware tuning, etc) please let
me know ASAP. Thanks!

http://ceph.com/cephdays



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Best method to limit snapshot/clone space overhead

2015-07-23 Thread Jan Schermer

Hi all,
I am looking for a way to alleviate the overhead of RBD snapshots/clones for 
some time.

In our scenario there are a few “master” volumes that contain production data, 
and are frequently snapshotted and cloned for dev/qa use. Those 
snapshots/clones live for a few days to a few weeks before they get dropped, 
and they sometimes grow very fast (databases, etc.).

With the default 4MB object size there seems to be huge overhead involved with 
this, could someone give me some hints on how to solve that?

I have some hope in

1) FIEMAP
I’ve calculated that files on my OSDs are approx. 30% filled with NULLs - I 
suppose this is what it could save (best-scenario) and it should also make COW 
operations much faster.
But there are lots of bugs in FIEMAP in kernels (i saw some reference to CentOS 
6.5 kernel being buggy - which is what we use) and filesystems (like XFS). No 
idea about ext4 which we’d like to use in the future.

Is enabling FIEMAP a good idea at all? I saw some mention of it being replaced 
with SEEK_DATA and SEEK_HOLE.

2) object size  4MB for clones
I did some quick performance testing and setting this lower for production is 
probably not a good idea. My sweet spot is 8MB object size, however this would 
make the overhead for clones even worse than it already is.
But I could make the cloned images with a different block size from the 
snapshot (at least according to docs). Does someone use it like that? Any 
caveats? That way I could have the production data with 8MB block size but make 
the development snapshots with for example 64KiB granularity, probably at 
expense of some performance, but most of the data would remain in the (faster) 
master snapshot anyway. This should drop overhead tremendously, maybe even more 
than neabling FIEMAP. (Even better when working in tandem I suppose?)

Your thoughts?

Thanks

Jan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Vedran Furač

On 07/23/2015 06:47 PM, Ilya Dryomov wrote:
 
 To me this looks like a writev() interrupted by a SIGALRM.  I think
 nginx guys read your original email the same way I did, which is write
 syscall *returned* ERESTARTSYS, but I'm pretty sure that is not the
 case here.
 
 ERESTARTSYS shows up in strace output but it is handled by the kernel,
 userpace doesn't see it (but strace has to be able to see it, otherwise
 you wouldn't know if your system call has been restarted or not).
 
 You cut the output short - I asked for the entire output for a reason,
 please paste it somewhere.

Might be, however I don't know why would be nginx interrupting it, all
writes are done pretty fast and timeouts are set to 10 minutes. Here are
2 examples on 2 servers with slightly different configs (timestams
included):

http://pastebin.com/wUAAcdT7

http://pastebin.com/wHyWc9U5


Thanks,
Vedran


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Eino Tuominen

Ah, I made the same mistake... Sorry for the noise.

-- 
  Eino Tuominen

 Ilya Dryomov idryo...@gmail.com kirjoitti 23.7.2015 kello 19.47:
 
 On Thu, Jul 23, 2015 at 6:28 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 On 07/23/2015 05:25 PM, Ilya Dryomov wrote:
 On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com 
 wrote:
 4118  writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257...,
 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096},
 {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted)
 4118  --- SIGALRM (Alarm clock) @ 0 (0) ---
 4118  rt_sigreturn(0xe) = -1 EINTR (Interrupted system 
 call)
 4118  gettid()  = 4118
 4118  write(4, 2015/..., 520) = 520
 4118  close(1206)   = 0
 4118  unlink(/home/ceph/temp/45/45/5/154545) = 0
 
 Sorry, I misread your original email and missed the nginx part
 entirely.  Looks like Zheng, who commented on IRC, was right:
 
 the ERESTARTSYS is likely caused by some timeout mechanism in nginx
 signal handler for SIGALARM does not want to restart the write syscall
 
 Knowing that this might be an nginx issues as well, I've asked the same
 thing on their mailing list in parallel, their response was:
 
 It more looks like a bug in cephfs.  writev() should never return
 ERESTARTSYS.
 
 To me this looks like a writev() interrupted by a SIGALRM.  I think
 nginx guys read your original email the same way I did, which is write
 syscall *returned* ERESTARTSYS, but I'm pretty sure that is not the
 case here.
 
 ERESTARTSYS shows up in strace output but it is handled by the kernel,
 userpace doesn't see it (but strace has to be able to see it, otherwise
 you wouldn't know if your system call has been restarted or not).
 
 You cut the output short - I asked for the entire output for a reason,
 please paste it somewhere.
 
 Thanks,
 
Ilya
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD Connections with Public and Cluster Networks

2015-07-23 Thread Brian Felton

Greetings,

I am working on standing up a fresh Ceph object storage cluster and have some 
questions about what I should be seeing as far as inter-OSD connectivity.  I 
have spun up my monitor and radosgw nodes as VMs, all running on a 
192.168.10.0/24 network (all IP ranges have been changed to protect the 
innocent).  I also have four physical servers to serve as my storage nodes.  
These physical servers have two bonded interfaces, one IP'd on the public 
network (192.168.10.0/24) and one IP'd on a cluster/private network 
(172.22.20.0/24).  I have added the following lines to my ceph.conf:

public network = 192.168.10.0/24
cluster network = 172.22.20.0/24

So far, so good.  Everything in the cluster starts up, all OSDs are up and in, 
and 'ceph -s' shows a happy, healthy cluster.  I can create users and otherwise 
do all the normal things that one would expect.

So why am I writing?

When looking at my network connections on my storage servers, I was expecting 
to see a few OSD - Mon/RGW connections on the public network, then a much 
larger number of OSD - OSD connections on the cluster network.  What I 
actually see is an equal number of connections between OSDs on both the public 
and cluster networks (in addition to OSD - Mon/RGW connections).  My question 
- is this normal?  If so, can someone explain what traffic is moving between 
OSDs on the public network?

Based on some additional testing (read: bringing down the cluster interface), 
this causes all OSDs on that node to be marked down, so there's evidence to 
support all heartbeat traffic moving over the interface.  I just want to ensure 
that what I'm seeing is normal and that I haven't otherwise botched the 
configuration.

Many thanks,

Brian Felton
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-mon cpu usage

2015-07-23 Thread Luis Periquito

Hi Greg,

I've been looking at the tcmalloc issues, but did seem to affect osd's, and
I do notice it in heavy read workloads (even after the patch and
increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This is
affecting the mon process though.

looking at perf top I'm getting most of the CPU usage in mutex lock/unlock
  5.02%  libpthread-2.19.so[.] pthread_mutex_unlock
  3.82%  libsoftokn3.so[.] 0x0001e7cb
  3.46%  libpthread-2.19.so[.] pthread_mutex_lock

I could try to use jemalloc, are you aware of any built binaries? Can I mix
a cluster with different malloc binaries?


On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum g...@gregs42.com wrote:

 On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito periqu...@gmail.com
 wrote:
  The ceph-mon is already taking a lot of memory, and I ran a heap stats
  
  MALLOC:   32391696 (   30.9 MiB) Bytes in use by application
  MALLOC: +  27597135872 (26318.7 MiB) Bytes in page heap freelist
  MALLOC: + 16598552 (   15.8 MiB) Bytes in central cache freelist
  MALLOC: + 14693536 (   14.0 MiB) Bytes in transfer cache freelist
  MALLOC: + 17441592 (   16.6 MiB) Bytes in thread cache freelists
  MALLOC: +116387992 (  111.0 MiB) Bytes in malloc metadata
  MALLOC:   
  MALLOC: =  27794649240 (26507.0 MiB) Actual memory used (physical + swap)
  MALLOC: + 26116096 (   24.9 MiB) Bytes released to OS (aka unmapped)
  MALLOC:   
  MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
  MALLOC:
  MALLOC:   5683  Spans in use
  MALLOC: 21  Thread heaps in use
  MALLOC:   8192  Tcmalloc page size
  
 
  after that I ran the heap release and it went back to normal.
  
  MALLOC:   22919616 (   21.9 MiB) Bytes in use by application
  MALLOC: +  4792320 (4.6 MiB) Bytes in page heap freelist
  MALLOC: + 18743448 (   17.9 MiB) Bytes in central cache freelist
  MALLOC: + 20645776 (   19.7 MiB) Bytes in transfer cache freelist
  MALLOC: + 18456088 (   17.6 MiB) Bytes in thread cache freelists
  MALLOC: +116387992 (  111.0 MiB) Bytes in malloc metadata
  MALLOC:   
  MALLOC: =201945240 (  192.6 MiB) Actual memory used (physical + swap)
  MALLOC: + 27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped)
  MALLOC:   
  MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
  MALLOC:
  MALLOC:   5639  Spans in use
  MALLOC: 29  Thread heaps in use
  MALLOC:   8192  Tcmalloc page size
  
 
  So it just seems the monitor is not returning unused memory into the OS
 or
  reusing already allocated memory it deems as free...

 Yep. This is a bug (best we can tell) in some versions of tcmalloc
 combined with certain distribution stacks, although I don't think
 we've seen it reported on Trusty (nor on a tcmalloc distribution that
 new) before. Alternatively some folks are seeing tcmalloc use up lots
 of CPU in other scenarios involving memory return and it may manifest
 like this, but I'm not sure. You could look through the mailing list
 for information on it.
 -Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and ERESTARTSYS on writes

2015-07-23 Thread Ilya Dryomov

On Thu, Jul 23, 2015 at 6:28 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 On 07/23/2015 05:25 PM, Ilya Dryomov wrote:
 On Thu, Jul 23, 2015 at 6:02 PM, Vedran Furač vedran.fu...@gmail.com wrote:
 4118  writev(377, [{\5\356\307l\361..., 4096}, {\337\261\17\257...,
 4096}, {\211;s\310..., 4096}, {\370N\372:\252..., 4096},
 {\202\311/\347\260..., 4096}, ...], 33) = ? ERESTARTSYS (To be restarted)
 4118  --- SIGALRM (Alarm clock) @ 0 (0) ---
 4118  rt_sigreturn(0xe) = -1 EINTR (Interrupted system call)
 4118  gettid()  = 4118
 4118  write(4, 2015/..., 520) = 520
 4118  close(1206)   = 0
 4118  unlink(/home/ceph/temp/45/45/5/154545) = 0

 Sorry, I misread your original email and missed the nginx part
 entirely.  Looks like Zheng, who commented on IRC, was right:

 the ERESTARTSYS is likely caused by some timeout mechanism in nginx
 signal handler for SIGALARM does not want to restart the write syscall

 Knowing that this might be an nginx issues as well, I've asked the same
 thing on their mailing list in parallel, their response was:

 It more looks like a bug in cephfs.  writev() should never return
 ERESTARTSYS.

To me this looks like a writev() interrupted by a SIGALRM.  I think
nginx guys read your original email the same way I did, which is write
syscall *returned* ERESTARTSYS, but I'm pretty sure that is not the
case here.

ERESTARTSYS shows up in strace output but it is handled by the kernel,
userpace doesn't see it (but strace has to be able to see it, otherwise
you wouldn't know if your system call has been restarted or not).

You cut the output short - I asked for the entire output for a reason,
please paste it somewhere.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-mon cpu usage

2015-07-23 Thread Luis Periquito

The ceph-mon is already taking a lot of memory, and I ran a heap stats

MALLOC:   32391696 (   30.9 MiB) Bytes in use by application
MALLOC: +  27597135872 (26318.7 MiB) Bytes in page heap freelist
MALLOC: + 16598552 (   15.8 MiB) Bytes in central cache freelist
MALLOC: + 14693536 (   14.0 MiB) Bytes in transfer cache freelist
MALLOC: + 17441592 (   16.6 MiB) Bytes in thread cache freelists
MALLOC: +116387992 (  111.0 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =  27794649240 (26507.0 MiB) Actual memory used (physical + swap)
MALLOC: + 26116096 (   24.9 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
MALLOC:
MALLOC:   5683  Spans in use
MALLOC: 21  Thread heaps in use
MALLOC:   8192  Tcmalloc page size


after that I ran the heap release and it went back to normal.

MALLOC:   22919616 (   21.9 MiB) Bytes in use by application
MALLOC: +  4792320 (4.6 MiB) Bytes in page heap freelist
MALLOC: + 18743448 (   17.9 MiB) Bytes in central cache freelist
MALLOC: + 20645776 (   19.7 MiB) Bytes in transfer cache freelist
MALLOC: + 18456088 (   17.6 MiB) Bytes in thread cache freelists
MALLOC: +116387992 (  111.0 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =201945240 (  192.6 MiB) Actual memory used (physical + swap)
MALLOC: +  27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
MALLOC:
MALLOC:   5639  Spans in use
MALLOC: 29  Thread heaps in use
MALLOC:   8192  Tcmalloc page size


So it just seems the monitor is not returning unused memory into the OS or
reusing already allocated memory it deems as free...


On Wed, Jul 22, 2015 at 4:29 PM, Luis Periquito periqu...@gmail.com wrote:

 This cluster is server RBD storage for openstack, and today all the I/O
 was just stopped.
 After looking in the boxes ceph-mon was using 17G ram - and this was on
 *all* the mons. Restarting the main one just made it work again (I
 restarted the other ones because they were using a lot of ram).
 This has happened twice now (first was last Monday).

 As this is considered a prod cluster there is no logging enabled, and I
 can't reproduce it - our test/dev clusters have been working fine, and have
 neither symptoms, but they were upgraded from firefly.
 What can we do to help debug the issue? Any ideas on how to identify the
 underlying issue?

 thanks,

 On Mon, Jul 20, 2015 at 1:59 PM, Luis Periquito periqu...@gmail.com
 wrote:

 Hi all,

 I have a cluster with 28 nodes (all physical, 4Cores, 32GB Ram), each
 node has 4 OSDs for a total of 112 OSDs. Each OSD has 106 PGs (counted
 including replication). There are 3 MONs on this cluster.
 I'm running on Ubuntu trusty with kernel 3.13.0-52-generic, with Hammer
 (0.94.2).

 This cluster was installed with Hammer (0.94.1) and has only been
 upgraded to the latest available version.

 On the three mons one is mostly idle, one is using ~170% CPU, and one is
 using ~270% CPU. They will change as I restart the process (usually the
 idle one is the one with the lowest uptime).

 Running a perf top againt the ceph-mon PID on the non-idle boxes it
 wields something like this:

   4.62%  libpthread-2.19.so[.] pthread_mutex_unlock
   3.95%  libpthread-2.19.so[.] pthread_mutex_lock
   3.91%  libsoftokn3.so[.] 0x0001db26
   2.38%  [kernel]  [k] _raw_spin_lock
   2.09%  libtcmalloc.so.4.1.2  [.] operator new(unsigned long)
   1.79%  ceph-mon  [.] DispatchQueue::enqueue(Message*, int,
 unsigned long)
   1.62%  ceph-mon  [.] RefCountedObject::get()
   1.58%  libpthread-2.19.so[.] pthread_mutex_trylock
   1.32%  libtcmalloc.so.4.1.2  [.] operator delete(void*)
   1.24%  libc-2.19.so  [.] 0x00097fd0
   1.20%  ceph-mon  [.] ceph::buffer::ptr::release()
   1.18%  ceph-mon  [.] RefCountedObject::put()
   1.15%  libfreebl3.so [.] 0x000542a8
   1.05%  [kernel]  [k] update_cfs_shares
   1.00%  [kernel]  [k] tcp_sendmsg

 The cluster is mostly idle, and it's healthy. The store is 69MB big, and
 the MONs are consuming around 700MB of RAM.

 Any ideas on this situation? Is it safe to ignore?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Clients' connection for concurrent access to ceph

2015-07-23 Thread Gregory Farnum

On Wed, Jul 22, 2015 at 8:39 PM, Shneur Zalman Mattern
shz...@eimsys.co.il wrote:
 Workaround... We're building now a huge computing cluster 140 computing
 DISKLESS nodes and they are pulling to storage a lot of computing data
 concurrently
 User that put job for the cluster - need also access to the same storage
 place (seeking progress  results)

 We've built Ceph cluster:
 3 mon nodes (one of them is combined with mds)
 3 osd nodes (each one have 10 osd + ssd for journaling)
 switch 24 ports x 10G
 10 gigabit - for public network
 20 gigabit bonding - between osds
 Ubuntu 12.04.05
 Ceph 0.87.2 - giant
 -
 Clients has:
 10 gigabit for ceph-connection
 CentOS 6.6 with upgraded kernel 3.19.8 (already running computing
 cluster)

 Surely all nodes, switches and clients were configured to jumbo-frames of
 network

 =

 First test:
 I thought to make big rbd with shareing, but:
   -  RBD supports multiple clients' mappingmounting but not
 parallel writes ...

 Second test:
 NFS over RBD - it's working pretty good, but:
 1. NFS gateway - it's Single-Point-of-Failure
 2. There's no performance scaling of scale-out storage e.g.
 bottleneck (limited with bandwidth of NFS-gateway)

 Third test:
 We wanted to try CephFS, because our client is familiar with Lustre,
 that's very near to CephFS capabilities:
1. I've used my CEPH nodes in the client's role. I've mounted
 CephFS on one of nodes, and ran dd with bs=1M ...
 - I've got wonderful write performance ~ 1.1 GBytes/s
 (really near to 10Gbit network throughput)

 2. I've connected CentOS client to 10gig public network, mounted
 CephFS, but ...
 - It was just ~ 250 MBytes/s

 3. I've connected Ubuntu client (non-ceph member) to 10gig
 public network, mounted CephFS, and ...
 - It was also ~ 260 MBytes/s

 Now I have to know: perhaps ceph-members-nodes have privileged
 access ???

There's nothing in the Ceph system that would do this directly. My
first guess is that you're seeing the impact of write latencies (as
opposed to bandwidth) on your system. What is the network latency from
each node you've used as a client to the Ceph system? Exactly what dd
command are you using? How are you mounting CephFS?

Are you sure your network is functioning as expected? Run iperf
(preferably, on all your nodes simultaneously) and verify the results.

Separately, be aware that CephFS is generally not a supported
technology right now.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-mon cpu usage

2015-07-23 Thread Gregory Farnum

On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito periqu...@gmail.com wrote:
 The ceph-mon is already taking a lot of memory, and I ran a heap stats
 
 MALLOC:   32391696 (   30.9 MiB) Bytes in use by application
 MALLOC: +  27597135872 (26318.7 MiB) Bytes in page heap freelist
 MALLOC: + 16598552 (   15.8 MiB) Bytes in central cache freelist
 MALLOC: + 14693536 (   14.0 MiB) Bytes in transfer cache freelist
 MALLOC: + 17441592 (   16.6 MiB) Bytes in thread cache freelists
 MALLOC: +116387992 (  111.0 MiB) Bytes in malloc metadata
 MALLOC:   
 MALLOC: =  27794649240 (26507.0 MiB) Actual memory used (physical + swap)
 MALLOC: + 26116096 (   24.9 MiB) Bytes released to OS (aka unmapped)
 MALLOC:   
 MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
 MALLOC:
 MALLOC:   5683  Spans in use
 MALLOC: 21  Thread heaps in use
 MALLOC:   8192  Tcmalloc page size
 

 after that I ran the heap release and it went back to normal.
 
 MALLOC:   22919616 (   21.9 MiB) Bytes in use by application
 MALLOC: +  4792320 (4.6 MiB) Bytes in page heap freelist
 MALLOC: + 18743448 (   17.9 MiB) Bytes in central cache freelist
 MALLOC: + 20645776 (   19.7 MiB) Bytes in transfer cache freelist
 MALLOC: + 18456088 (   17.6 MiB) Bytes in thread cache freelists
 MALLOC: +116387992 (  111.0 MiB) Bytes in malloc metadata
 MALLOC:   
 MALLOC: =201945240 (  192.6 MiB) Actual memory used (physical + swap)
 MALLOC: +  27618820096 (26339.4 MiB) Bytes released to OS (aka unmapped)
 MALLOC:   
 MALLOC: =  27820765336 (26531.9 MiB) Virtual address space used
 MALLOC:
 MALLOC:   5639  Spans in use
 MALLOC: 29  Thread heaps in use
 MALLOC:   8192  Tcmalloc page size
 

 So it just seems the monitor is not returning unused memory into the OS or
 reusing already allocated memory it deems as free...

Yep. This is a bug (best we can tell) in some versions of tcmalloc
combined with certain distribution stacks, although I don't think
we've seen it reported on Trusty (nor on a tcmalloc distribution that
new) before. Alternatively some folks are seeing tcmalloc use up lots
of CPU in other scenarios involving memory return and it may manifest
like this, but I'm not sure. You could look through the mailing list
for information on it.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Clients' connection for concurrent access to ceph

2015-07-23 Thread John Spray




On 22/07/15 20:39, Shneur Zalman Mattern wrote:

Third test:
We wanted to try CephFS, because our client is familiar with 
Lustre, that's very near to CephFS capabilities:
   1. I've used my CEPH nodes in the client's role. I've 
mounted CephFS on one of nodes, and ran dd with bs=1M ...
- I've got wonderful write performance ~ 1.1 
GBytes/s (really near to 10Gbit network throughput)
2. I've connected CentOS client to 10gig public network, 
mounted CephFS, but ...

- It was just ~ 250 MBytes/s
3. I've connected Ubuntu client (non-ceph member) to 10gig 
public network, mounted CephFS, and ...

- It was also ~ 260 MBytes/s

Now I have to know: perhaps ceph-members-nodes have 
privileged access ???



While you're benchmarking, it's a good idea to try both the kernel 
client and the fuse client.  You may find one works better than the 
other, and we'll find the numbers interesting too.


You're using giant, the latest LTS release is hammer -- if you're 
interested in cephfs you'll be better off with hammer (lots of new stuff 
going in all the time).


Aside from that, it's kind of surprising that your servers are working 
as better clients than your clients.  Do the clients definitely have all 
the same kernel+ceph versions as the servers? Is the link between the 
clients and the servers definitely 10G all the way across your network?  
Is a pure network benchmark seeing the full 10gbps between a client node 
and a server node?


Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd_agent_max_ops relating to number of OSDs in the cache pool

2015-07-23 Thread Nick Fisk

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Gregory Farnum
 Sent: 22 July 2015 15:05
 To: Nick Fisk n...@fisk.me.uk
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] osd_agent_max_ops relating to number of OSDs in
 the cache pool

 On Sat, Jul 18, 2015 at 10:25 PM, Nick Fisk n...@fisk.me.uk wrote:
  Hi All,

  I’m doing some testing on the new High/Low speed cache tiering flushing
 and I’m trying to get my head round the effect that changing these 2 settings
 have on the flushing speed.  When setting the osd_agent_max_ops to 1, I
 can get up to 20% improvement before the osd_agent_max_high_ops value
 kicks in for high speed flushing. Which is great for bursty workloads.

  As I understand it, these settings loosely effect the number of concurrent
 operations the cache pool OSD’s will flush down to the base pool.

  I may have got completely the wrong idea in my head but I can’t
 understand how a static default setting will work with different cache/base
 ratios. For example if I had a relatively small number of very fast cache tier
 OSD’s (PCI-E SSD perhaps) and a much larger number of base tier OSD’s,
 would the value need to be increased to ensure sufficient utilisation of the
 base tier and make sure that the cache tier doesn’t fill up too fast?

  Alternatively where the cache tier is based on spinning disks or where the
 base tier is not as comparatively large, this value may need to be reduced to
 stop it saturating the disks.

  Any Thoughts?

 I'm not terribly familiar with these exact values, but I think you've got it 
 right.
 We can't make decisions at the level of the entire cache pool (because
 sharing that information isn't feasible), so we let you specify it on a 
 per-OSD
 basis according to what setup you have.

 I've no idea if anybody has gathered up a matrix of baseline good settings or
 not.

Thanks for your response. I will run a couple of tests to see if I can work out 
a rough rule of thumb for the settings. I'm guessing you don't want to do more 
than 1 or 2 concurrent ops per spinning disk to avoid over loading them. Maybe 
something like:-

(# Base Tier Disks / Copies) / # Cache Tier Disks = Optimum number of 
concurrent flush operations

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-23 Thread Gregory Farnum

I'm not sure. It looks like Ceph and your disk controllers are doing
basically the right thing since you're going from 1GB/s to 420MB/s
when moving from dd to Ceph (the full data journaling cuts it in
half), but just fyi that dd task is not doing nearly the same thing as
Ceph does — you'd need to use directio or similar; the conv=fsync flag
means it will fsync the written data at the end of the run but not at
any intermediate point.

The change from 1 node to 2 cutting your performance so much is a bit
odd. I do note that
1 node: 420 MB/s each
2 nodes: 320 MB/s each
5 nodes: 275 MB/s each
so you appear to be reaching some kind of bound.

Your note that dd can do 2GB/s without networking makes me think that
you should explore that. As you say, network interrupts can be
problematic in some systems. The only thing I can think of that's been
really bad in the past is that some systems process all network
interrupts on cpu 0, and you probably want to make sure that it's
splitting them across CPUs.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-23 Thread SCHAER Frederic

Hi,

Well I think the journaling would still appear in the dstat output, as that's 
still IOs : even if the user-side bandwidth indeed is cut in half, that should 
not be the case of disks IO.
For instance I just tried a replicated pool for the test, and got around 
1300MiB/s in dstat for about 600MiB/s in the rados bench - I take it that 
indeed, with replication/size=2, there's a total of 2 replicas, so that's 1 
user IO for 2 * [1 replicas + 1  journals] / number of hosts = 600*2*2/2 = 
1200MiBs of IOs per host (+/- the approximations) ...

Using the dd flag oflag=sync indeed lowers the dstat values down to 
1100-1300MiB/s. Still above what ceph uses with EC pools .

I have tried to identify/watch interrupt issues (using the watch command), but 
I have to say I failed until know.
The Broadcom card is indeed spreading the load on the cpus:

# egrep 'CPU|p2p' /proc/interrupts
CPU0   CPU1   CPU2   CPU3   CPU4   CPU5   
CPU6   CPU7   CPU8   CPU9   CPU10  CPU11  CPU12  
CPU13  CPU14  CPU15
  80: 881646372   1508 30  97328  0  
10459270   2715   8753  0  12765   5100   
9148   9420  0   PCI-MSI-edge  p2p1
  82: 179710 165107  94684 334842 210219  47403 
270330 166877   3516 229043  709844660  16512   5088   
2456312  12302   PCI-MSI-edge  p2p1-fp-0
  83:  12454  14073   5571  15196   5282  22301  
11522  21299 4092581302069   1303  79810  705953243   
1836  15190 883683   PCI-MSI-edge  p2p1-fp-1
  84:   6463  13994  57006  16200  16778 374815 
558398  11902  695554360  94228   1252  18649 825684   
7555 731875 190402   PCI-MSI-edge  p2p1-fp-2
  85: 163228 259899 143625 121326 107509 798435 
168027 144088  75321  89962  55297  715175665 784356  
53961  92153  92959   PCI-MSI-edge  p2p1-fp-3
  86:233267453226792070827220797122540051748938
39492831684674 65008514098872704778 140711 160954 
5910372981286  672487805   PCI-MSI-edge  p2p1-fp-4
  87:  33772 233318 136341  58163 506773 183451   
18269706  52425 226509  22150  17026 176203   5942  
681346619 270341  87435   PCI-MSI-edge  p2p1-fp-5
  88:   65103573  105514146   51193688   51330824   41771147   61202946   
41053735   49301547 181380   73028922  39525 172439 155778 
108065  154750931   26348797   PCI-MSI-edge  p2p1-fp-6
  89:   59287698  120778879   43446789   47063897   39634087   39463210   
46582805   48786230 342778   82670325 135397 438041 318995
3642955  179107495 833932   PCI-MSI-edge  p2p1-fp-7
  90:   1804   4453   2434  19885  11527   9771  
12724   2392840  12721439   1166   3354
560  69386   9233   PCI-MSI-edge  p2p2
  92:6455149433007258203245273513   115645711838476
22200494039978 977482   15351931 9494511685983 772531
271810175312351954224   PCI-MSI-edge  p2p2-fp-0

I don't know yet how to check if there are memory bandwith/latency/whatever 
issues...

Regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD

2015-07-23 Thread Udo Lembke

Am 21.07.2015 12:06, schrieb Udo Lembke:
 Hi all,
 ...
 
 Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph 
 and I'm back again... but this looks bad
 for me.
 Unfortunality the system also don't start 9 OSDs as I switched back to the 
 old system-disk... (only three of the big
 OSDs are running well)
 
 What is the best solution for that? Empty one node (crush weight 0), fresh 
 reinstall OS/ceph, reinitialise all OSDs?
 This will take a long long time, because we use 173TB in this cluster...
 
 

Hi,
answer myself if anybody has similiar issues and find the posting.

Empty the whole nodes takes too long.
I used the puppet wheezy system and have to recreate all OSDs (in this case I 
need to empty the first blocks of the
journal before create the OSD again).


Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

66 matches

Mail list logo