[ceph-users] error mapping device in firefly

2014-07-04 Thread Xabier Elkano
Hi,

I am trying to map a rbd device in  Ubuntu 14.04 (kernel 3.13.0-30-generic):

# rbd -p mypool create test1 --size 500

# rbd -p mypool ls
test1

# rbd -p mypool map test1
rbd: add failed: (5) Input/output error

and in the syslog:
Jul  4 09:31:48 testceph kernel: [70503.356842] libceph: mon2
172.16.64.18:6789 feature set mismatch, my 4a042a42  server's
2004a042a42, missing 200
Jul  4 09:31:48 testceph kernel: [70503.356938] libceph: mon2
172.16.64.18:6789 socket error on read


my environment:

cluster version on all MONs and OSDs is 0.80.1
In the client machine:

ii  ceph-common 0.80.1-1trusty   
amd64common utilities to mount and interact with a ceph storage
cluster
ii  python-ceph 0.80.1-1trusty   
amd64Python libraries for the Ceph distributed filesystem
ii  librados2   0.80.1-1trusty   
amd64RADOS distributed object store client library

I think I started getting this error when I switched from tunables
legacy to optimal after upgrading from 0.72 to 0.80.

Thanks in advance!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] emperor - firefly : Significant increase in RAM usage

2014-07-04 Thread Sylvain Munaut
Hi,


Yesterday I finally updated our cluster to emperor (lastest stable
commit) and what's fairly apparent is a much higher RAM usage on the
OSD:

http://i.imgur.com/qw9iKSV.png

Has anyone noticed the same ? I mean 25% sudden increase in the idle
ram usage is hard to ignore ...

Those OSD are pretty much entirely dedicated to RGW data pools FWIW.


Cheers,

Sylvain
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] write performance per disk

2014-07-04 Thread Wido den Hollander

On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote:

HI,

Ceph.conf:
osd journal size = 15360
rbd cache = true
 rbd cache size = 2147483648
 rbd cache max dirty = 1073741824
 rbd cache max dirty age = 100
 osd recovery max active = 1
  osd max backfills = 1
  osd mkfs options xfs = -f -i size=2048
  osd mount options xfs = 
rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M
  osd op threads = 8

so it should be 8 threads?



How many threads are you using with rados bench? Don't touch the op 
threads from the start, usually the default is just fine.



All 3 machines have more or less the same disk load at the same time.
also the disks:
sdb  35.5687.10  6849.09 617310   48540806
sdc  26.7572.62  5148.58 514701   36488992
sdd  35.1553.48  6802.57 378993   48211141
sde  31.0479.04  6208.48 560141   44000710
sdf  32.7938.35  6238.28 271805   44211891
sdg  31.6777.84  5987.45 551680   42434167
sdh  32.9551.29  6315.76 363533   44761001
sdi  31.6756.93  5956.29 403478   42213336
sdj  35.8377.82  6929.31 551501   49109354
sdk  36.8673.84  7291.00 523345   51672704
sdl  36.02   112.90  7040.47 800177   49897132
sdm  33.2538.02  6455.05 269446   45748178
sdn  33.5239.10  6645.19 277101   47095696
sdo  33.2646.22  6388.20 327541   45274394
sdp  33.3874.12  6480.62 525325   45929369


the question is: is this a poor performance to get max 500mb/write with 45 
disks and replica 2 or should I expect this?



You should be able to get more as long as the I/O is done in parallel.

Wido



-Ursprüngliche Nachricht-
Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von Wido 
den Hollander
Gesendet: Donnerstag, 03. Juli 2014 15:22
An: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] write performance per disk

On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote:

Hi,

I have a ceph cluster setup (with 45 sata disk journal on disks) and
get only 450mb/sec writes seq (maximum playing around with threads in
rados
bench) with replica of 2



How many threads?


Which is about ~20Mb writes per disk (what y see in atop also)
theoretically with replica2 and having journals on disk should be 45 X
100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125
satas in reality have 120mb/sec so the theoretical output should be more.

I would expect to have between 40-50mb/sec for each sata disk

Can somebody confirm that he can reach this speed with a setup with
journals on the satas (with journals on ssd speed should be 100mb per disk)?
or does ceph only give about ¼ of the speed for a disk? (and not the ½
as expected because of journals)



Did you verify how much each machine is doing? It could be that the data is not 
distributed evenly and that on a certain machine the drives are doing 50MB/sec.


My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for
ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig
for osd traffic with reads I can saturate the network but writes is
far away. And I would expect at least to saturate the 10gig with
sequential writes also



Should be possible, but with 3 servers the data distribution might not be 
optimal causing a lower write performance.

I've seen 10Gbit write performance on multiple clusters without any problems.


Thank you



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bad Write-Performance on Ceph/Possible bottlenecks?

2014-07-04 Thread Marco Allevato
Hello Ceph-Community,

I'm writing here because we have a bad write-performance on our Ceph-Cluster of 
about

As an overview the technical details of our Cluster:

3 x monitoring-Servers; each with 2 x 1 Gbit/s NIC configured as Bond (Link 
Aggregation-Mode)

5 x datastore-Servers; each with 10 x 4 TB HDDs serving as OSDs, as Journal we 
use a 15 GB LVM on an 256 GB SSD-Raid1; 2 x 10 Gbit/s NIC configured as Bond 
(Link Aggregation-Mode)

ceph.conf

[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 172.30.30.8,172.30.30.9
mon_initial_members = monitoring1, monitoring2, monitoring3
fsid = 5f22ab94-8d96-48c2-88d3-cff7bad443a9
public network = 172.30.30.0/24

[mon.monitoring1]
host = monitoring1
addr = 172.30.30.8:6789

[mon.monitoring2]
host = monitoring2
addr = 172.30.30.9:6789

[mon.monitoring3]
host = monitoring3
addr = 172.30.30.10:6789

[filestore]
   filestore max sync interval = 10

[osd]
osd recovery max active = 1
osd journal size = 15360
osd op threads = 40
osd disk threads = 40

[osd.0]
host = datastore1

[osd.1]
host = datastore1

[osd.2]
host = datastore1

[osd.3]
host = datastore1

[osd.4]
host = datastore1

[osd.5]
host = datastore1

[osd.6]
host = datastore1

[osd.7]
host = datastore1

[osd.8]
host = datastore1

[osd.9]
host = datastore1

[osd.10]
host = datastore2

[osd.11]
host = datastore2

[osd.11]
host = datastore2

[osd.12]
host = datastore2

[osd.13]
host = datastore2

[osd.14]
host = datastore2

[osd.15]
host = datastore2

[osd.16]
host = datastore2

[osd.17]
host = datastore2

[osd.18]
host = datastore2

[osd.19]
host = datastore2

[osd.20]
host = datastore3

[osd.21]
host = datastore3

[osd.22]
host = datastore3

[osd.23]
host = datastore3

[osd.24]
host = datastore3

[osd.25]
host = datastore3

[osd.26]
host = datastore3

[osd.27]
host = datastore3

[osd.28]
host = datastore3

[osd.29]
host = datastore3

[osd.30]
host = datastore4

[osd.31]
host = datastore4

[osd.32]
host = datastore4

[osd.33]
host = datastore4

[osd.34]
host = datastore4

[osd.35]
host = datastore4

[osd.36]
host = datastore4

[osd.37]
host = datastore4

[osd.38]
host = datastore4

[osd.39]
host = datastore4

[osd.0]
host = datastore5

[osd.40]
host = datastore5

[osd.41]
host = datastore5

[osd.42]
host = datastore5

[osd.43]
host = datastore5

[osd.44]
host = datastore5

[osd.45]
host = datastore5

[osd.46]
host = datastore5

[osd.47]
host = datastore5

[osd.48]
host = datastore5


We have 3 pools:

- 2 x 1000 pgs with 2 Replicas distributing the data equally to two racks 
(Used for datastore 1-4)
- 1 x 100 pgs without replication; data only stored on datastore 5. This Pool 
is used to compare the performance on local disks without networking


Here are the performance values, which I get using fio-Bench on a 32GB rbd:


On 1000 pgs-Pool with distribution

fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32 
--runtime=60 --name=/dev/rbd/pool1/bench1

fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/312.0M/0K /s] [0 /312 /0  iops] [eta 
00m:00s]
/dev/rbd/pool1/bench1: (groupid=0, jobs=1): err= 0: pid=21675: Fri Jul  4 
11:03:52 2014
  write: io=21071MB, bw=358989KB/s, iops=350 , runt= 60104msec
slat (usec): min=127 , max=8040 , avg=511.49, stdev=216.27
clat (msec): min=5 , max=4018 , avg=90.74, stdev=215.83
 lat (msec): min=6 , max=4018 , avg=91.25, stdev=215.83
clat percentiles (msec):
 |  1.00th=[8],  5.00th=[9], 10.00th=[   11], 20.00th=[   15],
 | 30.00th=[   21], 40.00th=[   30], 50.00th=[   45], 60.00th=[   63],
 | 70.00th=[   83], 80.00th=[  105], 90.00th=[  129], 95.00th=[  190],
 | 99.00th=[ 1254], 99.50th=[ 1680], 99.90th=[ 2409], 99.95th=[ 2638],
 | 99.99th=[ 3556]
bw (KB/s)  : min=68210, max=479232, per=100.00%, avg=368399.55, 
stdev=84457.12
lat (msec) : 10=9.50%, 20=20.02%, 50=23.56%, 100=24.56%, 250=18.09%
lat (msec) : 500=1.39%, 750=0.81%, 1000=0.65%, 2000=1.13%, =2000=0.29%
  cpu  : usr=11.17%, sys=7.46%, ctx=17772, majf=0, minf=24
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, =64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, =64=0.0%
 issued: total=r=0/w=21071/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=21071MB, aggrb=358989KB/s, minb=358989KB/s, maxb=358989KB/s, 

Re: [ceph-users] Bad Write-Performance on Ceph/Possible bottlenecks?

2014-07-04 Thread Konrad Gutkowski

Hi,

I wouldn't put those SSD's in raid, just use them separately as journals  
for half of your's HDD's. This should make your write performance somewhat  
better.


W dniu 04.07.2014 o 11:13 Marco Allevato m.allev...@nwe.de pisze:



Hello Ceph-Community,


I’m writing here because we have a bad write-performance on our  
Ceph-Cluster of about


As an overview the technical details of our Cluster:


3 x monitoring-Servers; each with 2 x 1 Gbit/s NIC configured as Bond  
(Link Aggregation-Mode)



5 x datastore-Servers; each with 10 x 4 TB HDDs serving as OSDs, as  
Journal we use a 15 GB LVM on an 256 GB SSD-Raid1; 2 x 10 Gbit/s NIC  
configured as Bond (Link Aggregation-Mode)



ceph.conf


[global]

auth_service_required = cephx

filestore_xattr_use_omap = true

auth_client_required = cephx

auth_cluster_required = cephx

mon_host = 172.30.30.8,172.30.30.9

mon_initial_members = monitoring1, monitoring2, monitoring3

fsid = 5f22ab94-8d96-48c2-88d3-cff7bad443a9

public network = 172.30.30.0/24

[mon.monitoring1]

   host = monitoring1

   addr = 172.30.30.8:6789


[mon.monitoring2]

   host = monitoring2

   addr = 172.30.30.9:6789


[mon.monitoring3]

   host = monitoring3

   addr = 172.30.30.10:6789


[filestore]

  filestore max sync interval = 10


[osd]

   osd recovery max active = 1

   osd journal size = 15360

   osd op threads = 40

   osd disk threads = 40


[osd.0]

   host = datastore1


[osd.1]

   host = datastore1


[osd.2]

   host = datastore1


[osd.3]

   host = datastore1


[osd.4]

   host = datastore1


[osd.5]

   host = datastore1


[osd.6]

   host = datastore1


[osd.7]

   host = datastore1


[osd.8]

   host = datastore1


[osd.9]

   host = datastore1


[osd.10]

   host = datastore2


[osd.11]

   host = datastore2


[osd.11]

   host = datastore2


[osd.12]

   host = datastore2


[osd.13]

   host = datastore2


[osd.14]

   host = datastore2


[osd.15]

   host = datastore2


[osd.16]

   host = datastore2


[osd.17]

   host = datastore2


[osd.18]

   host = datastore2


[osd.19]

   host = datastore2


[osd.20]

   host = datastore3


[osd.21]

   host = datastore3


[osd.22]

   host = datastore3


[osd.23]

   host = datastore3


[osd.24]

   host = datastore3


[osd.25]

   host = datastore3


[osd.26]

   host = datastore3


[osd.27]

   host = datastore3


[osd.28]

   host = datastore3


[osd.29]

   host = datastore3


[osd.30]

   host = datastore4


[osd.31]

   host = datastore4


[osd.32]

   host = datastore4


[osd.33]

   host = datastore4


[osd.34]

   host = datastore4


[osd.35]

   host = datastore4


[osd.36]

   host = datastore4


[osd.37]

   host = datastore4


[osd.38]

   host = datastore4


[osd.39]

   host = datastore4


[osd.0]

   host = datastore5


[osd.40]

   host = datastore5


[osd.41]

   host = datastore5


[osd.42]

   host = datastore5


[osd.43]

   host = datastore5


[osd.44]

   host = datastore5


[osd.45]

   host = datastore5


[osd.46]

   host = datastore5


[osd.47]

   host = datastore5


[osd.48]

   host = datastore5



We have 3 pools:

- 2 x 1000 pgs with 2 Replicas distributing the data equally to two  
racks (Used for datastore 1-4)


- 1 x 100 pgs without replication; data only stored on datastore 5.  
This Pool is used to compare the performance on local disks without  
networking




Here are the performance values, which I get using fio-Bench on a 32GB  
rbd:




On 1000 pgs-Pool with distribution


fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32  
--runtime=60 --name=/dev/rbd/pool1/bench1



fio-2.0.13

Starting 1 process

Jobs: 1 (f=1): [w] [100.0% done] [0K/312.0M/0K /s] [0 /312 /0  iops]  
[eta 00m:00s]


/dev/rbd/pool1/bench1: (groupid=0, jobs=1): err= 0: pid=21675: Fri Jul   
4 11:03:52 2014


 write: io=21071MB, bw=358989KB/s, iops=350 , runt= 60104msec

   slat (usec): min=127 , max=8040 , avg=511.49, stdev=216.27

   clat (msec): min=5 , max=4018 , avg=90.74, stdev=215.83

lat (msec): min=6 , max=4018 , avg=91.25, stdev=215.83

   clat percentiles (msec):

|  1.00th=[8],  5.00th=[9], 10.00th=[   11], 20.00th=[   15],

| 30.00th=[   21], 40.00th=[   30], 50.00th=[   45], 60.00th=[   63],

| 70.00th=[   83], 80.00th=[  105], 90.00th=[  129], 95.00th=[  190],

| 99.00th=[ 1254], 99.50th=[ 1680], 99.90th=[ 2409], 99.95th=[ 2638],

| 99.99th=[ 3556]

   bw (KB/s)  : min=68210, max=479232, per=100.00%, avg=368399.55,  
stdev=84457.12


   lat (msec) : 10=9.50%, 20=20.02%, 50=23.56%, 100=24.56%, 250=18.09%

   lat (msec) : 500=1.39%, 750=0.81%, 1000=0.65%, 2000=1.13%,  
=2000=0.29%


 cpu  : usr=11.17%, sys=7.46%, ctx=17772, majf=0, minf=24

 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 

Re: [ceph-users] write performance per disk

2014-07-04 Thread VELARTIS Philipp Dürhammer
I use between 1 and 128 in different steps...
But 500mb write is the max playing around.

Uff its so hard to tune ceph... so many people have problems... ;-)

-Ursprüngliche Nachricht-
Von: Wido den Hollander [mailto:w...@42on.com] 
Gesendet: Freitag, 04. Juli 2014 10:55
An: VELARTIS Philipp Dürhammer; ceph-users@lists.ceph.com
Betreff: Re: AW: [ceph-users] write performance per disk

On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote:
 HI,

 Ceph.conf:
 osd journal size = 15360
 rbd cache = true
  rbd cache size = 2147483648
  rbd cache max dirty = 1073741824
  rbd cache max dirty age = 100
  osd recovery max active = 1
   osd max backfills = 1
   osd mkfs options xfs = -f -i size=2048
   osd mount options xfs = 
 rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M
   osd op threads = 8

 so it should be 8 threads?


How many threads are you using with rados bench? Don't touch the op threads 
from the start, usually the default is just fine.

 All 3 machines have more or less the same disk load at the same time.
 also the disks:
 sdb  35.5687.10  6849.09 617310   48540806
 sdc  26.7572.62  5148.58 514701   36488992
 sdd  35.1553.48  6802.57 378993   48211141
 sde  31.0479.04  6208.48 560141   44000710
 sdf  32.7938.35  6238.28 271805   44211891
 sdg  31.6777.84  5987.45 551680   42434167
 sdh  32.9551.29  6315.76 363533   44761001
 sdi  31.6756.93  5956.29 403478   42213336
 sdj  35.8377.82  6929.31 551501   49109354
 sdk  36.8673.84  7291.00 523345   51672704
 sdl  36.02   112.90  7040.47 800177   49897132
 sdm  33.2538.02  6455.05 269446   45748178
 sdn  33.5239.10  6645.19 277101   47095696
 sdo  33.2646.22  6388.20 327541   45274394
 sdp  33.3874.12  6480.62 525325   45929369


 the question is: is this a poor performance to get max 500mb/write with 45 
 disks and replica 2 or should I expect this?


You should be able to get more as long as the I/O is done in parallel.

Wido


 -Ursprüngliche Nachricht-
 Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag 
 von Wido den Hollander
 Gesendet: Donnerstag, 03. Juli 2014 15:22
 An: ceph-users@lists.ceph.com
 Betreff: Re: [ceph-users] write performance per disk

 On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote:
 Hi,

 I have a ceph cluster setup (with 45 sata disk journal on disks) and 
 get only 450mb/sec writes seq (maximum playing around with threads in 
 rados
 bench) with replica of 2


 How many threads?

 Which is about ~20Mb writes per disk (what y see in atop also) 
 theoretically with replica2 and having journals on disk should be 45 
 X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125 
 satas in reality have 120mb/sec so the theoretical output should be more.

 I would expect to have between 40-50mb/sec for each sata disk

 Can somebody confirm that he can reach this speed with a setup with 
 journals on the satas (with journals on ssd speed should be 100mb per disk)?
 or does ceph only give about ¼ of the speed for a disk? (and not the 
 ½ as expected because of journals)


 Did you verify how much each machine is doing? It could be that the data is 
 not distributed evenly and that on a certain machine the drives are doing 
 50MB/sec.

 My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for 
 ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig 
 for osd traffic with reads I can saturate the network but writes is 
 far away. And I would expect at least to saturate the 10gig with 
 sequential writes also


 Should be possible, but with 3 servers the data distribution might not be 
 optimal causing a lower write performance.

 I've seen 10Gbit write performance on multiple clusters without any problems.

 Thank you



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 Ceph consultant and trainer
 42on B.V.

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad Write-Performance on Ceph/Possible bottlenecks?

2014-07-04 Thread Wido den Hollander

On 07/04/2014 11:33 AM, Daniel Schwager wrote:

Hi,

I think, the problem is the rbd device. It's only ONE device.


I fully agree. Ceph excels in parallel performance. You should run 
multiple fio instances in parallel on different RBD devices and even 
better on different clients.


Then you will see a big difference.

Wido




fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32

--runtime=60 --name=/dev/rbd/pool1/bench1

Try to create e.g. 20 (small) rbd devices, putting them all in a lvm vg,
creating a logical volume (Raid0) with

20 stripes and e.g. stripeSize 1MB (better bandwith) or 4kb (better io)
- or use md-raid0 (it's maybe 10% faster - but not that flexible):

# create disks

for i in `seq -f %02.f 0 19` ; do rbd create --size 4
vmware/vol6-$i.dsk ; done

emacs -nw /etc/lvm/lvm.conf

types = [ rbd, 16 ]

# rbd map 

# pvcreate

for i in `seq -f %02.f 0 19` ; do pvcreate /dev/rbd/vmware/vol6-$i.dsk
; done

# vcreate VG

vgcreate VG_RBD20x40_VOL6 /dev/rbd/vmware/vol6-00.dsk

for i in `seq -f %02.f 1 19` ; do vgextend VG_RBD20x40_VOL6
/dev/rbd/vmware/vol6-$i.dsk ; done

# lvcreate raid0

# -i, --stripes Stripes - This is equal to the number of physical
volumes to scatter the logical volume.

# -I, --stripesize StripeSize - Gives the number of kilobytes for the
granularity of the stripes, 2^n, (n = 2 to 9)

# 20 stripes und 4k StripeSize

lvcreate -i20 -I1024 -L70m-n VmProd06VG_RBD20x40_VOL6

Now, try to run fio against /dev/mapper/ VG_RBD20x40_VOL6-VmProd06

I think, the performance will be about 10GBi.

regards

Danny

*From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
Of *Marco Allevato
*Sent:* Friday, July 04, 2014 11:13 AM
*To:* ceph-users@lists.ceph.com
*Subject:* [ceph-users] Bad Write-Performance on Ceph/Possible bottlenecks?

Hello Ceph-Community,

I’m writing here because we have a bad write-performance on our
Ceph-Cluster of about

_As an overview the technical details of our Cluster:_

3 x monitoring-Servers; each with 2 x 1 Gbit/s NIC configured as Bond
(Link Aggregation-Mode)

5 x datastore-Servers; each with 10 x 4 TB HDDs serving as OSDs, as
Journal we use a 15 GB LVM on an 256 GB SSD-Raid1; 2 x 10 Gbit/s NIC
configured as Bond (Link Aggregation-Mode)

__

_ceph.conf_

[global]

auth_service_required = cephx

filestore_xattr_use_omap = true

auth_client_required = cephx

auth_cluster_required = cephx

mon_host = 172.30.30.8,172.30.30.9

mon_initial_members = monitoring1, monitoring2, monitoring3

fsid = 5f22ab94-8d96-48c2-88d3-cff7bad443a9

public network = 172.30.30.0/24

[mon.monitoring1]

 host = monitoring1

 addr = 172.30.30.8:6789

[mon.monitoring2]

 host = monitoring2

 addr = 172.30.30.9:6789

[mon.monitoring3]

 host = monitoring3

 addr = 172.30.30.10:6789

[filestore]

filestore max sync interval = 10

[osd]

 osd recovery max active = 1

 osd journal size = 15360

 osd op threads = 40

 osd disk threads = 40

[osd.0]

 host = datastore1

[osd.1]

 host = datastore1

[osd.2]

 host = datastore1

[osd.3]

 host = datastore1

[osd.4]

 host = datastore1

[osd.5]

 host = datastore1

[osd.6]

 host = datastore1

[osd.7]

 host = datastore1

[osd.8]

 host = datastore1

[osd.9]

 host = datastore1

[osd.10]

 host = datastore2

[osd.11]

 host = datastore2

[osd.11]

 host = datastore2

[osd.12]

 host = datastore2

[osd.13]

 host = datastore2

[osd.14]

 host = datastore2

[osd.15]

 host = datastore2

[osd.16]

 host = datastore2

[osd.17]

 host = datastore2

[osd.18]

 host = datastore2

[osd.19]

 host = datastore2

[osd.20]

 host = datastore3

[osd.21]

 host = datastore3

[osd.22]

 host = datastore3

[osd.23]

 host = datastore3

[osd.24]

 host = datastore3

[osd.25]

 host = datastore3

[osd.26]

 host = datastore3

[osd.27]

 host = datastore3

[osd.28]

 host = datastore3

[osd.29]

 host = datastore3

[osd.30]

 host = datastore4

[osd.31]

 host = datastore4

[osd.32]

 host = datastore4

[osd.33]

 host = datastore4

[osd.34]

 host = datastore4

[osd.35]

 host = datastore4

[osd.36]

 host = datastore4

[osd.37]

 host = datastore4

[osd.38]

 host = datastore4

[osd.39]

 host = datastore4

[osd.0]

 host = datastore5

[osd.40]

 host = datastore5

[osd.41]

 host = datastore5

[osd.42]

 host = datastore5

[osd.43]

 host = datastore5

[osd.44]

 host = datastore5

[osd.45]

 host = datastore5

[osd.46]

 host = datastore5

[osd.47]

 host = datastore5

[osd.48]

 host = datastore5

We have 3 pools:

- 2 x 1000 pgs 

Re: [ceph-users] write performance per disk

2014-07-04 Thread Wido den Hollander

On 07/04/2014 11:40 AM, VELARTIS Philipp Dürhammer wrote:

I use between 1 and 128 in different steps...
But 500mb write is the max playing around.



I just mentioned it in a different thread, make sure you do parallel 
I/O! That's where Ceph really makes the difference. Run rados bench from 
multiple clients.



Uff its so hard to tune ceph... so many people have problems... ;-)


No, Ceph is simply different from any other storage. Distributed storage 
is a lot different in terms of performance from existing storage 
projects/products.


Wido



-Ursprüngliche Nachricht-
Von: Wido den Hollander [mailto:w...@42on.com]
Gesendet: Freitag, 04. Juli 2014 10:55
An: VELARTIS Philipp Dürhammer; ceph-users@lists.ceph.com
Betreff: Re: AW: [ceph-users] write performance per disk

On 07/03/2014 04:32 PM, VELARTIS Philipp Dürhammer wrote:

HI,

Ceph.conf:
 osd journal size = 15360
 rbd cache = true
  rbd cache size = 2147483648
  rbd cache max dirty = 1073741824
  rbd cache max dirty age = 100
  osd recovery max active = 1
   osd max backfills = 1
   osd mkfs options xfs = -f -i size=2048
   osd mount options xfs = 
rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M
   osd op threads = 8

so it should be 8 threads?



How many threads are you using with rados bench? Don't touch the op threads 
from the start, usually the default is just fine.


All 3 machines have more or less the same disk load at the same time.
also the disks:
sdb  35.5687.10  6849.09 617310   48540806
sdc  26.7572.62  5148.58 514701   36488992
sdd  35.1553.48  6802.57 378993   48211141
sde  31.0479.04  6208.48 560141   44000710
sdf  32.7938.35  6238.28 271805   44211891
sdg  31.6777.84  5987.45 551680   42434167
sdh  32.9551.29  6315.76 363533   44761001
sdi  31.6756.93  5956.29 403478   42213336
sdj  35.8377.82  6929.31 551501   49109354
sdk  36.8673.84  7291.00 523345   51672704
sdl  36.02   112.90  7040.47 800177   49897132
sdm  33.2538.02  6455.05 269446   45748178
sdn  33.5239.10  6645.19 277101   47095696
sdo  33.2646.22  6388.20 327541   45274394
sdp  33.3874.12  6480.62 525325   45929369


the question is: is this a poor performance to get max 500mb/write with 45 
disks and replica 2 or should I expect this?



You should be able to get more as long as the I/O is done in parallel.

Wido



-Ursprüngliche Nachricht-
Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag
von Wido den Hollander
Gesendet: Donnerstag, 03. Juli 2014 15:22
An: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] write performance per disk

On 07/03/2014 03:11 PM, VELARTIS Philipp Dürhammer wrote:

Hi,

I have a ceph cluster setup (with 45 sata disk journal on disks) and
get only 450mb/sec writes seq (maximum playing around with threads in
rados
bench) with replica of 2



How many threads?


Which is about ~20Mb writes per disk (what y see in atop also)
theoretically with replica2 and having journals on disk should be 45
X 100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125
satas in reality have 120mb/sec so the theoretical output should be more.

I would expect to have between 40-50mb/sec for each sata disk

Can somebody confirm that he can reach this speed with a setup with
journals on the satas (with journals on ssd speed should be 100mb per disk)?
or does ceph only give about ¼ of the speed for a disk? (and not the
½ as expected because of journals)



Did you verify how much each machine is doing? It could be that the data is not 
distributed evenly and that on a certain machine the drives are doing 50MB/sec.


My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for
ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig
for osd traffic with reads I can saturate the network but writes is
far away. And I would expect at least to saturate the 10gig with
sequential writes also



Should be possible, but with 3 servers the data distribution might not be 
optimal causing a lower write performance.

I've seen 10Gbit write performance on multiple clusters without any problems.


Thank you



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] nginx (tengine) and radosgw

2014-07-04 Thread Andrei Mikhailovsky
Hi David, 

Do you mind sharing the howto/documentation with examples of configs, etc.? 

I am tempted to give it a go and replace the Apache reverse proxy that I am 
currently using. 

cheers 

Andrei 

- Original Message -

From: David Moreau Simard dmsim...@iweb.com 
To: ceph-users@lists.ceph.com 
Sent: Sunday, 22 June, 2014 2:37:00 AM 
Subject: Re: [ceph-users] nginx (tengine) and radosgw 

Hi, 

I just wanted to chime in and say that I didn’t notice any problems swapping 
nginx out in favor of tengine. 
tengine is used as a load balancer that also handles SSL termination. 

I found that disabling body buffering saves a lot on upload times as well. 

I took the time to do a post about it and linked this thread: 
http://dmsimard.com/2014/06/21/a-use-case-of-tengine-a-drop-in-replacement-and-fork-of-nginx/
 

- David 

On May 29, 2014, at 12:20 PM, Michael Lukzak  mis...@vp.pl  wrote: 



Re[2]: [ceph-users] nginx (tengine) and radosgw 
Hi, 

Ups, so I don't read carefully a doc... 
I will try this solution. 

Thanks! 

Michael 



From the docs, you need this setting in ceph.conf (if you're using 
nginx/tengine): 

rgw print continue = false 

This will fix the 100-continue issues. 

On 5/29/2014 5:56 AM, Michael Lukzak wrote: 
Re[2]: [ceph-users] nginx (tengine) and radosgw Hi, 

I'm also use tengine, works fine with SSL (I have a Wildcard). 
But I have other issue with HTTP 100-Continue. 
Clients like boto or Cyberduck hangs if they can't make HTTP 100-Continue. 

IP_REMOVED - - [29/May/2014:11:27:53 +] PUT 
/temp/1b6f6a11d7aa188f06f8255fdf0345b4 HTTP/1.1 100 0 - Boto/2.27.0 
Python/2.7.6 Linux/3.13.0-24-generic 

Do You have also problem with that? 
I used for testing oryginal nginx and also have a problem with 100-Continue. 
Only Apache 2.x works fine. 

BR, 
Michael 




I haven't tried SSL yet. We currently don't have a wildcard certificate 
for this, so it hasn't been a concern (and our current use case, all the files 
are public anyway). 

On 5/20/2014 4:26 PM, Andrei Mikhailovsky wrote: 


That looks very interesting indeed. I've tried to use nginx, but from what I 
recall it had some ssl related issues. Have you tried to make the ssl work so 
that nginx acts as an ssl proxy in front of the radosgw? 

Cheers 

Andrei 


From: Brian Rak b...@gameservers.com 
To: ceph-users@lists.ceph.com 
Sent: Tuesday, 20 May, 2014 9:11:58 PM 
Subject: [ceph-users] nginx (tengine) and radosgw 

I've just finished converting from nginx/radosgw to tengine/radosgw, and 
it's fixed all the weird issues I was seeing (uploads failing, random 
clock skew errors, timeouts). 

The problem with nginx and radosgw is that nginx insists on buffering 
all the uploads to disk. This causes a significant performance hit, and 
prevents larger uploads from working. Supposedly, there is going to be 
an option in nginx to disable this, but it hasn't been released yet (nor 
do I see anything on the nginx devel list about it). 

tengine ( http://tengine.taobao.org/ ) is an nginx fork that implements 
unbuffered uploads to fastcgi. It's basically a drop in replacement for 
nginx. 

My configuration looks like this: 

server { 
listen 80; 

server_name *.rados.test rados.test; 

client_max_body_size 10g; 
# This is the important option that tengine has, but nginx does not 
fastcgi_request_buffering off; 

location / { 
fastcgi_pass_header Authorization; 
fastcgi_pass_request_headers on; 

if ($request_method = PUT ) { 
rewrite ^ /PUT$request_uri; 
} 
include fastcgi_params; 

fastcgi_pass unix:/path/to/ceph.radosgw.fastcgi.sock; 
} 

location /PUT/ { 
internal; 
fastcgi_pass_header Authorization; 
fastcgi_pass_request_headers on; 

include fastcgi_params; 
fastcgi_param CONTENT_LENGTH $content_length; 

fastcgi_pass unix:/path/to/ceph.radosgw.fastcgi.sock; 
} 
} 


if anyone else is looking to run radosgw without having to run apache, I 
would recommend you look into tengine :) 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 





___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad Write-Performance on Ceph/Possible bottlenecks?

2014-07-04 Thread Daniel Schwager
 Try to create e.g. 20 (small) rbd devices, putting them all in a lvm vg, 
 creating a logical volume (Raid0) with
 20 stripes and e.g. stripeSize 1MB (better bandwith) or 4kb (better io) - or 
 use md-raid0 (it's maybe 10% faster - but not that flexible):

BTW - we use this approach for VMware using
- one LVM LV (raid0: 20 stripes, 1MB stripe size ) LUN based on
- one VG containing 20 rbd's (each 40GB)  based on
- a ceph pool with 24osd, 3 replicates inside our
- ceph cluster, 3 nodes x 8 x 4TB OSD's, 2 x 10GBit
- published by scst  (Fibre channel, 4 GBit QLA) to vSphere ESX.

IOmeter (one worker, one disk) inside a w2k8r2 vm @esx tells me
iometer: 270/360 MB/sec write/read (1MByte block size, 4 
outstanding IOs)

And - important - other vm's share the bandwidth from 20 rbd volumes - so, now, 
our 4GBit fibrchannel is the bottle neck - not the (one) rbd volume anymore.

Also, we will add a flashcache in front of the raid0 LV to bust the 4k IO's - 
at the moment, 4k is terrible slow
iometer: 4/14 MB/sec write/read (4k block size, 8 outstanding 
IOs)

with a 10 GByte flashcache, it's about
iometer: 14/60 MB/sec write/read (4k block size, 8 outstanding 
IOs)

regards
Danny


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD recovery problem

2014-07-04 Thread Loic Dachary
Hi,

I extracted a disk with two partitions (journal and data) and copied its 
content in the hope to restart the OSD and recover its content.

   mount /dev/sdb1 /mnt
   rsync -avH --numeric-ids /mnt/ /var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/
   rm /var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal
   dd if=/dev/sdb2 of=/var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal

and then

   start ceph-osd id=$(cat /mnt/whoami)

It crashes on https://github.com/ceph/ceph/blob/v0.72.2/src/osd/PG.cc#L2182 and 
before it happens there is

   load_pgs ignoring unrecognized meta

and the full debug osd = 20 logs are in http://paste.ubuntu.com/7746993/ and 
this is

root@bm4202:/etc/ceph# dpkg -l | grep ceph
ii  ceph0.72.2-1trusty amd64  
distributed storage and file system
ii  ceph-common 0.72.2-1trusty amd64  common 
utilities to mount and interact with a ceph storage
ii  python-ceph 0.72.2-1trusty amd64  Python 
libraries for the Ceph distributed filesystem
root@bm4202:/etc/ceph# ceph --version
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

Cheers
-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD recovery problem

2014-07-04 Thread Wido den Hollander

On 07/04/2014 03:18 PM, Loic Dachary wrote:

Hi,

I extracted a disk with two partitions (journal and data) and copied its 
content in the hope to restart the OSD and recover its content.

mount /dev/sdb1 /mnt
rsync -avH --numeric-ids /mnt/ /var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/


I think you went wrong there, rsync man page:

-a, --archive   archive mode; equals -rlptgoD (no -H,-A,-X)
-X, --xattrspreserve extended attributes

So you didn't copy over the xattrs, so basically the data is lost/unusable.


rm /var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal
dd if=/dev/sdb2 of=/var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal

and then

start ceph-osd id=$(cat /mnt/whoami)

It crashes on https://github.com/ceph/ceph/blob/v0.72.2/src/osd/PG.cc#L2182 and 
before it happens there is

load_pgs ignoring unrecognized meta

and the full debug osd = 20 logs are in http://paste.ubuntu.com/7746993/ and 
this is

root@bm4202:/etc/ceph# dpkg -l | grep ceph
ii  ceph0.72.2-1trusty amd64  
distributed storage and file system
ii  ceph-common 0.72.2-1trusty amd64  common 
utilities to mount and interact with a ceph storage
ii  python-ceph 0.72.2-1trusty amd64  Python 
libraries for the Ceph distributed filesystem
root@bm4202:/etc/ceph# ceph --version
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

Cheers



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD recovery problem

2014-07-04 Thread Loic Dachary


On 04/07/2014 15:25, Wido den Hollander wrote:
 On 07/04/2014 03:18 PM, Loic Dachary wrote:
 Hi,

 I extracted a disk with two partitions (journal and data) and copied its 
 content in the hope to restart the OSD and recover its content.

 mount /dev/sdb1 /mnt
 rsync -avH --numeric-ids /mnt/ /var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/
 
 I think you went wrong there, rsync man page:
 
 -a, --archive   archive mode; equals -rlptgoD (no -H,-A,-X)
 -X, --xattrspreserve extended attributes
 
 So you didn't copy over the xattrs, so basically the data is lost/unusable.

Thanks ! Fortunately the original disks are still available ;-)

 rm /var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal
 dd if=/dev/sdb2 of=/var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal

 and then

 start ceph-osd id=$(cat /mnt/whoami)

 It crashes on https://github.com/ceph/ceph/blob/v0.72.2/src/osd/PG.cc#L2182 
 and before it happens there is

 load_pgs ignoring unrecognized meta

 and the full debug osd = 20 logs are in http://paste.ubuntu.com/7746993/ 
 and this is

 root@bm4202:/etc/ceph# dpkg -l | grep ceph
 ii  ceph0.72.2-1trusty amd64  
 distributed storage and file system
 ii  ceph-common 0.72.2-1trusty amd64  common 
 utilities to mount and interact with a ceph storage
 ii  python-ceph 0.72.2-1trusty amd64  Python 
 libraries for the Ceph distributed filesystem
 root@bm4202:/etc/ceph# ceph --version
 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

 Cheers



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multipart upload on ceph 0.8 doesn't work?

2014-07-04 Thread Patrycja Szabłowska
Thank you Luis for your response.

Quite unbelievable, but your solution worked!
Unfortunately, I'm stuck again when trying to upload parts of the file.

Apache's logs:


== apache.access.log ==
127.0.0.1 l - [04/Jul/2014:15:40:41 +0200] PUT /bucketbig/ HTTP/1.1 200
477 {Referer}i Boto/2.30.0 Python/2.7.6 Linux/3.13.0-30-generic
127.0.0.1 l - [04/Jul/2014:15:40:41 +0200] POST
/bucketbig/Bosphorus?uploads HTTP/1.1 200 249 {Referer}i Boto/2.30.0
Python/2.7.6 Linux/3.13.0-30-generic

== apache.error.log ==
[Fri Jul 04 15:40:41.868621 2014] [fastcgi:error] [pid 14199] [client
127.0.0.1:46571] FastCGI: incomplete headers (0 bytes) received from server
/home/pszablow/ceph/src/htdocs/rgw.fcgi

== apache.access.log ==
127.0.0.1 l - [04/Jul/2014:15:40:41 +0200] PUT
/bucketbig/Bosphorus?uploadId=2/fURJChPdpUqA3Z1oVLUjT7ROsnxIqZ9partNumber=1
HTTP/1.1 500 531 {Referer}i Boto/2.30.0 Python/2.7.6
Linux/3.13.0-30-generic

== apache.error.log ==
[Fri Jul 04 15:40:42.571543 2014] [fastcgi:error] [pid 14200]
(111)Connection refused: [client 127.0.0.1:46572] FastCGI: failed to
connect to server /home/pszablow/ceph/src/htdocs/rgw.fcgi: connect()
failed
[Fri Jul 04 15:40:42.571660 2014] [fastcgi:error] [pid 14200] [client
127.0.0.1:46572] FastCGI: incomplete headers (0 bytes) received from server
/home/pszablow/ceph/src/htdocs/rgw.fcgi



I'm using the default fastcgi module, not the one provided by Ceph. I've
tried installing it on my ubuntu 14.04, but unfortunately I keep getting
the error:
libapache2-mod-fastcgi : requires: apache2.2-common (= 2.2.4)


Is the modified fastcgi module mandatory in order to use multi part upload?


Thanks,

Patrycja Szabłowska


2014-07-03 18:34 GMT+02:00 Luis Periquito luis.periqu...@ocado.com:

 I was at this issue this morning. It seems radosgw requires you to have a
 pool named '' to work with multipart. I just created a pool with that name
 rados mkpool ''

 either that or allow the pool be created by the radosgw...


 On 3 July 2014 16:27, Patrycja Szabłowska szablowska.patry...@gmail.com
 wrote:

 Hi,

 I'm trying to make multi part upload work. I'm using ceph
 0.80-702-g9bac31b (from the ceph's github).

 I've tried the code provided by Mark Kirkwood here:


 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-October/034940.html


 But unfortunately, it gives me the error:

 (multitest)pszablow@pat-desktop:~/$ python boto_multi.py
   begin upload of abc.yuv
   size 746496, 7 parts
 Traceback (most recent call last):
   File boto_multi.py, line 36, in module
 part = bucket.initiate_multipart_upload(objname)
   File
 /home/pszablow/venvs/multitest/local/lib/python2.7/site-packages/boto/s3/bucket.py,
 line 1742, in initiate_multipart_upload
 response.status, response.reason, body)
 boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
 ?xml version=1.0
 encoding=UTF-8?ErrorCodeAccessDenied/Code/Error


 The single part upload works for me. I am able to create buckets and
 objects.
 I've tried also other similar examples, but none of them works.


 Any ideas what's wrong? Does the ceph's multi part upload actually
 work for anybody?


 Thanks,

 Patrycja Szabłowska
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Luis Periquito

 Unix Engineer

 Ocado.com http://www.ocado.com/

 Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
 Hatfield, Herts AL10 9NE

 Notice:  This email is confidential and may contain copyright material of
 members of the Ocado Group. Opinions and views expressed in this message
 may not necessarily reflect the opinions and views of the members of the
 Ocado Group.

 If you are not the intended recipient, please notify us immediately and
 delete all copies of this message. Please note that it is your
 responsibility to scan this message for viruses.

 References to the “Ocado Group” are to Ocado Group plc (registered in
 England and Wales with number 7098618) and its subsidiary undertakings (as
 that expression is defined in the Companies Act 2006) from time to time.
 The registered office of Ocado Group plc is Titan Court, 3 Bishops Square,
 Hatfield Business Park, Hatfield, Herts. AL10 9NE.




-- 
Pozdrawiam
Patrycja Szabłowska
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] write performance per disk

2014-07-04 Thread Mark Nelson

On 07/03/2014 08:11 AM, VELARTIS Philipp Dürhammer wrote:

Hi,

I have a ceph cluster setup (with 45 sata disk journal on disks) and get
only 450mb/sec writes seq (maximum playing around with threads in rados
bench) with replica of 2

Which is about ~20Mb writes per disk (what y see in atop also)
theoretically with replica2 and having journals on disk should be 45 X
100mb (sata) / 2 (replica) / 2 (journal writes) which makes it 1125
satas in reality have 120mb/sec so the theoretical output should be more.

I would expect to have between 40-50mb/sec for each sata disk

Can somebody confirm that he can reach this speed with a setup with
journals on the satas (with journals on ssd speed should be 100mb per disk)?
or does ceph only give about ¼ of the speed for a disk? (and not the ½
as expected because of journals)

My setup is 3 servers with: 2 x 2.6ghz xeons, 128gb ram 15 satas for
ceph (and ssds for system) 1 x 10gig for external traffic, 1 x 10gig for
osd traffic
with reads I can saturate the network but writes is far away. And I
would expect at least to saturate the 10gig with sequential writes also


In addition to the advice wido is providing (which I wholeheartedly 
agree with!), you might want to check your controller/disk 
configuration.  If you have journals on the same disks as the data, some 
times putting the disks into single-disk RAID0 LUNs with writeback cache 
enabled can help keep journal and data writes from causing seek 
contention.  This only works if you have a controller with cache and a 
battery though.




Thank you



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multipart upload on ceph 0.8 doesn't work?

2014-07-04 Thread Patrycja Szabłowska
Still not sure do I need the ceph's modified fastcgi or not.
But I guess this explains my problem with the installation:
http://tracker.ceph.com/issues/8233


It would be nice to have at least a workaround for this...

Thanks,

Patrycja Szabłowska



2014-07-04 16:02 GMT+02:00 Patrycja Szabłowska 
szablowska.patry...@gmail.com:

 Thank you Luis for your response.

 Quite unbelievable, but your solution worked!
 Unfortunately, I'm stuck again when trying to upload parts of the file.

 Apache's logs:


 == apache.access.log ==
 127.0.0.1 l - [04/Jul/2014:15:40:41 +0200] PUT /bucketbig/ HTTP/1.1 200
 477 {Referer}i Boto/2.30.0 Python/2.7.6 Linux/3.13.0-30-generic
 127.0.0.1 l - [04/Jul/2014:15:40:41 +0200] POST
 /bucketbig/Bosphorus?uploads HTTP/1.1 200 249 {Referer}i Boto/2.30.0
 Python/2.7.6 Linux/3.13.0-30-generic

 == apache.error.log ==
 [Fri Jul 04 15:40:41.868621 2014] [fastcgi:error] [pid 14199] [client
 127.0.0.1:46571] FastCGI: incomplete headers (0 bytes) received from
 server /home/pszablow/ceph/src/htdocs/rgw.fcgi

 == apache.access.log ==
 127.0.0.1 l - [04/Jul/2014:15:40:41 +0200] PUT
 /bucketbig/Bosphorus?uploadId=2/fURJChPdpUqA3Z1oVLUjT7ROsnxIqZ9partNumber=1
 HTTP/1.1 500 531 {Referer}i Boto/2.30.0 Python/2.7.6
 Linux/3.13.0-30-generic

 == apache.error.log ==
 [Fri Jul 04 15:40:42.571543 2014] [fastcgi:error] [pid 14200]
 (111)Connection refused: [client 127.0.0.1:46572] FastCGI: failed to
 connect to server /home/pszablow/ceph/src/htdocs/rgw.fcgi: connect()
 failed
 [Fri Jul 04 15:40:42.571660 2014] [fastcgi:error] [pid 14200] [client
 127.0.0.1:46572] FastCGI: incomplete headers (0 bytes) received from
 server /home/pszablow/ceph/src/htdocs/rgw.fcgi



 I'm using the default fastcgi module, not the one provided by Ceph. I've
 tried installing it on my ubuntu 14.04, but unfortunately I keep getting
 the error:

 libapache2-mod-fastcgi : requires: apache2.2-common (= 2.2.4)


 Is the modified fastcgi module mandatory in order to use multi part upload?


 Thanks,

 Patrycja Szabłowska


 2014-07-03 18:34 GMT+02:00 Luis Periquito luis.periqu...@ocado.com:

 I was at this issue this morning. It seems radosgw requires you to have a
 pool named '' to work with multipart. I just created a pool with that name
 rados mkpool ''

 either that or allow the pool be created by the radosgw...


 On 3 July 2014 16:27, Patrycja Szabłowska szablowska.patry...@gmail.com
 wrote:

 Hi,

 I'm trying to make multi part upload work. I'm using ceph
 0.80-702-g9bac31b (from the ceph's github).

 I've tried the code provided by Mark Kirkwood here:


 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-October/034940.html


 But unfortunately, it gives me the error:

 (multitest)pszablow@pat-desktop:~/$ python boto_multi.py
   begin upload of abc.yuv
   size 746496, 7 parts
 Traceback (most recent call last):
   File boto_multi.py, line 36, in module
 part = bucket.initiate_multipart_upload(objname)
   File
 /home/pszablow/venvs/multitest/local/lib/python2.7/site-packages/boto/s3/bucket.py,
 line 1742, in initiate_multipart_upload
 response.status, response.reason, body)
 boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
 ?xml version=1.0
 encoding=UTF-8?ErrorCodeAccessDenied/Code/Error


 The single part upload works for me. I am able to create buckets and
 objects.
 I've tried also other similar examples, but none of them works.


 Any ideas what's wrong? Does the ceph's multi part upload actually
 work for anybody?


 Thanks,

 Patrycja Szabłowska
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Luis Periquito

 Unix Engineer

 Ocado.com http://www.ocado.com/

 Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
 Hatfield, Herts AL10 9NE

 Notice:  This email is confidential and may contain copyright material of
 members of the Ocado Group. Opinions and views expressed in this message
 may not necessarily reflect the opinions and views of the members of the
 Ocado Group.

 If you are not the intended recipient, please notify us immediately and
 delete all copies of this message. Please note that it is your
 responsibility to scan this message for viruses.

 References to the “Ocado Group” are to Ocado Group plc (registered in
 England and Wales with number 7098618) and its subsidiary undertakings (as
 that expression is defined in the Companies Act 2006) from time to time.
 The registered office of Ocado Group plc is Titan Court, 3 Bishops Square,
 Hatfield Business Park, Hatfield, Herts. AL10 9NE.




 --
 Pozdrawiam
 Patrycja Szabłowska

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw-agent failed to parse

2014-07-04 Thread Peter
i am having issues running radosgw-agent to sync data between two 
radosgw zones. As far as i can tell both zones are running correctly.


My issue is when i run the radosgw-agent command:


radosgw-agent -v --src-access-key access_key --src-secret-key 
secret_key --dest-access-key access_key --dest-secret-key 
secret_key --src-zone us-master http://us-secondary.example.com:80


i get the following error:

|DEBUG:boto:Using access key provided by client.||
||DEBUG:boto:Using secret key provided by client.||
||DEBUG:boto:StringToSign:||
||GET||
||
||Fri, 04 Jul 2014 15:25:53 GMT||
||/admin/config||
||DEBUG:boto:Signature:||
||AWS EA20YO07DA8JJJX7ZIPJ:WbykwyXu5m5IlbEsBzo8bKEGIzg=||
||DEBUG:boto:url = 
'http://us-secondary.example.comhttp://us-secondary.example.com/admin/config'||

||params={}||
||headers={'Date': 'Fri, 04 Jul 2014 15:25:53 GMT', 'Content-Length': 
'0', 'Authorization': 'AWS 
EA20YO07DA8JJJX7ZIPJ:WbykwyXu5m5IlbEsBzo8bKEGIzg=', 'User-Agent': 
'Boto/2.20.1 Python/2.7.6 Linux/3.13.0-24-generic'}||

||data=None||
||ERROR:root:Could not retrieve region map from destination||
||Traceback (most recent call last):||
||  File /usr/lib/python2.7/dist-packages/radosgw_agent/cli.py, line 
269, in main||

||region_map = client.get_region_map(dest_conn)||
||  File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, 
line 391, in get_region_map||

||region_map = request(connection, 'get', 'admin/config')||
||  File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, 
line 153, in request||
||result = handler(url, params=params, headers=request.headers, 
data=data)||
||  File /usr/lib/python2.7/dist-packages/requests/api.py, line 55, in 
get||

||return request('get', url, **kwargs)||
||  File /usr/lib/python2.7/dist-packages/requests/api.py, line 44, in 
request||

||return session.request(method=method, url=url, **kwargs)||
||  File /usr/lib/python2.7/dist-packages/requests/sessions.py, line 
349, in request||

||prep = self.prepare_request(req)||
||  File /usr/lib/python2.7/dist-packages/requests/sessions.py, line 
287, in prepare_request||

||hooks=merge_hooks(request.hooks, self.hooks),||
||  File /usr/lib/python2.7/dist-packages/requests/models.py, line 
287, in prepare||

||self.prepare_url(url, params)||
||  File /usr/lib/python2.7/dist-packages/requests/models.py, line 
334, in prepare_url||

||scheme, auth, host, port, path, query, fragment = parse_url(url)||
||  File /usr/lib/python2.7/dist-packages/urllib3/util.py, line 390, 
in parse_url||

||raise LocationParseError(Failed to parse: %s % url)||
||LocationParseError: Failed to parse: Failed to parse: 
us-secondary.example.comhttp:



|||Is this a bug? or is my setup wrong? i can navigate to 
http://us-secondary.example.com/admin/config and it correctly outputs 
zone details. at the output above


|DEBUG:boto:url = 
'http://us-secondary.example.comhttp://us-secondary.example.com/admin/config'||



|should the url be repeated like that?

any help would be greatly appreciated

thanks
||
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] error mapping device in firefly

2014-07-04 Thread Ilya Dryomov
On Fri, Jul 4, 2014 at 11:48 AM, Xabier Elkano xelk...@hostinet.com wrote:
 Hi,

 I am trying to map a rbd device in  Ubuntu 14.04 (kernel 3.13.0-30-generic):

 # rbd -p mypool create test1 --size 500

 # rbd -p mypool ls
 test1

 # rbd -p mypool map test1
 rbd: add failed: (5) Input/output error

 and in the syslog:
 Jul  4 09:31:48 testceph kernel: [70503.356842] libceph: mon2
 172.16.64.18:6789 feature set mismatch, my 4a042a42  server's
 2004a042a42, missing 200
 Jul  4 09:31:48 testceph kernel: [70503.356938] libceph: mon2
 172.16.64.18:6789 socket error on read


 my environment:

 cluster version on all MONs and OSDs is 0.80.1
 In the client machine:

 ii  ceph-common 0.80.1-1trusty
 amd64common utilities to mount and interact with a ceph storage
 cluster
 ii  python-ceph 0.80.1-1trusty
 amd64Python libraries for the Ceph distributed filesystem
 ii  librados2   0.80.1-1trusty
 amd64RADOS distributed object store client library

 I think I started getting this error when I switched from tunables
 legacy to optimal after upgrading from 0.72 to 0.80.

Hi Xabier,

You need to do

ceph osd getcrushmap -o /tmp/crush
crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
ceph osd setcrushmap -i /tmp/crush.new

or upgrade your kernel to 3.15.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD recovery problem

2014-07-04 Thread Loic Dachary
For the record here is a summary of what happened : http://dachary.org/?p=3131

On 04/07/2014 15:35, Loic Dachary wrote:
 
 
 On 04/07/2014 15:25, Wido den Hollander wrote:
 On 07/04/2014 03:18 PM, Loic Dachary wrote:
 Hi,

 I extracted a disk with two partitions (journal and data) and copied its 
 content in the hope to restart the OSD and recover its content.

 mount /dev/sdb1 /mnt
 rsync -avH --numeric-ids /mnt/ /var/lib/ceph/osd/ceph-$(cat 
 /mnt/whoami)/

 I think you went wrong there, rsync man page:

 -a, --archive   archive mode; equals -rlptgoD (no -H,-A,-X)
 -X, --xattrspreserve extended attributes

 So you didn't copy over the xattrs, so basically the data is lost/unusable.
 
 Thanks ! Fortunately the original disks are still available ;-)
 
 rm /var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal
 dd if=/dev/sdb2 of=/var/lib/ceph/osd/ceph-$(cat /mnt/whoami)/journal

 and then

 start ceph-osd id=$(cat /mnt/whoami)

 It crashes on https://github.com/ceph/ceph/blob/v0.72.2/src/osd/PG.cc#L2182 
 and before it happens there is

 load_pgs ignoring unrecognized meta

 and the full debug osd = 20 logs are in http://paste.ubuntu.com/7746993/ 
 and this is

 root@bm4202:/etc/ceph# dpkg -l | grep ceph
 ii  ceph0.72.2-1trusty amd64  
 distributed storage and file system
 ii  ceph-common 0.72.2-1trusty amd64  
 common utilities to mount and interact with a ceph storage
 ii  python-ceph 0.72.2-1trusty amd64  
 Python libraries for the Ceph distributed filesystem
 root@bm4202:/etc/ceph# ceph --version
 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

 Cheers



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com