Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-23 Thread Kasper Dieter
On Thu, Sep 18, 2014 at 03:36:48PM +0200, Alexandre DERUMIER wrote:
> >>Have anyone ever testing multi volume performance on a *FULL* SSD setup?
> 
> I known that Stefan Priebe run full ssd clusters in production, and have done 
> benchmark. 
> (Ad far I remember, he have benched  around 20k peak with dumpling)
> 
> >>We are able to get ~18K IOPS for 4K random read on a single volume with fio 
> >>(with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) 
> >>IOPS even with multiple volumes.
> >>Seems the maximum random write performance we can get on the entire cluster 
> >>is quite close to single volume performance.
> Firefly or Giant ?

Seems the max. possible 4k seq-write IOPS you can get is around ~20K IOPS,
independent if 2 or 400 OSDs, independent if SAS or SSD, independent if 3 or 9 
storage nodes.
The CPU is the limiting resource, because of the overhead in the code.

My IO Subsystem would be able to handle 2 Mio IOPS on 4K writes with repli=2
. 9 Storage nodes
. in total  18x P3700 Intel PCIe-SSDs over NVMe (each 150k random write IOPS on 
4K)
. in total 357x SAS 2.5" via 18x LSI MegaRAID-2208
. 10 GbE to the 9 client nodes
. 56GbIB as Cluster interconnect

There was an improvement between 0.80.x and 0.81,
but then the performance droped again ...
(see attachment)

-Dieter


> 
> I'll do benchs with 6 osd dc3500 tomorrow to compare firefly and giant.
> 
> - Mail original -
> 
> De: "Jian Zhang" 
> À: "Sebastien Han" , "Alexandre DERUMIER" 
> 
> Cc: ceph-users@lists.ceph.com
> Envoyé: Jeudi 18 Septembre 2014 08:12:32
> Objet: RE: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
> IOPS
> 
> Have anyone ever testing multi volume performance on a *FULL* SSD setup?
> We are able to get ~18K IOPS for 4K random read on a single volume with fio 
> (with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) 
> IOPS even with multiple volumes.
> Seems the maximum random write performance we can get on the entire cluster 
> is quite close to single volume performance.
> 
> Thanks
> Jian
> 
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Sebastien Han
> Sent: Tuesday, September 16, 2014 9:33 PM
> To: Alexandre DERUMIER
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
> IOPS
> 
> Hi,
> 
> Thanks for keeping us updated on this subject.
> dsync is definitely killing the ssd.
> 
> I don't have much to add, I'm just surprised that you're only getting 5299 
> with 0.85 since I've been able to get 6,4K, well I was using the 200GB model, 
> that might explain this.
> 
> 
> On 12 Sep 2014, at 16:32, Alexandre DERUMIER  wrote:
> 
> > here the results for the intel s3500
> > 
> > max performance is with ceph 0.85 + optracker disabled.
> > intel s3500 don't have d_sync problem like crucial
> >
> > %util show almost 100% for read and write, so maybe the ssd disk 
> > performance is the limit.
> >
> > I have some stec zeusram 8GB in stock (I used them for zfs zil), I'll try 
> > to bench them next week.
> >
> >
> >
> >
> >
> >
> > INTEL s3500
> > ---
> > raw disk
> > 
> >
> > randread: fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k
> > --iodepth=32 --group_reporting --invalidate=0 --name=abc
> > --ioengine=aio bw=288207KB/s, iops=72051
> >
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
> > w_await svctm %util
> > sdb 0,00 0,00 73454,00 0,00 293816,00 0,00 8,00 30,96 0,42 0,42 0,00 0,01 
> > 99,90
> >
> > randwrite: fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=4k
> > --iodepth=32 --group_reporting --invalidate=0 --name=abc --ioengine=aio 
> > --sync=1 bw=48131KB/s, iops=12032
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
> > w_await svctm %util
> > sdb 0,00 0,00 0,00 24120,00 0,00 48240,00 4,00 2,08 0,09 0,00 0,09 0,04 
> > 100,00
> >
> >
> > ceph 0.80
> > -
> > randread: no tuning: bw=24578KB/s, iops=6144
> >
> >
> > randwrite: bw=10358KB/s, iops=2589
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
> > w_await svctm %util
> > sdb 0,00 373,00 0,00 8878,00 0,00 34012,50 7,66 1,63 0,18 0,00 0,18 0,06 
> > 50,90
> >
> >
> > ceph 0.85 :
> > -
> >
> > randread : bw=41406KB/s, iops=10351
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
> > w_await svctm %util
> > sdb 2,00 0,00 10425,00 0,00 41816,00 0,00 8,02 1,36 0,13 0,13 0,00 0,07 
> > 75,90
> >
> > randwrite : bw=17204KB/s, iops=4301
> >
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
> > w_await svctm %util
> > sdb 0,00 333,00 0,00 9788,00 0,00 57909,00 11,83 1,46 0,15 0,00 0,15 0,07 
> > 67,80
> >
> >
> > ceph 0.85 tuning op_tracker=false
> > 
> >
> > randread : bw=86537KB/s, iops=21634
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-

Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state

2014-09-23 Thread Sahana Lokeshappa
Hi All,

Anyone can help me out here.

Sahana Lokeshappa
Test Development Engineer I


From: Varada Kari
Sent: Monday, September 22, 2014 11:52 PM
To: Sage Weil; Sahana Lokeshappa; ceph-us...@ceph.com; 
ceph-commun...@lists.ceph.com
Subject: RE: [Ceph-community] Pgs are in stale+down+peering state

Hi Sage,

To give more context on this problem,

This cluster has two pools rbd and user-created.

Osd.12 is a primary for some other PG’s , but the problem happens for these 
three  PG’s.

$ sudo ceph osd lspools
0 rbd,2 pool1,

$ sudo ceph -s
cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
 health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck 
inactive; 3 pgs stuck stale; 3 pgs stuck unclean; 1 requests are blocked > 32 
sec
monmap e1: 3 mons at 
{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0},
 election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
 osdmap e17842: 64 osds: 64 up, 64 in
  pgmap v79729: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
12504 GB used, 10971 GB / 23476 GB avail
2145 active+clean
   3 stale+down+peering

Snippet from pg dump:

2.a9518 0   0   0   0   2172649472  30013001
active+clean2014-09-22 17:49:35.357586  6826'35762  17842:72706 
[12,7,28]   12  [12,7,28]   12   6826'35762  2014-09-22 
11:33:55.985449  0'0 2014-09-16 20:11:32.693864
0.590   0   0   0   0   0   0   0   
active+clean2014-09-22 17:50:00.751218  0'0 17842:4472  
[12,41,2]   12  [12,41,2]   12  0'0 2014-09-22 16:47:09.315499  
 0'0 2014-09-16 12:20:48.618726
0.4d0   0   0   0   0   0   4   4   
stale+down+peering  2014-09-18 17:51:10.038247  186'4   11134:498   
[12,56,27]  12  [12,56,27]  12  186'42014-09-18 17:30:32.393188 
 0'0 2014-09-16 12:20:48.615322
0.490   0   0   0   0   0   0   0   
stale+down+peering  2014-09-18 17:44:52.681513  0'0 11134:498   
[12,6,25]   12  [12,6,25]   12  0'0  2014-09-18 17:16:12.986658 
 0'0 2014-09-16 12:20:48.614192
0.1c0   0   0   0   0   0   12  12  
stale+down+peering  2014-09-18 17:51:16.735549  186'12  11134:522   
[12,25,23]  12  [12,25,23]  12  186'12   2014-09-18 17:16:04.457863 
 186'10  2014-09-16 14:23:58.731465
2.17510 0   0   0   0   2139095040  30013001
active+clean2014-09-22 17:52:20.364754  6784'30742  17842:72033 
[12,27,23]  12  [12,27,23]  12   6784'30742  2014-09-22 
00:19:39.905291  0'0 2014-09-16 20:11:17.016299
2.7e8   508 0   0   0   0   2130706432  34333433
active+clean2014-09-22 17:52:20.365083  6702'21132  17842:64769 
[12,25,23]  12  [12,25,23]  12   6702'21132  2014-09-22 
17:01:20.546126  0'0 2014-09-16 14:42:32.079187
2.6a5   528 0   0   0   0   2214592512  28402840
active+clean2014-09-22 22:50:38.092084  6775'34416  17842:83221 
[12,58,0]   12  [12,58,0]   12   6775'34416  2014-09-22 
22:50:38.091989  0'0 2014-09-16 20:11:32.703368

And we couldn’t observe and peering events happening on the primary osd.

$ sudo ceph pg 0.49 query
Error ENOENT: i don't have pgid 0.49
$ sudo ceph pg 0.4d query
Error ENOENT: i don't have pgid 0.4d
$ sudo ceph pg 0.1c query
Error ENOENT: i don't have pgid 0.1c

Not able to explain why the peering was stuck. BTW, Rbd pool doesn’t contain 
any data.

Varada

From: Ceph-community [mailto:ceph-community-boun...@lists.ceph.com] On Behalf 
Of Sage Weil
Sent: Monday, September 22, 2014 10:44 PM
To: Sahana Lokeshappa; 
ceph-users@lists.ceph.com; 
ceph-us...@ceph.com; 
ceph-commun...@lists.ceph.com
Subject: Re: [Ceph-community] Pgs are in stale+down+peering state


Stale means that the primary OSD for the PG went down and the status is stale.  
They all seem to be from OSD.12... Seems like something is preventing that OSD 
from reporting to the mon?

sage

On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa 
mailto:sahana.lokesha...@sandisk.com>> wrote:
Hi all,


I used command  ‘ceph osd thrash ‘ command and after all osds are up and in, 3  
pgs are in  stale+down+peering state


sudo ceph -s
cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
 health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck 
inactive; 3 pgs stuck stale; 3 pgs stuck unclean
 monmap e1: 3 mons at 
{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0},
 election epoch 2

Re: [ceph-users] Resetting RGW Federated replication

2014-09-23 Thread Yehuda Sadeh
On Tue, Sep 23, 2014 at 4:54 PM, Craig Lewis  wrote:
> I've had some issues in my secondary cluster.  I'd like to restart
> replication from the beginning, without destroying the data in the secondary
> cluster.
>
> Reading the radosgw-agent and Admin REST API code, I believe I just need to
> stop replication, delete the secondary zone's log_pool, recreate the
> log_pool, and restart replication.
>
> Anybody have any thoughts?  I'm still setting up some VMs to test this,
> before I try it in production.
>
>
>
> Background:
> I'm on Emperor (yeah, still need to upgrade).  I believe I ran into
> http://tracker.ceph.com/issues/7595 .  My read of that patch is that it
> prevents the problem from occurring, but doesn't correct corrupt data.  I
> tried applying some of the suggested patches, but they only ignored the
> error, rather than correcting it.  I finally dropped the corrupt pool.  That
> allowed the stock Emperor binaries to run without crashing.  The pool I
> dropped was my secondary zone's log_pool.
>
> Before I dropped the pool, I copied all of the objects to local disk.  After
> re-creating the pool, I uploaded the objects.
>
> Now replication is kinda of working, but not correctly.  I have a number of
> buckets that are being written to in the primary cluster, but no replication
> is occurring.  radosgw-agent says a number of shards have >= 1000 log
> entries, but then it never processes the buckets in those shards.
>
> Looking back at the pool's contents on local disk, all of the files are 0
> bytes.  So I'm assuming all of the important state was stored in the
> object's metadata.
>
> I'd like to completely zero out the replication state, then exploit a
> feature in radosgw-agent 1.1 that will only replicate the first 1000 objects
> in buckets, if the bucket isn't being actively written to.  Then I can
> restart radosgw-agent 1.2, and let it catch up the active buckets.  That'll
> save me many weeks and TB of replication.
>
> Obviously, I'll compare bucket listings between the two clusters when I'm
> done.  I'll probably try to catch up the read-only bucket's state at a later
> date.
>

I don't really understand what happened here. Maybe start with trying
to understand why the sync agent isn't replicating anymore? Maybe the
replicalog markers are off?


Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-23 Thread Yehuda Sadeh
On Tue, Sep 23, 2014 at 7:23 PM, Robin H. Johnson  wrote:
> On Tue, Sep 23, 2014 at 03:12:53PM -0600, John Nielsen wrote:
>> Keep Cluster A intact and migrate it to your new hardware. You can do
>> this with no downtime, assuming you have enough IOPS to support data
>> migration and normal usage simultaneously. Bring up the new OSDs and
>> let everything rebalance, then remove the old OSDs one at a time.
>> Replace the MONs one at a time. Since you will have the same data on
>> the same cluster (but different hardware), you don't need to worry
>> about mtimes or handling RBD or S3 data at all.
> The B side already has data however, and that's one of the merge
> problems (see below re S3).
>
>> Make sure you have top-level ceph credentials on the new cluster that
>> will work for current users of Cluster B.
>>
>> Use a librbd-aware tool to migrate the RBD volumes from Cluster B onto
>> the new Cluster A. qemu-img comes to mind. This would require downtime
>> for each volume, but not necessarily all at the same time.
> Thanks, qemu-img didn't come to mind as an RBD migration tool.
>
>> Migrate your S3 user accounts from Cluster B to the new Cluster A
>> (should be easily scriptable with e.g. JSON output from
>> radosgw-admin).
> It's fixed now, but didn't used to be possible to create all the various
> keys.
>
>> Check for and resolve S3 bucket name conflicts between Cluster A and
>> ClusterB.
> None.
>
>> Migrate your S3 data from Cluster B to the new Cluster A using an
>> S3-level tool. s3cmd comes to mind.
> s3cmd does not preserve mtimes, ACLs or CORS data; that's the largest
> part of the concern.

You need to setup a second rgw zone, and use the radosgw sync agent to
sync data to the secondary zone. That will preserve mtimes and ACLs.
Once that's complete you could then turn the secondary zone into your
primary.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Mark Kirkwood

On 24/09/14 16:21, Aegeaner wrote:

I have got my ceph OSDs running with keyvalue store now!

Thank Mark! I have been confused for a whole week.



Pleased to hear it! Now you can actually start plying with key value 
store backend.


There are quite a few parameters, not fully documented yet - see 
src/common/config_opts.h for some hints.


regards

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Aegeaner

I have got my ceph OSDs running with keyvalue store now!

Thank Mark! I have been confused for a whole week.


Cheers
Aegeaner


在 2014-09-24 10:46, Mark Kirkwood 写道:

On 24/09/14 14:29, Aegeaner wrote:

I run ceph on Red Hat Enterprise Linux Server 6.4 Santiago, and when I
run "service ceph start" i got:

# service ceph start

ERROR:ceph-disk:Failed to activate
ceph-disk: Does not look like a Ceph OSD, or incompatible version:
/var/lib/ceph/tmp/mnt.I71N5T
mount: /dev/hioa1 already mounted or /var/lib/ceph/tmp/mnt.02sVHj 
busy

ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t',
'xfs', '-o', 'noatime', '--',
'/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd',
'/var/lib/ceph/tmp/mnt.02sVHj']' returned non-zero exit status 32
ceph-disk: Error: One or more partitions failed to activate

Someone told me "service ceph start" still tries to call ceph-disk which
will create a filestore type OSD, and create a journal partition, is it
true?

ls -l /dev/disk/by-parttypeuuid/

lrwxrwxrwx. 1 root root 11 9月  23 16:56
45b0969e-9b03-4f30-b4c6-b4b80ceff106.00dbee5e-fb68-47c4-aa58-924c904c4383
-> ../../hioa2
lrwxrwxrwx. 1 root root 10 9月  23 17:02
45b0969e-9b03-4f30-b4c6-b4b80ceff106.c30e5b97-b914-4eb8-8306-a9649e1c20ba
-> ../../sdb2
lrwxrwxrwx. 1 root root 11 9月  23 16:56
4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd
-> ../../hioa1
lrwxrwxrwx. 1 root root 10 9月  23 17:02
4fbd7e29-9d25-41b8-afd0-062c0ceff05d.b56ec699-e134-4b90-8f55-4952453e1b7e
-> ../../sdb1
lrwxrwxrwx. 1 root root 11 9月  23 16:52
89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be.6d726c93-41f9-453d-858e-ab4132b5c8fd
-> ../../hioa1

There seems to be two hioa1 partitions there, maybe remained from last
time I create the OSD using ceph-deploy osd prepare?



Crap - it is fighting you, yes - looks like the startup script has 
tried to build an osd for you using ceph-disk (which will make two 
partitions by default). So that's toasted the setup that your script did.


Growl - that's made it more complicated for sure.

If you re-run your script you'll blast away the damage that 'service' 
did :-) , and take a look at /etc/init.d/ceph to see why it ignored 
your osd.0 arg (I'm not sure what it expects - maybe just 'osd'). 
Anyway experiment.


You can always start the osd with:

$ sudo ceph-osd -i 0

which bypasses the whole system startup confusion completely :-)

Cheers

Mark




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Mark Kirkwood

On 24/09/14 14:29, Aegeaner wrote:

I run ceph on Red Hat Enterprise Linux Server 6.4 Santiago, and when I
run "service ceph start" i got:

# service ceph start

ERROR:ceph-disk:Failed to activate
ceph-disk: Does not look like a Ceph OSD, or incompatible version:
/var/lib/ceph/tmp/mnt.I71N5T
mount: /dev/hioa1 already mounted or /var/lib/ceph/tmp/mnt.02sVHj busy
ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t',
'xfs', '-o', 'noatime', '--',

'/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd',
'/var/lib/ceph/tmp/mnt.02sVHj']' returned non-zero exit status 32
ceph-disk: Error: One or more partitions failed to activate

Someone told me "service ceph start" still tries to call ceph-disk which
will create a filestore type OSD, and create a journal partition, is it
true?

ls -l /dev/disk/by-parttypeuuid/

lrwxrwxrwx. 1 root root 11 9月  23 16:56
45b0969e-9b03-4f30-b4c6-b4b80ceff106.00dbee5e-fb68-47c4-aa58-924c904c4383
-> ../../hioa2
lrwxrwxrwx. 1 root root 10 9月  23 17:02
45b0969e-9b03-4f30-b4c6-b4b80ceff106.c30e5b97-b914-4eb8-8306-a9649e1c20ba
-> ../../sdb2
lrwxrwxrwx. 1 root root 11 9月  23 16:56
4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd
-> ../../hioa1
lrwxrwxrwx. 1 root root 10 9月  23 17:02
4fbd7e29-9d25-41b8-afd0-062c0ceff05d.b56ec699-e134-4b90-8f55-4952453e1b7e
-> ../../sdb1
lrwxrwxrwx. 1 root root 11 9月  23 16:52
89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be.6d726c93-41f9-453d-858e-ab4132b5c8fd
-> ../../hioa1

There seems to be two hioa1 partitions there, maybe remained from last
time I create the OSD using ceph-deploy osd prepare?



Crap - it is fighting you, yes - looks like the startup script has tried 
to build an osd for you using ceph-disk (which will make two partitions 
by default). So that's toasted the setup that your script did.


Growl - that's made it more complicated for sure.

If you re-run your script you'll blast away the damage that 'service' 
did :-) , and take a look at /etc/init.d/ceph to see why it ignored your 
osd.0 arg (I'm not sure what it expects - maybe just 'osd'). Anyway 
experiment.


You can always start the osd with:

$ sudo ceph-osd -i 0

which bypasses the whole system startup confusion completely :-)

Cheers

Mark


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Aegeaner
After a reboot all the redundant partitions have gone, but after running 
the script I still got:


ERROR:ceph-disk:Failed to activate
ceph-disk: Does not look like a Ceph OSD, or incompatible version: 
/var/lib/ceph/tmp/mnt.SFvU7O

ceph-disk: Error: One or more partitions failed to activate


在 2014-09-24 10:29, Aegeaner 写道:
I run ceph on Red Hat Enterprise Linux Server 6.4 Santiago, and when I 
run "service ceph start" i got:


# service ceph start

ERROR:ceph-disk:Failed to activate
ceph-disk: Does not look like a Ceph OSD, or incompatible version:
/var/lib/ceph/tmp/mnt.I71N5T
mount: /dev/hioa1 already mounted or /var/lib/ceph/tmp/mnt.02sVHj busy
ceph-disk: Mounting filesystem failed: Command '['/bin/mount',
'-t', 'xfs', '-o', 'noatime', '--',

'/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd',
'/var/lib/ceph/tmp/mnt.02sVHj']' returned non-zero exit status 32
ceph-disk: Error: One or more partitions failed to activate

Someone told me "service ceph start" still tries to call ceph-disk 
which will create a filestore type OSD, and create a journal 
partition, is it true?


ls -l /dev/disk/by-parttypeuuid/

lrwxrwxrwx. 1 root root 11 9月  23 16:56
45b0969e-9b03-4f30-b4c6-b4b80ceff106.00dbee5e-fb68-47c4-aa58-924c904c4383
-> ../../hioa2
lrwxrwxrwx. 1 root root 10 9月  23 17:02
45b0969e-9b03-4f30-b4c6-b4b80ceff106.c30e5b97-b914-4eb8-8306-a9649e1c20ba
-> ../../sdb2
lrwxrwxrwx. 1 root root 11 9月  23 16:56
4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd
-> ../../hioa1
lrwxrwxrwx. 1 root root 10 9月  23 17:02
4fbd7e29-9d25-41b8-afd0-062c0ceff05d.b56ec699-e134-4b90-8f55-4952453e1b7e
-> ../../sdb1
lrwxrwxrwx. 1 root root 11 9月  23 16:52
89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be.6d726c93-41f9-453d-858e-ab4132b5c8fd
-> ../../hioa1

There seems to be two hioa1 partitions there, maybe remained from last 
time I create the OSD using ceph-deploy osd prepare?




在 2014-09-24 10:19, Mark Kirkwood 写道:

On 24/09/14 14:07, Aegeaner wrote:

I turned on the debug option, and this is what I got:

# ./kv.sh

removed osd.0
removed item id 0 name 'osd.0' from crush map
0
umount: /var/lib/ceph/osd/ceph-0: not found
updated
add item id 0 name 'osd.0' weight 1 at location
{host=CVM-0-11,root=default} to crush map
meta-data=/dev/hioa  isize=256agcount=4,
agsize=24506368 blks
  =   sectsz=512   attr=2, 
projid32bit=0

data =   bsize=4096 blocks=98025472,
imaxpct=25
  =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096 blocks=47864, version=2
  =   sectsz=512   sunit=0 blks,
lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
2014-09-24 10:02:21.049162 7fe4cf3aa7a0  0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-osd, 
pid 10252

2014-09-24 10:02:21.055433 7fe4cf3aa7a0  1 mkfs in
/var/lib/ceph/osd/ceph-0
2014-09-24 10:02:21.056359 7fe4cf3aa7a0  1 mkfs generated fsid
d613a61d-a1b4-4180-aea2-552944a2f0dc
2014-09-24 10:02:21.061349 7fe4cf3aa7a0  1 keyvaluestore backend
exists/created
2014-09-24 10:02:21.061377 7fe4cf3aa7a0  1 mkfs done in
/var/lib/ceph/osd/ceph-0
2014-09-24 10:02:21.065679 7fe4cf3aa7a0 -1 created object store
/var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal
for osd.0 fsid d90272ca-d8cc-41eb-b525-2cffe734aec0
2014-09-24 10:02:21.065776 7fe4cf3aa7a0 -1 auth: error reading 
file:

/var/lib/ceph/osd/ceph-0/keyring: can't open
/var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2014-09-24 10:02:21.065889 7fe4cf3aa7a0 -1 created new key in
keyring /var/lib/ceph/osd/ceph-0/keyring
added key for osd.0

# ceph osd tree

# idweighttype nameup/downreweight
-11root default
-21host CVM-0-11
01osd.0down0

Also I updated my simple script to create the OSD:

ceph osd rm 0
ceph osd crush rm osd.0
ceph osd create
umount /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
mkdir /var/lib/ceph/osd/ceph-0
ceph auth del osd.0
ceph osd crush add osd.0 1 root=default host=CVM-0-11
mkfs -t xfs -f /dev/hioa
mount  /dev/hioa /var/lib/ceph/osd/ceph-0
ceph-osd --id 0 -d --mkkey --mkfs --osd-data 
/var/lib/ceph/osd/ceph-0

ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-0/keyring
/etc/init.d/ceph start osd.0



From where your log stops at, it would appear that your system start 
script is not even trying to get osd.0 up at all.


Can we see an ls -l of

Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Aegeaner
I run ceph on Red Hat Enterprise Linux Server 6.4 Santiago, and when I 
run "service ceph start" i got:


# service ceph start

   ERROR:ceph-disk:Failed to activate
   ceph-disk: Does not look like a Ceph OSD, or incompatible version:
   /var/lib/ceph/tmp/mnt.I71N5T
   mount: /dev/hioa1 already mounted or /var/lib/ceph/tmp/mnt.02sVHj busy
   ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t',
   'xfs', '-o', 'noatime', '--',
   
'/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd',
   '/var/lib/ceph/tmp/mnt.02sVHj']' returned non-zero exit status 32
   ceph-disk: Error: One or more partitions failed to activate

Someone told me "service ceph start" still tries to call ceph-disk which 
will create a filestore type OSD, and create a journal partition, is it 
true?


ls -l /dev/disk/by-parttypeuuid/

   lrwxrwxrwx. 1 root root 11 9月  23 16:56
   45b0969e-9b03-4f30-b4c6-b4b80ceff106.00dbee5e-fb68-47c4-aa58-924c904c4383
   -> ../../hioa2
   lrwxrwxrwx. 1 root root 10 9月  23 17:02
   45b0969e-9b03-4f30-b4c6-b4b80ceff106.c30e5b97-b914-4eb8-8306-a9649e1c20ba
   -> ../../sdb2
   lrwxrwxrwx. 1 root root 11 9月  23 16:56
   4fbd7e29-9d25-41b8-afd0-062c0ceff05d.6d726c93-41f9-453d-858e-ab4132b5c8fd
   -> ../../hioa1
   lrwxrwxrwx. 1 root root 10 9月  23 17:02
   4fbd7e29-9d25-41b8-afd0-062c0ceff05d.b56ec699-e134-4b90-8f55-4952453e1b7e
   -> ../../sdb1
   lrwxrwxrwx. 1 root root 11 9月  23 16:52
   89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be.6d726c93-41f9-453d-858e-ab4132b5c8fd
   -> ../../hioa1

There seems to be two hioa1 partitions there, maybe remained from last 
time I create the OSD using ceph-deploy osd prepare?




在 2014-09-24 10:19, Mark Kirkwood 写道:

On 24/09/14 14:07, Aegeaner wrote:

I turned on the debug option, and this is what I got:

# ./kv.sh

removed osd.0
removed item id 0 name 'osd.0' from crush map
0
umount: /var/lib/ceph/osd/ceph-0: not found
updated
add item id 0 name 'osd.0' weight 1 at location
{host=CVM-0-11,root=default} to crush map
meta-data=/dev/hioa  isize=256agcount=4,
agsize=24506368 blks
  =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096 blocks=98025472,
imaxpct=25
  =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=47864, 
version=2

  =   sectsz=512   sunit=0 blks,
lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
2014-09-24 10:02:21.049162 7fe4cf3aa7a0  0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-osd, pid 
10252

2014-09-24 10:02:21.055433 7fe4cf3aa7a0  1 mkfs in
/var/lib/ceph/osd/ceph-0
2014-09-24 10:02:21.056359 7fe4cf3aa7a0  1 mkfs generated fsid
d613a61d-a1b4-4180-aea2-552944a2f0dc
2014-09-24 10:02:21.061349 7fe4cf3aa7a0  1 keyvaluestore backend
exists/created
2014-09-24 10:02:21.061377 7fe4cf3aa7a0  1 mkfs done in
/var/lib/ceph/osd/ceph-0
2014-09-24 10:02:21.065679 7fe4cf3aa7a0 -1 created object store
/var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal
for osd.0 fsid d90272ca-d8cc-41eb-b525-2cffe734aec0
2014-09-24 10:02:21.065776 7fe4cf3aa7a0 -1 auth: error reading file:
/var/lib/ceph/osd/ceph-0/keyring: can't open
/var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2014-09-24 10:02:21.065889 7fe4cf3aa7a0 -1 created new key in
keyring /var/lib/ceph/osd/ceph-0/keyring
added key for osd.0

# ceph osd tree

# idweighttype nameup/downreweight
-11root default
-21host CVM-0-11
01osd.0down0

Also I updated my simple script to create the OSD:

ceph osd rm 0
ceph osd crush rm osd.0
ceph osd create
umount /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
mkdir /var/lib/ceph/osd/ceph-0
ceph auth del osd.0
ceph osd crush add osd.0 1 root=default host=CVM-0-11
mkfs -t xfs -f /dev/hioa
mount  /dev/hioa /var/lib/ceph/osd/ceph-0
ceph-osd --id 0 -d --mkkey --mkfs --osd-data 
/var/lib/ceph/osd/ceph-0

ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-0/keyring
/etc/init.d/ceph start osd.0



From where your log stops at, it would appear that your system start 
script is not even trying to get osd.0 up at all.


Can we see an ls -l of /var/lib/ceph/osd/ceph-0?

Also what os are you on? You might need to invoke via:

$ service ceph start

or similar.

Cheers

Mark





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-23 Thread Robin H. Johnson
On Tue, Sep 23, 2014 at 03:12:53PM -0600, John Nielsen wrote:
> Keep Cluster A intact and migrate it to your new hardware. You can do
> this with no downtime, assuming you have enough IOPS to support data
> migration and normal usage simultaneously. Bring up the new OSDs and
> let everything rebalance, then remove the old OSDs one at a time.
> Replace the MONs one at a time. Since you will have the same data on
> the same cluster (but different hardware), you don't need to worry
> about mtimes or handling RBD or S3 data at all.
The B side already has data however, and that's one of the merge
problems (see below re S3).

> Make sure you have top-level ceph credentials on the new cluster that
> will work for current users of Cluster B.
> 
> Use a librbd-aware tool to migrate the RBD volumes from Cluster B onto
> the new Cluster A. qemu-img comes to mind. This would require downtime
> for each volume, but not necessarily all at the same time.
Thanks, qemu-img didn't come to mind as an RBD migration tool.

> Migrate your S3 user accounts from Cluster B to the new Cluster A
> (should be easily scriptable with e.g. JSON output from
> radosgw-admin).
It's fixed now, but didn't used to be possible to create all the various
keys.

> Check for and resolve S3 bucket name conflicts between Cluster A and
> ClusterB.
None.

> Migrate your S3 data from Cluster B to the new Cluster A using an
> S3-level tool. s3cmd comes to mind.
s3cmd does not preserve mtimes, ACLs or CORS data; that's the largest
part of the concern.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Mark Kirkwood

On 24/09/14 14:07, Aegeaner wrote:

I turned on the debug option, and this is what I got:

# ./kv.sh

removed osd.0
removed item id 0 name 'osd.0' from crush map
0
umount: /var/lib/ceph/osd/ceph-0: not found
updated
add item id 0 name 'osd.0' weight 1 at location
{host=CVM-0-11,root=default} to crush map
meta-data=/dev/hioa  isize=256agcount=4,
agsize=24506368 blks
  =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=98025472,
imaxpct=25
  =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=47864, version=2
  =   sectsz=512   sunit=0 blks,
lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
2014-09-24 10:02:21.049162 7fe4cf3aa7a0  0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-osd, pid 10252
2014-09-24 10:02:21.055433 7fe4cf3aa7a0  1 mkfs in
/var/lib/ceph/osd/ceph-0
2014-09-24 10:02:21.056359 7fe4cf3aa7a0  1 mkfs generated fsid
d613a61d-a1b4-4180-aea2-552944a2f0dc
2014-09-24 10:02:21.061349 7fe4cf3aa7a0  1 keyvaluestore backend
exists/created
2014-09-24 10:02:21.061377 7fe4cf3aa7a0  1 mkfs done in
/var/lib/ceph/osd/ceph-0
2014-09-24 10:02:21.065679 7fe4cf3aa7a0 -1 created object store
/var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal
for osd.0 fsid d90272ca-d8cc-41eb-b525-2cffe734aec0
2014-09-24 10:02:21.065776 7fe4cf3aa7a0 -1 auth: error reading file:
/var/lib/ceph/osd/ceph-0/keyring: can't open
/var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2014-09-24 10:02:21.065889 7fe4cf3aa7a0 -1 created new key in
keyring /var/lib/ceph/osd/ceph-0/keyring
added key for osd.0

# ceph osd tree

# idweighttype nameup/downreweight
-11root default
-21host CVM-0-11
01osd.0down0

Also I updated my simple script to create the OSD:

ceph osd rm 0
ceph osd crush rm osd.0
ceph osd create
umount /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
mkdir /var/lib/ceph/osd/ceph-0
ceph auth del osd.0
ceph osd crush add osd.0 1 root=default host=CVM-0-11
mkfs -t xfs -f /dev/hioa
mount  /dev/hioa /var/lib/ceph/osd/ceph-0
ceph-osd --id 0 -d --mkkey --mkfs --osd-data /var/lib/ceph/osd/ceph-0
ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-0/keyring
/etc/init.d/ceph start osd.0



From where your log stops at, it would appear that your system start 
script is not even trying to get osd.0 up at all.


Can we see an ls -l of /var/lib/ceph/osd/ceph-0?

Also what os are you on? You might need to invoke via:

$ service ceph start

or similar.

Cheers

Mark



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Aegeaner

I turned on the debug option, and this is what I got:

# ./kv.sh

   removed osd.0
   removed item id 0 name 'osd.0' from crush map
   0
   umount: /var/lib/ceph/osd/ceph-0: not found
   updated
   add item id 0 name 'osd.0' weight 1 at location
   {host=CVM-0-11,root=default} to crush map
   meta-data=/dev/hioa  isize=256agcount=4,
   agsize=24506368 blks
 =   sectsz=512   attr=2, projid32bit=0
   data =   bsize=4096   blocks=98025472,
   imaxpct=25
 =   sunit=0  swidth=0 blks
   naming   =version 2  bsize=4096   ascii-ci=0
   log  =internal log   bsize=4096   blocks=47864, version=2
 =   sectsz=512   sunit=0 blks,
   lazy-count=1
   realtime =none   extsz=4096   blocks=0, rtextents=0
   2014-09-24 10:02:21.049162 7fe4cf3aa7a0  0 ceph version 0.80.5
   (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-osd, pid 10252
   2014-09-24 10:02:21.055433 7fe4cf3aa7a0  1 mkfs in
   /var/lib/ceph/osd/ceph-0
   2014-09-24 10:02:21.056359 7fe4cf3aa7a0  1 mkfs generated fsid
   d613a61d-a1b4-4180-aea2-552944a2f0dc
   2014-09-24 10:02:21.061349 7fe4cf3aa7a0  1 keyvaluestore backend
   exists/created
   2014-09-24 10:02:21.061377 7fe4cf3aa7a0  1 mkfs done in
   /var/lib/ceph/osd/ceph-0
   2014-09-24 10:02:21.065679 7fe4cf3aa7a0 -1 created object store
   /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal
   for osd.0 fsid d90272ca-d8cc-41eb-b525-2cffe734aec0
   2014-09-24 10:02:21.065776 7fe4cf3aa7a0 -1 auth: error reading file:
   /var/lib/ceph/osd/ceph-0/keyring: can't open
   /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
   2014-09-24 10:02:21.065889 7fe4cf3aa7a0 -1 created new key in
   keyring /var/lib/ceph/osd/ceph-0/keyring
   added key for osd.0

# ceph osd tree

   # idweighttype nameup/downreweight
   -11root default
   -21host CVM-0-11
   01osd.0down0

Also I updated my simple script to create the OSD:

   ceph osd rm 0
   ceph osd crush rm osd.0
   ceph osd create
   umount /var/lib/ceph/osd/ceph-0
   rm -rf /var/lib/ceph/osd/ceph-0
   rm -rf /var/lib/ceph/osd/ceph-0
   mkdir /var/lib/ceph/osd/ceph-0
   ceph auth del osd.0
   ceph osd crush add osd.0 1 root=default host=CVM-0-11
   mkfs -t xfs -f /dev/hioa
   mount  /dev/hioa /var/lib/ceph/osd/ceph-0
   ceph-osd --id 0 -d --mkkey --mkfs --osd-data /var/lib/ceph/osd/ceph-0
   ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i
   /var/lib/ceph/osd/ceph-0/keyring
   /etc/init.d/ceph start osd.0

===
Best Wishes
Aegeaner


在 2014-09-23 16:24, Aegeaner 写道:

This is my /var/log/ceph/ceph-osd.0.log :

2014-09-23 15:38:14.040699 7fbaccb1e7a0  0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-osd, pid 9764
2014-09-23 15:38:14.045192 7fbaccb1e7a0  1 mkfs in
/var/lib/ceph/osd/ceph-0
2014-09-23 15:38:14.046127 7fbaccb1e7a0  1 mkfs generated fsid
f72f9a73-60cf-41f2-976b-7ad144bb8714
2014-09-23 15:38:14.050033 7fbaccb1e7a0  1 keyvaluestore backend
exists/created
2014-09-23 15:38:14.050089 7fbaccb1e7a0  1 mkfs done in
/var/lib/ceph/osd/ceph-0
2014-09-23 15:38:14.056912 7fbaccb1e7a0 -1 created object store
/var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal
for osd.0 fsid f55cbcc0-4a91-40cd-8ef8-a1b2c8a09650
2014-09-23 15:38:14.056983 7fbaccb1e7a0 -1 auth: error reading
file: /var/lib/ceph/osd/ceph-0/keyring: can't open
/var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2014-09-23 15:38:14.057113 7fbaccb1e7a0 -1 created new key in
keyring /var/lib/ceph/osd/ceph-0/keyring




在 2014-09-23 15:26, Mark Kirkwood 写道:

On 23/09/14 18:22, Aegeaner wrote:

Now I use the following script to create key/value backended OSD, but
the OSD is created down and never go up.

ceph osd create
umount /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
mkdir /var/lib/ceph/osd/ceph-0
ceph osd crush add osd.0 1 root=default host=CVM-0-11
mkfs -t xfs -f /dev/hioa
mount  /dev/hioa /var/lib/ceph/osd/ceph-0
ceph-osd --id 0 --mkkey --mkfs --osd-data /var/lib/ceph/osd/ceph-0
/etc/init.d/ceph start osd.0


Anything goes wrong?



Hmmm - trying out a variant of the above (attached) seems to work for 
me. I think we need to see your osd log (in /var/log/ceph).


Cheers

Mark




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state

2014-09-23 Thread Craig Lewis
Is osd.12  doing anything strange?  Is it consuming lots of CPU or IO?  Is
it flapping?   Writing any interesting logs?  Have you tried restarting it?

If that doesn't help, try the other involved osds: 56, 27, 6, 25, 23.  I
doubt that it will help, but it won't hurt.



On Mon, Sep 22, 2014 at 11:21 AM, Varada Kari 
wrote:

>  Hi Sage,
>
>
>
> To give more context on this problem,
>
>
>
> This cluster has two pools rbd and user-created.
>
>
>
> Osd.12 is a primary for some other PG’s , but the problem happens for
> these three  PG’s.
>
>
>
> $ sudo ceph osd lspools
>
> 0 rbd,2 pool1,
>
>
>
> $ sudo ceph -s
>
> cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
>
>  health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs
> stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean; 1 requests are
> blocked > 32 sec
>
> monmap e1: 3 mons at {rack2-ram-1=
> 10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0},
> election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
>
>  osdmap e17842: 64 osds: 64 up, 64 in
>
>   pgmap v79729: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
>
> 12504 GB used, 10971 GB / 23476 GB avail
>
> 2145 active+clean
>
>3 stale+down+peering
>
>
>
> Snippet from pg dump:
>
>
>
> 2.a9518 0   0   0   0   2172649472  3001
> 3001active+clean2014-09-22 17:49:35.357586  6826'35762
> 17842:72706 [12,7,28]   12  [12,7,28]   12
> 6826'35762  2014-09-22 11:33:55.985449  0'0 2014-09-16
> 20:11:32.693864
>
> 0.590   0   0   0   0   0   0   0
> active+clean2014-09-22 17:50:00.751218  0'0 17842:4472
> [12,41,2]   12  [12,41,2]   12  0'0 2014-09-22
> 16:47:09.315499   0'0 2014-09-16 12:20:48.618726
>
> 0.4d0   0   0   0   0   0   4   4
> stale+down+peering  2014-09-18 17:51:10.038247  186'4
> 11134:498   [12,56,27]  12  [12,56,27]  12  186'4
> 2014-09-18 17:30:32.393188  0'0 2014-09-16 12:20:48.615322
>
> 0.490   0   0   0   0   0   0   0
> stale+down+peering  2014-09-18 17:44:52.681513  0'0
> 11134:498   [12,6,25]   12  [12,6,25]   12  0'0
>  2014-09-18 17:16:12.986658  0'0 2014-09-16 12:20:48.614192
>
> 0.1c0   0   0   0   0   0   12  12
> stale+down+peering  2014-09-18 17:51:16.735549  186'12
> 11134:522   [12,25,23]  12  [12,25,23]  12  186'12
> 2014-09-18 17:16:04.457863  186'10  2014-09-16 14:23:58.731465
>
> 2.17510 0   0   0   0   2139095040  3001
> 3001active+clean2014-09-22 17:52:20.364754  6784'30742
> 17842:72033 [12,27,23]  12  [12,27,23]  12
> 6784'30742  2014-09-22 00:19:39.905291  0'0 2014-09-16
> 20:11:17.016299
>
> 2.7e8   508 0   0   0   0   2130706432  3433
> 3433active+clean2014-09-22 17:52:20.365083  6702'21132
> 17842:64769 [12,25,23]  12  [12,25,23]  12
> 6702'21132  2014-09-22 17:01:20.546126  0'0 2014-09-16
> 14:42:32.079187
>
> 2.6a5   528 0   0   0   0   2214592512  2840
> 2840active+clean2014-09-22 22:50:38.092084  6775'34416
> 17842:83221 [12,58,0]   12  [12,58,0]   12
> 6775'34416  2014-09-22 22:50:38.091989  0'0 2014-09-16
> 20:11:32.703368
>
>
>
> And we couldn’t observe and peering events happening on the primary osd.
>
>
>
> $ sudo ceph pg 0.49 query
>
> Error ENOENT: i don't have pgid 0.49
>
> $ sudo ceph pg 0.4d query
>
> Error ENOENT: i don't have pgid 0.4d
>
> $ sudo ceph pg 0.1c query
>
> Error ENOENT: i don't have pgid 0.1c
>
>
>
> Not able to explain why the peering was stuck. BTW, Rbd pool doesn’t
> contain any data.
>
>
>
> Varada
>
>
>
> *From:* Ceph-community [mailto:ceph-community-boun...@lists.ceph.com] *On
> Behalf Of *Sage Weil
> *Sent:* Monday, September 22, 2014 10:44 PM
> *To:* Sahana Lokeshappa; ceph-users@lists.ceph.com; ceph-us...@ceph.com;
> ceph-commun...@lists.ceph.com
> *Subject:* Re: [Ceph-community] Pgs are in stale+down+peering state
>
>
>
> Stale means that the primary OSD for the PG went down and the status is
> stale.  They all seem to be from OSD.12... Seems like something is
> preventing that OSD from reporting to the mon?
>
> sage
>
>
>
> On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa <
> sahana.lokesha...@sandisk.com> wrote:
>
> Hi all,
>
>
>
> I used command  ‘ceph osd thrash ‘ command and after all osds are up and
> in, 3  pgs are in  stale+down+peering state
>
>
>
> sudo ceph -s
>
> cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
>
>  health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs
> stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean
>
>  monmap e1: 3 mons at {rack2-ram-1=
> 10.2

[ceph-users] Resetting RGW Federated replication

2014-09-23 Thread Craig Lewis
I've had some issues in my secondary cluster.  I'd like to restart
replication from the beginning, without destroying the data in the
secondary cluster.

Reading the radosgw-agent and Admin REST API code, I believe I just need to
stop replication, delete the secondary zone's log_pool, recreate the
log_pool, and restart replication.

Anybody have any thoughts?  I'm still setting up some VMs to test this,
before I try it in production.



Background:
I'm on Emperor (yeah, still need to upgrade).  I believe I ran into
http://tracker.ceph.com/issues/7595 .  My read of that patch is that it
prevents the problem from occurring, but doesn't correct corrupt data.  I
tried applying some of the suggested patches, but they only ignored the
error, rather than correcting it.  I finally dropped the corrupt pool.
That allowed the stock Emperor binaries to run without crashing.  The pool
I dropped was my secondary zone's log_pool.

Before I dropped the pool, I copied all of the objects to local disk.
After re-creating the pool, I uploaded the objects.

Now replication is kinda of working, but not correctly.  I have a number of
buckets that are being written to in the primary cluster, but no
replication is occurring.  radosgw-agent says a number of shards have >=
1000 log entries, but then it never processes the buckets in those shards.

Looking back at the pool's contents on local disk, all of the files are 0
bytes.  So I'm assuming all of the important state was stored in the
object's metadata.

I'd like to completely zero out the replication state, then exploit a
feature in radosgw-agent 1.1 that will only replicate the first 1000
objects in buckets, if the bucket isn't being actively written to.  Then I
can restart radosgw-agent 1.2, and let it catch up the active buckets.
That'll save me many weeks and TB of replication.

Obviously, I'll compare bucket listings between the two clusters when I'm
done.  I'll probably try to catch up the read-only bucket's state at a
later date.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any way to remove possible orphaned files in a federated gateway configuration

2014-09-23 Thread Yehuda Sadeh
On Tue, Sep 23, 2014 at 3:05 PM, Lyn Mitchell  wrote:
> Is anyone aware of a way to either reconcile or remove possible orphaned
> “shadow” files in a federated gateway configuration?  The issue we’re seeing
> is the number of chunks/shadow files on the slave has many more “shadow”
> files than the master, the breakdown is as follows:
>
> master zone:
>
> .region-1.zone-1.rgw.buckets = 1737 “shadow” files of which there are 10
> distinct sets of tags, an example of 1 distinct set is:
>
> alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_1 through
> alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_516
>
>
>
> slave zone:
>
> .region-1.zone-2.rgw.buckets = 331961 “shadow” files, of which there are 652
> distinct sets of  tags, examples:
>
> 1 set having 516 “shadow” files:
>
> alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_1 through
> alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_516
>
>
>
> 236 sets having 515 “shadow” files apiece:
>
> alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_1 through
> alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_515
>
> alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_1 through
> alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_515

These are all part of the same bucket (prefixed by alph-1.80907.1).

>
> ….
>
>
>
> The number of shadow files in zone-2 is taking quite a bit of space from the
> OSD’s in the cluster.   Without being able to trace back to the original
> file name from an s3 or rados tag, I have no way of knowing which files
> these are.  Is it possible that the same file may have been replicated
> multiple times, due to network or connectivity issues?
>
>
>
> I can provide any logs or other information that may provide some help,
> however at this point we’re not seeing any real errors.
>
>
>
> Thanks in advance for any help that can be provided,

You can also run the following command on the existing objects within
that specific bucket:

$ radosgw-admin object stat --bucket= --object=

This will show the mapping from the rgw object to the rados objects
that construct it.


Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Any way to remove possible orphaned files in a federated gateway configuration

2014-09-23 Thread Lyn Mitchell
Is anyone aware of a way to either reconcile or remove possible orphaned
"shadow" files in a federated gateway configuration?  The issue we're seeing
is the number of chunks/shadow files on the slave has many more "shadow"
files than the master, the breakdown is as follows:

master zone:

.region-1.zone-1.rgw.buckets = 1737 "shadow" files of which there are 10
distinct sets of tags, an example of 1 distinct set is:

alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_1 through
alph-1.80907.1__shadow_.VTZYW5ubV53wCHAKcnGwrD_yGkyGDuG_516

 

slave zone:

.region-1.zone-2.rgw.buckets = 331961 "shadow" files, of which there are 652
distinct sets of  tags, examples:

1 set having 516 "shadow" files:

alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_1 through
alph-1.80907.1__shadow_.yPT037fjWhTi_UtHWSYPcRWBanaN9Oy_516

 

236 sets having 515 "shadow" files apiece:

alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_1 through
alph-1.80907.1__shadow_.RA9KCc_U5T9kBN_ggCUx8VLJk36RSiw_515

alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_1 through
alph-1.80907.1__shadow_.aUWuanLbJD5vbBSD90NWwjkuCxQmvbQ_515

..

 

The number of shadow files in zone-2 is taking quite a bit of space from the
OSD's in the cluster.   Without being able to trace back to the original
file name from an s3 or rados tag, I have no way of knowing which files
these are.  Is it possible that the same file may have been replicated
multiple times, due to network or connectivity issues?

 

I can provide any logs or other information that may provide some help,
however at this point we're not seeing any real errors.

 

Thanks in advance for any help that can be provided,

MLM

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-23 Thread John Nielsen
I would:

Keep Cluster A intact and migrate it to your new hardware. You can do this with 
no downtime, assuming you have enough IOPS to support data migration and normal 
usage simultaneously. Bring up the new OSDs and let everything rebalance, then 
remove the old OSDs one at a time. Replace the MONs one at a time. Since you 
will have the same data on the same cluster (but different hardware), you don't 
need to worry about mtimes or handling RBD or S3 data at all.

Make sure you have top-level ceph credentials on the new cluster that will work 
for current users of Cluster B.

Use a librbd-aware tool to migrate the RBD volumes from Cluster B onto the new 
Cluster A. qemu-img comes to mind. This would require downtime for each volume, 
but not necessarily all at the same time.

Migrate your S3 user accounts from Cluster B to the new Cluster A (should be 
easily scriptable with e.g. JSON output from radosgw-admin).

Check for and resolve S3 bucket name conflicts between Cluster A and ClusterB.

Migrate your S3 data from Cluster B to the new Cluster A using an S3-level 
tool. s3cmd comes to mind.

Fine-tuning and automating the above is left as an exercise for the reader, but 
it should all be possible with built-in and/or commodity tools.

On Sep 20, 2014, at 11:15 PM, Robin H. Johnson  wrote:

> For a variety of reasons, none good anymore, we have two separate Ceph
> clusters.
> 
> I would like to merge them onto the newer hardware, with as little
> downtime and data loss as possible; then discard the old hardware.
> 
> Cluster A (2 hosts):
> - 3TB of S3 content, >100k files, file mtimes important
> - <500GB of RBD volumes, exported via iscsi
> 
> Cluster B (4 hosts):
> - <50GiB of S3 content
> - 7TB of RBD volumes, exported via iscsi
> 
> Short of finding somewhere to dump all of the data from one side, and
> re-importing it after merging with that cluster as empty; are there any
> other alternatives available to me?
> 
> -- 
> Robin Hugh Johnson
> Gentoo Linux: Developer, Infrastructure Lead
> E-Mail : robb...@gentoo.org
> GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-23 Thread Mikaël Cluseau

On 09/22/2014 05:17 AM, Robin H. Johnson wrote:

Can somebody else make comments about migrating S3 buckets with
preserved mtime data (and all of the ACLs & CORS) then?


I don't know how radosgw objects are stored, but have you considered a 
lower level rados export/import ?


IMPORT AND EXPORT
   import [options]  
   Upload  to 
   export [options] rados-pool> 
   Download  to 
   options:
   -f / --force Copy everything, even if it hasn't 
changed.
   -d / --delete-after  After synchronizing, delete 
unreferenced

files or objects from the target bucket
or directory.
   --workersNumber of worker threads to spawn
(default 5)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reassigning admin server

2014-09-23 Thread Gregory Farnum
On Mon, Sep 22, 2014 at 1:22 PM, LaBarre, James  (CTR)  A6IT
 wrote:
> If I have a machine/VM I am using as an Admin node for a ceph cluster, can I
> relocate that admin to another machine/VM after I’ve built a cluster?  I
> would expect as the Admin isn’t an actual operating part of the cluster
> itself (other than Calamari, if it happens to be running) the rest of the
> cluster should be adequately served with a –update-conf.

The admin node really just has the default ceph.conf and the keyrings
for admin access to your cluster. You just need to copy that data to
whatever other node(s) you want; there's no updating to do for the
rest of the cluster.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-23 Thread Sebastien Han
What about writes with Giant?

On 18 Sep 2014, at 08:12, Zhang, Jian  wrote:

> Have anyone ever testing multi volume performance on a *FULL* SSD setup?
> We are able to get ~18K IOPS for 4K random read on a single volume with fio 
> (with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) 
> IOPS even with multiple volumes. 
> Seems the maximum random write performance we can get on the entire cluster 
> is quite close to single volume performance. 
> 
> Thanks
> Jian
> 
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Sebastien Han
> Sent: Tuesday, September 16, 2014 9:33 PM
> To: Alexandre DERUMIER
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
> IOPS
> 
> Hi,
> 
> Thanks for keeping us updated on this subject.
> dsync is definitely killing the ssd.
> 
> I don't have much to add, I'm just surprised that you're only getting 5299 
> with 0.85 since I've been able to get 6,4K, well I was using the 200GB model, 
> that might explain this.
> 
> 
> On 12 Sep 2014, at 16:32, Alexandre DERUMIER  wrote:
> 
>> here the results for the intel s3500
>> 
>> max performance is with ceph 0.85 + optracker disabled.
>> intel s3500 don't have d_sync problem like crucial
>> 
>> %util show almost 100% for read and write, so maybe the ssd disk performance 
>> is the limit.
>> 
>> I have some stec zeusram 8GB in stock (I used them for zfs zil), I'll try to 
>> bench them next week.
>> 
>> 
>> 
>> 
>> 
>> 
>> INTEL s3500
>> ---
>> raw disk
>> 
>> 
>> randread: fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k 
>> --iodepth=32 --group_reporting --invalidate=0 --name=abc 
>> --ioengine=aio bw=288207KB/s, iops=72051
>> 
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sdb   0,00 0,00 73454,000,00 293816,00 0,00 8,00 
>>30,960,420,420,00   0,01  99,90
>> 
>> randwrite: fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=4k 
>> --iodepth=32 --group_reporting --invalidate=0 --name=abc --ioengine=aio 
>> --sync=1 bw=48131KB/s, iops=12032
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sdb   0,00 0,000,00 24120,00 0,00 48240,00 4,00  
>>2,080,090,000,09   0,04 100,00
>> 
>> 
>> ceph 0.80
>> -
>> randread: no tuning:  bw=24578KB/s, iops=6144
>> 
>> 
>> randwrite: bw=10358KB/s, iops=2589
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sdb   0,00   373,000,00 8878,00 0,00 34012,50 7,66   
>>   1,630,180,000,18   0,06  50,90
>> 
>> 
>> ceph 0.85 :
>> -
>> 
>> randread :  bw=41406KB/s, iops=10351
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sdb   2,00 0,00 10425,000,00 41816,00 0,00 8,02  
>>1,360,130,130,00   0,07  75,90
>> 
>> randwrite : bw=17204KB/s, iops=4301
>> 
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sdb   0,00   333,000,00 9788,00 0,00 57909,0011,83   
>>   1,460,150,000,15   0,07  67,80
>> 
>> 
>> ceph 0.85 tuning op_tracker=false
>> 
>> 
>> randread :  bw=86537KB/s, iops=21634
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sdb  25,00 0,00 21428,000,00 86444,00 0,00 8,07  
>>3,130,150,150,00   0,05  98,00
>> 
>> randwrite:  bw=21199KB/s, iops=5299
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sdb   0,00  1563,000,00 9880,00 0,00 75223,5015,23   
>>   2,090,210,000,21   0,07  80,00
>> 
>> 
>> - Mail original -
>> 
>> De: "Alexandre DERUMIER" 
>> À: "Cedric Lemarchand" 
>> Cc: ceph-users@lists.ceph.com
>> Envoyé: Vendredi 12 Septembre 2014 08:15:08
>> Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 
>> 3, 2K IOPS
>> 
>> results of fio on rbd with kernel patch
>> 
>> 
>> 
>> fio rbd crucial m550 1 osd 0.85 (osd_enable_op_tracker true or false, same 
>> result):
>> ---
>> bw=12327KB/s, iops=3081
>> 
>> So no much better than before, but this time, iostat show only 15% 
>> utils, and latencies are lower
>> 
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await 
>> r_await w_await svctm %util sdb 0,00 29,00 0,00 3075,00 0,00 36748,50 
>> 23,90 0,29 0,10 0,00 0,10 0,05

Re: [ceph-users] [Ceph-community] Can't Start-up MDS

2014-09-23 Thread Shun-Fa Yang
hi Gregory,

Thanks for your response.

I'm installed ceph v80.5 on a single node, and my mds status always be
"creating".

The output of "ceph -s" as following:

root@ubuntu165:~# ceph -s
cluster 3cd658c3-34ca-43f3-93c7-786e5162e412
 health HEALTH_WARN 200 pgs incomplete; 200 pgs stuck inactive; 200 pgs
stuck unclean; 50 requests are blocked > 32 sec
 monmap e1: 1 mons at {ubuntu165=10.62.170.165:6789/0}, election epoch
1, quorum 0 ubuntu165
 mdsmap e19: 1/1/1 up {0=ubuntu165=up:creating}
 osdmap e32: 1 osds: 1 up, 1 in
  pgmap v64: 200 pgs, 4 pools, 0 bytes data, 0 objects
1059 MB used, 7448 GB / 7449 GB avail
 200 creating+incomplete
root@ubuntu165:~# ceph -v
ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)


thanks.

2014-09-18 1:22 GMT+08:00 Gregory Farnum :

> That looks like the beginning of an mds creation to me. What's your
> problem in more detail, and what's the output of "ceph -s"?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Mon, Sep 15, 2014 at 5:34 PM, Shun-Fa Yang  wrote:
> > Hi all,
> >
> > I'm installed ceph v 0.80.5 on Ubuntu 14.04 server version by using
> > apt-get...
> >
> > The log of mds shows as following:
> >
> > 2014-09-15 17:24:58.291305 7fd6f6d47800  0 ceph version 0.80.5
> > (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 10487
> >
> > 2014-09-15 17:24:58.302164 7fd6f6d47800 -1 mds.-1.0 *** no OSDs are up
> as of
> > epoch 8, waiting
> >
> > 2014-09-15 17:25:08.302930 7fd6f6d47800 -1 mds.-1.-1 *** no OSDs are up
> as
> > of epoch 8, waiting
> >
> > 2014-09-15 17:25:19.322092 7fd6f1938700  1 mds.-1.0 handle_mds_map
> standby
> >
> > 2014-09-15 17:25:19.325024 7fd6f1938700  1 mds.0.3 handle_mds_map i am
> now
> > mds.0.3
> >
> > 2014-09-15 17:25:19.325026 7fd6f1938700  1 mds.0.3 handle_mds_map state
> > change up:standby --> up:creating
> >
> > 2014-09-15 17:25:19.325196 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:1
> >
> > 2014-09-15 17:25:19.325377 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:100
> >
> > 2014-09-15 17:25:19.325381 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:600
> >
> > 2014-09-15 17:25:19.325449 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:601
> >
> > 2014-09-15 17:25:19.325489 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:602
> >
> > 2014-09-15 17:25:19.325538 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:603
> >
> > 2014-09-15 17:25:19.325564 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:604
> >
> > 2014-09-15 17:25:19.325603 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:605
> >
> > 2014-09-15 17:25:19.325627 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:606
> >
> > 2014-09-15 17:25:19.325655 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:607
> >
> > 2014-09-15 17:25:19.325682 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:608
> >
> > 2014-09-15 17:25:19.325714 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:609
> >
> > 2014-09-15 17:25:19.325738 7fd6f1938700  0 mds.0.cache creating system
> inode
> > with ino:200
> >
> > Could someone tell me how to solve it?
> >
> > Thanks.
> >
> > --
> > 楊順發(yang shun-fa)
> >
> > ___
> > Ceph-community mailing list
> > ceph-commun...@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
> >
>



-- 
楊順發(yang shun-fa)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Repetitive replication occuring in slave zone causing OSD's to fill

2014-09-23 Thread Lyn Mitchell
I'm currently running 2 ceph clusters (ver. 0.80.1) which are providing
secondary storage for CloudPlatform.  Each cluster resides in a different
datacenter and our federated gateway consists of a region (us-east-1) with 2
zones (zone-a [master], zone-b [slave]). Objects appear to be
replicating/syncing from zone-a to zone-b as expected, meaning the objects
appear in both zones with the same size and checksum using an s3 client to
view.  We've recently run into an issue where an object is replicated to
zone-b, the object appears to be complete, yet the
.us-east-1.zone-b.rgw.buckets pool continues to fill with shadow files for
this object. We noticed the osd's were being consumed rather quickly, and
while troubleshooting we found 230+ unique TAGS for the object (i.e.
TAG_shadow_1 through TAG_shadow_515). Has anyone seen this behavior or have
any idea what may have caused it.

 

Thanks in advance for any help that may be provided,

MLM

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-23 Thread Ilya Dryomov
On Fri, Sep 19, 2014 at 11:22 AM, Micha Krause  wrote:
> Hi,
>
>> I have build an NFS Server based on Sebastiens Blog Post here:
>>
>> http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/
>>
>> Im using Kernel 3.14-0.bpo.1-amd64 on Debian wheezy, the host is a VM on
>> Vmware.
>>
>> Using rsync im writing data via nfs from one client to this Server.
>>
>> The NFS Server crashes multiple times per day, I can't even login to the
>> Server then.
>> After a reset, there is no kernel log about the crash, so I guess
>> something is blocking
>> all I/Os.
>
>
> Ok, it seems that I just can't get a shell, but I can run commands via ssh
> directly.

So does it actually crash or it's just the blocked I/Os?  If it doesn't
crash, you should be able to get everything off dmesg.

>
> I was able to get the following informations:
>
> dmesg:
>
> [18102.981064] INFO: task nfsd:2769 blocked for more than 120 seconds.
> [18102.981112]   Not tainted 3.14-0.bpo.1-amd64 #1
> [18102.981150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [18102.981216] nfsdD 88003fc14340 0  2769  2
> 0x
> [18102.981218]  88003bac6e20 0046 
> 88003d47ada0
> [18102.981219]  00014340 88003ce31fd8 00014340
> 88003bac6e20
> [18102.981221]  88003ce31728 8800029539f0 7fff
> 7fff
> [18102.981223] Call Trace:
> [18102.981225]  [] ? schedule_timeout+0x1ed/0x250
> [18102.981231]  [] ? _xfs_buf_find+0xd2/0x280 [xfs]
> [18102.981234]  [] ? kmem_cache_alloc+0x1bc/0x1f0
> [18102.981236]  [] ? __down_common+0x97/0xea
> [18102.981241]  [] ? _xfs_buf_find+0xea/0x280 [xfs]
> [18102.981243]  [] ? down+0x37/0x40
> [18102.981247]  [] ? xfs_buf_lock+0x32/0xf0 [xfs]
> [18102.981252]  [] ? _xfs_buf_find+0xea/0x280 [xfs]
> [18102.981257]  [] ? xfs_buf_get_map+0x35/0x1a0 [xfs]
> [18102.981263]  [] ? xfs_buf_read_map+0x33/0x130 [xfs]
> [18102.981269]  [] ? xfs_trans_read_buf_map+0x34a/0x4f0
> [xfs]
> [18102.981275]  [] ? xfs_imap_to_bp+0x69/0xf0 [xfs]
> [18102.981281]  [] ? xfs_iread+0x7d/0x3f0 [xfs]
> [18102.981284]  [] ? make_kgid+0x9/0x10
> [18102.981286]  [] ? inode_init_always+0x10e/0x1d0
> [18102.981292]  [] ? xfs_iget+0x2ba/0x810 [xfs]
> [18102.981298]  [] ? xfs_ialloc+0xe6/0x740 [xfs]
> [18102.981305]  [] ? kmem_zone_alloc+0x6e/0xf0 [xfs]
> [18102.981311]  [] ? xfs_dir_ialloc+0x83/0x300 [xfs]
> [18102.981317]  [] ? xfs_trans_reserve+0x213/0x220 [xfs]
> [18102.981323]  [] ? xfs_create+0x4fe/0x720 [xfs]
> [18102.981329]  [] ? xfs_vn_mknod+0xd2/0x200 [xfs]
> [18102.981331]  [] ? vfs_create+0xe4/0x160
> [18102.981335]  [] ? do_nfsd_create+0x53e/0x610 [nfsd]
> [18102.981339]  [] ? nfsd3_proc_create+0x16d/0x250 [nfsd]
> [18102.981342]  [] ? nfsd_dispatch+0xe4/0x230 [nfsd]
> [18102.981347]  [] ? svc_process_common+0x354/0x690
> [sunrpc]
> [18102.981349]  [] ? try_to_wake_up+0x280/0x280
> [18102.981353]  [] ? svc_process+0x10b/0x160 [sunrpc]
> [18102.981359]  [] ? nfsd+0xb7/0x130 [nfsd]
> [18102.981363]  [] ? nfsd_destroy+0x70/0x70 [nfsd]
> [18102.981365]  [] ? kthread+0xbc/0xe0
> [18102.981367]  [] ? flush_kthread_worker+0xa0/0xa0
> [18102.981369]  [] ? ret_from_fork+0x7c/0xb0
> [18102.981371]  [] ? flush_kthread_worker+0xa0/0xa0

Is that the only hung task in dmesg?

>
> iostat:
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>0.000.001.00   99.000.000.00
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> dm-0  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> dm-1  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> dm-2  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> dm-3  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> dm-4  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> rbd0  0.00 0.000.000.00 0.00 0.00 0.00
> 46.000.000.000.00   0.00 100.00
> rbd1  0.00 0.000.000.00 0.00 0.00 0.00
> 12.000.000.000.00   0.00 100.00
> rbd2  0.00 0.000.000.00 0.00 0.00 0.00
> 136.000.000.000.00   0.00 100.00
> rbd3  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> rbd4  0.00 0.000.000.00 0.00 0.00 0.00
> 11.000.000.000.00   0.00 100.00
> rbd5  0.00 0.000.000.00 0.00 0.00 0.

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-23 Thread Micha Krause

bump

I have observed this crash on ubuntu with kernel 3.13 and centos with 3.16 as 
well now.
rbd hangs, and iostat shows something similar to the Output below.


Micha Krause

Am 19.09.2014 um 09:22 schrieb Micha Krause:

Hi,

 > I have build an NFS Server based on Sebastiens Blog Post here:

http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/

Im using Kernel 3.14-0.bpo.1-amd64 on Debian wheezy, the host is a VM on Vmware.

Using rsync im writing data via nfs from one client to this Server.

The NFS Server crashes multiple times per day, I can't even login to the Server 
then.
After a reset, there is no kernel log about the crash, so I guess something is 
blocking
all I/Os.


Ok, it seems that I just can't get a shell, but I can run commands via ssh 
directly.

I was able to get the following informations:

dmesg:

[18102.981064] INFO: task nfsd:2769 blocked for more than 120 seconds.
[18102.981112]   Not tainted 3.14-0.bpo.1-amd64 #1
[18102.981150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[18102.981216] nfsdD 88003fc14340 0  2769  2 0x
[18102.981218]  88003bac6e20 0046  
88003d47ada0
[18102.981219]  00014340 88003ce31fd8 00014340 
88003bac6e20
[18102.981221]  88003ce31728 8800029539f0 7fff 
7fff
[18102.981223] Call Trace:
[18102.981225]  [] ? schedule_timeout+0x1ed/0x250
[18102.981231]  [] ? _xfs_buf_find+0xd2/0x280 [xfs]
[18102.981234]  [] ? kmem_cache_alloc+0x1bc/0x1f0
[18102.981236]  [] ? __down_common+0x97/0xea
[18102.981241]  [] ? _xfs_buf_find+0xea/0x280 [xfs]
[18102.981243]  [] ? down+0x37/0x40
[18102.981247]  [] ? xfs_buf_lock+0x32/0xf0 [xfs]
[18102.981252]  [] ? _xfs_buf_find+0xea/0x280 [xfs]
[18102.981257]  [] ? xfs_buf_get_map+0x35/0x1a0 [xfs]
[18102.981263]  [] ? xfs_buf_read_map+0x33/0x130 [xfs]
[18102.981269]  [] ? xfs_trans_read_buf_map+0x34a/0x4f0 [xfs]
[18102.981275]  [] ? xfs_imap_to_bp+0x69/0xf0 [xfs]
[18102.981281]  [] ? xfs_iread+0x7d/0x3f0 [xfs]
[18102.981284]  [] ? make_kgid+0x9/0x10
[18102.981286]  [] ? inode_init_always+0x10e/0x1d0
[18102.981292]  [] ? xfs_iget+0x2ba/0x810 [xfs]
[18102.981298]  [] ? xfs_ialloc+0xe6/0x740 [xfs]
[18102.981305]  [] ? kmem_zone_alloc+0x6e/0xf0 [xfs]
[18102.981311]  [] ? xfs_dir_ialloc+0x83/0x300 [xfs]
[18102.981317]  [] ? xfs_trans_reserve+0x213/0x220 [xfs]
[18102.981323]  [] ? xfs_create+0x4fe/0x720 [xfs]
[18102.981329]  [] ? xfs_vn_mknod+0xd2/0x200 [xfs]
[18102.981331]  [] ? vfs_create+0xe4/0x160
[18102.981335]  [] ? do_nfsd_create+0x53e/0x610 [nfsd]
[18102.981339]  [] ? nfsd3_proc_create+0x16d/0x250 [nfsd]
[18102.981342]  [] ? nfsd_dispatch+0xe4/0x230 [nfsd]
[18102.981347]  [] ? svc_process_common+0x354/0x690 [sunrpc]
[18102.981349]  [] ? try_to_wake_up+0x280/0x280
[18102.981353]  [] ? svc_process+0x10b/0x160 [sunrpc]
[18102.981359]  [] ? nfsd+0xb7/0x130 [nfsd]
[18102.981363]  [] ? nfsd_destroy+0x70/0x70 [nfsd]
[18102.981365]  [] ? kthread+0xbc/0xe0
[18102.981367]  [] ? flush_kthread_worker+0xa0/0xa0
[18102.981369]  [] ? ret_from_fork+0x7c/0xb0
[18102.981371]  [] ? flush_kthread_worker+0xa0/0xa0

iostat:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.000.001.00   99.000.000.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-0  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-1  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-2  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-3  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-4  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
rbd0  0.00 0.000.000.00 0.00 0.00 0.00
46.000.000.000.00   0.00 100.00
rbd1  0.00 0.000.000.00 0.00 0.00 0.00
12.000.000.000.00   0.00 100.00
rbd2  0.00 0.000.000.00 0.00 0.00 0.00   
136.000.000.000.00   0.00 100.00
rbd3  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
rbd4  0.00 0.000.000.00 0.00 0.00 0.00
11.000.000.000.00   0.00 100.00
rbd5  0.00 0.000.000.00 0.00 0.00 0.00
57.000.000.000.00   0.00 100.00
emcpowerig0.00 0.000.000.00 0.00 0.00 0.00
32.000.000

Re: [ceph-users] OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left

2014-09-23 Thread Christian Eichelmann
Hi Nathan,

that was indeed the Problem! I was increasing the max_pid value to 65535
and the problem is gone! Thank you!

It was a bit misleading that there is also a
/proc/sys/kernel/threads-max, which has a much higher number. And since
I was only seeing around 400 processes and wasn't aware that threads are
also consuming pids, it was hard to find the root cause of this issue.

After this problem is solved, I'm thinking if it is a good idea to run
aout 40.000 Threads (in an idle cluster) on one machine. The system has
a load around 6-7 without having traffic, maybe just because of the
intense context-switching.

Anyways, thats another topic. Thank you for your help!

Regards,
Christian

Am 23.09.2014 03:21, schrieb Nathan O'Sullivan:
> Hi Christian,
> 
> Your problem is probably that your kernel.pid_max (the maximum
> threads+processes across the entire system) needs to be increased - the
> default is 32768, which is too low for even a medium density
> deployment.  You can test this easily enough with
> 
> $ ps axms | wc -l
> 
> If you get a number around the 30,000 mark then you are going to be
> affected.
> 
> There's an issue here http://tracker.ceph.com/issues/6142 , although it
> doesn't seem to have gotten much traction in terms of informing users.
> 
> Regards
> Nathan
> 
> On 15/09/2014 7:13 PM, Christian Eichelmann wrote:
>> Hi all,
>>
>> I have no idea why running out of filehandles should produce a "out of
>> memory" error, but well. I've increased the ulimit as you told me, and
>> nothing changed. I've noticed that the osd init script sets the max open
>> file handles explicitly, so I was setting the corresponding option in my
>> ceph conf. Now the limits of an OSD process look like this:
>>
>> Limit Soft Limit   Hard Limit
>> Units
>> Max cpu time  unlimitedunlimited
>> seconds
>> Max file size unlimitedunlimited
>> bytes
>> Max data size unlimitedunlimited
>> bytes
>> Max stack size8388608  unlimited
>> bytes
>> Max core file sizeunlimitedunlimited
>> bytes
>> Max resident set  unlimitedunlimited
>> bytes
>> Max processes 2067478  2067478
>> processes
>> Max open files6553665536
>> files
>> Max locked memory 6553665536
>> bytes
>> Max address space unlimitedunlimited
>> bytes
>> Max file locksunlimitedunlimited
>> locks
>> Max pending signals   2067478  2067478
>> signals
>> Max msgqueue size 819200   819200
>> bytes
>> Max nice priority 00
>> Max realtime priority 00
>> Max realtime timeout  unlimitedunlimitedus
>>
>> Anyways, the exact same behavior as before. I was also finding a mailing
>> on this list from someone who had the exact same problem:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040059.html
>>
>> Unfortunately, there was also no real solution for this problem.
>>
>> So again: this is *NOT* a ulimit issue. We were running emperor and
>> dumpling on the same hardware without any issues. They first started
>> after our upgrade to firefly.
>>
>> Regards,
>> Christian
>>
>>
>> Am 12.09.2014 18:26, schrieb Christian Balzer:
>>> On Fri, 12 Sep 2014 12:05:06 -0400 Brian Rak wrote:
>>>
 That's not how ulimit works.  Check the `ulimit -a` output.

>>> Indeed.
>>>
>>> And to forestall the next questions, see "man initscript", mine looks
>>> like
>>> this:
>>> ---
>>> ulimit -Hn 131072
>>> ulimit -Sn 65536
>>>
>>> # Execute the program.
>>> eval exec "$4"
>>> ---
>>>
>>> And also a /etc/security/limits.d/tuning.conf (debian) like this:
>>> ---
>>> rootsoftnofile  65536
>>> roothardnofile  131072
>>> *   softnofile  16384
>>> *   hardnofile  65536
>>> ---
>>>
>>> Adjusted to your actual needs. There might be other limits you're
>>> hitting,
>>> but that is the most likely one
>>>
>>> Also 45 OSDs with 12 (24 with HT, bleah) CPU cores is pretty ballsy.
>>> I personally would rather do 4 RAID6 (10 disks, with OSD SSD journals)
>>> with that kind of case and enjoy the fact that my OSDs never fail. ^o^
>>>
>>> Christian (another one)
>>>
>>>
 On 9/12/2014 10:15 AM, Christian Eichelmann wrote:
> Hi,
>
> I am running all commands as root, so there are no limits for the
> processes.
>
> Regards,
> Christian
> ___
> Von: Mariusz Gronczewski [mariusz.gronczew...@efigence.com]
> Gesendet: Freitag, 12. September 2014 15:33
> An: Christian Eichelmann
> Cc: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] OSDs are crashing with "Cannot fork" or
> "cannot create thread" but plenty of memo

Re: [ceph-users] ceph backups

2014-09-23 Thread Andrei Mikhailovsky
Luis, 

you may want to take a look at rbd export/import and export-diff import-diff 
functionality. this could be used to copy data to another cluster or offsite. 

S3 has regions, which you could use for async replication. 

Not sure how the cephfs work for backups. 

Andrei 
- Original Message -

> From: "Luis Periquito" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, 23 September, 2014 11:28:39 AM
> Subject: [ceph-users] ceph backups

> Hi fellow cephers,

> I'm being asked questions around our backup of ceph, mainly due to
> data deletion.

> We are currently using ceph to store RBD, S3 and eventually cephFS;
> and we would like to be able to devise a plan to backup the
> information as to avoid issues with data being deleted from the
> cluster.

> I know RBD has the snapshots, but how can they be automated? Can we
> rely on them to perform data recovery?

> And for S3/CephFS? Are there any backup methods? Other than copying
> all the information into another location?

> thanks,
> Luis

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] delete performance

2014-09-23 Thread Luis Periquito
Any thoughts on how to improve the delete process performance?

thanks,

On Mon, Sep 8, 2014 at 9:17 AM, Luis Periquito  wrote:

> Hi,
>
> I've been trying to tweak and improve the performance of our ceph
> cluster.
>
> One of the operations that I can't seem to be able to improve much is the
> delete. From what I've gathered every time there is a delete it goes
> directly to the HDD, hitting its performance - the op may be recorded in
> the journal but I don't notice almost any impact.
>
> From my tests (1M files with 512k) writing the data will take 2x as much
> as the delete operation - should there be a bigger difference? And whilst
> the delete operation is running all the remaining operations will be slower
> - it does impact the whole cluster performance in a significant way.
>
> Is there any way to improve the delete performance on the cluster? I'm
> using S3 to do all the tests, and the .rgw.bucket.index is already running
> from SSDs as is the journal. I'm running firefly 0.80.5.
>
> thanks,
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph backups

2014-09-23 Thread Luis Periquito
Hi fellow cephers,

I'm being asked questions around our backup of ceph, mainly due to data
deletion.

We are currently using ceph to store RBD, S3 and eventually cephFS; and we
would like to be able to devise a plan to backup the information as to
avoid issues with data being deleted from the cluster.

I know RBD has the snapshots, but how can they be automated? Can we rely on
them to perform data recovery?

And for S3/CephFS? Are there any backup methods? Other than copying all the
information into another location?

thanks,
Luis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] question about client's cluster aware

2014-09-23 Thread yuelongguang
hi,all
 
my question is from my test.
let's take a example.   object1(4MB)--> pg 0.1 --> osd 1,2,3,p1
 
when client is writing object1, during the write , osd1 is down. let suppose 
2MB is writed.
1.
   when the connection to osd1 is down, what does client do?  ask monitor for 
new osdmap? or only the pg map?
 
2.
  now client gets a newer map , continues the write , the primary osd should be 
osd2.  the rest 2MB is writed out.
 now what does ceph do to integrate the two part data? and to promise that 
replicas is enough?
 
3.
 where is the code.  Be sure to tell me where the code is。
 
it is a very difficult question.
 
Thanks so much___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] question about object replication theory

2014-09-23 Thread yuelongguang
hi,all
   take a look at the link ,  
http://www.ceph.com/docs/master/architecture/#smart-daemons-enable-hyperscale
could you explain  point 2, 3 in that picture.
 
1.
at point 2,3, before primary writes data  to next osd, where is the data?  it 
is in momory or on disk already?
 
2. where is the  code of point 2 or 3,  at there  primary distributes data to 
others?
 
thanks___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Aegeaner

This is my /var/log/ceph/ceph-osd.0.log :

   2014-09-23 15:38:14.040699 7fbaccb1e7a0  0 ceph version 0.80.5
   (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-osd, pid 9764
   2014-09-23 15:38:14.045192 7fbaccb1e7a0  1 mkfs in
   /var/lib/ceph/osd/ceph-0
   2014-09-23 15:38:14.046127 7fbaccb1e7a0  1 mkfs generated fsid
   f72f9a73-60cf-41f2-976b-7ad144bb8714
   2014-09-23 15:38:14.050033 7fbaccb1e7a0  1 keyvaluestore backend
   exists/created
   2014-09-23 15:38:14.050089 7fbaccb1e7a0  1 mkfs done in
   /var/lib/ceph/osd/ceph-0
   2014-09-23 15:38:14.056912 7fbaccb1e7a0 -1 created object store
   /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal
   for osd.0 fsid f55cbcc0-4a91-40cd-8ef8-a1b2c8a09650
   2014-09-23 15:38:14.056983 7fbaccb1e7a0 -1 auth: error reading file:
   /var/lib/ceph/osd/ceph-0/keyring: can't open
   /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
   2014-09-23 15:38:14.057113 7fbaccb1e7a0 -1 created new key in
   keyring /var/lib/ceph/osd/ceph-0/keyring




在 2014-09-23 15:26, Mark Kirkwood 写道:

On 23/09/14 18:22, Aegeaner wrote:

Now I use the following script to create key/value backended OSD, but
the OSD is created down and never go up.

ceph osd create
umount /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
mkdir /var/lib/ceph/osd/ceph-0
ceph osd crush add osd.0 1 root=default host=CVM-0-11
mkfs -t xfs -f /dev/hioa
mount  /dev/hioa /var/lib/ceph/osd/ceph-0
ceph-osd --id 0 --mkkey --mkfs --osd-data /var/lib/ceph/osd/ceph-0
/etc/init.d/ceph start osd.0


Anything goes wrong?



Hmmm - trying out a variant of the above (attached) seems to work for 
me. I think we need to see your osd log (in /var/log/ceph).


Cheers

Mark


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Mark Kirkwood

On 23/09/14 18:22, Aegeaner wrote:

Now I use the following script to create key/value backended OSD, but
the OSD is created down and never go up.

ceph osd create
umount /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
mkdir /var/lib/ceph/osd/ceph-0
ceph osd crush add osd.0 1 root=default host=CVM-0-11
mkfs -t xfs -f /dev/hioa
mount  /dev/hioa /var/lib/ceph/osd/ceph-0
ceph-osd --id 0 --mkkey --mkfs --osd-data /var/lib/ceph/osd/ceph-0
/etc/init.d/ceph start osd.0


Anything goes wrong?




Ahh - looking closer - you are missing a step to register the osd key 
before trying to start it. E.g I do:


$ ceph auth add osd.${OSD_ID} osd 'allow *' mon 'allow profile osd' \
 -i /var/lib/ceph/osd/ceph-${OSD_ID}/keyring

...which is why I save the OSD_ID when I do a create of the osd (It'll 
be zero in most cases where you are doing this, but in case you want to 
expand the script to create more osd this is the right way)!


Cheers

Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-23 Thread Mark Kirkwood

On 23/09/14 18:22, Aegeaner wrote:

Now I use the following script to create key/value backended OSD, but
the OSD is created down and never go up.

ceph osd create
umount /var/lib/ceph/osd/ceph-0
rm -rf /var/lib/ceph/osd/ceph-0
mkdir /var/lib/ceph/osd/ceph-0
ceph osd crush add osd.0 1 root=default host=CVM-0-11
mkfs -t xfs -f /dev/hioa
mount  /dev/hioa /var/lib/ceph/osd/ceph-0
ceph-osd --id 0 --mkkey --mkfs --osd-data /var/lib/ceph/osd/ceph-0
/etc/init.d/ceph start osd.0


Anything goes wrong?



Hmmm - trying out a variant of the above (attached) seems to work for 
me. I think we need to see your osd log (in /var/log/ceph).


Cheers

Mark


deploy-bug.sh
Description: application/shellscript
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com