Re: [ceph-users] About conf parameter mon_initial_members

2014-10-20 Thread Nicheal
Hi guys, I try to bootstrap the monitor without setting the parameter mon_initial_members or leaving it as none "". But the mon still can be created and run correctly. So as the osd. Actually, I find that the command tool and osd hunts the mon based on setting below, e.g.: [mon.b] host = ceph0 m

Re: [ceph-users] RADOS pool snaps and RBD

2014-10-20 Thread Sage Weil
On Mon, 20 Oct 2014, Xavier Trilla wrote: > Hi, > > It seems Ceph doesn't allow rados pool snapshots on RBD pools which have or > had RBD snapshots. They only work on RBD pools which never had a RBD > snapshot. > > So, basically this works: > > rados mkpool test-pool 1024 1024 replicated > rb

Re: [ceph-users] Ceph counters

2014-10-20 Thread Craig Lewis
> > >> I'm sure there are many more useful things to graph. One of things I'm >> interested in (but haven't found time to research yet) is the journal >> usage, with maybe some alerts if the journal is more than 90% full. >> > > This is not likely to be an issue with the default journal config sin

Re: [ceph-users] Use case: one-way RADOS "replication" between two clusters by time period

2014-10-20 Thread Craig Lewis
In a normal setup, where radosgw-agent runs all the time, it will delete the objects and buckets fairly quickly after they're deleted in the primary zone. If you shut down radosgw-agent, then nothing will update in the secondary cluster. Once you re-enable radosgw-agent, it will eventually proces

Re: [ceph-users] Ceph counters

2014-10-20 Thread Mark Nelson
On 10/20/2014 08:22 PM, Craig Lewis wrote: I've just started on this myself.. I started with https://ceph.com/docs/v0.80/dev/perf_counters/ I'm currently monitoring the latency, using the (to pick one example) [op_w_latency][sum] and [op_w_latency][avgcount]. Both values are counters, so they

Re: [ceph-users] Ceph counters

2014-10-20 Thread Craig Lewis
I've just started on this myself.. I started with https://ceph.com/docs/v0.80/dev/perf_counters/ I'm currently monitoring the latency, using the (to pick one example) [op_w_latency][sum] and [op_w_latency][avgcount]. Both values are counters, so they only increase with time. The lifetime averag

Re: [ceph-users] RADOS pool snaps and RBD

2014-10-20 Thread Shu, Xinxin
comments inline. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Shu, Xinxin Sent: Tuesday, October 21, 2014 9:13 AM To: Xavier Trilla; ceph-users@lists.ceph.com Subject: Re: [ceph-users] RADOS pool snaps and RBD -Original Message- Fr

Re: [ceph-users] Use case: one-way RADOS "replication" between two clusters by time period

2014-10-20 Thread Anthony Alba
Great information, thanks. I would like to confirm that if I regularly delete older buckets off the LIVE primary system, the "extra" objects on the ARCHIVE secondaries are ignored during replication. I.e. it does not behave like rsync -avz --delete LIVE/ ARCHIVE/ Rather it behaves more like rs

Re: [ceph-users] radosGW balancer best practices

2014-10-20 Thread Craig Lewis
I'm using my existing HAProxy server to also balance my RadosGW nodes. I'm not going to run into bandwidth problems on that link any time soon, but I'll split RadosGW off onto it's own HAProxy instance when it does become congested. I have a smaller cluster, only 5 nodes. I'm running mon on the

Re: [ceph-users] RADOS pool snaps and RBD

2014-10-20 Thread Shu, Xinxin
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Xavier Trilla Sent: Tuesday, October 21, 2014 12:42 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] RADOS pool snaps and RBD Hi, It seems Ceph doesn't allow rados pool snapshots on RBD po

Re: [ceph-users] how to resolve : start mon assert == 0

2014-10-20 Thread minchen
thank you very much, Shu, Xinxin I just start all mons , with command "ceph-kvstore-tool /var/lib/ceph/mon/store.db set auth last_committed ver 0" on each mon node Min Chen 在2014-10-20,"Shu, Xinxin" 写道:-原始邮件- 发件人: "Shu, Xinxin" 发送时间: 2014年10月20日 星期一 收件人: minchen , ceph-users , "

Re: [ceph-users] OSD (and probably other settings) not being picked up outside of the [global] section

2014-10-20 Thread Christian Balzer
Hello, On Mon, 20 Oct 2014 17:09:57 -0700 Craig Lewis wrote: > I'm still running Emperor, but I'm not seeing that behavior. My > ceph.conf is pretty similar: Yeah, I tested things extensively with Emperor back in the day and at that time frequently verified that changes in the config file wer

Re: [ceph-users] Use case: one-way RADOS "replication" between two clusters by time period

2014-10-20 Thread Craig Lewis
RadosGW Federation can fulfill this use case: http://ceph.com/docs/master/radosgw/federated-config/ . Depending on your setup, it may or may not be "easily". To start, radosgw-agent handles the replication. It does the metadata (users and bucket) and the data (objects in a bucket). It only flow

Re: [ceph-users] Ceph RBD

2014-10-20 Thread Sage Weil
Hi Fred, There is a fencing mechanism. There is work underway to wire it up to an iSCSI target (LIO in this case), but I think that isn't needed to simply run ocfs2 (or similar) directly on top of an RBD device. Honestly I'm not quite sure how that would glue together. sage On Mon, 20 Oct

Re: [ceph-users] urgent- object unfound

2014-10-20 Thread Craig Lewis
It's probably a bit late now, but did you get the issue resolved? If not, why is OSD.49 down? I'd start by trying to get all of your OSDs back UP and IN. It may take a little while to unblock the requests. Recovery doesn't appear to prioritize blocked PGs, so it might take a while for recovery t

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-20 Thread Craig Lewis
It's part of the way the CRUSH hashing works. Any change to the CRUSH map causes the algorithm to change slightly. BTW, it's safer to remove OSDs and hosts by first marking the OSDs UP and OUT (ceph osd out OSDID). That will trigger the remapping, while keeping the OSDs in the pool so you have

Re: [ceph-users] OSD (and probably other settings) not being picked up outside of the [global] section

2014-10-20 Thread Craig Lewis
I'm still running Emperor, but I'm not seeing that behavior. My ceph.conf is pretty similar: [global] mon initial members = ceph0 mon host = 10.129.0.6:6789, 10.129.0.7:6789, 10.129.0.8:6789 cluster network = 10.130.0.0/16 osd pool default flag hashpspool = true osd pool default min size

Re: [ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread Lionel Bouton
Hi, Le 21/10/2014 01:10, 池信泽 a écrit : > Thanks. > >Another reason is the checksum in the attr of object used for deep > scrub in EC pools should be computed when modify the object. When > supporting the random write, We should caculate the whole object for > checksum, even if there is a bit m

Re: [ceph-users] Ceph RBD

2014-10-20 Thread Fred Yang
Sage, Even with cluster file system, it will still need a fencing mechanism to allow SCSI device shared by multiple host, what kind of SCSI reservation RBD currently support? Fred Sent from my Samsung Galaxy S3 On Oct 20, 2014 4:42 PM, "Sage Weil" wrote: > On Mon, 20 Oct 2014, Dianis Dimoglo wr

Re: [ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread 池信泽
Thanks. Another reason is the checksum in the attr of object used for deep scrub in EC pools should be computed when modify the object. When supporting the random write, We should caculate the whole object for checksum, even if there is a bit modified. If only supporting append write, We can ge

Re: [ceph-users] recovery process stops

2014-10-20 Thread Craig Lewis
I've been in a state where reweight-by-utilization was deadlocked (not the daemons, but the remap scheduling). After successive osd reweight commands, two OSDs wanted to swap PGs, but they were both toofull. I ended up temporarily increasing mon_osd_nearfull_ratio to 0.87. That removed the imped

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
yes, tomorrow I will get the replacement of the failed disk, to get a new node with many disk will take a few days. No other idea? Harald Rößler > Am 20.10.2014 um 16:45 schrieb Wido den Hollander : > > On 10/20/2014 04:43 PM, Harald Rößler wrote: >> Yes, I had some OSD which was near full

Re: [ceph-users] Ceph RBD

2014-10-20 Thread Sage Weil
On Mon, 20 Oct 2014, Dianis Dimoglo wrote: > I installed ceph two nodes, 2 mon 2 osd in xfs, also used the RBD and > mount the pool on two different ceph host and when I write data through > one of the hosts at the other I do not see the data, what's wrong? Although the RBD disk can be shared, t

Re: [ceph-users] Ceph OSD very slow startup

2014-10-20 Thread Gregory Farnum
On Mon, Oct 20, 2014 at 8:25 AM, Lionel Bouton wrote: > Hi, > > More information on our Btrfs tests. > > Le 14/10/2014 19:53, Lionel Bouton a écrit : > > > > Current plan: wait at least a week to study 3.17.0 behavior and upgrade the > 3.12.21 nodes to 3.17.0 if all goes well. > > > 3.17.0 and 3.1

Re: [ceph-users] recovery process stops

2014-10-20 Thread Leszek Master
You can set lower weight on full osds, or try changing the osd_near_full_ratio parameter in your cluster from 85 to for example 89. But i don't know what can go wrong when you do that. 2014-10-20 17:12 GMT+02:00 Wido den Hollander : > On 10/20/2014 05:10 PM, Harald Rößler wrote: > > yes, tomorrow

Re: [ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread Lionel Bouton
Le 20/10/2014 16:39, Wido den Hollander a écrit : > On 10/20/2014 03:25 PM, 池信泽 wrote: >> hi, cephers: >> >> When I look into the ceph source code, I found the erasure code pool >> not support >> the random write, it only support the append write. Why? Is that random >> write of is erasure co

[ceph-users] RADOS pool snaps and RBD

2014-10-20 Thread Xavier Trilla
Hi, It seems Ceph doesn't allow rados pool snapshots on RBD pools which have or had RBD snapshots. They only work on RBD pools which never had a RBD snapshot. So, basically this works: rados mkpool test-pool 1024 1024 replicated rbd -p test-pool create --size=102400 test-image ceph osd pool mk

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

2014-10-20 Thread Mark Nelson
On 10/20/2014 09:28 AM, Mark Wu wrote: 2014-10-20 21:04 GMT+08:00 Mark Nelson mailto:mark.nel...@inktank.com>>: On 10/20/2014 06:27 AM, Mark Wu wrote: Test result Update: Number of Hosts Maximum single volume IOPS Maximum aggregated IOPS SSD Disk I

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
Yes I agree 100%, but actual every disk have a maximum of 86% usage, there should a way to recover the cluster. To set the near full ratio to higher than 85% should be only a short term solution. New disk for higher capacity are already ordered, I only don’t like degraded situation, for a week o

Re: [ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread Loic Dachary
Hi 池信泽, On 20/10/2014 06:25, 池信泽 wrote:> hi, cephers: > > When I look into the ceph source code, I found the erasure code pool > not support > the random write, it only support the append write. Why? The main reason is because it is complicated. The second reason is that it as a signif

Re: [ceph-users] Ceph OSD very slow startup

2014-10-20 Thread Lionel Bouton
Hi, More information on our Btrfs tests. Le 14/10/2014 19:53, Lionel Bouton a écrit : > > > Current plan: wait at least a week to study 3.17.0 behavior and > upgrade the 3.12.21 nodes to 3.17.0 if all goes well. > 3.17.0 and 3.17.1 have a bug which remounts Btrfs filesystems read-only (no corrup

Re: [ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread Wido den Hollander
On 10/20/2014 03:25 PM, 池信泽 wrote: > hi, cephers: > > When I look into the ceph source code, I found the erasure code pool > not support > the random write, it only support the append write. Why? Is that random > write of is erasure code high cost and the performance of the deep scrub is > v

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 05:10 PM, Harald Rößler wrote: > yes, tomorrow I will get the replacement of the failed disk, to get a new > node with many disk will take a few days. > No other idea? > If the disks are all full, then, no. Sorry to say this, but it came down to poor capacity management. Never le

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 04:43 PM, Harald Rößler wrote: > Yes, I had some OSD which was near full, after that I tried to fix the > problem with "ceph osd reweight-by-utilization", but this does not help. > After that I set the near full ratio to 88% with the idea that the remapping > would fix the issue. A

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

2014-10-20 Thread Mark Wu
2014-10-20 21:04 GMT+08:00 Mark Nelson : > On 10/20/2014 06:27 AM, Mark Wu wrote: > >> Test result Update: >> >> >> Number of Hosts Maximum single volume IOPS Maximum aggregated IOPS >> SSD Disk IOPS SSD Disk Utilization >> >> 7 14k 45k 9800+ >> 90%

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
Yes, I had some OSD which was near full, after that I tried to fix the problem with "ceph osd reweight-by-utilization", but this does not help. After that I set the near full ratio to 88% with the idea that the remapping would fix the issue. Also a restart of the OSD doesn’t help. At the same ti

Re: [ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread Gregory Farnum
This is a common constraint in many erasure coding storage system. It arises because random writes turn into a read-modify-write cycle (in order to redo the parity calculations). So we simply disallow them in EC pools, which works fine for the target use cases right now. -Greg On Monday, October 2

Re: [ceph-users] real beginner question

2014-10-20 Thread Dan Geist
Hi, Ranju. Are you talking about setting up Ceph Monitors and OSD nodes on VMs for the purposes of learning, or adding a Ceph storage cluster to an existing KVM-based infrastructure that's using local storage/NFS/iSCSI for block storage now? - If the former, this is pretty easy. Although perfor

[ceph-users] why the erasure code pool not support random write?

2014-10-20 Thread 池信泽
hi, cephers: When I look into the ceph source code, I found the erasure code pool not support the random write, it only support the append write. Why? Is that random write of is erasure code high cost and the performance of the deep scrub is very poor? Thanks. _

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

2014-10-20 Thread Mark Nelson
On 10/20/2014 06:27 AM, Mark Wu wrote: Test result Update: Number of Hosts Maximum single volume IOPS Maximum aggregated IOPS SSD Disk IOPS SSD Disk Utilization 7 14k 45k 9800+90% 8 21k 50k

Re: [ceph-users] recovery process stops

2014-10-20 Thread Leszek Master
I think it's because you have too full osds like in warning message. I had similiar problem recently and i did: ceph osd reweight-by-utilization But first read what this command does. It solved problem for me. 2014-10-20 14:45 GMT+02:00 Harald Rößler : > Dear All > > I have in them moment a iss

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 02:45 PM, Harald Rößler wrote: > Dear All > > I have in them moment a issue with my cluster. The recovery process stops. > See this: 2 active+degraded+remapped+backfill_toofull 156 pgs backfill_toofull You have one or more OSDs which are to full and that causes recovery to stop.

Re: [ceph-users] How to calculate file size when mount a block device from rbd image

2014-10-20 Thread Benedikt Fraunhofer
Hi Mika, 2014-10-20 11:16 GMT+02:00 Vickie CH : > 2.Use dd command to create a 1.2T file. >#dd if=/dev/zero of=/mnt/ceph-mount/test12T bs=1M count=12288000 I think you're off by one "zero" 12288000/1024/1024 11 Means you're instructing it to create a 11TB file on a 1.5T volume. Cheers

[ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
Dear All I have in them moment a issue with my cluster. The recovery process stops. ceph -s health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 pgs backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck unclean; recovery 111487/1488290 degraded (7.491%) monmap e2:

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

2014-10-20 Thread Mark Wu
Test result Update: Number of Hosts Maximum single volume IOPS Maximum aggregated IOPS SSD Disk IOPS SSD Disk Utilization 7 14k45k 9800+90% 8

Re: [ceph-users] real beginner question

2014-10-20 Thread Ashish Chandra
Hi Ranjan, Is it possible to set up ceph in an already virtualized environment? Yes obviously, you can try out all the features of ceph in a virtualized environment. Infact it is the easiet and recommended way of playing with Ceph. Ceph docs lists the way to do this, it should take hardly any tim

Re: [ceph-users] Reweight a host

2014-10-20 Thread Lei Dong
I¹ve never see this before. The weight of host is commented out because it¹s weight is the sum of weight in the following lines started from ³item². Can you attach your crush map? Did you manually change it? On 10/20/14, 6:06 PM, "Erik Logtenberg" wrote: >I don't think so, check this out: > ># i

Re: [ceph-users] Reweight a host

2014-10-20 Thread Erik Logtenberg
I don't think so, check this out: # idweight type name up/down reweight -6 3.05root ssd -7 0.04999 host ceph-01-ssd 11 0.04999 osd.11 up 1 -8 1 host ceph-02-ssd 12 0.04999 osd.12 up 1 -9

[ceph-users] slow requests - what is causing them?

2014-10-20 Thread Andrei Mikhailovsky
Hello cephers, I've been testing flashcache and enhanceio block device caching for the osds and i've noticed i have started getting the slow requests. The caching type that I use is ready only, so all writes bypass the caching ssds and go directly to osds, just like what it used to be before

Re: [ceph-users] How to calculate file size when mount a block device from rbd image

2014-10-20 Thread Wido den Hollander
On 10/20/2014 11:16 AM, Vickie CH wrote: > Hello all, > I have a question about how to calculate file size when mount a block > device from rbd image . > [Cluster information:] > 1.The cluster with 1 mon and 6 osds. Every osd is 1T. Total spaces is 5556G. > 2.rbd pool:replicated size 2 min_size 1.

[ceph-users] real beginner question

2014-10-20 Thread Ranju Upadhyay
Hi list, This is a real newbie question.(and hopefully the right list to ask to!) Is it possible to set up ceph in an already virtualized environment? i.e. we have a scenario here, where we have virtual machine ( as opposed to individual physical machines) with ubuntu OS on it. We are tryin

[ceph-users] Few questions.

2014-10-20 Thread Leszek Master
1) If i want to use cache tier should i use it with ssd journaling or i can get better perfomance using more ssd GB for cache tier? 2) I've got cluster made of 26x900GB SAS disk with ssd journaling. The placement groups i've got is 1024. When i add new osd to cluster, my VMs get io errors and got

[ceph-users] How to calculate file size when mount a block device from rbd image

2014-10-20 Thread Vickie CH
Hello all, I have a question about how to calculate file size when mount a block device from rbd image . [Cluster information:] 1.The cluster with 1 mon and 6 osds. Every osd is 1T. Total spaces is 5556G. 2.rbd pool:replicated size 2 min_size 1. num = 128. Except rbd pool other pools is empty. [St

Re: [ceph-users] how to resolve : start mon assert == 0

2014-10-20 Thread Shu, Xinxin
Please refer to http://tracker.ceph.com/issues/8851 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of minchen Sent: Monday, October 20, 2014 3:42 PM To: ceph-users; ceph-de...@vger.kernel.org Subject: [ceph-users] how to resolve : start mon assert == 0 Hello , all when i r

Re: [ceph-users] how to resolve : start mon assert == 0

2014-10-20 Thread Shu, Xinxin
Please refer to http://tracker.ceph.com/issues/8851 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of minchen Sent: Monday, October 20, 2014 3:42 PM To: ceph-users; ceph-de...@vger.kernel.org Subject: [ceph-users] how to resolve : start mon assert == 0 Hello , all when i

Re: [ceph-users] Same rbd mount from multiple servers

2014-10-20 Thread Mihály Árva-Tóth
Hi Sean, Thank you for your quick response! Okay I see, is there any preferred clustered FS in this case? OCFS2, GFS? Thanks, Mihaly 2014-10-20 10:36 GMT+02:00 Sean Redmond : > Hi Mihaly, > > > > To my understanding you cannot mount an ext4 file system on more than one > server at the same tim

Re: [ceph-users] Same rbd mount from multiple servers

2014-10-20 Thread Sean Redmond
Hi Mihaly, To my understanding you cannot mount an ext4 file system on more than one server at the same time, You would need to look to use a clustered file system. Thanks From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mihály Árva-Tóth Sent: 20 October 2014 09:34 To:

[ceph-users] Same rbd mount from multiple servers

2014-10-20 Thread Mihály Árva-Tóth
Hello, I made a 2GB RBD on Ceph and mounted on three separated servers.I followed this: http://ceph.com/docs/master/start/quick-rbd/ Set up, mkfs (extt4) and mount were finished success, but every node seems like three different rbd volume. :-o If I copy one 100 MB file on test1 node I don't see

[ceph-users] how to resolve : start mon assert == 0

2014-10-20 Thread minchen
Hello , all when i restart any mon in mon cluster{mon.a, mon.b, mon.c} after kill all mons(disabled cephx). An exception occured as follows: # ceph-mon -i b mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread thread 7fc801c78780 time 2014-10-20 15:29:3

[ceph-users] how to resolve : start mon assert == 0

2014-10-20 Thread minchen
Hello , all when i restart any mon in mon cluster{mon.a, mon.b, mon.c} after kill all mons(disabled cephx). An exception occured as follows: # ceph-mon -i b mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread thread 7fc801c78780 time 2014-10-20 15:29:3