Re: [ceph-users] Bluestore + erasure coding memory usage

2016-11-18 Thread bobobo1...@gmail.com
Just to update, this is still an issue as of the latest Git commit ( 64bcf92e87f9fbb3045de49b7deb53aca1989123). On Fri, Nov 11, 2016 at 1:31 PM, bobobo1...@gmail.com wrote: > Here's another: http://termbin.com/smnm > > On Fri, Nov 11, 2016 at 1:28 PM, Sage Weil

Re: [ceph-users] Ceph Down on Cluster

2016-11-18 Thread Goncalo Borges
Olá Bruno I am not understanding your outputs. On the first 'ceph -s' it says one mon is down but hour 'ceph health detail' does not report it further. On your crush map I count 7 osds= 0,1,2,3,4,6,7 but ceph -s says only 6 are active. Can you send the output of 'ceph osd tree, 'ceph osd df'

Re: [ceph-users] Ceph Down on Cluster

2016-11-18 Thread Bruno Silva
Hi, thanks. # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 device5

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-18 Thread Samuel Just
Many reasons: 1) You will eventually get a DC wide power event anyway at which point probably most of the OSDs will have hopelessly corrupted internal xfs structures (yes, I have seen this happen to a poor soul with a DC with redundant power). 2) Even in the case of a single rack/node power

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-18 Thread Nick Fisk
Yes, because these things happen http://www.theregister.co.uk/2016/11/15/memset_power_cut_service_interruption/ We had customers who had kit in this DC. To use your analogy, it's like crossing the road at traffic lights but not checking cars have stopped. You might be OK 99%of the time, but

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-18 Thread Brian ::
This is like your mother telling not to cross the road when you were 4 years of age but not telling you it was because you could be flattened by a car :) Can you expand on your answer? If you are in a DC with AB power, redundant UPS, dual feed from the electric company, onsite generators, dual

Re: [ceph-users] I want to submit a PR - Can someone guide me

2016-11-18 Thread Shinobu Kinjo
On Sat, Nov 19, 2016 at 6:59 AM, Brad Hubbard wrote: > +ceph-devel > > On Fri, Nov 18, 2016 at 8:45 PM, Nick Fisk wrote: >> Hi All, >> >> I want to submit a PR to include fix in this tracker bug, as I have just >> realised I've been experiencing it. >> >>

Re: [ceph-users] I want to submit a PR - Can someone guide me

2016-11-18 Thread Brad Hubbard
+ceph-devel On Fri, Nov 18, 2016 at 8:45 PM, Nick Fisk wrote: > Hi All, > > I want to submit a PR to include fix in this tracker bug, as I have just > realised I've been experiencing it. > > http://tracker.ceph.com/issues/9860 > > I understand that I would also need to update

Re: [ceph-users] "Lost" buckets on radosgw

2016-11-18 Thread Yehuda Sadeh-Weinraub
On Fri, Nov 18, 2016 at 1:14 PM, Jeffrey McDonald wrote: > Hi, > > MSI has an erasure coded ceph pool accessible by the radosgw interface. > We recently upgraded to Jewel from Hammer. Several days ago, we > experienced issues with a couple of the rados gateway servers and >

[ceph-users] "Lost" buckets on radosgw

2016-11-18 Thread Jeffrey McDonald
Hi, MSI has an erasure coded ceph pool accessible by the radosgw interface. We recently upgraded to Jewel from Hammer. Several days ago, we experienced issues with a couple of the rados gateway servers and inadvertently deployed older Hammer versions of the radosgw instances. This

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-18 Thread Craig Chi
Hi Nick and other Cephers, Thanks for your reply. >2) Config Errors>This can be an easy one to say you are safe from. But I would >say most outages and data loss incidents I have seen on the mailing>lists have >been due to poor hardware choice or configuring options such as size=2, >min_size=1

Re: [ceph-users] ceph-mon not starting on system startup (Ubuntu 16.04 / systemd)

2016-11-18 Thread Matthew Vernon
Hi, On 15/11/16 11:55, Craig Chi wrote: > You can try to manually fix this by adding the > /lib/systemd/system/ceph-mon.target file, which contains: > and then execute the following command to tell systemd to start this > target on bootup > systemctl enable ceph-mon.target This worked a

Re: [ceph-users] Intel P3700 SSD for journals

2016-11-18 Thread Alan Johnson
We use the 800GB version as journal devices with up to an 1:18 ratio and have had good experiences no bottleneck on the journal side. These also feature good endurance characteristics. I would think that higher capacities are hard to justify as journals -Original Message- From:

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-18 Thread Thomas Danan
I often read that small IO write and RBD are working better with bigger filestore_max_sync_interval than default value. Default value is 5 sec and I saw many post saying they are using 30 sec. Also the slow request symptom is often linked to this parameter. My journals are 10GB ( collocated with

Re: [ceph-users] Configuring Ceph RadosGW with SLA based rados pools

2016-11-18 Thread Stéphane DUGRAVOT
- Le 4 Nov 16, à 21:17, Andrey Ptashnik a écrit : > Hello Ceph team! > I’m trying to create different pools in Ceph in order to have different tiers > (some are fast, small and expensive and others are plain big and cheap), so > certain users will be tied to one pool

Re: [ceph-users] backup of radosgw config

2016-11-18 Thread Stéphane DUGRAVOT
- Le 3 Nov 16, à 5:18, Thomas a écrit : > Hi guys, Hi Thomas, This is a question I also asked myself ... Maybe something like : radosgw-admin zonegroup get radosgw-admin zone get And for each user : radosgw-admin metadata get user:uid Anyone ? Stephane. >

[ceph-users] Ceph Infrastructure Downtime

2016-11-18 Thread Patrick McGarry
Hey Cephers, Due to Dreamhost shutting down the old DreamCompute cluster in their US-East 1 region, we are in the process of beginning the migration of Ceph infrastructure. We will need to move download.ceph.com, tracker.ceph.com, and docs.ceph.com to their US-East 2 region. The current plan is

Re: [ceph-users] Antw: Re: Best practices for extending a ceph cluster with minimal client impact data movement

2016-11-18 Thread Martin Palma
> I was wondering how exactly you accomplish that? > Can you do this with a "ceph-deploy create" with "noin" or "noup" flags > set, or does one need to follow the manual steps of adding an osd? You can do it either way (manual or with ceph-deploy). Here are the steps using ceph-deploy: 1. Add

Re: [ceph-users] Intel P3700 SSD for journals

2016-11-18 Thread William Josefsson
yes nick, you're right, I can now see on page 16 here www.intel.com/content/www/xa/en/solid-state-drives/ssd-dc-p3700-spec.html there is a difference in the durability. However, I think 7.3PBW isn't much worse than Intel S3610 that's much slower. thx will 400GB: 7.3 PBW 800GB: 14.6 PBW (10 drive

Re: [ceph-users] Intel P3700 SSD for journals

2016-11-18 Thread Nick Fisk
I'm using the 400Gb models as a Journal for 12x drives. I know this is probably pushing it a little bit, but seems to work fine. I'm guessing the reason may be relating to the TBW figure being higher on the more expensive models, maybe they don't want to have to replace warn NVME's on warranty?

Re: [ceph-users] ceph mon eating lots of memory after upgrade0.94.2 to 0.94.9

2016-11-18 Thread William Josefsson
hi Corin. We run latest hammer on CentOS7.2, with 3 mons and have not seen this problem. I'm not sure if there are any other possible differences between the healthy nodes and the one that has excessive consumption of memory? thx will On Fri, Nov 18, 2016 at 6:35 PM, Corin Langosch

Re: [ceph-users] rgw print continue and civetweb

2016-11-18 Thread William Josefsson
thanks Yehuda and Brian. I'm not sure if you have ever seen this error with radosgw (lastest Hammer CentOS7), or can advice whether this is a critical error? appreciate any hints here. thx will 2016-11-12 13:49:08.905114 7fbba7fff700 20 RGWUserStatsCache: sync user=myuserid1 2016-11-12

Re: [ceph-users] ceph mon eating lots of memory after upgrade0.94.2 to 0.94.9

2016-11-18 Thread David Turner
We've had this for a while. We just monitor memory usage and restart the mon services when 1 or more reach 80%. Sent from my iPhone > On Nov 18, 2016, at 3:35 AM, Corin Langosch > wrote: > > Hi, > > about 2 weeks ago I upgraded a rather small cluster from ceph

[ceph-users] Intel P3700 SSD for journals

2016-11-18 Thread William Josefsson
Hi list, I wonder if there is anyone who have experience with Intel P3700 SSD drives as Journals, and can share their experience? I was thinking of using the P3700 SSD 400GB as journal in my ceph deployment. It is benchmarked in Sebastian hann ssd page as well. However a vendor I spoke to didn't

Re: [ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread John Spray
On Fri, Nov 18, 2016 at 1:04 PM, Iain Buclaw wrote: > On 18 November 2016 at 13:14, John Spray wrote: >> On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote: >>> Hi, >>> >>> Follow up from the suggestion to use any of the following

Re: [ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread Iain Buclaw
On 18 November 2016 at 13:14, John Spray wrote: > On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote: >> Hi, >> >> Follow up from the suggestion to use any of the following options: >> >> - client_mount_timeout >> - rados_mon_op_timeout >> -

Re: [ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread John Spray
On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw wrote: > Hi, > > Follow up from the suggestion to use any of the following options: > > - client_mount_timeout > - rados_mon_op_timeout > - rados_osd_op_timeout > > To mitigate the waiting time being blocked on requests. Is there >

[ceph-users] Down OSDs blocking read requests.

2016-11-18 Thread Iain Buclaw
Hi, Follow up from the suggestion to use any of the following options: - client_mount_timeout - rados_mon_op_timeout - rados_osd_op_timeout To mitigate the waiting time being blocked on requests. Is there really no other way around this? If two OSDs go down that between them have the both

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-18 Thread Thomas Danan
Hi Nick, Here are some logs. The system is in IST TZ and I have filtered the logs to get only 2 last hours during which we can observe the issue. In that particular case, issue is illustrated with the following OSDs Primary: ID:607 PID:2962227 HOST:10.137.81.18 Secondary1 ID:528 PID:3721728

Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)

2016-11-18 Thread Nick Fisk
Hi Sam, Updated with some more info. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Samuel Just > Sent: 17 November 2016 19:02 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users]

[ceph-users] I want to submit a PR - Can someone guide me

2016-11-18 Thread Nick Fisk
Hi All, I want to submit a PR to include fix in this tracker bug, as I have just realised I've been experiencing it. http://tracker.ceph.com/issues/9860 I understand that I would also need to update the debian/ceph-osd.* to get the file copied, however I'm not quite sure where this new file

Re: [ceph-users] index-sharding on existing bucket ?

2016-11-18 Thread Orit Wasserman
Hi, We have support for offline bucket resharding admin command: https://github.com/ceph/ceph/pull/11230. It will be available in Jewel 10.2.5. Orit On Thu, Nov 17, 2016 at 9:11 PM, Yoann Moulin wrote: > Hello, > > is that possible to shard the index of existing buckets

[ceph-users] ceph mon eating lots of memory after upgrade0.94.2 to 0.94.9

2016-11-18 Thread Corin Langosch
Hi, about 2 weeks ago I upgraded a rather small cluster from ceph 0.94.2 to 0.94.9. The upgrade went fine, the cluster is running stable. But I just noticed that one monitor is already eating 20 GB of memory, growing slowly over time. The other 2 mons look fine. The disk space used by the

Re: [ceph-users] Register ceph daemons on initctl

2016-11-18 Thread Jaemyoun Lee
Thanks! I solved it by the ceph-osd command. So... there is no a script to install Upstart, isn't it? Jae On Fri, Nov 18, 2016 at 3:26 PM 钟佳佳 wrote: > if you built from git repo tag v10.2.3, > refers to links below from ceph.com >