Re: [ceph-users] (no subject)

2016-07-12 Thread Anand Bhat
Use qemu-img-convert to convert from one format to another. Regards, Anand On Mon, Jul 11, 2016 at 9:37 PM, Gaurav Goyal wrote: > Thanks! > > I need to create a VM having qcow2 image file as 6.7 GB but raw image as > 600GB which is too big. > Is there a way that i

Re: [ceph-users] cephfs change metadata pool?

2016-07-12 Thread Christian Balzer
Hello, On Tue, 12 Jul 2016 20:57:00 -0500 Di Zhang wrote: > I am using 10G infiniband for cluster network and 1G ethernet for public. Hmm, very unbalanced, but I guess that's HW you already had. > Because I don't have enough slots on the node, so I am using three files on > the OS drive (SSD)

Re: [ceph-users] Cache Tier configuration

2016-07-12 Thread Christian Balzer
Hello, On Tue, 12 Jul 2016 11:01:30 +0200 Mateusz Skała wrote: > Thank You for replay. Answers below. > > > -Original Message- > > From: Christian Balzer [mailto:ch...@gol.com] > > Sent: Tuesday, July 12, 2016 3:37 AM > > To: ceph-users@lists.ceph.com > > Cc: Mateusz Skała

[ceph-users] anybody looking for ceph jobs?

2016-07-12 Thread Ken Peng
Is there anybody looking for a job related to dev/ops on ceph? If so we (a NASDAQ listed company) can provide one. please PM me for details. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] cephfs change metadata pool?

2016-07-12 Thread Di Zhang
I am using 10G infiniband for cluster network and 1G ethernet for public. Because I don't have enough slots on the node, so I am using three files on the OS drive (SSD) for journaling, which really improved but not entirely solved the problem. I am quite happy with the current IOPS, which range

Re: [ceph-users] cephfs change metadata pool?

2016-07-12 Thread Christian Balzer
Hello, On Tue, 12 Jul 2016 19:54:38 -0500 Di Zhang wrote: > It's a 5 nodes cluster. Each node has 3 OSDs. I set pg_num = 512 for both > cephfs_data and cephfs_metadata. I experienced some slow/blocked requests > issues when I was using hammer 0.94.x and prior. So I was thinking if the > pg_num

Re: [ceph-users] cephfs change metadata pool?

2016-07-12 Thread Di Zhang
It's a 5 nodes cluster. Each node has 3 OSDs. I set pg_num = 512 for both cephfs_data and cephfs_metadata. I experienced some slow/blocked requests issues when I was using hammer 0.94.x and prior. So I was thinking if the pg_num is too large for metadata. I just upgraded the cluster to Jewel

Re: [ceph-users] multiple journals on SSD

2016-07-12 Thread Christian Balzer
Hello, did you actually read my full reply last week, the in-line parts, not just the top bit? http://www.spinics.net/lists/ceph-users/msg29266.html On Tue, 12 Jul 2016 16:16:09 +0300 George Shuklin wrote: > Yes, linear io speed was concern during benchmark. I can not predict how > much

Re: [ceph-users] SSD Journal

2016-07-12 Thread Christian Balzer
Hello, On Tue, 12 Jul 2016 19:14:14 +0200 (CEST) Wido den Hollander wrote: > > > Op 12 juli 2016 om 15:31 schreef Ashley Merrick : > > > > > > Hello, > > > > Looking at final stages of planning / setup for a CEPH Cluster. > > > > Per a Storage node looking @ > > > >

Re: [ceph-users] Quick short survey which SSDs

2016-07-12 Thread Christian Balzer
Hello Warren, On Tue, 12 Jul 2016 21:09:16 + Warren Wang - ISD wrote: > Our testing so far shows that it¹s a pretty good drive. We use it for the > actual backing OSD, but the journal is on NVMe. The raw results indicate > that it¹s a reasonable journal too, if you need to colocate, but

Re: [ceph-users] cephfs change metadata pool?

2016-07-12 Thread Gregory Farnum
I'm not at all sure that rados cppool actually captures everything (it might). Doug has been working on some similar stuff for disaster recovery testing and can probably walk you through moving over. But just how large *is* your metadata pool in relation to others? Having a too-large pool doesn't

Re: [ceph-users] Advice on increasing pgs

2016-07-12 Thread Robin Percy
Thanks for the clarification Christian. Good to know about the potential increase in OSD usage. As you said, given how much available capacity we have, we're betting on the distribution not getting much worse. But we'll look at re-weighting if things go sideways. Cheers, Robin On Mon, Jul 11,

[ceph-users] cephfs change metadata pool?

2016-07-12 Thread Di Zhang
Hi, Is there any way to change the metadata pool for a cephfs without losing any existing data? I know how to clone the metadata pool using rados cppool. But the filesystem still links to the original metadata pool no matter what you name it. The motivation here is to decrease the pg_num

Re: [ceph-users] Emergency! Production cluster is down

2016-07-12 Thread Chandrasekhar Reddy
Hi Wido, Thank you for helping out. it worked like charm. i followed this steps http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors can you help in sharing any good docs which deals with backups ? Thanks, Chandra. On Tue, Jul 12, 2016 at 10:37 PM, Chandrasekhar

Re: [ceph-users] Quick short survey which SSDs

2016-07-12 Thread Warren Wang - ISD
Our testing so far shows that it¹s a pretty good drive. We use it for the actual backing OSD, but the journal is on NVMe. The raw results indicate that it¹s a reasonable journal too, if you need to colocate, but you¹ll exhaust write performance pretty quickly depending on your workload. We also

[ceph-users] setting crushmap while creating pool fails

2016-07-12 Thread Oliver Dzombic
Hi, i have a crushmap which looks like: http://pastebin.com/YC9FdTUd I issue: # ceph osd pool create vmware1 64 cold-storage-rule pool 'vmware1' created I would expect the pool to have ruleset 2. #ceph osd pool ls detail pool 10 'vmware1' replicated size 3 min_size 2 crush_ruleset 1

Re: [ceph-users] multiple journals on SSD

2016-07-12 Thread Udo Lembke
Hi Vincent, On 12.07.2016 15:03, Vincent Godin wrote: > Hello. > > I've been testing Intel 3500 as journal store for few HDD-based OSD. I > stumble on issues with multiple partitions (>4) and UDEV (sda5, sda6,etc > sometime do not appear after partition creation). And I'm thinking that >

Re: [ceph-users] ceph + vmware

2016-07-12 Thread Oliver Dzombic
Hi Jack, thank you! What has reliability to do with rbd_cache = true ? I mean aside of the fact, that if a host powers down, the "flying" data are lost. Are there any special limitations / issues with rbd_cache = true and iscsi tgt ? -- Mit freundlichen Gruessen / Best regards Oliver

[ceph-users] Setting rados_mon_op_timeout/rados_osd_op_timeout with RGW

2016-07-12 Thread Wido den Hollander
Hi, Currently when something goes wrong with the backing Ceph cluster a RGW will block for ever because RADOS operationgs block indefinitely. Using 'rados_osd_op_timeout' and 'rados_mon_op_timeout' we can tune this behaviour by setting timeouts to 30 seconds. [0] By doing so we can give

Re: [ceph-users] Object creation in librbd

2016-07-12 Thread Mansour Shafaei Moghaddam
Thank you both a lot. On Tue, Jul 12, 2016 at 7:56 AM, Jason Dillaman wrote: > All the various types of IO operations against backing objects are all > confined within AioObjectRequest.cc. At a high-level, the IO will follow > this path: > > fio -> librbd AIO API -> librbd

Re: [ceph-users] SSD Journal

2016-07-12 Thread Wido den Hollander
> Op 12 juli 2016 om 15:31 schreef Ashley Merrick : > > > Hello, > > Looking at final stages of planning / setup for a CEPH Cluster. > > Per a Storage node looking @ > > 2 x SSD OS / Journal > 10 x SATA Disk > > Will have a small Raid 1 Partition for the OS, however

Re: [ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Mike Jacobacci
Thank you! firefly looks like it will work. > On Jul 12, 2016, at 10:04 AM, Sean Redmond wrote: > > Hi, > > If your clients can support the firefly tunable you can set firefly: > > 'ceph osd crush tunables firefly' > > Or you can hide the warning: > > mon warn on

Re: [ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Wido den Hollander
> Op 12 juli 2016 om 18:37 schreef Mike Jacobacci : > > > Hi All, > > Is mounting rbd only really supported in Ubuntu? All of our servers are > CentOS 7 or RedHat 7 and since they are at Kernel 3.10, I can’t get rbd > working. Even if I use legacy tunables or set tunables

Re: [ceph-users] Emergency! Production cluster is down

2016-07-12 Thread Chandrasekhar Reddy
Thanks wido.. I will give a try. Thanks, Chandra On Tue, Jul 12, 2016 at 10:35 PM, Wido den Hollander < w...@42on.com [w...@42on.com] > wrote: > Op 12 juli 2016 om 19:00 schreef Chandrasekhar Reddy > : > > > Thanks for quick reply.. > > Should I need to remove

Re: [ceph-users] Emergency! Production cluster is down

2016-07-12 Thread Wido den Hollander
> Op 12 juli 2016 om 19:00 schreef Chandrasekhar Reddy > : > > > Thanks for quick reply.. > > Should I need to remove cephx in osd nodes also?? > disable all cephx on all nodes in the ceph.conf See:

Re: [ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Sean Redmond
Hi, If your clients can support the firefly tunable you can set firefly: 'ceph osd crush tunables firefly' Or you can hide the warning: mon warn on legacy crush tunables = false in [mon] section in ceph.conf. Thanks On Tue, Jul 12, 2016 at 5:59 PM, Mike Jacobacci wrote: >

Re: [ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Mike Jacobacci
Thanks, Can I ignore this warning then? health HEALTH_WARN crush map has legacy tunables (require bobtail, min is firefly) Cheers, Mike > On Jul 12, 2016, at 9:57 AM, Sean Redmond wrote: > > Hi, > > Take a look at the docs here >

Re: [ceph-users] Emergency! Production cluster is down

2016-07-12 Thread Chandrasekhar Reddy
Thanks for quick reply.. Should I need to remove cephx in osd nodes also?? Thanks, Chandra On Tue, Jul 12, 2016 at 10:22 PM, Oliver Dzombic < i...@ip-interactive.de [i...@ip-interactive.de] > wrote: Hi, fast aid: remove cephx authentication. -- Mit freundlichen Gruessen / Best regards

Re: [ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Sean Redmond
Hi, Take a look at the docs here ( http://docs.ceph.com/docs/jewel/rados/operations/crush-map/#tunables) for details on what tunables work with which kernel version. and how you can change them e.g: 'ceph osd crush tunables bobtail' You could use a main line kernel from here

Re: [ceph-users] Emergency! Production cluster is down

2016-07-12 Thread Oliver Dzombic
Hi, fast aid: remove cephx authentication. -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung:

[ceph-users] Emergency! Production cluster is down

2016-07-12 Thread Chandrasekhar Reddy
Hi Guys, Need help. I had 3 monitors nodes and 2 went down ( Disk got corrupted ). after some time even 3rd monitor went unresponsive. so i rebooted the 3rd node. it came up but ceph is not working . so i tried to remove 2 failed monitors from ceph.conf file and restarted the mon and osd.

Re: [ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Mike Jacobacci
Hi Sean, Thanks for the quick response, this is what I see in dimes: set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 How can I set the tunable low enough? And what does that mean for performance? Cheers, Mike > On Jul 12, 2016, at 9:43 AM, Sean Redmond

Re: [ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Sean Redmond
Hi, It should work for you with kernel 3.10 as long as turntables are set low enough - Do you see anything in 'dmesg'? Thanks On Tue, Jul 12, 2016 at 5:37 PM, Mike Jacobacci wrote: > Hi All, > > Is mounting rbd only really supported in Ubuntu? All of our servers are >

[ceph-users] Realistic Ceph Client OS

2016-07-12 Thread Mike Jacobacci
Hi All, Is mounting rbd only really supported in Ubuntu? All of our servers are CentOS 7 or RedHat 7 and since they are at Kernel 3.10, I can’t get rbd working. Even if I use legacy tunables or set tunables to 1/2/3… Nothing will mount the block device. The only answer I can find to get

Re: [ceph-users] Object creation in librbd

2016-07-12 Thread Jason Dillaman
All the various types of IO operations against backing objects are all confined within AioObjectRequest.cc. At a high-level, the IO will follow this path: fio -> librbd AIO API -> librbd AioImageRequestWQ [1] -> librbd AioImageRequest (-> osdc ObjectCacher [2] -> librbd LibrbdWriteback [3]) ->

Re: [ceph-users] Can't remove /var/lib/ceph/osd/ceph-53 dir

2016-07-12 Thread William Josefsson
Yes, I can # umount /dev/sdf4 without error, however still # rm -rf /var/lib/ceph/osd/ceph-53 fails with, Device or resource busy. After trying many commands what I finally did was to: [root@cnode-1 ~]# systemctl stop systemd-udevd.service Warning: Stopping systemd-udevd.service, but it can

[ceph-users] osd inside LXC

2016-07-12 Thread Guillaume Comte
Hi, I am currently defining a storage architecture based on ceph, and i wish to know if i don't misunderstood some stuffs. So, i plan to deploy for each HDD of each servers as much as OSD as free harddrive, each OSD will be inside a LXC container. Then, i wish to turn the server itself as a rbd

[ceph-users] SSD Journal

2016-07-12 Thread Ashley Merrick
Hello, Looking at final stages of planning / setup for a CEPH Cluster. Per a Storage node looking @ 2 x SSD OS / Journal 10 x SATA Disk Will have a small Raid 1 Partition for the OS, however not sure if best to do: 5 x Journal Per a SSD 10 x Journal on Raid 1 of two SSD's Is the

Re: [ceph-users] multiple journals on SSD

2016-07-12 Thread George Shuklin
Yes, linear io speed was concern during benchmark. I can not predict how much linear IO would be generated by clients (compare to IOPS) so we going to balance HDD-OSD per SSD according to real usage. If users would generate too much random IO, we will raise HDD/SSD ratio, if they would

Re: [ceph-users] multiple journals on SSD

2016-07-12 Thread Guillaume Comte
2016-07-12 15:03 GMT+02:00 Vincent Godin : > Hello. > > I've been testing Intel 3500 as journal store for few HDD-based OSD. I > stumble on issues with multiple partitions (>4) and UDEV (sda5, sda6,etc > sometime do not appear after partition creation). And I'm thinking

Re: [ceph-users] multiple journals on SSD

2016-07-12 Thread Vincent Godin
Hello. I've been testing Intel 3500 as journal store for few HDD-based OSD. I stumble on issues with multiple partitions (>4) and UDEV (sda5, sda6,etc sometime do not appear after partition creation). And I'm thinking that partition is not that useful for OSD management, because linux do no allow

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-12 Thread Goncalo Borges
Hi All... Thank you for continuing to follow this already very long thread. Pat and Greg are correct in their assumption regarding the 10gb virtual memory footprint I see for ceph-fuse process in our cluster with 12 core (24 because of hyperthreading) machines and 96 gb of RAM. The source is

Re: [ceph-users] Can't remove /var/lib/ceph/osd/ceph-53 dir

2016-07-12 Thread Pisal, Ranjit Dnyaneshwar
Try umount /dev/ - This should unmount the directory. Post this host restart may be needed to reflect the changes. Best Regards, Ranjit From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of William Josefsson Sent: Tuesday, July 12, 2016 4:15 PM To: ceph-users@lists.ceph.com

Re: [ceph-users] Advice on meaty CRUSH map update

2016-07-12 Thread Christian Balzer
On Tue, 12 Jul 2016 12:50:26 +0200 (CEST) Wido den Hollander wrote: > > > Op 12 juli 2016 om 12:35 schreef Simon Murray > > : > > > > > > Hi all. > > > > I'm about to perform a rather large reorganization of our cluster and > > thought I'd get some insight

Re: [ceph-users] Antw: Re: Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Wido den Hollander
> Op 12 juli 2016 om 12:18 schreef Steffen Weißgerber : > > > > > >>> Christian Balzer schrieb am Dienstag, 12. Juli 2016 um > >>> 08:47: > > > Hello, > > > > On Tue, 12 Jul 2016 08:39:16 +0200 (CEST) Wido den Hollander wrote: > > > >> Hi, > >> > >>

Re: [ceph-users] Advice on meaty CRUSH map update

2016-07-12 Thread Wido den Hollander
> Op 12 juli 2016 om 12:35 schreef Simon Murray > : > > > Hi all. > > I'm about to perform a rather large reorganization of our cluster and > thought I'd get some insight from the community before going any further. > > The current state we have (logically) is

[ceph-users] Can't remove /var/lib/ceph/osd/ceph-53 dir

2016-07-12 Thread William Josefsson
Hi Cephers, I got problem removing /var/lib/ceph/osd/ceph-53 dir which was used by OSD.53 that I have removed. The way that I remove the OSD: 1. ceph osd out 53 2. sudo service ceph stop osd.53 3. ceph osd crush remove osd.53 4. ceph auth del osd.53 5. ceph osd rm 53 6. sudo umount

[ceph-users] Advice on meaty CRUSH map update

2016-07-12 Thread Simon Murray
Hi all. I'm about to perform a rather large reorganization of our cluster and thought I'd get some insight from the community before going any further. The current state we have (logically) is two trees, one for spinning rust, one for SSD. Chassis are the current failure domain, and are all

[ceph-users] Antw: Re: Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Steffen Weißgerber
>>> Christian Balzer schrieb am Dienstag, 12. Juli 2016 um >>> 08:47: > Hello, > > On Tue, 12 Jul 2016 08:39:16 +0200 (CEST) Wido den Hollander wrote: > >> Hi, >> >> I am upgrading a 1800 OSD cluster from Hammer 0.94.5 to 0.94.7 prior to > going to Jewel and while doing so

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-12 Thread Lionel Bouton
Hi, Le 12/07/2016 02:51, Brad Hubbard a écrit : > [...] This is probably a fragmentation problem : typical rbd access patterns cause heavy BTRFS fragmentation. >>> To the extent that operations take over 120 seconds to complete? Really? >> Yes, really. I had these too. By default

Re: [ceph-users] Cache Tier configuration

2016-07-12 Thread Mateusz Skała
Thank You for replay. Answers below. > -Original Message- > From: Christian Balzer [mailto:ch...@gol.com] > Sent: Tuesday, July 12, 2016 3:37 AM > To: ceph-users@lists.ceph.com > Cc: Mateusz Skała > Subject: Re: [ceph-users] Cache Tier configuration > > >

Re: [ceph-users] ceph master build fails on src/gmock, workaround?

2016-07-12 Thread Brad Hubbard
This was resolved in http://tracker.ceph.com/issues/16646 On Sun, Jul 10, 2016 at 5:09 PM, Brad Hubbard wrote: > On Sat, Jul 09, 2016 at 10:43:52AM +, Kevan Rehm wrote: >> Greetings, >> >> I cloned the master branch of ceph at https://github.com/ceph/ceph.git >> onto a

Re: [ceph-users] Object creation in librbd

2016-07-12 Thread Wido den Hollander
> Op 12 juli 2016 om 0:28 schreef Mansour Shafaei Moghaddam > : > > > Can anyone explain or at least refer to the lines of the codes in librd by > which objects are created? I need to know the relation between objects and > fio's iodepth... Objects are not created

Re: [ceph-users] Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Wido den Hollander
> Op 12 juli 2016 om 8:47 schreef Christian Balzer : > > > > Hello, > > On Tue, 12 Jul 2016 08:39:16 +0200 (CEST) Wido den Hollander wrote: > > > Hi, > > > > I am upgrading a 1800 OSD cluster from Hammer 0.94.5 to 0.94.7 prior to > > going to Jewel and while doing so I see

Re: [ceph-users] Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Christian Balzer
Hello, On Tue, 12 Jul 2016 08:39:16 +0200 (CEST) Wido den Hollander wrote: > Hi, > > I am upgrading a 1800 OSD cluster from Hammer 0.94.5 to 0.94.7 prior to going > to Jewel and while doing so I see the monitors being flooded with these > messages: > Google is your friend (and so is the

[ceph-users] Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Wido den Hollander
Hi, I am upgrading a 1800 OSD cluster from Hammer 0.94.5 to 0.94.7 prior to going to Jewel and while doing so I see the monitors being flooded with these messages: 2016-07-12 08:28:12.919748 osd.1200 [WRN] failed to encode map e130549 with expected crc 2016-07-12 08:28:12.921943 osd.1338

Re: [ceph-users] Advice on increasing pgs

2016-07-12 Thread Christian Balzer
Hello, On Tue, 12 Jul 2016 03:43:41 + Robin Percy wrote: > First off, thanks for the great response David. > Yes, that was a very good writeup. > If I understand correctly, you're saying there are two distinct costs to > consider: peering, and backfilling. The backfilling cost is a