Re: [ceph-users] New to Ceph - osd autostart problem

2016-07-14 Thread Dirk Laurenz
Hello George, i did what you suggested, but it didn't help...no autostart - i have to start them manually root@cephosd01:~# sgdisk -i 1 /dev/sdb Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown) Partition unique GUID: 48B7EC4E-A582-4B84-B823-8C3A36D9BB0A First sector:

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges
Thanks Zheng... Now that we have identified the exact context when the segfault appears (only in AMD 62XX) I think it should be safe to understand in each situation does the crash appears. My current compilation is ongoing and I will then test it. If it fails, I will recompile including

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
On Fri, Jul 15, 2016 at 11:19:12AM +0800, Yan, Zheng wrote: > On Fri, Jul 15, 2016 at 9:35 AM, Goncalo Borges > wrote: > > So, we are hopping that compiling 10.2.2 in an intel processor without the > > AVX extensions will solve our problem. > > > > Does this make

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Yan, Zheng
On Fri, Jul 15, 2016 at 9:35 AM, Goncalo Borges wrote: > Hi All... > > I've seen that Zheng, Brad, Pat and Greg already updated or made some > comments on the bug issue. Zheng also proposes a simple patch. However, I do > have a bit more information. We do think we

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
On Fri, Jul 15, 2016 at 11:35:10AM +1000, Goncalo Borges wrote: > Hi All... > > I've seen that Zheng, Brad, Pat and Greg already updated or made some > comments on the bug issue. Zheng also proposes a simple patch. However, I do > have a bit more information. We do think we have identified the

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges
Hi All... I've seen that Zheng, Brad, Pat and Greg already updated or made some comments on the bug issue. Zheng also proposes a simple patch. However, I do have a bit more information. We do think we have identified the source of the problem and that we can correct it. Therefore, I would

Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel

2016-07-14 Thread 席智勇
good job, thank you for sharing, Wido~ it's very useful~ 2016-07-14 14:33 GMT+08:00 Wido den Hollander : > To add, the RGWs upgraded just fine as well. > > No regions in use here (yet!), so that upgraded as it should. > > Wido > > > Op 13 juli 2016 om 16:56 schreef Wido den

Re: [ceph-users] Terrible RBD performance with Jewel

2016-07-14 Thread Adrian Saul
I would suggest caution with " filestore_odsync_write" - its fine on good SSDs, but on poor SSDs or spinning disks it will kill performance. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Somnath Roy Sent: Friday, 15 July 2016 3:12 AM To: Garg, Pankaj;

[ceph-users] Qemu with customized librbd/librados

2016-07-14 Thread ZHOU Yuan
Hi list, I ran into some issue on customizing the librbd(linking with jemalloc) with stock qemu in Ubuntu Trusty here. Stock qemu depends on librbd1 and librados2(0.80.x). These two libraries will be installed at /usr/lib/x86_64-linux-gnu/lib{rbd,rados}.so. The path is included in

[ceph-users] Slow requet on node reboot

2016-07-14 Thread Luis Ramirez
Hi, I've a cluster with 3 MON nodes and 5 OSD nodes. If i make a reboot of 1 of the osd nodes i get slow request waiting for active. 2016-07-14 19:39:07.996942 osd.33 10.255.128.32:6824/7404 888 : cluster [WRN] slow request 60.627789 seconds old, received at 2016-07-14 19:38:07.369009:

Re: [ceph-users] osd inside LXC

2016-07-14 Thread Guillaume Comte
Thanks for all your answers, Today people dedicate servers to act as ceph osd nodes which serve data stored inside to other dedicated servers which run applications or VMs, can we think about squashing the 2 inside 1? Le 14 juil. 2016 18:15, "Daniel Gryniewicz" a écrit : >

Re: [ceph-users] setting crushmap while creating pool fails

2016-07-14 Thread Oliver Dzombic
Hi, thanks for the suggestion. I tried it out. No effect. My ceph.conf looks like: [osd] osd_pool_default_crush_replicated_ruleset = 2 osd_pool_default_size = 2 osd_pool_default_min_size = 1 The complete: http://pastebin.com/sG4cPYCY But the config is completely ignored. If i run # ceph

Re: [ceph-users] Ceph RBD object-map and discard in VM

2016-07-14 Thread Jason Dillaman
I would probably be able to resolve the issue fairly quickly if it would be possible for you to provide a RBD replay trace from a slow and fast mkfs.xfs test run and attach it to the tracker ticket I just opened for this issue [1]. You can follow the instructions here [2] but would only need to

Re: [ceph-users] Ceph RBD object-map and discard in VM

2016-07-14 Thread Vaibhav Bhembre
We have been observing this similar behavior. Usually it is the case where we create a new rbd image, expose it into the guest and perform any operation that issues discard to the device. A typical command that's first run on a given device is mkfs, usually with discard on. # time mkfs.xfs -s

Re: [ceph-users] Terrible RBD performance with Jewel

2016-07-14 Thread Garg, Pankaj
Disregard the last msg. Still getting long 0 IOPS periods. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Garg, Pankaj Sent: Thursday, July 14, 2016 10:05 AM To: Somnath Roy; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Terrible RBD performance with Jewel

Re: [ceph-users] Terrible RBD performance with Jewel

2016-07-14 Thread Somnath Roy
Try increasing the following to say 10 osd_op_num_shards = 10 filestore_fd_cache_size = 128 Hope, the following you introduced after I told you , so, it shouldn't be the cause it seems (?) filestore_odsync_write = true Also, comment out the following. filestore_wbthrottle_enable = false

Re: [ceph-users] Terrible RBD performance with Jewel

2016-07-14 Thread Garg, Pankaj
Something in this section is causing all the 0 IOPS issue. Have not been able to nail down it yet. (I did comment out the filestore_max_inline_xattr_size entries, and problem still exists). If I take out the whole [osd] section, I was able to get rid of IOPS staying at 0 for long periods of

Re: [ceph-users] osd inside LXC

2016-07-14 Thread Daniel Gryniewicz
This is fairly standard for container deployment: one app per container instance. This is how we're deploying docker in our upstream ceph-docker / ceph-ansible as well. Daniel On 07/13/2016 08:41 PM, Łukasz Jagiełło wrote: Hi, Just wonder why you want each OSD inside separate LXC

Re: [ceph-users] SSD Journal

2016-07-14 Thread Christian Balzer
Hello, On Thu, 14 Jul 2016 13:37:54 +0200 Steffen Weißgerber wrote: > > > >>> Christian Balzer schrieb am Donnerstag, 14. Juli 2016 um > 05:05: > > Hello, > > > Hello, > > > > On Wed, 13 Jul 2016 09:34:35 + Ashley Merrick wrote: > > > >> Hello, > >> > >> Looking at

Re: [ceph-users] Slow requests on cluster.

2016-07-14 Thread Jaroslaw Owsiewski
I think that first symptoms of out problems occurred when we posted this issue: http://tracker.ceph.com/issues/15727 Regards -- Jarek -- Jarosław Owsiewski 2016-07-14 15:43 GMT+02:00 Jaroslaw Owsiewski < jaroslaw.owsiew...@allegrogroup.com>: > 2016-07-14 15:26 GMT+02:00 Luis Periquito

Re: [ceph-users] Slow requests on cluster.

2016-07-14 Thread Jaroslaw Owsiewski
2016-07-14 15:26 GMT+02:00 Luis Periquito : > Hi Jaroslaw, > > several things are springing up to mind. I'm assuming the cluster is > healthy (other than the slow requests), right? > > Yes. > From the (little) information you send it seems the pools are > replicated with

Re: [ceph-users] Slow requests on cluster.

2016-07-14 Thread Luis Periquito
Hi Jaroslaw, several things are springing up to mind. I'm assuming the cluster is healthy (other than the slow requests), right? From the (little) information you send it seems the pools are replicated with size 3, is that correct? Are there any long running delete processes? They usually have

[ceph-users] Slow requests on cluster.

2016-07-14 Thread Jaroslaw Owsiewski
Hi, we have problem with drastic performance slowing down on a cluster. We used radosgw with S3 protocol. Our configuration: 153 OSD SAS 1.2TB with journal on SSD disks (ratio 4:1) - no problems with networking, no hardware issues, etc. Output from "ceph df": GLOBAL: SIZE AVAIL RAW

Re: [ceph-users] 40Gb fileserver/NIC suggestions

2016-07-14 Thread Götz Reinicke - IT Koordinator
Am 13.07.16 um 17:08 schrieb c...@jack.fr.eu.org: > I am using these for other stuff: > http://www.supermicro.com/products/accessories/addon/AOC-STG-b4S.cfm > > If you want NIC, also think of the "network side" : SFP+ switch are very > common, 40G is less common, 25G is really new (= really few

Re: [ceph-users] 40Gb fileserver/NIC suggestions

2016-07-14 Thread Götz Reinicke - IT Koordinator
Am 13.07.16 um 17:44 schrieb David: > Aside from the 10GbE vs 40GbE question, if you're planning to export > an RBD image over smb/nfs I think you are going to struggle to reach > anywhere near 1GB/s in a single threaded read. This is because even > with readahead cranked right up you're still

Re: [ceph-users] Terrible RBD performance with Jewel

2016-07-14 Thread Nick Fisk
I've seen something similar if you are using RBD caching, I found that if you can fill the RBD cache faster than it can flush you get these stalls. I increased the size of the cache and also the flush threshold and this solved the problem. I didn't spend much time looking into it, but it seemed

[ceph-users] fail to add mon in a way of ceph-deploy or manually

2016-07-14 Thread 朱 彤
Using ceph-deploy: I have ceph-node1 as admin and mon, and I would like to add another mon ceph-node2. On ceph-node1: ceph-deploy mon create ceph-node2 ceph-deploy mon add ceph-node2 The fisrt command warns: [ceph-node2][WARNIN] ceph-node2 is not defined in `mon initial members`

[ceph-users] cephfs-journal-tool lead to data missing and show up

2016-07-14 Thread txm
I am a user of cephfs. Recently i met a problem by using the cephfs-journal-tool. There were some strange things happened below. 1.After use the cephfs-journal-tool and cephfs-table-tool(i came up with the "negative object nums” issues, so i tried these tools to repair the cephfs),i remount

Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel

2016-07-14 Thread Wido den Hollander
To add, the RGWs upgraded just fine as well. No regions in use here (yet!), so that upgraded as it should. Wido > Op 13 juli 2016 om 16:56 schreef Wido den Hollander : > > > Hello, > > The last 3 days I worked at a customer with a 1800 OSD cluster which had to > be upgraded