Re: [ceph-users] Enabling Jumbo Frames on ceph cluser

2017-08-11 Thread w...@42on.com


> Op 11 aug. 2017 om 20:22 heeft Sameer Tiwari  het 
> volgende geschreven:
> 
> Hi,
> 
> We ran a test with 1500 MTU and 9000MTU on a small ceph test cluster (3mon + 
> 10 hosts with 2 SSD each, one for journal and one for data) and found minimal 
> ~10% perf improvements.
> 
> We tested with FIO for 4K, 8K and 64K block sizes, using RBD directly.
> 
> Anyone else have any experience with this?
> 

I saw good results as well, not 10% perse. However, it's 1/6 of the packets per 
second over the wire, so I always recommend to do it.

Wido

> Thanks,
> Sameer
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Change Partition Schema on OSD Possible?

2017-01-16 Thread w...@42on.com


> Op 17 jan. 2017 om 05:31 heeft Hauke Homburg  het 
> volgende geschreven:
> 
> Am 16.01.2017 um 12:24 schrieb Wido den Hollander:
>>> Op 14 januari 2017 om 14:58 schreef Hauke Homburg :
>>> 
>>> 
>>> Am 14.01.2017 um 12:59 schrieb Wido den Hollander:
> Op 14 januari 2017 om 11:05 schreef Hauke Homburg 
> :
> 
> 
> Hello,
> 
> In our Ceph Cluster are our HDD in the OSD with 50% DATA in GPT
> Partitions configured. Can we change this Schema to have more Data 
> Storage?
> 
 How do you mean?
 
> Our HDD are 5TB so i hope to have more Space when i change the GPT
> bigger from 2TB to 3 oder 4 TB.
> 
 On a 5TB disks only 50% is used for data? What is the other 50% being used 
 for?
>>> I think for Journal. We worked with cephdeploy an with
>>> data-path:journal-path on a Device.
>> Hmm, that's weird. ceph-deploy uses a 5GB partition by default for the 
>> journal.
>> 
>> Are you sure about that? Can you post a partition scheme of a disk and a 'df 
>> -h' output?
> sgdisk -p /dev/sdg
> Disk /dev/sdg: 11721045168 sectors, 5.5 TiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): BFC047BB-75D7-4F18-B8A6-0C538454FA43
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 11721045134
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 2014 sectors (1007.0 KiB)
> 
> Number  Start (sector)End (sector)  Size   Code  Name
>   110487808 11721045134   5.5 TiB   ceph data
>   2204810487807   5.0 GiB   ceph journal
> 

Looks good. 5GB journal and rest for the OSD's data. Nothing wrong there.

Wido

>> 
>> Wido
>> 
> Can we modify the Partitions without install reinstall the Server?
> 
 Sure! Just like changing any other GPT partition. Don't forget to resize 
 XFS afterwards with xfs_growfs.
 
 However, test this on one OSD/disk first before doing it on all.
 
 Wido
 
> Whats the best Way to do this? Boot the Node with a Rescue CD and change
> the Partition with gparted, and boot the Server again?
> 
> Thanks for help
> 
> Regards
> 
> Hauke
> 
> -- 
> www.w3-creative.de
> 
> www.westchat.de
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> -- 
>>> www.w3-creative.de
>>> 
>>> www.westchat.de
>>> 
> 
> 
> -- 
> www.w3-creative.de
> 
> www.westchat.de
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS

2017-01-16 Thread w...@42on.com


> Op 17 jan. 2017 om 03:47 heeft Tu Holmes  het volgende 
> geschreven:
> 
> I could use either one. I'm just trying to get a feel for how stable the 
> technology is in general. 

Stable. Multiple customers of me run it in production with the kernel client 
and serious load on it. No major problems.

Wido

>> On Mon, Jan 16, 2017 at 3:19 PM Sean Redmond  wrote:
>> What's your use case? Do you plan on using kernel or fuse clients? 
>> 
>> On 16 Jan 2017 23:03, "Tu Holmes"  wrote:
>> So what's the consensus on CephFS?
>> 
>> Is it ready for prime time or not?
>> 
>> //Tu
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover VM Images from Dead Cluster

2016-12-24 Thread w...@42on.com


> Op 24 dec. 2016 om 17:20 heeft L. Bader <ceph-us...@lbader.de> het volgende 
> geschreven:
> 
> Do you have any references on this?
> 
> I searched for something like this quite a lot and did not find anything...
> 

No, saw it somewhere on the ML I think, but I  am not sure.

I just know it is in development or on a todo somewhere.

> 
>> On 24.12.2016 14:55, w...@42on.com wrote:
>> 
>>> Op 24 dec. 2016 om 14:47 heeft L. Bader <ceph-us...@lbader.de> het volgende 
>>> geschreven:
>>> 
>>> Hello,
>>> 
>>> I have a problem with our (dead) Ceph-Cluster: The configuration seems to 
>>> be gone (deleted / overwritten) and all monitors are gone aswell. However, 
>>> we do not have (up-to-date) backups for all VMs (used with Proxmox) and we 
>>> would like to recover them from "raw" OSDs only (we have all OSDs mounted 
>>> on one Storage Server). Restoring the cluster itself seems impossible.
>>> 
>> Work is on it's way iirc to restore MONs from OSD data.
>> 
>> You might want to search for that, the tracker or Github might help.
>> 
>>> To recover the VM images I tried to write a simple tool that:
>>> 1) searches all OSDs for udata files
>>> 2) Sorts them by Image ID
>>> 3) Sorts them by "position" / offset
>>> 4) Assembles the 4MB blocks to a single file using dd
>>> 
>>> (See: https://gitlab.lbader.de/kryptur/ceph-recovery/tree/master )
>>> 
>>> However, for many (nearly all) images there are missing blocks (empty parts 
>>> I guess). So I created a 4MB block of Null-Bytes for each missing parts.
>>> 
>>> The problem is that the created Image is not usable. fdisk detects 
>>> partitions correctly, but we cannot access the data in any way.
>>> 
>>> Is there another way to recover the data without having any (working) ceph 
>>> tools?
>>> 
>>> Greetings and Merry Christmas :)
>>> 
>>> Lennart
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover VM Images from Dead Cluster

2016-12-24 Thread w...@42on.com


> Op 24 dec. 2016 om 14:47 heeft L. Bader  het volgende 
> geschreven:
> 
> Hello,
> 
> I have a problem with our (dead) Ceph-Cluster: The configuration seems to be 
> gone (deleted / overwritten) and all monitors are gone aswell. However, we do 
> not have (up-to-date) backups for all VMs (used with Proxmox) and we would 
> like to recover them from "raw" OSDs only (we have all OSDs mounted on one 
> Storage Server). Restoring the cluster itself seems impossible.
> 

Work is on it's way iirc to restore MONs from OSD data.

You might want to search for that, the tracker or Github might help.

> To recover the VM images I tried to write a simple tool that:
> 1) searches all OSDs for udata files
> 2) Sorts them by Image ID
> 3) Sorts them by "position" / offset
> 4) Assembles the 4MB blocks to a single file using dd
> 
> (See: https://gitlab.lbader.de/kryptur/ceph-recovery/tree/master )
> 
> However, for many (nearly all) images there are missing blocks (empty parts I 
> guess). So I created a 4MB block of Null-Bytes for each missing parts.
> 
> The problem is that the created Image is not usable. fdisk detects partitions 
> correctly, but we cannot access the data in any way.
> 
> Is there another way to recover the data without having any (working) ceph 
> tools?
> 
> Greetings and Merry Christmas :)
> 
> Lennart
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread w...@42on.com


> Op 3 nov. 2016 om 19:13 heeft Joao Eduardo Luis <j...@suse.de> het volgende 
> geschreven:
> 
>> On 11/03/2016 05:52 PM, w...@42on.com wrote:
>> 
>> 
>>>> Op 3 nov. 2016 om 16:44 heeft Joao Eduardo Luis <j...@suse.de> het 
>>>> volgende geschreven:
>>>> 
>>>>> On 11/03/2016 01:24 PM, Wido den Hollander wrote:
>>>>> 
>>>>> Op 3 november 2016 om 13:09 schreef Joao Eduardo Luis <j...@suse.de>:
>>>>> 
>>>>> 
>>>>>> On 11/03/2016 09:40 AM, Wido den Hollander wrote:
>>>>>> root@mon3:/var/lib/ceph/mon# ceph-monstore-tool ceph-mon3 dump-keys|awk 
>>>>>> '{print $1}'|uniq -c
>>>>>>96 auth
>>>>>>  1143 logm
>>>>>> 3 mdsmap
>>>>>> 1 mkfs
>>>>>> 1 mon_sync
>>>>>> 6 monitor
>>>>>> 3 monmap
>>>>>>  1158 osdmap
>>>>>> 358364 paxos
>>>>>>   656 pgmap
>>>>>> 6 pgmap_meta
>>>>>>   168 pgmap_osd
>>>>>>  6144 pgmap_pg
>>>>>> root@mon3:/var/lib/ceph/mon#
>>>>>> 
>>>>>> So there are 358k Paxos entries in the Mon store.
>>>>>> 
>>>>>> Any suggestions on how to trim those from the MON store(s)?
>>>>> 
>>>>> Can you check the value of paxos:first_committed in the store?
>>>>> 
>>>> 
>>>> Here you go:
>>>> 
>>>> root@mon3:~# ceph-monstore-tool /var/lib/ceph/mon/ceph-mon3 show-versions 
>>>> --map-type paxos
>>>> first committed:174349108
>>>> last  committed:174349609
>>>> root@mon3:~#
>>>> 
>>>> Doesn't seem like a lot of keys in there?
>>> 
>>> I have this annoying feeling that this relates to some really old bug I 
>>> can't really recall the ticket number, but in essence was not trimming the 
>>> maps even though first_committed was updated.
>>> 
>>> The one way out of this I can think of is to change the value of 
>>> 'first_committed' in the store to the very first paxos epoch you have. You 
>>> will have to use 'ceph_kvstore_tool' to do that.
>>> 
>> 
>> Is one mon enough? So stop that mon, change the value and start it again?
>> 
>> Or do I need to do this on a mon which will become primary?
> 
> Sorry for not being clear. This needs to be done on the one that will become 
> the leader.
> 

Ok! Np

> Personally, I don't like this solution one bit, but I can't see any other way 
> without a patched monitor, or maybe ceph_monstore_tool.
> 
> If you are willing to wait till tomorrow, I'll be happy to kludge a 
> sanitation feature onto ceph_monstore_tool that will clean those versions for 
> you (latency being due to coding + testing + building).
> 

Yes, no rush. I will wait. Thanks!

Wido

>  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread w...@42on.com


> Op 3 nov. 2016 om 16:44 heeft Joao Eduardo Luis  het volgende 
> geschreven:
> 
>> On 11/03/2016 01:24 PM, Wido den Hollander wrote:
>> 
>>> Op 3 november 2016 om 13:09 schreef Joao Eduardo Luis :
>>> 
>>> 
 On 11/03/2016 09:40 AM, Wido den Hollander wrote:
 root@mon3:/var/lib/ceph/mon# ceph-monstore-tool ceph-mon3 dump-keys|awk 
 '{print $1}'|uniq -c
 96 auth
   1143 logm
  3 mdsmap
  1 mkfs
  1 mon_sync
  6 monitor
  3 monmap
   1158 osdmap
 358364 paxos
656 pgmap
  6 pgmap_meta
168 pgmap_osd
   6144 pgmap_pg
 root@mon3:/var/lib/ceph/mon#
 
 So there are 358k Paxos entries in the Mon store.
 
 Any suggestions on how to trim those from the MON store(s)?
>>> 
>>> Can you check the value of paxos:first_committed in the store?
>>> 
>> 
>> Here you go:
>> 
>> root@mon3:~# ceph-monstore-tool /var/lib/ceph/mon/ceph-mon3 show-versions 
>> --map-type paxos
>> first committed:174349108
>> last  committed:174349609
>> root@mon3:~#
>> 
>> Doesn't seem like a lot of keys in there?
> 
> I have this annoying feeling that this relates to some really old bug I can't 
> really recall the ticket number, but in essence was not trimming the maps 
> even though first_committed was updated.
> 
> The one way out of this I can think of is to change the value of 
> 'first_committed' in the store to the very first paxos epoch you have. You 
> will have to use 'ceph_kvstore_tool' to do that.
> 

Is one mon enough? So stop that mon, change the value and start it again?

Or do I need to do this on a mon which will become primary?

Wido

> This approach presumes your leader also has the same number of maps (because, 
> otherwise, the monitor would not sync bajillions of useless maps), and that 
> the version trim will then also be committed by the peons.
> 
>  -Joao
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrubbing causes severe I/O stalling

2016-10-28 Thread w...@42on.com


> Op 28 okt. 2016 om 11:52 heeft Kees Meijs  het volgende 
> geschreven:
> 
> Hi Cephers,
> 
> Using Ceph 0.94.9-1trusty we noticed severe I/O stalling during deep 
> scrubbing (vanilla parameters used in regards to scrubbing). I'm aware this 
> has been discussed before, but I'd like to share the parameters we're going 
> to evaluate:
> 
> osd_scrub_begin_hour 1
> osd_scrub_end_hour 7

I don't like this personally. Your cluster should be capable of doing a deep 
scrub at any moment. If not it will also not be able to handle a node failure 
during peak times.

> osd_scrub_min_interval 259200
> osd_scrub_max_interval 1814400
> osd_scrub_chunk_max 5
> osd_scrub_sleep .1
You can try to bump that even more.

Wido

> osd_deep_scrub_interval 1814400
> osd_deep_scrub_stride 1048576
> Anyway, thoughts on the matter or specific parameter advice is more than 
> welcome.
> 
> Cheers,
> Kees
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph all NVME Cluster sequential read speed

2016-08-18 Thread w...@42on.com


> Op 18 aug. 2016 om 10:15 heeft nick  het volgende geschreven:
> 
> Hi,
> we are currently building a new ceph cluster with only NVME devices. One Node 
> consists of 4x Intel P3600 2TB devices. Journal and filestore are on the same 
> device. Each server has a 10 core CPU and uses 10 GBit ethernet NICs for 
> public and ceph storage traffic. We are currently testing with 4 nodes 
> overall. 
> 
> The cluster will be used only for virtual machine images via RBD. The pools 
> are replicated (no EC).
> 
> Altough we are pretty happy with the single threaded write performance, the 
> single threaded (iodepth=1) sequential read performance is a bit 
> disappointing.
> 
> We are testing with fio and the rbd engine. After creating a 10GB RBD image, 
> we 
> use the following fio params to test:
> """
> [global]
> invalidate=1
> ioengine=rbd
> iodepth=1
> ramp_time=2
> size=2G
> bs=4k
> direct=1
> buffered=0
> """
> 
> For a 4k workload we are reaching 1382 IOPS. Testing one NVME device directly 
> (with psync engine and iodepth of 1) we can reach up to 84176 IOPS. This is a 
> big difference.
> 

Network is a big difference as well. Keep in mind the Ceph OSDs have to process 
the I/O as well.

For example, if you have a network latency of 0.200ms, in 1.000ms (1 sec) you 
will be able to potentially do 5.000 IOps, but that is without the OSD or any 
other layers doing any work.


> I already read that the read_ahead setting might improve the situation, 
> although this would only be true when using buffered reads, right?
> 
> Does anyone have other suggestions to get better serial read performance?
> 

You might want to disable all logging and look at AsyncMessenger. Disabling 
cephx might help, but that is not very safe to do.

Wido

> Cheers
> Nick
> 
> -- 
> Sebastian Nickel
> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS quota

2016-08-14 Thread w...@42on.com


> Op 14 aug. 2016 om 08:55 heeft Willi Fehler  het 
> volgende geschreven:
> 
> Hello guys,
> 
> my cluster is running on the latest Ceph version. My cluster and my client 
> are running on CentOS 7.2.
> 
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> 
> My Client is using CephFS, I'm not using Fuse. My fstab:
> 
> linsrv001,linsrv002,linsrv003:/ /mnt/cephfs ceph 
> noatime,dirstat,_netdev,name=cephfs,secretfile=/etc/ceph/cephfs.secret 0 0

See http://docs.ceph.com/docs/master/cephfs/quota/

The Kernel hasn't implemented quotas yet.

Wido

> Regards - Willi
> 
> 
>> Am 13.08.16 um 13:58 schrieb Goncalo Borges:
>> Hi Willi
>> If you are using ceph-fuse, to enable quota, you need to pass 
>> "--client-quota" option in the mount operation.
>> Cheers
>> Goncalo
>> 
>> 
>> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Willi 
>> Fehler [willi.feh...@t-online.de]
>> Sent: 13 August 2016 17:23
>> To: ceph-users
>> Subject: [ceph-users] CephFS quota
>> 
>> Hello,
>> 
>> I'm trying to use CephFS quaotas. On my client I've created a
>> subdirectory in my CephFS mountpoint and used the following command from
>> the documentation.
>> 
>> setfattr -n ceph.quota.max_bytes -v 1 /mnt/cephfs/quota
>> 
>> But if I create files bigger than my quota nothing happens. Do I need a
>> mount option to use Quotas?
>> 
>> Regards - Willi
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS quota

2016-08-13 Thread w...@42on.com


> Op 13 aug. 2016 om 09:24 heeft Willi Fehler  het 
> volgende geschreven:
> 
> Hello,
> 
> I'm trying to use CephFS quaotas. On my client I've created a subdirectory in 
> my CephFS mountpoint and used the following command from the documentation.
> 
> setfattr -n ceph.quota.max_bytes -v 1 /mnt/cephfs/quota
> 
> But if I create files bigger than my quota nothing happens. Do I need a mount 
> option to use Quotas?
> 

What version is the client? CephFS quotas rely on the client to support it as 
well.

Wido

> Regards - Willi
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-13 Thread w...@42on.com


> Op 13 aug. 2016 om 08:58 heeft Georgios Dimitrakakis  
> het volgende geschreven:
> 
> 
>>> Op 13 aug. 2016 om 03:19 heeft Bill Sharer  het volgende geschreven:
>>> 
>>> If all the system disk does is handle the o/s (ie osd journals are
>>> on dedicated or osd drives as well), no problem. Just rebuild the
>>> system and copy the ceph.conf back in when you re-install ceph.Â
>>> Keep a spare copy of your original fstab to keep your osd filesystem
>>> mounts straight.
>> 
>> With systems deployed with ceph-disk/ceph-deploy you no longer need a
>> fstab. Udev handles it.
>> 
>>> Just keep in mind that you are down 11 osds while that system drive
>>> gets rebuilt though. It's safer to do 10 osds and then have a
>>> mirror set for the system disk.
>> 
>> In the years that I run Ceph I rarely see OS disks fail. Why bother?
>> Ceph is designed for failure.
>> 
>> I would not sacrifice a OSD slot for a OS disk. Also, let's say a
>> additional OS disk is €100.
>> 
>> If you put that disk in 20 machines that's €2.000. For that money
>> you can even buy a additional chassis.
>> 
>> No, I would run on a single OS disk. It fails? Let it fail. Re-install
>> and you're good again.
>> 
>> Ceph makes sure the data is safe.
>> 
> 
> Wido,
> 
> can you elaborate a little bit more on this? How does CEPH achieve that? Is 
> it by redundant MONs?
> 

No, Ceph replicates over hosts by default. So you can loose a host and the 
other ones will have copies.


> To my understanding the OSD mapping is needed to have the cluster back. In 
> our setup (I assume in others as well) that is stored in the OS 
> disk.Furthermore, our MONs are running on the same host as OSDs. So if the OS 
> disk fails not only we loose the OSD host but we also loose the MON node. Is 
> there another way to be protected by such a failure besides additional MONs?
> 

Aha, MON on the OSD host. I never recommend that. Try to use dedicated machines 
with a good SSD for MONs.

Technically you can run the MON on the OSD nodes, but I always try to avoid it. 
It just isn't practical when stuff really goes wrong.

Wido

> We recently had a problem where a user accidentally deleted a volume. Of 
> course this has nothing to do with OS disk failure itself but we 've been in 
> the loop to start looking for other possible failures on our system that 
> could jeopardize data and this thread got my attention.
> 
> 
> Warmest regards,
> 
> George
> 
> 
>> Wido
>> 
>> Bill Sharer
>> 
>>> On 08/12/2016 03:33 PM, Ronny Aasen wrote:
>>> 
 On 12.08.2016 13:41, Félix Barbeira wrote:
 
 Hi,
 
 I'm planning to make a ceph cluster but I have a serious doubt. At
 this moment we have ~10 servers DELL R730xd with 12x4TB SATA
 disks. The official ceph docs says:
 
 "We recommend using a dedicated drive for the operating system and
 software, and one drive for each Ceph OSD Daemon you run on the
 host."
 
 I could use for example 1 disk for the OS and 11 for OSD data. In
 the operating system I would run 11 daemons to control the OSDs.
 But...what happen to the cluster if the disk with the OS fails??
 maybe the cluster thinks that 11 OSD failed and try to replicate
 all that data over the cluster...that sounds no good.
 
 Should I use 2 disks for the OS making a RAID1? in this case I'm
 "wasting" 8TB only for ~10GB that the OS needs.
 
 In all the docs that i've been reading says ceph has no unique
 single point of failure, so I think that this scenario must have a
 optimal solution, maybe somebody could help me.
 
 Thanks in advance.
 
 --
 
 Félix Barbeira.
>>> if you do not have dedicated slots on the back for OS disks, then i
>>> would recomend using SATADOM flash modules directly into a SATA port
>>> internal in the machine. Saves you 2 slots for osd's and they are
>>> quite reliable. you could even use 2 sd cards if your machine have
>>> the internal SD slot
>>> 
>>> 
>> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
>>> [1]
>>> 
>>> kind regards
>>> Ronny Aasen
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com [2]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-u
>> ph.com
>> http://li
>> 
>>> i/ceph-users-ceph.com
>> 
>> 
>> Links:
>> --
>> [1]
>> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
>> [2] mailto:ceph-users@lists.ceph.com
>> [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> [4] mailto:bsha...@sharerland.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list

Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-13 Thread w...@42on.com


> Op 13 aug. 2016 om 03:19 heeft Bill Sharer  het 
> volgende geschreven:
> 
> If all the system disk does is handle the o/s (ie osd journals are on 
> dedicated or osd drives as well), no problem.  Just rebuild the system and 
> copy the ceph.conf back in when you re-install ceph.  Keep a spare copy of 
> your original fstab to keep your osd filesystem mounts straight.
> 

With systems deployed with ceph-disk/ceph-deploy you no longer need a fstab. 
Udev handles it.

> Just keep in mind that you are down 11 osds while that system drive gets 
> rebuilt though.  It's safer to do 10 osds and then have a mirror set for the 
> system disk.
> 

In the years that I run Ceph I rarely see OS disks fail. Why bother? Ceph is 
designed for failure.

I would not sacrifice a OSD slot for a OS disk. Also, let's say a additional OS 
disk is €100.

If you put that disk in 20 machines that's €2.000. For that money you can even 
buy a additional chassis.

No, I would run on a single OS disk. It fails? Let it fail. Re-install and 
you're good again.

Ceph makes sure the data is safe.

Wido

> Bill Sharer
> 
> 
>> On 08/12/2016 03:33 PM, Ronny Aasen wrote:
>>> On 12.08.2016 13:41, Félix Barbeira wrote:
>>> Hi,
>>> 
>>> I'm planning to make a ceph cluster but I have a serious doubt. At this 
>>> moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The official 
>>> ceph docs says:
>>> 
>>> "We recommend using a dedicated drive for the operating system and 
>>> software, and one drive for each Ceph OSD Daemon you run on the host."
>>> 
>>> I could use for example 1 disk for the OS and 11 for OSD data. In the 
>>> operating system I would run 11 daemons to control the OSDs. But...what 
>>> happen to the cluster if the disk with the OS fails?? maybe the cluster 
>>> thinks that 11 OSD failed and try to replicate all that data over the 
>>> cluster...that sounds no good.
>>> 
>>> Should I use 2 disks for the OS making a RAID1? in this case I'm "wasting" 
>>> 8TB only for ~10GB that the OS needs.
>>> 
>>> In all the docs that i've been reading says ceph has no unique single point 
>>> of failure, so I think that this scenario must have a optimal solution, 
>>> maybe somebody could help me.
>>> 
>>> Thanks in advance.
>>> 
>>> -- 
>>> Félix Barbeira.
>>> 
>> if you do not have dedicated slots on the back for OS disks, then i would 
>> recomend using SATADOM flash modules directly into a SATA port internal in 
>> the machine. Saves you 2 slots for osd's and they are quite reliable. you 
>> could even use 2 sd cards if your machine have the internal SD slot 
>> 
>> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
>> 
>> kind regards
>> Ronny Aasen
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Uncompactable Monitor Store at 69GB -- Re: Cluster in warn state, not sure what to do next.

2016-07-21 Thread w...@42on.com


> Op 21 jul. 2016 om 21:06 heeft Salwasser, Zac  het 
> volgende geschreven:
> 
> Thanks for the response!  Long story short, there’s one specific osd in my 
> cluster that is responsible, according to the dump command, for the two pg’s 
> that are still down.
>  
> I wiped the osd data directory and recreated that osd a couple of days ago, 
> but it is still stuck in the “booting” state.  Any ideas how I can 
> investigate that particular osd further?
>  
>  

Did you re-use the old UUID of the OSD as set in the OSDMap? Otherwise it will 
stay in booting.

However, wiping an OSD like that is usually not a good idea.

Wido

>  
> From: Gregory Farnum 
> Date: Thursday, July 21, 2016 at 3:01 PM
> To: "Salwasser, Zac" 
> Cc: "ceph-users@lists.ceph.com" , "Heller, Chris" 
> 
> Subject: Re: [ceph-users] Uncompactable Monitor Store at 69GB -- Re: Cluster 
> in warn state, not sure what to do next.
>  
> On Thu, Jul 21, 2016 at 11:54 AM, Salwasser, Zac  wrote:
> Rephrasing for brevity – I have a monitor store that is 69GB and won’t
> compact any further on restart or with ‘tell compact’.  Has anyone dealt
> with this before?
>  
> The monitor can't trim OSD maps over a period where PGs are unclean;
> you'll likely find that's where all the space has gone. You need to
> resolve your down PGs.
> -Greg
>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] server download.ceph.com seems down

2016-06-25 Thread w...@42on.com
Try one of the mirrors, see the docs of Ceph.

eu.ceph.com, etc

Wido

> Op 25 jun. 2016 om 09:29 heeft Jeronimo Romero  het 
> volgende geschreven:
> 
> This always happens on server migration night :)
> 
> 
> 
> Sent from my T-Mobile 4G LTE Device
> 
> 
>  Original message 
> From: "Brian ::"  
> Date: 06/25/2016 3:08 AM (GMT-05:00) 
> To: Jeronimo Romero  
> Cc: ceph-users@lists.ceph.com 
> Subject: Re: [ceph-users] server download.ceph.com seems down 
> 
> Same here
> 
> [Connecting to download.ceph.com (173.236.253.173)]
> 
> On Sat, Jun 25, 2016 at 4:50 AM, Jeronimo Romero  wrote:
> > Dear ceph overlords. It seems that the ceph download server is down.
> >
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deprecating ext4 support

2016-04-12 Thread w...@42on.com


> Op 12 apr. 2016 om 23:09 heeft Nick Fisk  het volgende 
> geschreven:
> 
> Jan,
> 
> I would like to echo Sage's response here. It seems you only want a subset
> of what Ceph offers, whereas RADOS is designed to offer a whole lot more,
> which requires a lot more intelligence at the lower levels.
> 

I fully agree with your e-mail. I think the Ceph devvers have earned their 
respect over the years and they know what they are talking about.

For years I have been wondering why there even was a POSIX filesystem 
underneath Ceph.

> I must say I have found your attitude to both Sage and the Ceph project as a
> whole over the last few emails quite disrespectful. I spend a lot of my time
> trying to sell the benefits of open source, which centre on the openness of
> the idea/code and not around the fact that you can get it for free. One of
> the things that I like about open source is the constructive, albeit
> sometimes abrupt, constructive criticism that results in a better product.
> Simply shouting Ceph is slow and it's because dev's don't understand
> filesystems is not constructive.
> 
> I've just come back from an expo at ExCel London where many providers are
> passionately talking about Ceph. There seems to be a lot of big money
> sloshing about for something that is inherently "wrong"
> 
> Sage and the core Ceph team seem like  very clever people to me and I trust
> that over the years of development, that if they have decided that standard
> FS's are not the ideal backing store for Ceph, that this is probably correct
> decision. However I am also aware that the human condition "Can't see the
> wood for the trees" is everywhere and I'm sure if you have any clever
> insights into filesystem behaviour, the Ceph Dev team would be more than
> open to suggestions.
> 
> Personally I wish I could contribute more to the project as I feel that I
> (any my company) get more from Ceph than we put in, but it strikes a nerve
> when there is such negative criticism for what effectively is a free
> product.
> 
> Yes, I also suffer from the problem of slow sync writes, but the benefit of
> being able to shift 1U servers around a Rack/DC compared to a SAS tethered
> 4U jbod somewhat outweighs that as well as several other advanatages. A new
> cluster that we are deploying has several hardware choices which go a long
> way to improve this performance as well. Coupled with the coming Bluestore,
> the future looks bright.
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Sage Weil
>> Sent: 12 April 2016 21:48
>> To: Jan Schermer 
>> Cc: ceph-devel ; ceph-users > us...@ceph.com>; ceph-maintain...@ceph.com
>> Subject: Re: [ceph-users] Deprecating ext4 support
>> 
>>> On Tue, 12 Apr 2016, Jan Schermer wrote:
>>> Still the answer to most of your points from me is "but who needs that?"
>>> Who needs to have exactly the same data in two separate objects
>>> (replicas)? Ceph needs it because "consistency"?, but the app (VM
>>> filesystem) is fine with whatever version because the flush didn't
>>> happen (if it did the contents would be the same).
>> 
>> If you want replicated VM store that isn't picky about consistency, try
>> Sheepdog.  Or your mdraid over iSCSI proposal.
>> 
>> We care about these things because VMs are just one of many users of
>> rados, and because even if we could get away with being sloppy in some (or
>> even most) cases with VMs, we need the strong consistency to build other
>> features people want, like RBD journaling for multi-site async
> replication.
>> 
>> Then there's the CephFS MDS, RGW, and a pile of out-of-tree users that
>> chose rados for a reason.
>> 
>> And we want to make sense of an inconsistency when we find one on scrub.
>> (Does it mean the disk is returning bad data, or we just crashed during a
> write
>> a while back?)
>> 
>> ...
>> 
>> Cheers-
>> sage
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mirrors wanted!

2016-02-07 Thread w...@42on.com

> Op 7 feb. 2016 om 12:23 heeft Tomáš Kukrál  het 
> volgende geschreven:
> 
> Hi,
> We can build new mirror in Czech republic,
> 
> Would it help even there are mirrors already in Netherlands and
> (Sweden)?
> 

Yes! The more mirrors, the better. This way we make sure people can always 
fetch Ceph when needed.

Can you provide native IPv4 and IPv6?

Wido


> tom
> 
> 
>> On 01-30 15:01, Wido den Hollander wrote:
>> Hi,
>> 
>> My PR was merged with a script to mirror Ceph properly:
>> https://github.com/ceph/ceph/tree/master/mirroring
>> 
>> Currently there are 3 (official) locations where you can get Ceph:
>> 
>> - download.ceph.com (Dreamhost, US)
>> - eu.ceph.com (PCextreme, Netherlands)
>> - au.ceph.com (Digital Pacific, Australia)
>> 
>> I'm looking for more mirrors to become official mirrors so we can easily
>> distribute Ceph.
>> 
>> Mirrors do go down and it's always nice to have a mirror local to you.
>> 
>> I'd like to have one or more mirrors in Asia, Africa and/or South
>> Ameirca if possible. Anyone able to host there? Other locations are
>> welcome as well!
>> 
>> A few things which are required:
>> 
>> - 1Gbit connection or more
>> - Native IPv4 and IPv6
>> - HTTP access
>> - rsync access
>> - 2TB of storage or more
>> - Monitoring of the mirror/source
>> 
>> You can easily mirror Ceph yourself with this script I wrote:
>> https://github.com/ceph/ceph/blob/master/mirroring/mirror-ceph.sh
>> 
>> eu.ceph.com and au.ceph.com use it to sync from download.ceph.com. If
>> you want to mirror Ceph locally, please pick a mirror local to you.
>> 
>> Please refer to these guidelines:
>> https://github.com/ceph/ceph/tree/master/mirroring#guidelines
>> 
>> -- 
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>> 
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mirrors wanted!

2016-02-07 Thread w...@42on.com


> Op 7 feb. 2016 om 12:56 heeft Oliver Dzombic  het 
> volgende geschreven:
> 
> Hi,
> 
> we can offer a mirror located in Frankfurt/Main Germany.
> 
> If wanted, please let me know.
> 

Also very welcome, this way we cover Europe properly.

Do you have native IPv4 and IPv6 available?

If so, I'll add you to the mirrors list so we cam set up CZ, US-East, DE and SE 
at once.

Wido


> -- 
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
>> Am 30.01.2016 um 15:14 schrieb Wido den Hollander:
>> Hi,
>> 
>> My PR was merged with a script to mirror Ceph properly:
>> https://github.com/ceph/ceph/tree/master/mirroring
>> 
>> Currently there are 3 (official) locations where you can get Ceph:
>> 
>> - download.ceph.com (Dreamhost, US)
>> - eu.ceph.com (PCextreme, Netherlands)
>> - au.ceph.com (Digital Pacific, Australia)
>> 
>> I'm looking for more mirrors to become official mirrors so we can easily
>> distribute Ceph.
>> 
>> Mirrors do go down and it's always nice to have a mirror local to you.
>> 
>> I'd like to have one or more mirrors in Asia, Africa and/or South
>> Ameirca if possible. Anyone able to host there? Other locations are
>> welcome as well!
>> 
>> A few things which are required:
>> 
>> - 1Gbit connection or more
>> - Native IPv4 and IPv6
>> - HTTP access
>> - rsync access
>> - 2TB of storage or more
>> - Monitoring of the mirror/source
>> 
>> You can easily mirror Ceph yourself with this script I wrote:
>> https://github.com/ceph/ceph/blob/master/mirroring/mirror-ceph.sh
>> 
>> eu.ceph.com and au.ceph.com use it to sync from download.ceph.com. If
>> you want to mirror Ceph locally, please pick a mirror local to you.
>> 
>> Please refer to these guidelines:
>> https://github.com/ceph/ceph/tree/master/mirroring#guidelines
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com