Re: [ceph-users] Consumer-grade SSD in Ceph

2020-01-03 Thread Eneko Lacunza

I'm sure you know also the following, but just in case:
- Intel SATA D3-S4610 (I think they're out of stock right now)
- Intel SATA D3-S4510 (I see stock of these right now)

El 27/12/19 a las 17:56, vita...@yourcmc.ru escribió:
SATA: Micron 5100-5200-5300, Seagate Nytro 1351/1551 (don't forget to 
disable their cache with hdparm -W 0)


NVMe: Intel P4500, Micron 9300


Thanks for all the replies. In summary; consumer grade SSD is a no go.

What is an alternative to SM863a? Since it is quite hard to get these
due non non-stock.



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consumer-grade SSD in Ceph

2019-12-22 Thread Eneko Lacunza

Hi Sinan,

Just to reiterate: don't do this. Consumer SSDs will destroy your 
enterprise SSD's performance.


Our office cluster is made of consumer-grade servers: cheap gaming 
motherboards, memory, ryzen processors, desktop HDDs. But SSD drives are 
Enterprise, we had awful experiences with consumer SSDs (some perform 
worse that HDDs with Ceph).


Cheers
Eneko

El 19/12/19 a las 20:20, Sinan Polat escribió:

Hi all,

Thanks for the replies. I am not worried about their lifetime. We will be 
adding only 1 SSD disk per physical server. All SSD’s are enterprise drives. If 
the added consumer grade disk will fail, no problem.

I am more curious regarding their I/O performance. I do want to have 50% drop 
in performance.

So anyone any experience with 860 EVO or Crucial MX500 in a Ceph setup?

Thanks!


Op 19 dec. 2019 om 19:18 heeft Mark Nelson  het volgende 
geschreven:

The way I try to look at this is:


1) How much more do the enterprise grade drives cost?

2) What are the benefits? (Faster performance, longer life, etc)

3) How much does it cost to deal with downtime, diagnose issues, and replace 
malfunctioning hardware?


My personal take is that enterprise drives are usually worth it. There may be 
consumer grade drives that may be worth considering in very specific scenarios 
if they still have power loss protection and high write durability.  Even when 
I was in academia years ago with very limited budgets, we got burned with 
consumer grade SSDs to the point where we had to replace them all.  You have to 
be very careful and know exactly what you are buying.


Mark



On 12/19/19 12:04 PM, jes...@krogh.cc wrote:
I dont think “usually” is good enough in a production setup.



Sent from myMail for iOS


Thursday, 19 December 2019, 12.09 +0100 from Виталий Филиппов 
:

Usually it doesn't, it only harms performance and probably SSD
lifetime
too

> I would not be running ceph on ssds without powerloss protection. I
> delivers a potential data loss scenario


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Single threaded IOPS on SSD pool.

2019-06-05 Thread Eneko Lacunza

Hi,

El 5/6/19 a las 16:53, vita...@yourcmc.ru escribió:

Ok, average network latency from VM to OSD's ~0.4ms.


It's rather bad, you can improve the latency by 0.3ms just by 
upgrading the network.



Single threaded performance ~500-600 IOPS - or average latency of 1.6ms
Is that comparable to what other are seeing?


Good "reference" numbers are 0.5ms for reads (~2000 iops) and 1ms for 
writes (~1000 iops).


I confirm that the most powerful thing to do is disabling CPU 
powersave (governor=performance + cpupower -D 0). You usually get 2x 
single thread iops at once.


We have a small cluster with 4 OSD host, each with 1 SSD INTEL 
SSDSC2KB019T8 (D3-S4510 1.8T), connected with a 10G network (shared with 
VMs, not a busy cluster). Volumes are replica 3:


Network latency from one node to the other 3:
10 packets transmitted, 10 received, 0% packet loss, time 9166ms
rtt min/avg/max/mdev = 0.042/0.064/0.088/0.013 ms

10 packets transmitted, 10 received, 0% packet loss, time 9190ms
rtt min/avg/max/mdev = 0.047/0.072/0.110/0.017 ms

10 packets transmitted, 10 received, 0% packet loss, time 9219ms
rtt min/avg/max/mdev = 0.061/0.078/0.099/0.011 ms

You fio test on a 4-core VM:

$ fio fio-job-randr.ini
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1

fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=10.3MiB/s][r=2636 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=4056: Wed Jun  5 17:14:33 2019
  Description  : [fio random 4k reads]
  read: IOPS=2386, BW=9544KiB/s (9773kB/s)(559MiB/60001msec)
    slat (nsec): min=0, max=616576, avg=10847.27, stdev=3253.55
    clat (nsec): min=0, max=10346k, avg=406536.60, stdev=145643.92
 lat (nsec): min=0, max=10354k, avg=417653.11, stdev=145740.26
    clat percentiles (usec):
 |  1.00th=[   37],  5.00th=[  202], 10.00th=[  258], 20.00th=[ 318],
 | 30.00th=[  351], 40.00th=[  383], 50.00th=[  416], 60.00th=[ 445],
 | 70.00th=[  474], 80.00th=[  502], 90.00th=[  545], 95.00th=[ 578],
 | 99.00th=[  701], 99.50th=[  742], 99.90th=[ 1004], 99.95th=[ 1500],
 | 99.99th=[ 3752]
   bw (  KiB/s): min=    0, max=10640, per=100.00%, avg=9544.13, 
stdev=486.02, samples=120
   iops    : min=    0, max= 2660, avg=2386.03, stdev=121.50, 
samples=120

  lat (usec)   : 2=0.01%, 50=2.94%, 100=0.17%, 250=6.20%, 500=70.34%
  lat (usec)   : 750=19.92%, 1000=0.33%
  lat (msec)   : 2=0.07%, 4=0.03%, 10=0.01%, 20=0.01%
  cpu  : usr=1.01%, sys=3.44%, ctx=143387, majf=0, minf=16
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%

 issued rwts: total=143163,0,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=9544KiB/s (9773kB/s), 9544KiB/s-9544KiB/s 
(9773kB/s-9773kB/s), io=559MiB (586MB), run=60001-60001msec


Disk stats (read/write):
    dm-0: ios=154244/120, merge=0/0, ticks=63120/12, in_queue=63128, 
util=96.98%, aggrios=154244/58, aggrmerge=0/62, aggrticks=63401/40, 
aggrin_queue=62800, aggrutil=96.42%
  sda: ios=154244/58, merge=0/62, ticks=63401/40, in_queue=62800, 
util=96.42%



So if I read correctly, about 2500 IOPS read. I see governor=performance 
(out of the box on Proxmox VE I think). We touched cpupower, at least 
not from beyond what does our distribution (Proxmox VE).


For reference, the same test with random write (KVM disk cache is 
write-back):


$ fio fio-job-randw.ini
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1

fio-3.12
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=35.5MiB/s][w=9077 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=4278: Wed Jun  5 17:35:51 2019
  Description  : [fio random 4k writes]
  write: IOPS=9809, BW=38.3MiB/s (40.2MB/s)(2299MiB/60001msec); 0 zone 
resets

    slat (nsec): min=0, max=856527, avg=13669.16, stdev=5257.21
    clat (nsec): min=0, max=256305k, avg=86123.12, stdev=913448.71
 lat (nsec): min=0, max=256328k, avg=100145.33, stdev=913512.45
    clat percentiles (usec):
 |  1.00th=[   37],  5.00th=[   41], 10.00th=[   46], 20.00th=[   54],
 | 30.00th=[   60], 40.00th=[   65], 50.00th=[   71], 60.00th=[   78],
 | 70.00th=[   86], 80.00th=[   96], 90.00th=[  119], 95.00th=[ 151],
 | 99.00th=[  251], 99.50th=[  297], 99.90th=[  586], 99.95th=[ 857],
 | 99.99th=[ 4490]
   bw (  KiB/s): min=    0, max=52392, per=100.00%, avg=39243.27, 
stdev=3553.88, samples=119
   iops    : min=    0, max=13098, avg=9810.81, stdev=888.47, 
samples=119

  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=15.44%
  lat (usec)   : 100=67.16%, 250=16.36%, 500=0.90%, 750=0.06%, 1000=0.03%
  lat 

Re: [ceph-users] Intel D3-S4610 performance

2019-03-13 Thread Eneko Lacunza

Hi Kai,

El 12/3/19 a las 9:13, Kai Wembacher escribió:


Hi everyone,

I have an Intel D3-S4610 SSD with 1.92 TB here for testing and get 
some pretty bad numbers, when running the fio benchmark suggested by 
Sébastien Han 
(http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/):


Intel D3-S4610 1.92 TB

--numjobs=1 write: IOPS=3860, BW=15.1MiB/s (15.8MB/s)(905MiB/60001msec)

--numjobs=2 write: IOPS=7138, BW=27.9MiB/s (29.2MB/s)(1673MiB/60001msec)

--numjobs=4 write: IOPS=12.5k, BW=48.7MiB/s (51.0MB/s)(2919MiB/60002msec)

Compared to our current Samsung SM863 SSDs the Intel one is about 6x 
slower.


Has someone here tested this SSD and can give me some values for 
comparison?




We don't have D3-S4610 drives, but are in the process of deploying 4 
D3-S4510 1.92TB for OSD purposes. I can test them if that helps?


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD

2019-02-27 Thread Eneko Lacunza

Hi Uwe,

We tried to use a Samsung 840 Pro SSD as OSD some time ago and it was a 
no-go; it wasn't that performance was bad, it just didn't work for the 
kind of use of OSD. Any HDD was better than it (the disk was healthy and 
have been used in a software raid-1 for a pair of years).


I suggest you check first that your Samsung 860 Pro disks work well for 
Ceph. Also, how is your host's RAM?


Cheers

El 26/2/19 a las 22:01, Uwe Sauter escribió:

Hi,

TL;DR: In my Ceph clusters I replaced all OSDs from HDDs of several 
brands and models with Samsung 860 Pro SSDs and used the opportunity 
to switch from filestore to bluestore. Now I'm seeing blocked ops in 
Ceph and file system freezes inside VMs. Any suggestions?



I have two Proxmox clusters for virtualization which use Ceph on HDDs 
as backend storage for VMs. About half a year ago I had to increase 
the pool size and used the occasion to switch from filestore to 
bluestore. That was when trouble started. Both clusters showed blocked 
ops that caused freezes inside VMs which needed a reboot to function 
properly again. I wasn't able to identify the cause of the blocking 
ops but I blamed the low performance of the HDDs. It was also the time 
when patches for Spectre/Meltdown were released. Kernel 4.13.x didn't 
show the behavior while kernel 4.15.x did. After several weeks of 
debugging the workaround was to go back to filestore.


Today I replace all HDDs with brand new Samsung 860 Pro SSDs and 
switched to bluestore again (on one cluster). And… the blocked ops 
reappeared. I am out of ideas about the cause.


Any idea why bluestore is so much more demanding on the storage 
devices compared to filestore?


Before switching back to filestore do you have any suggestions for 
debugging? Anything special to check for in the network?


The clusters are both connected via 10GbE (MTU 9000) and are only 
lightly loaded (15 VMs on the first, 6 VMs on the second). Each host 
has 3 SSDs and 64GB memory.


"rados bench" gives decent results for 4M block size but 4K block size 
triggers blocked ops (and only finishes after I restart the OSD with 
the blocked ops). Results below.



Thanks,

Uwe




Results from "rados bench" runs with 4K block size when the cluster 
didn't block:


root@px-hotel-cluster:~# rados bench -p scbench 60 write -b 4K -t 16 
--no-cleanup

hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 
for up to 60 seconds or 0 objects

Object prefix: benchmark_data_px-hotel-cluster_3814550
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) avg 
lat(s)

    0   0 0 0 0 0 -   0
    1  16  2338  2322   9.06888   9.07031 0.0068972   
0.0068597
    2  16  4631  4615   9.01238   8.95703   0.0076618 
0.00692027
    3  16  6936  6920   9.00928   9.00391   0.0066511 
0.00692966
    4  16  9173  9157   8.94133   8.73828  0.00416256 
0.00698071
    5  16 11535 11519   8.99821   9.22656  0.00799875 
0.00693842
    6  16 13892 13876   9.03287   9.20703  0.00688782 
0.00691459
    7  15 16173 16158   9.01578   8.91406  0.00791589 
0.00692736
    8  16 18406 18390   8.97854   8.71875  0.00745151 
0.00695723
    9  16 20681 20665   8.96822   8.88672   0.0072881 
0.00696475
   10  16 23037 23021   8.99163   9.20312 0.00728763   
0.0069473
   11  16 24261 24245   8.60882   4.78125  0.00502342 
0.00725673
   12  16 25420 25404   8.26863   4.52734  0.00443917 
0.00750865
   13  16 27347 27331   8.21154   7.52734  0.00670819 
0.00760455
   14  16 28750 28734   8.01642   5.48047  0.00617038 
0.00779322
   15  16 30222 30206    7.8653  5.75  0.00700398 
0.00794209
   16  16 32180 32164    7.8517   7.64844 0.00704785   
0.0079573
   17  16 34527 34511   7.92907   9.16797  0.00582831 
0.00788017
   18  15 36969 36954   8.01868   9.54297  0.00635168 
0.00779228
   19  16 39059 39043   8.02609   8.16016  0.00622597 
0.00778436
2019-02-26 21:55:41.623245 min lat: 0.00337595 max lat: 0.431158 avg 
lat: 0.00779143
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) avg 
lat(s)
   20  16 41079 41063   8.01928   7.89062  0.00649895 
0.00779143
   21  16 43076 43060   8.00878   7.80078  0.00726145 
0.00780128
   22  16 45433 45417   8.06321   9.20703  0.00455727 
0.00774944
   23  16 47763 47747   8.10832   9.10156  0.00582818 
0.00770599
   24  16 50079 50063   8.14738   9.04688   0.0051125 
0.00766894
   25  16 52477 52461   8.19614   9.36719  0.00537575 
0.00762343
   26  16 54895 54879   8.24415   9.44531  0.00573134 
0.00757909
   27  16 57276 57260   8.28325   9.30078  0.00576683 
0.00754383
   28  16 59487 59471   8.29585   8.63672  

Re: [ceph-users] Low traffic Ceph cluster with consumer SSD.

2018-11-26 Thread Eneko Lacunza

Hi,

El 25/11/18 a las 18:23, Виталий Филиппов escribió:
Ok... That's better than previous thread with file download where the 
topic starter suffered from normal only-metadata-journaled fs... 
Thanks for the link, it would be interesting to repeat similar tests. 
Although I suspect it shouldn't be that bad... at least not all 
desktop SSDs are that broken - for example 
https://engineering.nordeus.com/power-failure-testing-with-ssds/ says 
samsumg 840 pro is ok.


Only that ceph performance for that SSD model is very very bad. We had 
one of those repurposed for ceph and had to run to buy an Intel 
enterprise SSD drive to replace it.


Don't even try :)

Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Proxmox with EMC VNXe 3200

2018-06-25 Thread Eneko Lacunza

Hi all,

We're planning the migration of a VMWare 5.5 cluster backed by a EMC 
VNXe 3200 storage appliance to Proxmox.


The VNXe has about 3 year of warranty left and half the disks 
unprovisioned, so the current plan is to use the same VNXe for Proxmox 
storage. After warranty expires we'll most probably go ceph but that's 
some years in the future.


VNXe seems to support both iSCSI and NFS (CIFS too but that is really 
out of my tech-tastes). I guess best option performance-wise would be 
iSCSI, but I like the simplicity of NFS. Any idea about what could be 
the performance impact of this (NFS/iSCSI)?


Has anyone had any experience with this kind of storage appliances?

Thanks a lot
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-19 Thread Eneko Lacunza

Hi Fabian,

Hope your arm is doing well :)


unless such a backport is created and tested fairly well (and we will
spend some more time investigating this internally despite the caveats
above), our plan B will probably involve:
- building Luminous for Buster to ease the upgrade from Stretch+Luminous
(upgrading both base distro release and Ceph major version in one go
did not work out in the past)
- keeping our Stretch-based release on Luminous even once Luminous is EoL 
upstream
- strongly recommending to those of our users that rely on Ceph to
upgrade to our (future/next) Buster-based release (which will likely
get Mimic or Nautilus as default Ceph version, depending on whether
the Ceph release schedule holds or not)
- hope this whole story does not repeat itself too often because of the
inherent misalignment between Ceph and Debian release cycles

especially the second and third point will irritate some of our users,
but sometimes life only hands you lemons.
We're responsible of about 6 small clusters of Proxmox + Ceph; I think 
this is the path to take.


Use the time to "extend" Luminous support, maybe you can do this 
together with others, maybe even with some support from Ceph upstream. I 
think it should be less work than the gcc backport just for a few months.


Just skip Mimic like you did with non LTS releases in the past. It's 
also less work for the Proxmox admins, as we'll be able to skip a Ceph 
upgrade easily.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hawk-M4E SSD disks for journal

2018-01-05 Thread Eneko Lacunza
Hi all,

We're in the process of deploying a new Proxmox/ceph cluster. We had
planned to use S3710 disks for system+journals, but our provider (Dell) is
telling us that they're EOL and the only alternative they offer are some
"mix use" Hawk-M4E with sizes 200GB/400GB.

I really can't find reliable info on those disks online.

Did anyone try them and can comment whether they perform well or not?

Thanks a lot
Eneko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Small cluster for VMs hosting

2017-11-07 Thread Eneko Lacunza

Hi Gandalf,

El 07/11/17 a las 14:16, Gandalf Corvotempesta escribió:

Hi to all
I've been far from ceph from a couple of years (CephFS was still unstable)

I would like to test it again, some questions for a production cluster 
for VMs hosting:


1. Is CephFS stable?

Yes.
2. Can I spin up a 3 nodes cluster with mons, MDS and osds on the same 
machine?

Yes.

3. Hardware suggestions?

Depends on your load. :-)
4. How can I understand the ceph health status output, in details? 
I've not seen any docs about this
I think it is quite self-explanatory when you know how works Ceph. Don't 
run Ceph if you don't understand it :) Hell don't plan a Ceph deployment 
before understanding it either (reading this list can help too, look at 
the archives).
5. How can I know if cluster is fully synced or if any background 
operation (scrubbing, replication, ...) Is running?

Looking at the health status output.
6. Is 10G Ethernet mandatory? Currently I only have 4 gigabit nic (2 
for public traffic, 2 for cluster traffic)

It is not mandatory, I administer 4 3-node clusters and all have 1gbit NICs.

Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-02 Thread Eneko Lacunza
 -

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000798.html

Regards,
Jens Dueholm Christensen
Rambøll Survey IT

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com
<mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of Adam
Carheden
Sent: Wednesday, April 26, 2017 5:54 PM
To: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Sharing SSD journals and SSD drive
choice

Thanks everyone for the replies.

I will be avoiding TLC drives, it was just something easy to
benchmark
with existing equipment. I hadn't though of unscrupulous data
durability
lies or performance suddenly tanking in unpredictable ways. I
guess it
all comes down to trusting the vendor since it would be
expensive in
time and $$ to test for such things.

Any thoughts on multiple Intel 35XX vs a single 36XX/37XX? All
have "DC"
prefixes and are listed in the Data Center section of their
marketing
pages, so I assume they'll all have the same quality
underlying NAND.

--
Adam Carheden


On 04/26/2017 09:20 AM, Chris Apsey wrote:
> Adam,
>
> Before we deployed our cluster, we did extensive testing on
all kinds of
> SSDs, from consumer-grade TLC SATA all the way to Enterprise
PCI-E NVME
> Drives.  We ended up going with a ratio of 1x Intel P3608
PCI-E 1.6 TB
> to 12x HGST 10TB SAS3 HDDs.  It provided the best
> price/performance/density balance for us overall.  As a frame of
> reference, we have 384 OSDs spread across 16 nodes.
>
> A few (anecdotal) notes:
>
> 1. Consumer SSDs have unpredictable performance under load;
write
> latency can go from normal to unusable with almost no warning.
> Enterprise drives generally show much less load sensitivity.
> 2. Write endurance; while it may appear that having several
> consumer-grade SSDs backing a smaller number of OSDs will
yield better
> longevity than an enterprise grade SSD backing a larger
number of OSDs,
> the reality is that enterprise drives that use SLC or eMLC
are generally
> an order of magnitude more reliable when all is said and done.
> 3. Power Loss protection (PLP).  Consumer drives generally
don't do well
> when power is suddenly lost.  Yes, we should all have UPS,
etc., but
> things happen.  Enterprise drives are much more tolerant of
> environmental failures.  Recovering from misplaced objects
while also
> attempting to serve clients is no fun.
>
>
>
>
>
> ---
> v/r
>
> Chris Apsey
> bitskr...@bitskrieg.net <mailto:bitskr...@bitskrieg.net>
> https://www.bitskrieg.net
>
> On 2017-04-26 10:53, Adam Carheden wrote:
>> What I'm trying to get from the list is /why/ the
"enterprise" drives
>> are important. Performance? Reliability? Something else?
>>
>> The Intel was the only one I was seriously considering. The
others were
>> just ones I had for other purposes, so I thought I'd see
how they fared
>> in benchmarks.
>>
>> The Intel was the clear winner, but my tests did show that
throughput
>> tanked with more threads. Hypothetically, if I was throwing
16 OSDs at
>> it, all with osd op threads = 2, do the benchmarks below
not show that
>> the Hynix would be a better choice (at least for performance)?
>>
>> Also, 4 x Intel DC S3520 costs as much as 1 x Intel DC
S3610. Obviously
>> the single drive leaves more bays free for OSD disks, but
is there any
>> other reason a single S3610 is preferable to 4 S3520s?
Wouldn't 4xS3520s
>> mean:
>>
>> a) fewer OSDs go down if the SSD fails
>>
>> b) better throughput (I'm speculating that the S3610 isn't
4 times
>> faster than the S3520)
>>
>> c) load spread across 4 SATA channels (I suppose this
doesn't really
>> matter since the drives can't throttle the SATA bus).
>>
>>
>> --
>> Adam Carheden
>>
>> On 04/2

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-04-26 Thread Eneko Lacunza

Adam,

What David said before about SSD drives is very important. I will tell 
you another way: use enterprise grade SSD drives, not consumer grade. 
Also, pay attention to endurance.


The only suitable drive for Ceph I see in your tests is SSDSC2BB150G7, 
and probably it isn't even the most suitable SATA SSD disk from Intel; 
better use S3610 o S3710 series.


Cheers
Eneko

El 25/04/17 a las 21:02, Adam Carheden escribió:

On 04/25/2017 11:57 AM, David wrote:

On 19 Apr 2017 18:01, "Adam Carheden" > wrote:

 Does anyone know if XFS uses a single thread to write to it's journal?


You probably know this but just to avoid any confusion, the journal in
this context isn't the metadata journaling in XFS, it's a separate
journal written to by the OSD daemons

Ha! I didn't know that.


I think the number of threads per OSD is controlled by the 'osd op
threads' setting which defaults to 2

So the ideal (for performance) CEPH cluster would be one SSD per HDD
with 'osd op threads' set to whatever value fio shows as the optimal
number of threads for that drive then?


I would avoid the SanDisk and Hynix. The s3500 isn't too bad. Perhaps
consider going up to a 37xx and putting more OSDs on it. Of course with
the caveat that you'll lose more OSDs if it goes down.

Why would you avoid the SanDisk and Hynix? Reliability (I think those
two are both TLC)? Brand trust? If it's my benchmarks in my previous
email, why not the Hynix? It's slower than the Intel, but sort of
decent, at lease compared to the SanDisk.

My final numbers are below, including an older Samsung Evo (MCL I think)
which did horribly, though not as bad as the SanDisk. The Seagate is a
10kRPM SAS "spinny" drive I tested as a control/SSD-to-HDD comparison.

  SanDisk SDSSDA240G, fio  1 jobs:   7.0 MB/s (5 trials)


  SanDisk SDSSDA240G, fio  2 jobs:   7.6 MB/s (5 trials)


  SanDisk SDSSDA240G, fio  4 jobs:   7.5 MB/s (5 trials)


  SanDisk SDSSDA240G, fio  8 jobs:   7.6 MB/s (5 trials)


  SanDisk SDSSDA240G, fio 16 jobs:   7.6 MB/s (5 trials)


  SanDisk SDSSDA240G, fio 32 jobs:   7.6 MB/s (5 trials)


  SanDisk SDSSDA240G, fio 64 jobs:   7.6 MB/s (5 trials)


HFS250G32TND-N1A2A 3P10, fio  1 jobs:   4.2 MB/s (5 trials)


HFS250G32TND-N1A2A 3P10, fio  2 jobs:   0.6 MB/s (5 trials)


HFS250G32TND-N1A2A 3P10, fio  4 jobs:   7.5 MB/s (5 trials)


HFS250G32TND-N1A2A 3P10, fio  8 jobs:  17.6 MB/s (5 trials)


HFS250G32TND-N1A2A 3P10, fio 16 jobs:  32.4 MB/s (5 trials)


HFS250G32TND-N1A2A 3P10, fio 32 jobs:  64.4 MB/s (5 trials)


HFS250G32TND-N1A2A 3P10, fio 64 jobs:  71.6 MB/s (5 trials)


 SAMSUNG SSD, fio  1 jobs:   2.2 MB/s (5 trials)


 SAMSUNG SSD, fio  2 jobs:   3.9 MB/s (5 trials)


 SAMSUNG SSD, fio  4 jobs:   7.1 MB/s (5 trials)


 SAMSUNG SSD, fio  8 jobs:  12.0 MB/s (5 trials)


 SAMSUNG SSD, fio 16 jobs:  18.3 MB/s (5 trials)


 SAMSUNG SSD, fio 32 jobs:  25.4 MB/s (5 trials)


 SAMSUNG SSD, fio 64 jobs:  26.5 MB/s (5 trials)


 INTEL SSDSC2BB150G7, fio  1 jobs:  91.2 MB/s (5 trials)


 INTEL SSDSC2BB150G7, fio  2 jobs: 132.4 MB/s (5 trials)


 INTEL SSDSC2BB150G7, fio  4 jobs: 138.2 MB/s (5 trials)


 INTEL SSDSC2BB150G7, fio  8 jobs: 116.9 MB/s (5 trials)


 INTEL SSDSC2BB150G7, fio 16 jobs:  61.8 MB/s (5 trials)
 INTEL SSDSC2BB150G7, fio 32 jobs:  22.7 MB/s (5 trials)
 INTEL SSDSC2BB150G7, fio 64 jobs:  16.9 MB/s (5 trials)
 SEAGATE ST9300603SS, fio  1 jobs:   0.7 MB/s (5 trials)
 SEAGATE ST9300603SS, fio  2 jobs:   0.9 MB/s (5 trials)
 SEAGATE ST9300603SS, fio  4 jobs:   1.6 MB/s (5 trials)
 SEAGATE ST9300603SS, fio  8 jobs:   2.0 MB/s (5 trials)
 SEAGATE ST9300603SS, fio 16 jobs:   4.6 MB/s (5 trials)
 SEAGATE ST9300603SS, fio 32 jobs:   6.9 MB/s (5 trials)
 SEAGATE ST9300603SS, fio 64 jobs:   0.6 MB/s (5 trials)

For those who come across this and are looking for drives for purposes
other than CEPH, those are all sequential write numbers with caching
disabled, a very CEPH-journal-specific test. The SanDisk held it's own
against the Intel using some benchmarks on Windows that didn't disable
caching. It may very well be a perfectly good drive for other purposes.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bluestore

2017-03-15 Thread Eneko Lacunza

Hi Michal,

El 14/03/17 a las 23:45, Michał Chybowski escribió:


I'm going to set up a small cluster (5 nodes with 3 MONs, 2 - 4 HDDs 
per node) to test if ceph in such small scale is going to perform good 
enough to put it into production environment (or does it perform well 
only if there are tens of OSDs, etc.).
Are there any "do's" and "don'ts" in matter of OSD storage type 
(bluestore / xfs / ext4 / btrfs), correct 
"journal-to-storage-drive-size" ratio and monitor placement in very 
limited space (dedicated machines just for MONs are not an option).


You don't tell us what this cluster will be used for. I have several 
tiny ceph clusters (3 nodes) in production for some years now, ceph 
nodes usually do mon+osd+virtualization.


They perform quite good for their use case (VMs only use heavy I/O 
rarely), but I have always built the clusters with SSDs for journals. I 
have seen better performance with this setup than some entry-level EMC 
disk enclosures; I always thought this was a misconfiguration problem on 
the other enclosure provider though! :)


Cheers
Eneko


--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS PG calculation

2017-03-10 Thread Eneko Lacunza

Hi Martin,

Take a look at
http://ceph.com/pgcalc/

Cheers
Eneko

El 10/03/17 a las 09:54, Martin Wittwer escribió:

Hi List

I am creating a POC cluster with CephFS as a backend for our backup
infrastructure. The backups are rsyncs of whole servers.
I have 4 OSD nodes with 10 4TB disks and 2 SSDs for journaling per node.

My question is now how to calculate the PG count for that scenario? Is
there a way to calculate how many PGs the data/metadata pool needs or
are there any recommendations?

Best




--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery ceph cluster down OS corruption

2017-02-24 Thread Eneko Lacunza

Hi Iban,

Is the monitor data safe? If it is, just install jewel in other servers 
and plug in the OSD disks, it should work.


El 24/02/17 a las 14:41, Iban Cabrillo escribió:

Hi,
  We have a serious issue. We have a mini cluster (jewel version) with 
two server (Dell RX730), with 16Bays and the OS intalled on dual 8 GB 
sd card, But this configuration is working really really bad.



  The replication is 2, but yesterday one server crash and this 
morning the other One, this is not the first time, but others we had 
one server up and the data could be replicated without any troubles, 
reinstalling the osdserver completely.


  Until I understand, Ceph data and metadata is still on bays (data on 
SATA and metadata on 2 fast SSDs), I think only the OS installed on SD 
cards is corrupted.


  Is there any way to solve this situation?
  Any Idea will be great!!

Regards, I


--

Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969
PGP PUBLIC KEY: 
http://pgp.mit.edu/pks/lookup?op=get=0xD9DF0B3D6C8C08AC


Bertrand Russell:/"El problema con el mundo es que los estúpidos están 
seguros de todo y los inteligentes están //llenos de dudas/"




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Release schedule and notes.

2016-11-24 Thread Eneko Lacunza

Hi,

El 24/11/16 a las 12:09, Stephen Harker escribió:

Hi All,

This morning I went looking for information on the Ceph release 
timelines and so on and was directed to this page by Google:


http://docs.ceph.com/docs/jewel/releases/

but this doesn't seem to have been updated for a long time. Is there 
somewhere else I should be looking?

Here:
http://docs.ceph.com/docs/master/releases/

:-)



Additionally, I tried to find information on the Hammer release that I 
see as current for Debian Wheezy LTS:


0.94.9-1~bpo70+1

but there seems to be nothing here either:

http://docs.ceph.com/docs/jewel/release-notes/

the latest Hammer release mentioned is 0.94.6

has this information been moved elsewhere or just not updated recently.

Thanks! :)

Kind regards,

Stephen




--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] KVM / Ceph performance problems

2016-11-22 Thread Eneko Lacunza

Hi Michiel,

How are you configuring VM disks on Proxmox? What type (virtio, scsi, 
ide) and what cache setting?



El 23/11/16 a las 07:53, M. Piscaer escribió:

Hi,

I have an little performance problem with KVM and Ceph.

I'm using Proxmox 4.3-10/7230e60f, with KVM version
pve-qemu-kvm_2.7.0-8. Ceph is on version jewel 10.2.3 on both the
cluster as the client (ceph-common).

The systems are connected to the network via an 4x bonding with an total
of 4 Gb/s.

Within an guest,
- when I do an write to I get about 10 MB/s.
- Also when I try to do an write within the guest but then directly to
ceph I get the same speed.
- But when I mount an ceph object on the Proxmox host I get about 110MB/s

The guest is connected to interface vmbr160 → bond0.160 → bond0.

This bridge vmbr160 has an IP address with the same subnet as the ceph
cluster with an mtu 9000.

The KVM block device is an virtio device.

What can I do to solve this problem?

Kind regards,

Michiel Piscaer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating files from ceph fs from cluster a to cluster b without low downtime

2016-06-07 Thread Eneko Lacunza

El 06/06/16 a las 20:53, Oliver Dzombic escribió:

Hi,

thank you for your suggestion.

Rsync will copy the whole file new, if the size is different.

Since we talk about raw image files of virtual servers, rsync is no option.

We need something which will inside of a file just copy the delta's.

Something like lvmsync ( which is just working with LVM ).

So i am looking for a tool which can do that on a file-base level.


Have you tried rsync --inplace? It works quite well for us, no whole 
file copying. We use ir for raw VM disc files.


--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd resize option

2016-05-12 Thread Eneko Lacunza

You have to shrink FS before RBD block! Now your FS is corrupt! :)

El 12/05/16 a las 15:41, M Ranga Swami Reddy escribió:

Used "resize2fs" and its working for resize to higher number (ie from
10G -> 20G) or so...
If I tried to resize the lower numbers (ie from 10G -> 5G), its
failied...with below message:
===
ubuntu@swami-resize-test-vm:/$ sudo resize2fs /dev/vdb

sudo: unable to resolve host swami-resize-test-vm

resize2fs 1.42.9 (4-Feb-2014)

Please run 'e2fsck -f /dev/vdb' first.


ubuntu@swami-resize-test-vm:/$ sudo e2fsck -f /dev/vdb

sudo: unable to resolve host swami-resize-test-vm

e2fsck 1.42.9 (4-Feb-2014)

The filesystem size (according to the superblock) is 52428800 blocks

The physical size of the device is 13107200 blocks

Either the superblock or the partition table is likely to be corrupt!

Abort?

On Thu, May 12, 2016 at 6:37 PM, Eneko Lacunza <elacu...@binovo.es> wrote:

Swami,

You must resize (reduce) a filesystem before shrinking a partition/disk.
Please search online how to do so with your specific filesystem/partitions.

El 12/05/16 a las 15:00, M Ranga Swami Reddy escribió:


Not done any FS shrink before "rbd resize". Please let me know what to
do with FS shink before "rbd resize"

Thanks
Swami

On Thu, May 12, 2016 at 4:34 PM, Eneko Lacunza <elacu...@binovo.es> wrote:

Did you shrink the FS to be smaller than the target rbd size before doing
"rbd resize"?

El 12/05/16 a las 12:33, M Ranga Swami Reddy escribió:


When I used "rbd resize" option for size shrink,  the image/volume
lost its fs sectors and asking for "fs" not found...
I have used  "mkf" option, then all data lost in it? This happens with
shrink option...


Thanks
Swami

On Wed, May 11, 2016 at 5:28 PM, Christian Balzer <ch...@gol.com> wrote:

Hello,

On Wed, 11 May 2016 13:33:44 +0200 (CEST) Alexandre DERUMIER wrote:


but the fstrim can used with in mount partition...But I wanted to as
cloud admin...

if you use qemu, you can launch fstrim  through guest-agent


This of course assumes that qemu/kvm is using a disk method that allows
for TRIM.

And nobody in their right mind uses IDE (performance), while
virtio-scsi
isn't the default or even supported with some cloud stacks.

And of course that the VM in question runs Linux and has fstrim
installed.

Otherwise solid advise, I agree.

Christian



http://dustymabe.com/2013/06/26/enabling-qemu-guest-agent-and-fstrim-again/

- Mail original -
De: "M Ranga Swami Reddy" <swamire...@gmail.com>
À: "Wido den Hollander" <w...@42on.com>
Cc: "ceph-users" <ceph-us...@ceph.com>
Envoyé: Mercredi 11 Mai 2016 13:16:27
Objet: Re: [ceph-users] rbd resize option

Thank you.

but the fstrim can used with in mount partition...But I wanted to as
cloud admin...
I have a few uses with high volume size (ie capacity) allotted, but
only used 5% of the capacity. so I wanted to reduce the size to 10% of
size using the rbd resize command. But in this process, if a
customer's volume has more than 10% data, then I may end-up with data
lost...

Thanks
Swami

On Wed, May 11, 2016 at 1:17 PM, Wido den Hollander <w...@42on.com>
wrote:

Op 11 mei 2016 om 8:38 schreef M Ranga Swami Reddy
<swamire...@gmail.com>:


Hello,
I wanted to resize an image using 'rbd' resize option, but it should
be have data loss.
For ex: I have image with 100 GB size (thin provisioned). and this
image has data of 10GB only. Here I wanted to resize this image to
11GB, so that 10GB data is safe and its resized.

Can I do the above resize safely.?


No, you can't. You need to resize the filesystem and partitions
inside
the RBD image to something below 11GB before you can do this.

Still, make sure you have backups!

Also, why shrink? If you can, run a fstrim on the image, that might
reclaimed unused space on the Ceph cluster.


If I tried to resize to 5GB, is rbd throughs an error saying that
your data is going loss, something like that???

Any inputs here are appriciated.

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/cep

Re: [ceph-users] rbd resize option

2016-05-12 Thread Eneko Lacunza
Did you shrink the FS to be smaller than the target rbd size before 
doing "rbd resize"?


El 12/05/16 a las 12:33, M Ranga Swami Reddy escribió:

When I used "rbd resize" option for size shrink,  the image/volume
lost its fs sectors and asking for "fs" not found...
I have used  "mkf" option, then all data lost in it? This happens with
shrink option...


Thanks
Swami

On Wed, May 11, 2016 at 5:28 PM, Christian Balzer  wrote:

Hello,

On Wed, 11 May 2016 13:33:44 +0200 (CEST) Alexandre DERUMIER wrote:


but the fstrim can used with in mount partition...But I wanted to as
cloud admin...

if you use qemu, you can launch fstrim  through guest-agent


This of course assumes that qemu/kvm is using a disk method that allows
for TRIM.

And nobody in their right mind uses IDE (performance), while virtio-scsi
isn't the default or even supported with some cloud stacks.

And of course that the VM in question runs Linux and has fstrim installed.

Otherwise solid advise, I agree.

Christian

http://dustymabe.com/2013/06/26/enabling-qemu-guest-agent-and-fstrim-again/

- Mail original -
De: "M Ranga Swami Reddy" 
À: "Wido den Hollander" 
Cc: "ceph-users" 
Envoyé: Mercredi 11 Mai 2016 13:16:27
Objet: Re: [ceph-users] rbd resize option

Thank you.

but the fstrim can used with in mount partition...But I wanted to as
cloud admin...
I have a few uses with high volume size (ie capacity) allotted, but
only used 5% of the capacity. so I wanted to reduce the size to 10% of
size using the rbd resize command. But in this process, if a
customer's volume has more than 10% data, then I may end-up with data
lost...

Thanks
Swami

On Wed, May 11, 2016 at 1:17 PM, Wido den Hollander 
wrote:

Op 11 mei 2016 om 8:38 schreef M Ranga Swami Reddy
:


Hello,
I wanted to resize an image using 'rbd' resize option, but it should
be have data loss.
For ex: I have image with 100 GB size (thin provisioned). and this
image has data of 10GB only. Here I wanted to resize this image to
11GB, so that 10GB data is safe and its resized.

Can I do the above resize safely.?


No, you can't. You need to resize the filesystem and partitions inside
the RBD image to something below 11GB before you can do this.

Still, make sure you have backups!

Also, why shrink? If you can, run a fstrim on the image, that might
reclaimed unused space on the Ceph cluster.


If I tried to resize to 5GB, is rbd throughs an error saying that
your data is going loss, something like that???

Any inputs here are appriciated.

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding new disk/OSD to ceph cluster

2016-04-11 Thread Eneko Lacunza

Hi Mad,

El 09/04/16 a las 14:39, Mad Th escribió:

We have a 3 node proxmox/ceph cluster ... each with 4 x4 TB disks


Are you using 3-way replication? I guess you are. :)
1) If we want to add more disks , what are the things that we need to 
be careful about?



Will the following steps automatically add it to ceph.conf?
ceph-disk zap /dev/sd[X]
pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y]
where X is new disk and Y is the journal disk.

Yes, this is the same as adding it from web GUI.


2) Is it safe to run different number of OSDs in the cluster, say one 
server with 5 OSD and other two servers with 4OSD ? Though we have 
plan to add one OSD to each server.


It is safe as long as none of your nodes OSDs are near-full. If you're 
asking this because you're adding a new OSD to each node, step by step; 
yes, it is safe.
Be prepared for data moving around when you add new disks. (performance 
will suffer unless you have tuned some parameters in ceph.conf)


3) How do we safely add the new OSD to an existing storage pool?
New OSD will be used automatically by existing ceph pools unless you 
have changed CRUSH map.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical architecture in RDB mode - Number of servers explained ?

2016-01-28 Thread Eneko Lacunza

Hi,

El 28/01/16 a las 13:53, Gaetan SLONGO escribió:

Dear Ceph users,

We are currently working on CEPH (RBD mode only). The technology is 
currently in "preview" state in our lab. We are currently diving into 
Ceph design... We know it requires at least 3 nodes (OSDs+Monitors 
inside) to work properly. But we would like to know if it makes sense 
to use 4 nodes ? I've heard this is not a good idea because all of the 
capacity of the 4 servers won't be available ?

Someone can confirm ?
There's no problem to use 4 servers for OSD; just don't put a monitor in 
one of the nodes. Always keep an odd number of monitors (3 or 5).


Monitors don't need to be in a OSD node, and in fact for medium and 
large clusters it is recommended to have a dedicated node for them.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading Ceph

2016-01-27 Thread Eneko Lacunza

Hi,

El 27/01/16 a las 15:00, Vlad Blando escribió:


I have a production Ceph Cluster
- 3 nodes
- 3 mons on each nodes
- 9 OSD @ 4TB per node
- using ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)

​Now I want to upgrade it to Hammer, I saw the documentation on 
upgrading, it looks straight forward, but I want to know to those who 
have tried upgrading a production environment, any precautions, 
caveats, preparation that I need to do before doing it?



Our migration on 3 Proxmox nodes with 3x3 OSD disks, went really smooth. :)

We were running lastest Firefly, I suggest you first upgrade to latest 
Firefly too.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD journals killed by VMs generating 500 IOPs (4kB) non-stop for a month, seemingly because of a syslog-ng bug

2015-11-23 Thread Eneko Lacunza

Hi Mart,

El 23/11/15 a las 10:29, Mart van Santen escribió:



On 11/22/2015 10:01 PM, Robert LeBlanc wrote:

There have been numerous on the mailing list of the Samsung EVO and
Pros failing far before their expected wear. This is most likely due
to the 'uncommon' workload of Ceph and the controllers of those drives
are not really designed to handle the continuous direct sync writes
that Ceph does. Because of this they can fail without warning
(controller failure rather than MLC failure).


I'm new to the mailinglist and I'm scanning the archive currently.  
And I'm getting a sense of the Samsung Evo quality disks. If i 
understand correctly, is is at least advise to put DC grade Journals 
in front om them to safe them a bit from failure. For example intel 
750's.

I don't think Intel 750's are DC grade. I don't have any of them though.


However, is there experience in when the Evo's fail in the Ceph 
scenarion? For example, is wear leveling is according SMART about 40%, 
it's time to replace your disks? Or is it just random. Actually we are 
using mostly Crucial drives (m550, mx200's), there is not a lot about 
them on the list. Do other people use them and what's there experience 
so far. I expect about the same quality of the Samsung Evo's, but I'm 
not sure if that is the correct conclusion.
My experience with Samsung 840 pro is that they can't be used for Ceph 
at all. In case of Crucial M550, they are slow and have little endurance 
for ceph use, but I have used them and seemed reliable during warranty 
lifetime (we retired them for performance reasons).




About SSD failure in general, do they normally fail hard, or are they 
just getting unbearable slow? We do measure/graph disks 'busy' 
performance, and use that as an indicator if a disk is getting slow. 
Is this is a sensible approach?


Just don't do it. Use DC SSDs, like intel S3xxx, or Samsung DC Pro, or 
something like that. You will save a lot of time and effort, and 
possibly also money.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Eneko Lacunza

Hi Jan,

What SSD model?

I've seen SSDs work quite well usually but suddenly give a totally awful 
performance for some time (not those 8K you see though).


I think there was some kind of firmware process involved, I had to 
replace the drive with a serious DC one.


El 23/06/15 a las 14:07, Jan Schermer escribió:

Yes, but that’s a separate issue :-)
Some drives are just slow (100 IOPS) for synchronous writes with no other load.
The drives I’m testing have ~8K IOPS when not under load - having them drop to 
10 IOPS is a huge problem. If it’s indeed a CFQ problem (as I suspect) then no 
matter what drive you have you will have problems.

Jan


On 23 Jun 2015, at 14:03, Dan van der Ster d...@vanderster.com wrote:

Oh sorry, I had missed that. Indeed that is surprising. Did you read
the recent thread (SSD IO performance) discussing the relevance of
O_DSYNC performance for the journal?

Cheers, Dan

On Tue, Jun 23, 2015 at 1:54 PM, Jan Schermer j...@schermer.cz wrote:

I only use SSDs, which is why I’m so surprised at the CFQ behaviour - the drive 
can sustain tens of thousand of reads per second, thousands of writes - yet 
saturating it with reads drops the writes to 10 IOPS - that’s mind boggling to 
me.

Jan


On 23 Jun 2015, at 13:43, Dan van der Ster d...@vanderster.com wrote:

On Tue, Jun 23, 2015 at 1:37 PM, Jan Schermer j...@schermer.cz wrote:

Yes, I use the same drive

one partition for journal
other for xfs with filestore

I am seeing slow requests when backfills are occuring - backfills hit the 
filestore but slow requests are (most probably) writes going to the journal - 
10 IOPS is just to few for anything.


My Ceph version is dumpling - that explains the integers.
So it’s possible it doesn’t work at all?

I thought that bug was fixed. You can check if it worked by using
iotop -b -n1 and looking for threads with the idle priority.


Bad news about the backfills no being in the disk thread, I might have to use 
deadline after all.

If your experience follows the same paths of most users, eventually
deep scrubs will cause latency issues and you'll switch back to cfq
plus ionicing the disk thread.

Are you using Ceph RBD or object storage? If RBD, eventually you'll
find that you need to put the journals on an SSD.

Cheers, Dan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 16:18, Mark Nelson wrote:

On 06/02/2015 09:02 AM, Phil Schwarz wrote:

Le 02/06/2015 15:33, Eneko Lacunza a écrit :

Hi,

On 02/06/15 15:26, Phil Schwarz wrote:

On 02/06/15 14:51, Phil Schwarz wrote:
i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) 
cluster.


-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 
3X 4TB

SATA
It'll be used as OSD+Mon server only.

Are these SSDs Intel S3700 too? What amount of RAM?

Yes, All DCS3700, for the four nodes.
16GB of RAM on this node.

This should be enough for 3 OSDs I think, I used to have a Dell
T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK.

Cheers
Eneko


Yes, indeed.
My main problem is doing something non adviced...
Running VMs on Ceph nodes...
No choice, but it seems that i'll have to do that.
Hope  i won't peg the CPU too quickly..


I'm doing it in 3 different Proxmox clusters. They're not very busy 
clusters, but works very well.
You might want to consider using cgroups or some other mechanism to 
segment what runs on what cores.  While not ideal, dedicating 2-3 of 
the cores to ceph and leaving the other(s) for VMs might be a 
reasonable way to go.



I think this may be must if you setup a dedicated SSD pool.
A single DC S3700 should suffice for journals for 4 OSDs.  I wouldn't 
recommend using the other one for a cache tier unless you have a very 
highly skewed hot/cold workload.  Perhaps instead make a dedicated SSD 
pool that could be used for high IOPS workloads. In fact you might 
consider skipping SSD journals and just making a dedicated SSD pool 
with all of the SSDs depending on how much write workload your main 
pool sees and if you could make good use of a dedicated SSD pool.
Be warned that running SSD and HD based OSDs in the same server is not 
recommended. If you need the storage capacity, I'd stick to the journals 
on SSDs plan.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recommendations for a driver situation

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 14:18, Pontus Lindgren wrote:

We have recently acquired new servers for a new ceph cluster and we want to run 
Debian on those servers. Unfortunately drivers needed for the raid controller 
are only available in newer kernels than what Debian Wheezy provides.

We need to run the dumpling release of Ceph.

Since the Ceph repo does not have packages for Debian Jessie I see 3 
alternatives for us:
1. Wait for the Ceph repo to add packages for Debian Jessie.
Number 1 is not really an option for us. But, is there an approximate ETA on 
this?
Why is this the case? At least Alexandre Derumier is working on this: 
(check an email from him in this list on 12th May)


http://odisoweb1.odiso.net/ceph-jessie/


2. Run Debian Wheezy with backported drivers.


I haven't used them lately, but linux kernel in wheezy-backport is 3.16, 
is this enough?


What kernel version do you require for the drivers?

Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 14:51, Phil Schwarz wrote:

i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA
It'll be used as OSD+Mon server only.

Are these SSDs Intel S3700 too? What amount of RAM?

- 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS
for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD

I can't change the hardware, especially the poor cpu...

Everything will be connected through Intel X520+Netgear XS708E, as 10GBE
storage network.

This cluster will support VM (mostly KVM) upon the 3 R730 nodes.
I'm already aware of the CPU pegging all the time...But can't change it
for the moment.
The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or
OpenLDAP).
One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with
500+ clients.


My question is :
Is it recommended to setup  the 2 SSDS as :
One SSD as journal for 2 (up to 3in the future) OSDs
Or
One SSD as journal for the 4 (up to 6 in the future) OSDs and the
remaining SSD as cache tiering for the previous SSD+4 OSDs pool ?
I haven't used cache tiering myself, but others have not reported much 
benefit from it (if any) at all, at least this is my understanding.


So I think it would be better to use both SSDs for journals. It probably 
won't help performance using 2 instead of only 1, but it will lessen the 
impact from a SSD failure. Also it seems that the consensus is 3-4 OSD 
for each SSD, so it will help when you expand to 6 OSD.

SSD should be rock solid enough to support both bandwidth and living
time before being destroyed by the low amount of data that will be
written on it (Few hundreds of GB per day as rule of thumb..)
If all are Intel S3700 you're on the safe side unless you have lots on 
writes. Anyway I suggest you monitor the SMART values.


Cheers
Eneko


--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 15:26, Phil Schwarz wrote:

On 02/06/15 14:51, Phil Schwarz wrote:

i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB
SATA
It'll be used as OSD+Mon server only.

Are these SSDs Intel S3700 too? What amount of RAM?

Yes, All DCS3700, for the four nodes.
16GB of RAM on this node.
This should be enough for 3 OSDs I think, I used to have a Dell 
T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing OSD disks with SSD journal - journal disk space use

2015-05-26 Thread Eneko Lacunza

Hi,

It's firefly 0.80.9, so if the improvement is in Hammer I haven't seen it.

Will check back when I upgrade the cluster.

Thanks
Eneko

On 26/05/15 17:45, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

What version of Ceph are you using? I seem to remember an enhancement
of ceph-disk for Hammer that is more aggressive in reusing previous
partition.
- 
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, May 25, 2015 at 4:22 AM, Eneko Lacunza  wrote:

Hi all,

We have a firefly ceph cluster (using Promxox VE, but I don't think this is
revelant), and found a OSD disk was having quite a high amount of errors as
reported by SMART, and also quite high wait time as reported by munin, so we
decided to replace it.

What I have done is down/out the osd, then remove it (removing partitions).
Replace the disk and create a new OSD, which was created with the same ID as
the removed one (as I was hoping to not change CRUSH map).

So everything has worked as expected, except one minor non-issue:
- Original OSD journal was on a separate SSD disk, which had partitions #1
and #2 (journals of 2 OSDs).
- Original journal partition (#1) was removed
- A new partition has been created as #1, but has been assigned space after
the last existing partition. So there is now hole of 5GB in the beginning of
SSD disk. Promox is using ceph-disk prepare for this, I seen in the docs
(http://ceph.com/docs/master/man/8/ceph-disk/) that ceph-disk prepare
creates a new partition in the journal block device.

What I'm afraid is that given enough OSD replacements, Proxmox wouldn't find
free space for new journals in that SSD disk? Although there would be plenty
in the beginning?

Maybe the journal-partition creation can be improved so that it can detect
free space also in the beginning and between existing partitions?

Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
   943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVZJUvCRDmVDuy+mK58QAAZ+4QAIr27ymAPpOZPr9JUVWZ
M8avNyddIiJpG/S2pP91UyxAzrgAy+mGbVQG0istpo98QKjT9UNxi/ySe64c
OxmIHb1tp40nyMtWFnv3W0Iw1iiScTxp2hWc2KSubbibFS6YY4ACRmTysBh+
Curdo9TG9h6k4zSbQ1gAInuMCh6NIoxUMnNatkyju5UgxpGYKg9iN8Ddt+wX
H/YC3yKLnwuqIkYBWsMpQCNpry2RZYWTUF9tRiuGTJg5lnIuU572sXRCpXkZ
NGcVYjbOX2g16MMxohSfivxJ36PbCGsvPIde3WZz0RDP7xmeJnEanR3Zw9mC
Td80pyVkuu28lRJ/UYWwTRkd0PECNejYaGvBN6LjidbZE2nejTz31Pl0DGuZ
9zlCyNFQDvUAcrKgIB0iE0qgNNzGgtmfgq+dvcu5+uFY0FLev8s7SZWCVcMf
UUwGe+UldfDo9w5g2vo89jMFvG+SIA7Pmk3ZsSvt1NzQCAYABRsb4MXUwNJ8
k/S8ZgtNr1GcDeTSH+C+SqOdGS4i+AXVr3+r01Jw+9CbIWerI9aFZ8iBifUf
Amhz0DCqFe4m4ZHNp1HSaGaHtc1DZYiqaRggQ73FeIfGnyheNllJXx9hlJJF
ioLHk84XoiRn4KgdATF6XXIi1lk7zp0KyvyIxpGX958Q8qqPc5AbVDg3Q8OY
f0yb
=w3HG
-END PGP SIGNATURE-





--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Replacing OSD disks with SSD journal - journal disk space use

2015-05-25 Thread Eneko Lacunza

Hi all,

We have a firefly ceph cluster (using Promxox VE, but I don't think this 
is revelant), and found a OSD disk was having quite a high amount of 
errors as reported by SMART, and also quite high wait time as reported 
by munin, so we decided to replace it.


What I have done is down/out the osd, then remove it (removing 
partitions). Replace the disk and create a new OSD, which was created 
with the same ID as the removed one (as I was hoping to not change CRUSH 
map).


So everything has worked as expected, except one minor non-issue:
- Original OSD journal was on a separate SSD disk, which had partitions 
#1 and #2 (journals of 2 OSDs).

- Original journal partition (#1) was removed
- A new partition has been created as #1, but has been assigned space 
after the last existing partition. So there is now hole of 5GB in the 
beginning of SSD disk. Promox is using ceph-disk prepare for this, I 
seen in the docs (http://ceph.com/docs/master/man/8/ceph-disk/) that 
ceph-disk prepare creates a new partition in the journal block device.


What I'm afraid is that given enough OSD replacements, Proxmox wouldn't 
find free space for new journals in that SSD disk? Although there would 
be plenty in the beginning?


Maybe the journal-partition creation can be improved so that it can 
detect free space also in the beginning and between existing partitions?


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-21 Thread Eneko Lacunza

Hi,

I'm just writing to you to stress out what others have already said, 
because it is very important that you take it very seriously.


On 20/04/15 19:17, J-P Methot wrote:

On 4/20/2015 11:01 AM, Christian Balzer wrote:



This is similar to another thread running right now, but since our
current setup is completely different from the one described in the
other thread, I thought it may be better to start a new one.

We are running Ceph Firefly 0.80.8 (soon to be upgraded to 0.80.9). We
have 6 OSD hosts with 16 OSD each (so a total of 96 OSDs). Each OSD 
is a

Samsung SSD 840 EVO on which I can reach write speeds of roughly 400
MB/sec, plugged in jbod on a controller that can theoretically transfer
at 6gb/sec. All of that is linked to openstack compute nodes on two
bonded 10gbps links (so a max transfer rate of 20 gbps).


I sure as hell hope you're not planning to write all that much to this
cluster.
But then again you're worried about write speed, so I guess you do.
Those _consumer_ SSDs will be dropping like flies, there are a number of
threads about them here.

They also might be of the kind that don't play well with O_DSYNC, I 
can't

recall for sure right now, check the archives.
  Consumer SSDs universally tend to slow down quite a bit when not 
TRIM'ed
and/or subjected to prolonged writes, like those generated by a 
benchmark.
I see, yes it looks like these SSDs are not the best for the job. We 
will not change them for now, but if they start failing, we will 
replace them with better ones.
I tried to put a Samsung 840 Pro 256GB in a ceph setup. It is supposed 
to be quite better than the EVO right? It was total crap. No not the 
best for the job. TOTAL CRAP. :)


It can't give any useful write performance for a Ceph OSD. Spec sheet 
numbers don't matter for this, they don't work for ceph OSD, period. And 
yes, the drive is fine and works like a charm in workstation workloads.


I suggest you at least get some intel S3700/S3610 and use them for the 
journal of those samsung drives, I think that could help performance a lot.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] journal placement for small office?

2015-02-09 Thread Eneko Lacunza

Hi,

The common recommendation is to use a good (Intel S3700) SSD disk for 
journals for each 3-4 OSDs, or otherwise to use internal journal on each 
OSD. Don't put more than one journal on the same spinning disk.


Also, it is recommended to use 500G-1TB disks, specially if you have a 
1gbit network; otherwise when a OSD fails recover time can be quite 
long. Also look in the mailing list archives for some tunning of 
backfiling for smalls ceph clusters.


Cheers.
Eneko

On 06/02/15 16:48, pixelfairy wrote:

3 nodes, each with 2x1TB in a raid (for /) and 6x4TB for storage. all
of this will be used for block devices for kvm instances. typical
office stuff. databases, file servers, internal web servers, a couple
dozen thin clients. not using the object store or cephfs.

i was thinking about putting the journals on the root disk (this is
how my virtual cluster works, because in that version the osds are 4G
instead of 4TB), and keeping that on its current raid 1, for
resiliency but im worried about making a performance bottleneck.
tempted to swap these out with ssds. if so, how big should i get? is
1/2TB enough?

the other thought was little partitions on each osd. were doing xfs
because i dont know enough about brtfs to feel comfortable with that.
would the performance degredation be worse?

is there a better way?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] remote storage

2015-01-26 Thread Eneko Lacunza

Hi Robert,

I don't see any reply to your email, so I send you my thoughts.

Ceph is all about using cheap local disks to build a large performant 
and resilient storage. Your use case with SAN and storwise doesn't seem 
to fit very well to Ceph. (I'm not saying it can't be done).


¿Why are you planning to use Ceph with a SAN? Why not use the SAN directly?

Cheers
Eneko

On 23/01/15 12:28, Robert Duncan wrote:


Hi All,

This is my first post, I have been using Ceph OSD in OpenStack 
Icehouse as part of the Mirantis distribution with Fuel – this is my 
only experience with Ceph, so as you can imagine – it works, but I 
don’t really understand all of the technical details, I am working for 
a college in Ireland and we are planning on deploying a larger private 
cloud this year using the Kilo release of OpenStack when it matures. I 
am architecting the physical components and storage has become quite 
complex – Currently we use a Dell Equallogic Array and we have 
configured the cinder service to use the driver provided by Dell, the 
nodes in the data centre don’t have a lot of local storage. So here is 
my Ceph ignorance laid bare


1-I have enough compute nodes to run a ceph cluster, radosgw etc. as 
per http://ceph.com/docs/master/radosgw/


2-I have no available local disks * - this is the problem

3-I have a Dell Equallogic SAN and fabric (7.5k NL SAS)

4-I have access to storage as a service from our ISP – this is an IBM 
storwise V7000- I can provision block storage and mount iscsi volumes, 
it’s across town but we have a p2p layer 2 connection


The use cases will be students on a Masters in data analytics using 
OpenStack Sahara and S3 for data sets. So if I mounted remote storage 
or network attached storage would it work? Can I put Ceph directly in 
front of my Equallogic array and use ceph for cinder, glance, nova and 
S3? Has anyone any thoughts or experience on this – thanks for taking 
the time to read this and any input would be greatly appreciated.


All the best,

Rob.

disclaimer


The information contained and transmitted in this e-mail is 
confidential information, and is intended only for the named recipient 
to which it is addressed. The content of this e-mail may not have been 
sent with the authority of National College of Ireland. Any views or 
opinions presented are solely those of the author and do not 
necessarily represent those of National College of Ireland. If the 
reader of this message is not the named recipient or a person 
responsible for delivering it to the named recipient, you are notified 
that the review, dissemination, distribution, transmission, printing 
or copying, forwarding, or any other use of this message or any part 
of it, including any attachments, is strictly prohibited. If you have 
received this communication in error, please delete the e-mail and 
destroy all record of this communication. Thank you for your assistance.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New firefly tiny cluster stuck unclean

2015-01-20 Thread Eneko Lacunza

Hi all,

Finally this was fixed this way:
# ceph osd pool set rbd size 1
(wait some seconds for HEALTH_OK)
# ceph osd pool set rbd size 2
(wait almost an hour for HEALTH_OK after backfilling)

I wanted to avoid this but didn't want to leave the cluster in bad state 
all night :)


I really think there's some kind of bug that sometimes prevents ceph to 
backfill correctly; this is quite similar to another problem I reported 
on december (that time it was originally a size=3 then changed to size=2 
not cleaning correctly).


This time default pools were deleted and a new rbd pool was created 
with size=2. This was done before adding the OSDs of one of the nodes.


Thanks
Eneko

On 20/01/15 16:23, Eneko Lacunza wrote:

Hi all,

I've just created a new ceph cluster for RBD with latest firefly:
- 3 monitors
- 2 OSD nodes, each has 1 s3700 (journals) + 2 x 3TB WD red (osd)

Network is 1gbit, different physical interfaces for public and private 
network. There's only one pool rbd, size=2. There are just 5 rbd 
devices created.


Somehow I reached the following status:
cluster 8f839a95-d5e3-4a31-981e-497f9a0e4991
 health HEALTH_WARN 16 pgs stuck unclean; recovery 2986/47638 
objects degraded (6.268%)
 monmap e3: 3 mons at 
{0=172.16.1.3:6789/0,1=172.16.1.1:6789/0,2=172.16.1.2:6789/0}, 
election epoch 10, quorum 0,1,2 1,2,0

 osdmap e38: 4 osds: 4 up, 4 in
  pgmap v4347: 128 pgs, 1 pools, 95232 MB data, 23819 objects
186 GB used, 10985 GB / 11171 GB avail
2986/47638 objects degraded (6.268%)
  16 active
 112 active+clean
  client io 43854 B/s wr, 10 op/s

I don't see the problem for 16 pgs stuck unclean. ¿Can somebody 
suggest any hint?


# cat /etc/pve/ceph.conf
[global]
 auth client required = cephx
 auth cluster required = cephx
 auth service required = cephx
 auth supported = cephx
 cluster network = 172.16.2.0/24
 filestore xattr use omap = true
 fsid = 8f839a95-d5e3-4a31-981e-497f9a0e4991
 keyring = /etc/pve/priv/$cluster.$name.keyring
 osd journal size = 5120
 osd pool default min size = 1
 public network = 172.16.1.0/24

[osd]
 keyring = /var/lib/ceph/osd/ceph-$id/keyring
 osd max backfills = 1
 osd recovery max active = 1

[mon.0]
 host = proxmox3
 mon addr = 172.16.1.3:6789

[mon.1]
 host = proxmox1
 mon addr = 172.16.1.1:6789

[mon.2]
 host = proxmox2
 mon addr = 172.16.1.2:6789


# ceph pg dump_stuck
ok
pg_statobjectsmipdegrunfbyteslog disklog
statestate_stampvreportedup up_primaryacting
acting_primarylast_scrub scrub_stamplast_deep_scrub
deep_scrub_stamp
3.815501550650117120359359active 
2015-01-20 12:44:19.54568538'35938:1593[1,3]1 [1,3]
10'02015-01-20 12:44:15.6770780'0 2015-01-20 12:44:15.677078
3.2221702170910163968987987active 
2015-01-20 12:44:19.53959638'98738:1312[3,1]3 [3,1]
30'02015-01-20 12:44:15.6761280'0 2015-01-20 12:44:15.676128
3.1e1790179075078041630013001 active
2015-01-20 12:44:19.53957038'541038:5961 [3,0]3
[3,0]30'02015-01-20 12:44:15.675939 0'02015-01-20 
12:44:15.675939
3.6218201820763363328588588active 
2015-01-20 12:44:19.53971338'58838:932[3,1]3 [3,1]
30'02015-01-20 12:44:15.6808060'0 2015-01-20 12:44:15.680806
3.6317001700713031680340340active 
2015-01-20 12:44:19.54032938'34038:512[3,0]3 [3,0]
30'02015-01-20 12:44:15.6810990'0 2015-01-20 12:44:15.681099
3.1819001900796917760589589active 
2015-01-20 12:44:19.53955038'58938:852[3,0]3 [3,0]
30'02015-01-20 12:44:15.6753450'0 2015-01-20 12:44:15.675345
3.1b20002000838860800734734active 
2015-01-20 12:44:19.53951438'73438:1882[3,0]3 [3,0]
30'02015-01-20 12:44:15.6757380'0 2015-01-20 12:44:15.675738
3.1418501850775946240393393active 
2015-01-20 12:44:19.53949238'39338:965[3,0]3 [3,0]
30'02015-01-20 12:44:15.6751380'0 2015-01-20 12:44:15.675138
3.1018701870780140560606606active 
2015-01-20 12:44:19.54574138'60638:925[1,3]1 [1,3]
10'02015-01-20 12:44:15.6780350'0 2015-01-20 12:44:15.678035
3.1118601860780140544301301active 
2015-01-20 12:44:20.83855038'30138:686[0,2]0 [0,2]
00'02015-01-20 12:44:15.6769080'0 2015-01-20 12:44:15.676908
3.1218701870784334848601601active 
2015-01-20 12:44:19.49926438'60138:1228[2,0]2 [2,0]
20'02015

Re: [ceph-users] Improving Performance with more OSD's?

2014-12-30 Thread Eneko Lacunza

Hi,

On 29/12/14 15:12, Christian Balzer wrote:

3rd Node
  - Monitor only, for quorum
- Intel Nuc
- 8GB RAM
- CPU: Celeron N2820

Uh oh, a bit weak for a monitor. Where does the OS live (on this and the
other nodes)? The leveldb (/var/lib/ceph/..) of the monitors likes it fast,
SSDs preferably.


I have a small setup with such a node (only 4 GB RAM, another 2 good 
nodes for OSD and virtualization) - it works like a charm and CPU max is 
always under 5% in the graphs. It only peaks when backups are dumped to 
its 1TB disk using NFS.

I'd prefer to use the existing third node (the Intel Nuc), but its
expansion is limited to USB3 devices. Are there USB3 external drives
with decent performance stats?


I'd advise against it.
That node doing both monitor and OSDs is not going to end well.
My experience has led me not to trust USB disks for continuous 
operation, I wouldn't do this either.


Just my cents
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Block and NAS Services for Non Linux OS

2014-12-30 Thread Eneko Lacunza

Hi Steven,

Welcome to the list.

On 30/12/14 11:47, Steven Sim wrote:
This is my first posting and I apologize if the content or query is 
not appropriate.


My understanding for CEPH is the block and NAS services are through 
specialized (albeit opensource) kernel modules for Linux.


What about the other OS e.g. Solaris, AIX, Windows, ESX ...

If the solution is to use a proxy, would using the MON servers (as 
iSCSI and NAS proxies) be okay?
Virtual machines see a QEMU IDE/SCSI disk, they don't know whether its 
on ceph, NFS, local, LVM, ... so it works OK for any VM guest SO.


Currently on Proxmox, it's qemu-kvm the ceph (RBD) client, not the linux 
kernel.


What about performance?


It depends a lot on the setup. Do you have something on your mind? :)

Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Improving Performance with more OSD's?

2014-12-30 Thread Eneko Lacunza

Hi,

On 30/12/14 11:55, Lindsay Mathieson wrote:

On Tue, 30 Dec 2014 11:26:08 AM Eneko Lacunza wrote:

  have a small setup with such a node (only 4 GB RAM, another 2 good
nodes for OSD and virtualization) - it works like a charm and CPU max is
always under 5% in the graphs. It only peaks when backups are dumped to
its 1TB disk using NFS.

Yes, CPU has not been a problem for em at all, I even occasional run a windows
VM on the NUC.

Sounds like we have very similar setups - 2 good ndoes that run full osd's,
mon and VM's, and a third smaller node for quorum.

Do you have OSD's on your thrid ndoe as well?
No, I have never had a VM running on it, there are only 6 VMs in this 
cluster and the other 2 nodes have plenty of RAM/CPU for them. I might 
try if one of the good nodes goes down ;)

I'd advise against it.
That node doing both monitor and OSDs is not going to end well.

My experience has led me not to trust USB disks for continuous
operation, I wouldn't do this either.

Yeah, it doesn't sound like a good idea. Pity, the nucs are so small and quiet

Yes. But I think the CPU would become a problem as soon as we put 1-2 
OSDs on that NUC. Maybe with a Core i3 NUC... :)


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Eneko Lacunza

Hi Christian,

Have you tried to migrate the disk from the old storage (pool) to the 
new one?


I think it should show the same problem, but I think it'd be a much 
easier path to recover than the posix copy.


How full is your storage?

Maybe you can customize the crushmap, so that some OSDs are left in the 
bad (default) pool, and other OSDs and set for the new pool. It think 
(I'm yet learning ceph) that this will make different pgs for each pool, 
also different OSDs, may be this way you can overcome the issue.


Cheers
Eneko

On 30/12/14 12:17, Christian Eichelmann wrote:

Hi Nico and all others who answered,

After some more trying to somehow get the pgs in a working state (I've
tried force_create_pg, which was putting then in creating state. But
that was obviously not true, since after rebooting one of the containing
osd's it went back to incomplete), I decided to save what can be saved.

I've created a new pool, created a new image there, mapped the old image
from the old pool and the new image from the new pool to a machine, to
copy data on posix level.

Unfortunately, formatting the image from the new pool hangs after some
time. So it seems that the new pool is suffering from the same problem
as the old pool. Which is totaly not understandable for me.

Right now, it seems like Ceph is giving me no options to either save
some of the still intact rbd volumes, or to create a new pool along the
old one to at least enable our clients to send data to ceph again.

To tell the truth, I guess that will result in the end of our ceph
project (running for already 9 Monthes).

Regards,
Christian

Am 29.12.2014 15:59, schrieb Nico Schottelius:

Hey Christian,

Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:

[incomplete PG / RBD hanging, osd lost also not helping]

that is very interesting to hear, because we had a similar situation
with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg
directories to allow OSDs to start after the disk filled up completly.

So I am sorry not to being able to give you a good hint, but I am very
interested in seeing your problem solved, as it is a show stopper for
us, too. (*)

Cheers,

Nico

(*) We migrated from sheepdog to gluster to ceph and so far sheepdog
 seems to run much smoother. The first one is however not supported
 by opennebula directly, the second one not flexible enough to host
 our heterogeneous infrastructure (mixed disk sizes/amounts) - so we
 are using ceph at the moment.






--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Eneko Lacunza

Hi Christian,

New pool's pgs also show as incomplete?

Did you notice something remarkable in ceph logs in the new pools image 
format?


On 30/12/14 12:31, Christian Eichelmann wrote:

Hi Eneko,

I was trying a rbd cp before, but that was haning as well. But I
couldn't find out if the source image was causing the hang or the
destination image. That's why I decided to try a posix copy.

Our cluster is sill nearly empty (12TB / 867TB). But as far as I
understood (If not, somebody please correct me) placement groups are in
genereally not shared between pools at all.

Regards,
Christian

Am 30.12.2014 12:23, schrieb Eneko Lacunza:

Hi Christian,

Have you tried to migrate the disk from the old storage (pool) to the
new one?

I think it should show the same problem, but I think it'd be a much
easier path to recover than the posix copy.

How full is your storage?

Maybe you can customize the crushmap, so that some OSDs are left in the
bad (default) pool, and other OSDs and set for the new pool. It think
(I'm yet learning ceph) that this will make different pgs for each pool,
also different OSDs, may be this way you can overcome the issue.

Cheers
Eneko

On 30/12/14 12:17, Christian Eichelmann wrote:

Hi Nico and all others who answered,

After some more trying to somehow get the pgs in a working state (I've
tried force_create_pg, which was putting then in creating state. But
that was obviously not true, since after rebooting one of the containing
osd's it went back to incomplete), I decided to save what can be saved.

I've created a new pool, created a new image there, mapped the old image
from the old pool and the new image from the new pool to a machine, to
copy data on posix level.

Unfortunately, formatting the image from the new pool hangs after some
time. So it seems that the new pool is suffering from the same problem
as the old pool. Which is totaly not understandable for me.

Right now, it seems like Ceph is giving me no options to either save
some of the still intact rbd volumes, or to create a new pool along the
old one to at least enable our clients to send data to ceph again.

To tell the truth, I guess that will result in the end of our ceph
project (running for already 9 Monthes).

Regards,
Christian

Am 29.12.2014 15:59, schrieb Nico Schottelius:

Hey Christian,

Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:

[incomplete PG / RBD hanging, osd lost also not helping]

that is very interesting to hear, because we had a similar situation
with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg
directories to allow OSDs to start after the disk filled up completly.

So I am sorry not to being able to give you a good hint, but I am very
interested in seeing your problem solved, as it is a show stopper for
us, too. (*)

Cheers,

Nico

(*) We migrated from sheepdog to gluster to ceph and so far sheepdog
  seems to run much smoother. The first one is however not supported
  by opennebula directly, the second one not flexible enough to host
  our heterogeneous infrastructure (mixed disk sizes/amounts) - so we
  are using ceph at the moment.








--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Block and NAS Services for Non Linux OS

2014-12-30 Thread Eneko Lacunza

Hi Steven,

On 30/12/14 13:26, Steven Sim wrote:


You mentioned that machines see a QEMU IDE/SCSI disk, they don't know 
whether its on ceph, NFS, local, LVM, ... so it works OK for any VM 
guest SO.


But what if I want to CEPH cluster to serve a whole range of clients 
in the data center, ranging from ESXi, Microsoft Hypervisors, Solaris 
(unvirtualized), AIX (unvirtualized) etc ...


Sorry, my mistake, I thought the message was on Proxmox VE list. :-)


In particular, I'm being asked to create a NAS and iSCSI Block storage 
farm with an ability to serve not just Linux but a range of operating 
system(s), some virtualized, some not . ...


I love the distributive nature of CEPH but using Proxy nodes (or 
heads) sort of goes against the distributive concept...
For virtualized VMs, using a virtualization platform that supports 
Ceph/RBD will make the trick.


I'm afraid you'll need proxy nodes for the rest, as pointed by Nick with 
his setup for VMware.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RESOLVED Re: Cluster with pgs in active (unclean) status

2014-12-16 Thread Eneko Lacunza

Hi Gregory,

Sorry for the delay getting back.

There was no activity at all on those 3 pools. Activity on the fourth 
pool was under 1 Mbps of writes.


I think I waited several hours, but I can't recall exactly. One hour at 
least is for sure.


Thanks
Eneko

On 11/12/14 19:32, Gregory Farnum wrote:

Was there any activity against your cluster when you reduced the size
from 3 - 2? I think maybe it was just taking time to percolate
through the system if nothing else was going on. When you reduced them
to size 1 then data needed to be deleted so everything woke up and
started processing.
-Greg

On Wed, Dec 10, 2014 at 5:27 AM, Eneko Lacunza elacu...@binovo.es wrote:

Hi all,

I fixed the issue with the following commands:
# ceph osd pool set data size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set data size 2
# ceph osd pool set metadata size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set metadata size 2
# ceph osd pool set rbd size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set rbd size 2

This now gives me:
# ceph status
 cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
  health HEALTH_OK
  monmap e3: 3 mons at
{0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32,
quorum 0,1,2 1,2,0
  osdmap e275: 2 osds: 2 up, 2 in
   pgmap v395557: 256 pgs, 4 pools, 194 GB data, 49820 objects
 388 GB used, 116 GB / 505 GB avail
  256 active+clean

I'm still curious whether this can be fixed without this trick?

Cheers
Eneko


On 10/12/14 13:14, Eneko Lacunza wrote:

Hi all,

I have a small ceph cluster with just 2 OSDs, latest firefly.

Default data, metadata and rbd pools were created with size=3 and
min_size=1
An additional pool rbd2 was created with size=2 and min_size=1

This would give me a warning status, saying that 64 pgs were active+clean
and 192 active+degraded. (there are 64 pg per pool).

I realized it was due to the size=3 in the three pools, so I changed that
value to 2:
# ceph osd pool set data size 2
# ceph osd pool set metadata size 2
# ceph osd pool set rbd size 2

Those 3 pools are empty. After those commands status would report 64 pgs
active+clean, and 192 pgs active, with a warning saying 192 pgs were
unclean.

I have created a rbd block with:
rbd create -p rbd --image test --size 1024

And now the status is:
# ceph status
 cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
  health HEALTH_WARN 192 pgs stuck unclean; recovery 2/99640 objects
degraded (0.002%)
  monmap e3: 3 mons at
{0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32,
quorum 0,1,2 1,2,0
  osdmap e263: 2 osds: 2 up, 2 in
   pgmap v393763: 256 pgs, 4 pools, 194 GB data, 49820 objects
 388 GB used, 116 GB / 505 GB avail
 2/99640 objects degraded (0.002%)
  192 active
   64 active+clean

Looking to an unclean non-empty pg:
# ceph pg 2.14 query
{ state: active,
   epoch: 263,
   up: [
 0,
 1],
   acting: [
 0,
 1],
   actingbackfill: [
 0,
 1],
   info: { pgid: 2.14,
   last_update: 263'1,
   last_complete: 263'1,
   log_tail: 0'0,
   last_user_version: 1,
   last_backfill: MAX,
   purged_snaps: [],
   history: { epoch_created: 1,
   last_epoch_started: 136,
   last_epoch_clean: 136,
   last_epoch_split: 0,
   same_up_since: 135,
   same_interval_since: 135,
   same_primary_since: 11,
   last_scrub: 0'0,
   last_scrub_stamp: 2014-11-26 12:23:57.023493,
   last_deep_scrub: 0'0,
   last_deep_scrub_stamp: 2014-11-26 12:23:57.023493,
   last_clean_scrub_stamp: 0.00},
   stats: { version: 263'1,
   reported_seq: 306,
   reported_epoch: 263,
   state: active,
   last_fresh: 2014-12-10 12:53:37.766465,
   last_change: 2014-12-10 10:32:24.189000,
   last_active: 2014-12-10 12:53:37.766465,
   last_clean: 0.00,
   last_became_active: 0.00,
   last_unstale: 2014-12-10 12:53:37.766465,
   mapping_epoch: 128,
   log_start: 0'0,
   ondisk_log_start: 0'0,
   created: 1,
   last_epoch_clean: 136,
   parent: 0.0,
   parent_split_bits: 0,
   last_scrub: 0'0,
   last_scrub_stamp: 2014-11-26 12:23:57.023493,
   last_deep_scrub: 0'0,
   last_deep_scrub_stamp: 2014-11-26 12:23:57.023493,
   last_clean_scrub_stamp: 0.00,
   log_size: 1,
   ondisk_log_size: 1,
   stats_invalid: 0,
   stat_sum: { num_bytes: 112,
   num_objects: 1,
   num_object_clones: 0,
   num_object_copies: 2,
   num_objects_missing_on_primary: 0,
   num_objects_degraded: 1,
   num_objects_unfound: 0

[ceph-users] Cluster with pgs in active (unclean) status

2014-12-10 Thread Eneko Lacunza

Hi all,

I have a small ceph cluster with just 2 OSDs, latest firefly.

Default data, metadata and rbd pools were created with size=3 and min_size=1
An additional pool rbd2 was created with size=2 and min_size=1

This would give me a warning status, saying that 64 pgs were 
active+clean and 192 active+degraded. (there are 64 pg per pool).


I realized it was due to the size=3 in the three pools, so I changed 
that value to 2:

# ceph osd pool set data size 2
# ceph osd pool set metadata size 2
# ceph osd pool set rbd size 2

Those 3 pools are empty. After those commands status would report 64 pgs 
active+clean, and 192 pgs active, with a warning saying 192 pgs were 
unclean.


I have created a rbd block with:
rbd create -p rbd --image test --size 1024

And now the status is:
# ceph status
cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
 health HEALTH_WARN 192 pgs stuck unclean; recovery 2/99640 objects 
degraded (0.002%)
 monmap e3: 3 mons at 
{0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 
32, quorum 0,1,2 1,2,0

 osdmap e263: 2 osds: 2 up, 2 in
  pgmap v393763: 256 pgs, 4 pools, 194 GB data, 49820 objects
388 GB used, 116 GB / 505 GB avail
2/99640 objects degraded (0.002%)
 192 active
  64 active+clean

Looking to an unclean non-empty pg:
# ceph pg 2.14 query
{ state: active,
  epoch: 263,
  up: [
0,
1],
  acting: [
0,
1],
  actingbackfill: [
0,
1],
  info: { pgid: 2.14,
  last_update: 263'1,
  last_complete: 263'1,
  log_tail: 0'0,
  last_user_version: 1,
  last_backfill: MAX,
  purged_snaps: [],
  history: { epoch_created: 1,
  last_epoch_started: 136,
  last_epoch_clean: 136,
  last_epoch_split: 0,
  same_up_since: 135,
  same_interval_since: 135,
  same_primary_since: 11,
  last_scrub: 0'0,
  last_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_clean_scrub_stamp: 0.00},
  stats: { version: 263'1,
  reported_seq: 306,
  reported_epoch: 263,
  state: active,
  last_fresh: 2014-12-10 12:53:37.766465,
  last_change: 2014-12-10 10:32:24.189000,
  last_active: 2014-12-10 12:53:37.766465,
  last_clean: 0.00,
  last_became_active: 0.00,
  last_unstale: 2014-12-10 12:53:37.766465,
  mapping_epoch: 128,
  log_start: 0'0,
  ondisk_log_start: 0'0,
  created: 1,
  last_epoch_clean: 136,
  parent: 0.0,
  parent_split_bits: 0,
  last_scrub: 0'0,
  last_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_clean_scrub_stamp: 0.00,
  log_size: 1,
  ondisk_log_size: 1,
  stats_invalid: 0,
  stat_sum: { num_bytes: 112,
  num_objects: 1,
  num_object_clones: 0,
  num_object_copies: 2,
  num_objects_missing_on_primary: 0,
  num_objects_degraded: 1,
  num_objects_unfound: 0,
  num_objects_dirty: 1,
  num_whiteouts: 0,
  num_read: 0,
  num_read_kb: 0,
  num_write: 1,
  num_write_kb: 1,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
  num_deep_scrub_errors: 0,
  num_objects_recovered: 0,
  num_bytes_recovered: 0,
  num_keys_recovered: 0,
  num_objects_omap: 0,
  num_objects_hit_set_archive: 0},
  stat_cat_sum: {},
  up: [
0,
1],
  acting: [
0,
1],
  up_primary: 0,
  acting_primary: 0},
  empty: 0,
  dne: 0,
  incomplete: 0,
  last_epoch_started: 136,
  hit_set_history: { current_last_update: 0'0,
  current_last_stamp: 0.00,
  current_info: { begin: 0.00,
  end: 0.00,
  version: 0'0},
  history: []}},
  peer_info: [
{ peer: 1,
  pgid: 2.14,
  last_update: 263'1,
  last_complete: 263'1,
  log_tail: 0'0,
  last_user_version: 0,
  last_backfill: MAX,
  purged_snaps: [],
  history: { epoch_created: 1,
  last_epoch_started: 136,
  last_epoch_clean: 136,
  last_epoch_split: 0,
  same_up_since: 0,
  same_interval_since: 0,
  same_primary_since: 0,
  last_scrub: 0'0,
  last_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2014-11-26 12:23:57.023493,
 

[ceph-users] RESOLVED Re: Cluster with pgs in active (unclean) status

2014-12-10 Thread Eneko Lacunza

Hi all,

I fixed the issue with the following commands:
# ceph osd pool set data size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set data size 2
# ceph osd pool set metadata size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set metadata size 2
# ceph osd pool set rbd size 1
(wait some seconds for clean+active state of +64pgs)
# ceph osd pool set rbd size 2

This now gives me:
# ceph status
cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
 health HEALTH_OK
 monmap e3: 3 mons at 
{0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 
32, quorum 0,1,2 1,2,0

 osdmap e275: 2 osds: 2 up, 2 in
  pgmap v395557: 256 pgs, 4 pools, 194 GB data, 49820 objects
388 GB used, 116 GB / 505 GB avail
 256 active+clean

I'm still curious whether this can be fixed without this trick?

Cheers
Eneko


On 10/12/14 13:14, Eneko Lacunza wrote:

Hi all,

I have a small ceph cluster with just 2 OSDs, latest firefly.

Default data, metadata and rbd pools were created with size=3 and 
min_size=1

An additional pool rbd2 was created with size=2 and min_size=1

This would give me a warning status, saying that 64 pgs were 
active+clean and 192 active+degraded. (there are 64 pg per pool).


I realized it was due to the size=3 in the three pools, so I changed 
that value to 2:

# ceph osd pool set data size 2
# ceph osd pool set metadata size 2
# ceph osd pool set rbd size 2

Those 3 pools are empty. After those commands status would report 64 
pgs active+clean, and 192 pgs active, with a warning saying 192 pgs 
were unclean.


I have created a rbd block with:
rbd create -p rbd --image test --size 1024

And now the status is:
# ceph status
cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
 health HEALTH_WARN 192 pgs stuck unclean; recovery 2/99640 
objects degraded (0.002%)
 monmap e3: 3 mons at 
{0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election 
epoch 32, quorum 0,1,2 1,2,0

 osdmap e263: 2 osds: 2 up, 2 in
  pgmap v393763: 256 pgs, 4 pools, 194 GB data, 49820 objects
388 GB used, 116 GB / 505 GB avail
2/99640 objects degraded (0.002%)
 192 active
  64 active+clean

Looking to an unclean non-empty pg:
# ceph pg 2.14 query
{ state: active,
  epoch: 263,
  up: [
0,
1],
  acting: [
0,
1],
  actingbackfill: [
0,
1],
  info: { pgid: 2.14,
  last_update: 263'1,
  last_complete: 263'1,
  log_tail: 0'0,
  last_user_version: 1,
  last_backfill: MAX,
  purged_snaps: [],
  history: { epoch_created: 1,
  last_epoch_started: 136,
  last_epoch_clean: 136,
  last_epoch_split: 0,
  same_up_since: 135,
  same_interval_since: 135,
  same_primary_since: 11,
  last_scrub: 0'0,
  last_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_clean_scrub_stamp: 0.00},
  stats: { version: 263'1,
  reported_seq: 306,
  reported_epoch: 263,
  state: active,
  last_fresh: 2014-12-10 12:53:37.766465,
  last_change: 2014-12-10 10:32:24.189000,
  last_active: 2014-12-10 12:53:37.766465,
  last_clean: 0.00,
  last_became_active: 0.00,
  last_unstale: 2014-12-10 12:53:37.766465,
  mapping_epoch: 128,
  log_start: 0'0,
  ondisk_log_start: 0'0,
  created: 1,
  last_epoch_clean: 136,
  parent: 0.0,
  parent_split_bits: 0,
  last_scrub: 0'0,
  last_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2014-11-26 12:23:57.023493,
  last_clean_scrub_stamp: 0.00,
  log_size: 1,
  ondisk_log_size: 1,
  stats_invalid: 0,
  stat_sum: { num_bytes: 112,
  num_objects: 1,
  num_object_clones: 0,
  num_object_copies: 2,
  num_objects_missing_on_primary: 0,
  num_objects_degraded: 1,
  num_objects_unfound: 0,
  num_objects_dirty: 1,
  num_whiteouts: 0,
  num_read: 0,
  num_read_kb: 0,
  num_write: 1,
  num_write_kb: 1,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
  num_deep_scrub_errors: 0,
  num_objects_recovered: 0,
  num_bytes_recovered: 0,
  num_keys_recovered: 0,
  num_objects_omap: 0,
  num_objects_hit_set_archive: 0},
  stat_cat_sum: {},
  up: [
0,
1],
  acting: [
0,
1],
  up_primary: 0,
  acting_primary: 0},
  empty: 0,
  dne: 0

[ceph-users] Suitable SSDs for journal

2014-12-04 Thread Eneko Lacunza

Hi all,

Does anyone know about a list of good and bad SSD disks for OSD journals?

I was pointed to
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

But I was looking for something more complete?

For example, I have a Samsung 840 Pro that gives me even worse 
performance than a Crucial m550... I even thought it was dying (but 
doesn't seem this is the case).


Maybe creating a community-contributed list could be a good idea?

Regards
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Suitable SSDs for journal

2014-12-04 Thread Eneko Lacunza

Thanks, will look back in the list archive.

On 04/12/14 15:47, Nick Fisk wrote:

Hi Eneko,

There has been various discussions on the list previously as to the best SSD 
for Journal use. All of them have pretty much come to the conclusion that the 
Intel S3700 models are the best suited and in fact work out the cheapest in 
terms of write durability.

Nick

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eneko 
Lacunza
Sent: 04 December 2014 14:35
To: Ceph Users
Subject: [ceph-users] Suitable SSDs for journal

Hi all,

Does anyone know about a list of good and bad SSD disks for OSD journals?

I was pointed to
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

But I was looking for something more complete?

For example, I have a Samsung 840 Pro that gives me even worse performance than 
a Crucial m550... I even thought it was dying (but doesn't seem this is the 
case).

Maybe creating a community-contributed list could be a good idea?

Regards
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) 
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com









--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com