[ceph-users] Better way to use osd's of different size

2015-01-14 Thread Межов Игорь Александрович
Hi! We have a small production ceph cluster, based on firefly release. It was built using hardware we already have in our site so it is not new shiny, but works quite good. It was started in 2014.09 as a proof of concept from 4 hosts with 3 x 1tb osd's each: 1U dual socket Intel 54XX

[ceph-users] НА: Better way to use osd's of different size

2015-01-16 Thread Межов Игорь Александрович
me...@yuterra.ru me...@mail.ru +7 915 855 3139 +7 4742 762 909 От: Udo Lembke ulem...@polarzone.de Отправлено: 15 января 2015 г. 10:41 Кому: Межов Игорь Александрович Копия: ceph-users@lists.ceph.com Ceph Users Тема: Re: [ceph-users] Better way to use osd's

[ceph-users] RBD volume to PG mapping

2015-04-20 Thread Межов Игорь Александрович
Hi! In case of scub error we get some PGs in inconsistent state. What is a best method to check, what RBD volumes are mapped into this inconsistent PG? Now we invent a long and not easy way to to this: - from 'ceph health detail' we take PGnums in inconsistent state - we check logs for

[ceph-users] НА: Turning on rbd cache safely

2015-05-05 Thread Межов Игорь Александрович
От: Alexandre DERUMIER aderum...@odiso.com Отправлено: 5 мая 2015 г. 14:28 Кому: Межов Игорь Александрович Копия: ceph-users Тема: Re: [ceph-users] Turning on rbd cache safely Hi, rbd_cache is client config only, so no need to restart osd. if you set cache=writeback in libvirt

[ceph-users] НА: Turning on rbd cache safely

2015-05-05 Thread Межов Игорь Александрович
Hi! Sorry, I've found the reason of this strange results - rbd cache was enabled in local ceph.conf on client node, I used for testing. I remove it from config and get more sane results. On all tests direct=1 iodepth=32 ioengine=aio fio=seqwr bs=4k sync=0 cache=wb - iops=31700,bw=126Mb/s, 75%

[ceph-users] НА: НА: Turning on rbd cache safely

2015-05-05 Thread Межов Игорь Александрович
Hi! Which ceph.conf do you talk about ? The one on host server (on which vm is running) ? Yes, that ceph.conf on client host, which is not part of a ceph cluster (no OSD, no MON) and it used solely to run VMs with RBD backend. Interesting, can you explain this please ? I think, that libvirt

[ceph-users] Turning on rbd cache safely

2015-05-05 Thread Межов Игорь Александрович
Hi! After examining our running OSD configuration through an admin socket we suddenly noticed, that rbd_cache parameter is set to false. Till that moment, I suppose, that rbd cache is entirly client-side feature, and it is enabled with cache=writeback parameter in libvirt VM xml definition.

[ceph-users] Migrating CEPH to different VLAN and IP segment

2015-04-06 Thread Межов Игорь Александрович
Hi! We have a small production ceph cluster, based on firefly release. Now, client and cluster network shared the same IP range and VLAN. Also the same network used for OpenNebula instance, that we use to manage our cloud. This network segment was created some time ago and grows up with

[ceph-users] How to improve latencies and per-VM performance and latencies

2015-05-19 Thread Межов Игорь Александрович
Hi! Seeking performance improvement in our cluster (Firefly 0.80.7 on Wheezy, 5 nodes, 58 osds), I wrote a small python script, that walks through ceph nodes and issue 'perf dump' command on osd admin sockets. It extracts *_latency tuples, calculate min/max/avg, compare osd perf metrics with

[ceph-users] НА: How to improve latencies and per-VM performance and latencies

2015-05-20 Thread Межов Игорь Александрович
slightly. PS: This is my first python script, so suggestions and improvements are welcome ;) Megov Igor CIO, Yuterra От: Michael Kuriger mk7...@yp.com Отправлено: 19 мая 2015 г. 18:51 Кому: Межов Игорь Александрович Тема: Re: [ceph-users] How to improve

[ceph-users] НА: How to improve latencies and per-VM performance

2015-05-21 Thread Межов Игорь Александрович
Кому: Межов Игорь Александрович Копия: ceph-users Тема: Re: [ceph-users] How to improve latencies and per-VM performance and latencies Hi, Just to add, there’s also a collectd plugin at https://github.com/rochaporto/collectd-ceph. Things to check when you have slow read performance is: *) how

[ceph-users] НА: apply/commit latency

2015-06-04 Thread Межов Игорь Александрович
Hi! My deployments have seen many different versions of ceph. Pre 0.80.7, I've seen those numbers being pretty high. After upgrading to 0.80.7, all of a sudden, commit latency of all OSDs drop to 0-1ms, and apply latency remains pretty low most of the time. We use now Ceph 0.80.7-1~bpo70+1

[ceph-users] НА: Slightly OT question - LSI SAS 2308 / 9207-8i performance

2015-06-16 Thread Межов Игорь Александрович
Hi! As I know: C60X = SAS2 = 3Gbps LS2308 = 6Gbps Onboard SATA3 = 6Gbps (usually only 2 ports) Onboard SATA2 = 3Gbps (4-6 ports) We use Intel S2600 motherboards and R2224GZ4 platforms in our Hammer evaluation instance. C60X connected to 4-drive 2.5 bay: 2 small SAS drives for OS. 2xS3700

[ceph-users] НА: Blocked requests/ops?

2015-05-27 Thread Межов Игорь Александрович
Hi! Does this make sense to you? Any other thoughts? Yes, we're use: osd max backfills = 2 osd recovery max active = 2 on 5 node 58 osds cluster. The duration of one OSD full recovery is ~4 hours. Such tuning do not harm client io - we observe only 20-30% performance degradation. Megov Igor

[ceph-users] НА: НА: tcmalloc use a lot of CPU

2015-08-18 Thread Межов Игорь Александрович
От: Luis Periquito periqu...@gmail.com Отправлено: 17 августа 2015 г. 17:15 Кому: Межов Игорь Александрович Копия: YeYin; ceph-users Тема: Re: [ceph-users] НА: tcmalloc use a lot of CPU How big are those OPS? Are they random? How many nodes? How many SSDs/OSDs? What are you using to make

[ceph-users] НА: tcmalloc use a lot of CPU

2015-08-17 Thread Межов Игорь Александрович
Hi! We also observe the same behavior on our test Hammer install, and I wrote about it some time ago: http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609 Jan Schremes give us some suggestions in thread, but we

[ceph-users] НА: НА: CEPH cache layer. Very slow

2015-08-17 Thread Межов Игорь Александрович
Hi! 6 nodes, 70 OSDs (1-2-4Tb sata drives). Ceph used as RBD backstore for VM images (~100VMs). Megov Igor CIO, Yuterra От: Ben Hines bhi...@gmail.com Отправлено: 14 августа 2015 г. 21:01 Кому: Межов Игорь Александрович Копия: Voloshanenko Igor; ceph

[ceph-users] НА: Question

2015-08-18 Thread Межов Игорь Александрович
Hi! You can run mons on the same hosts, though it is not recommemned. MON daemon itself are not resurce hungry - 1-2 cores and 2-4 Gb RAM is enough in most small installs. But there are some pitfalls: - MONs use LevelDB as a backstorage, and widely use direct write to ensure DB consistency. So,

[ceph-users] НА: Rename Ceph cluster

2015-08-19 Thread Межов Игорь Александрович
Hi! I think, that renaming cluster - is not only mv config file. We try to change name of test Hammer cluster, created with ceph-deploy and got some issues. In default install, naming of many parts are derived from cluster name. For example, cephx keys are stored not in

[ceph-users] НА: inconsistent pgs

2015-08-11 Thread Межов Игорь Александрович
Hi! Glad to hear your Ceph is working again! ;) BTW, it is a new knowledge: how ceph behave with bad ram. Do you have memory ECC errors in logs? Linux has EDAC module (I think, it is enabled by default in Debian) which reports any machine errors happening - machine check exeptions, memory

[ceph-users] Ceph allocator and performance

2015-08-11 Thread Межов Игорь Александрович
Hi! We got some strange performance results when running random read fio test on our test Hammer cluster. When we run fio-rbd (4k, randread, 8 jobs, QD=32, 500Gb rbd image) at first time (pagecache is cold/empty) we got ~12kiops sustained performance. It is quite resonable value, as

[ceph-users] Different filesystems on OSD hosts at the same cluster

2015-08-07 Thread Межов Игорь Александрович
Hi! We do some performance tests on our small Hammer install: - Debian Jessie; - Ceph Hammer 0.94.2 self-built from sources (tcmalloc) - 1xE5-2670 + 128Gb RAM - 2 nodes shared with mons, system and mon DB are on separate SAS mirror; - 16 OSD on each node, SAS 10k; - 2 Intel DC S3700 200Gb

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
Hi! Do you have any disk errors in dmesg output? In our practice, every time the deep scrub found inconsistent PG, we also found a disk error, that was the reason. Sometimes it was media errors (bad sectors), one time - bad sata cable and we also had some raid/hba firmware issues. But in all

[ceph-users] НА: Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Межов Игорь Александрович
Hi! No, I was indeed talking about the ext4 journals, e.g. described here: ... but the problem with the persistent device names is keeping me from trying it. So you assume 3-way setup in Ceph: first drive for filesystem data, second drive for filesystem journal and third drive for ceph journal?

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
was written back to lower lier pool? Megov Igor CIO, Yuterra От: Константин Сахинов sakhi...@gmail.com Отправлено: 7 августа 2015 г. 15:39 Кому: Межов Игорь Александрович; ceph-users@lists.ceph.com Тема: Re: [ceph-users] inconsistent pgs It's hard to say now. I

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
Hi! When inconsistent PGs starting to appear? Maybe after some event? Hang, node reboot or after reconfiguration or changing parameters? Can you say, what triggers such behaviour? And, BTW, what system/kernel you use? Megov Igor CIO, Yuterra ___

[ceph-users] НА: inconsistent pgs

2015-08-07 Thread Межов Игорь Александрович
journal from PID 1 [5.533922] type=1305 audit(1438754526.111:4): audit_pid=626 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 пт, 7 авг. 2015 г. в 14:08, Межов Игорь Александрович me...@yuterra.rumailto:me...@yuterra.ru: Hi! Do you have any disk errors in dmesg

[ceph-users] НА: CEPH cache layer. Very slow

2015-08-14 Thread Межов Игорь Александрович
Hi! Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for ceph journals and DC S3700 400Gb in the SSD pool: same hosts, separate root in crushmap. SSD pool are not yet in production, journаlling SSDs works under production load for 10 months. They're in good condition - no

[ceph-users] Fixing inconsistency

2015-11-16 Thread Межов Игорь Александрович
Hi! We have a hard crash on one node - it hangs in an indefinite state and do not respond neither network requests, nor even console commands. After node restart, all OSDs successfully mount their filesystems (ext4) and rejoin the cluster. Some time later, scrub process found two errors. The

[ceph-users] НА: Is Ceph appropriate for small installations?

2015-08-31 Thread Межов Игорь Александрович
Hi! >Hi, I can reach 60 iops 4k read with 3 nodes (6ssd each). It is very interesting! Can you give any details about your config? We can't get more than ~40kiops 4k random reads from 2node x 2ssd pool. :( Under load our SSDs give ~8kiops each, and that is far too low for Intel DC S3700

[ceph-users] Deep scrubbing OSD

2015-09-04 Thread Межов Игорь Александрович
Hi! Just one simple question: how can we see, when deep-scrub of osd complete, if we execute 'ceph osd deep-scrub ' command? Megov Igor CIO, Yuterra ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] НА: XFS and nobarriers on Intel SSD

2015-09-07 Thread Межов Игорь Александрович
; Отправлено: 7 сентября 2015 г. 14:10 Кому: Межов Игорь Александрович Копия: ceph-us...@ceph.com; Richard Bade Тема: Re: [ceph-users] XFS and nobarriers on Intel SSD Is this based on LSI? I don't think so. 03:00.0 Serial Attached SCSI controller: Intel Corporation C606 chipset Dual 4-Port SATA

[ceph-users] НА: XFS and nobarriers on Intel SSD

2015-09-07 Thread Межов Игорь Александрович
Hi! >And for the record, _ALL_ the drives I tested are faster on Intel SAS than on >LSI (2308) and >often faster on a regular SATA AHCI then on their "high throughput" HBAs. But most of Intel HBAs are LSI based. It is the same chips with slightly different firmware, i think. We use RS2MB044,

[ceph-users] НА: which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Межов Игорь Александрович
Hi! Have worked with Intel DC S3700 200Gb. Due to budget restrictions, one ssd hosts a system volume and 1:12 OSD journals. 6 nodes, 120Tb raw space. Cluster serves as RBD storage for ~100VM. Not a single failure per year - all devices are healthy. The remainig resource (by smart) is ~92%.

[ceph-users] НА: НА: which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-07 Thread Межов Игорь Александрович
Igor CIO, Yuterra От: Christian Balzer <ch...@gol.com> Отправлено: 5 сентября 2015 г. 5:36 Кому: ceph-users Копия: Межов Игорь Александрович Тема: Re: [ceph-users] НА: which SSD / experiences withSamsung 843T vs. Intel s3700 Hello, On Fri, 4 Sep 2015 22:37:06 + Межов И

[ceph-users] НА: Ceph cache-pool overflow

2015-09-07 Thread Межов Игорь Александрович
Hi! Because distribution is computed with CRUSH rules algoritmically. So, as with any other hash algorithms, the result will depend of the 'data' itself. In ceph, 'data' - is the object name. Imagine, that you have a simple plain hashtable with 17 buckets. Bucket index is computed by a simple

[ceph-users] НА: Changing monitors whilst running OpenNebula VMs

2015-09-30 Thread Межов Игорь Александрович
Hi! Yes, we did exactly the same and do not have practically any problems except some minor issues with recreatind VMs. At first, OpenNebula use specified in a template ceph monitors only when creating VM or migrating it. This template values passed as qemu parameters, when bootstrapping VM.

[ceph-users] НА: Ceph, SSD, and NVMe

2015-10-02 Thread Межов Игорь Александрович
Hi! Yes, we run a small Hammer cluster in production. Initially is was a 6-node Firefly installation on slightly outdated hardware: - Intel 56XX platforms, - 32-48Gb RAM, - 70 SATA OSDs (1tb/2tb), - SSD journals on DC S3700 200Gb, - 10Gbit interconnect - ~100 VM images (RBD only) To

[ceph-users] НА: НА: How to get RBD volume to PG mapping?

2015-09-28 Thread Межов Игорь Александрович
Megov Igor CIO, Yuterra От: Ilya Dryomov <idryo...@gmail.com> Отправлено: 25 сентября 2015 г. 18:21 Кому: Межов Игорь Александрович Копия: David Burley; Jan Schermer; ceph-users Тема: Re: [ceph-users] НА: How to get RBD volume to PG mapping? On Fri, Sep 25, 2015 at 5

[ceph-users] НА: How to get RBD volume to PG mapping?

2015-09-25 Thread Межов Игорь Александрович
ceph.com> от имени David Burley <da...@slashdotmedia.com> Отправлено: 25 сентября 2015 г. 17:15 Кому: Jan Schermer Копия: ceph-users; Межов Игорь Александрович Тема: Re: [ceph-users] How to get RBD volume to PG mapping? So I had two ideas here: 1. Use find as Jan suggested. Y

[ceph-users] НА: Uneven data distribution across OSDs

2015-09-22 Thread Межов Игорь Александрович
Hi! It will be difficult to evenly distribute data with such difference in disk sizes. You can adjust a weight of most filled up ODSs with command #ceph osd reweight where new weight is a float in range 0.0-1.0. When you lower the weight of OSD, some PG will move from it to another location,

[ceph-users] How to get RBD volume to PG mapping?

2015-09-25 Thread Межов Игорь Александрович
Hi! Last week I wrote, that one PG in our Firefly stuck in degraded state with 2 replicas instead of 3 and do not try to backfill or recovery. We try to investigate, what RBD vol's are affected. The working plan are inspired by Sebastian Han's snippet

[ceph-users] НА: OSD on XFS ENOSPC at 84% data / 5% inode and inode64?

2015-11-25 Thread Межов Игорь Александрович
Hi! >After our trouble with ext4/xattr soft lockup kernel bug we started >moving some of our OSD to XFS, we're using ubuntu 14.04 3.19 kernel >and ceph 0.94.5. It was a rather serious bug, but there is small a patch at kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=107301

Re: [ceph-users] Fixing inconsistency

2015-11-25 Thread Межов Игорь Александрович
Hi! >I think the only time we've seen this was when there was some kind of >XFS corruption that accidentally extended the size of the file on >disk, and the object info was correct with its shorter size. But >perhaps not, in which case I've no idea how this could have happened. We use ext4

[ceph-users] НА: network failover with public/custer network - is that possible

2015-11-30 Thread Межов Игорь Александрович
Hi! Götz Reinicke wrotes: >>What if one of the networks fail? e.g. just on one host or the whole >>network for all nodes? >>Is there some sort of auto failover to use the other network for alltraffic >>than? >>How dose that work in real life? :) Or do I have to interact by hand Alex Gorbachev

Re: [ceph-users] Fixing inconsistency

2015-11-18 Thread Межов Игорь Александрович
Hi! As for my previous message, digging mailing list gave me only one method to fix inconsistency - truncate object files in a filesystem to a size, that they have in ceph metadata: http://www.spinics.net/lists/ceph-users/msg00794.html But in that issue, metadata size was bigger, than ondisk,