[ceph-users] incomplete pg, recovery some data
Hi, After some hardware errors one of pg in our backup server is 'incomplete'. I do export pg without problems like here: https://ceph.com/community/incomplete-pgs-oh-my/ After remove pg from all osd's and import pg to one of osd pg is still 'incomplete'. I want to recover only some pice of data from this rbd so if I lost something then nothing happened. How can I tell ceph to accept this pg as complete and clean? ceph health detail HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean pg 0.109 is stuck inactive since forever, current state incomplete, last acting [9,13] pg 0.109 is stuck unclean since forever, current state incomplete, last acting [9,13] pg 0.109 is incomplete, acting [9,13] Ceph pg 0.109 query In attachment. Regards, Mateusz { state: incomplete, snap_trimq: [], epoch: 6310, up: [ 9, 13], acting: [ 9, 13], info: { pgid: 0.109, last_update: 0'0, last_complete: 0'0, log_tail: 0'0, last_user_version: 0, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 1, last_epoch_started: 4101, last_epoch_clean: 4089, last_epoch_split: 0, same_up_since: 6306, same_interval_since: 6306, same_primary_since: 6304, last_scrub: 3249'1096189, last_scrub_stamp: 2015-06-05 14:50:50.378387, last_deep_scrub: 3084'1088300, last_deep_scrub_stamp: 2015-05-31 13:56:29.394517, last_clean_scrub_stamp: 2015-06-05 14:50:50.378387}, stats: { version: 0'0, reported_seq: 7, reported_epoch: 6310, state: incomplete, last_fresh: 2015-06-18 12:54:14.562011, last_change: 2015-06-18 12:53:05.172499, last_active: 0.00, last_clean: 0.00, last_became_active: 0.00, last_unstale: 2015-06-18 12:54:14.562011, last_undegraded: 2015-06-18 12:54:14.562011, last_fullsized: 2015-06-18 12:54:14.562011, mapping_epoch: 6306, log_start: 0'0, ondisk_log_start: 0'0, created: 1, last_epoch_clean: 4089, parent: 0.0, parent_split_bits: 0, last_scrub: 3249'1096189, last_scrub_stamp: 2015-06-05 14:50:50.378387, last_deep_scrub: 3084'1088300, last_deep_scrub_stamp: 2015-05-31 13:56:29.394517, last_clean_scrub_stamp: 2015-06-05 14:50:50.378387, log_size: 0, ondisk_log_size: 0, stats_invalid: 0, stat_sum: { num_bytes: 0, num_objects: 0, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_misplaced: 0, num_objects_unfound: 0, num_objects_dirty: 0, num_whiteouts: 0, num_read: 0, num_read_kb: 0, num_write: 0, num_write_kb: 0, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 0, num_bytes_recovered: 0, num_keys_recovered: 0, num_objects_omap: 0, num_objects_hit_set_archive: 0, num_bytes_hit_set_archive: 0}, stat_cat_sum: {}, up: [ 9, 13], acting: [ 9, 13], blocked_by: [], up_primary: 9, acting_primary: 9}, empty: 1, dne: 0, incomplete: 0, last_epoch_started: 0, hit_set_history: { current_last_update: 0'0, current_last_stamp: 0.00, current_info: { begin: 0.00, end: 0.00, version: 0'0}, history: []}}, peer_info: [ { peer: 2, pgid: 0.109, last_update: 0'0, last_complete: 0'0, log_tail: 0'0, last_user_version: 0, last_backfill: MAX, purged_snaps: [], history: { epoch_created: 0, last_epoch_started: 0, last_epoch_clean: 0, last_epoch_split: 0, same_up_since: 0, same_interval_since: 0, same_primary_since: 0, last_scrub: 0'0, last_scrub_stamp: 0.00, last_deep_scrub: 0'0, last_deep_scrub_stamp: 0.00, last_clean_scrub_stamp: 0.00}, stats: { version: 0'0, reported_seq: 0, reported_epoch: 0, state: inactive, last_fresh: 0.00, last_change: 0.00, last_active: 0.00, last_clean: 0.00, last_became_active: 0.00,
Re: [ceph-users] 403-Forbidden error using radosgw
I am also having same issue can somebody help me out. But for me it is HTTP/1.1 404 Not Found. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd performance issue - can't find bottleneck
Hi, On 06/18/2015 12:54 PM, Alexandre DERUMIER wrote: Hi, for read benchmark with fio, what is the iodepth ? my fio 4k randr results with iodepth=1 : bw=6795.1KB/s, iops=1698 iodepth=2 : bw=14608KB/s, iops=3652 iodepth=4 : bw=32686KB/s, iops=8171 iodepth=8 : bw=76175KB/s, iops=19043 iodepth=16 :bw=173651KB/s, iops=43412 iodepth=32 :bw=336719KB/s, iops=84179 I'm trying multiple versions - from one job and iodepth=1 to 16 jobs with iodepth 32, similar to what You do. I'm less worried about the bandwidth now, since I found out about the Intel SSD 530 problem (the dsync stuff). I'm worried about iops - when I test it locally I get the expected ~40k iops on a ssd drive, but when I do it from a client I get 2-4k iops.. (This should be similar with rados bench -t (threads) option). This is normal because of network latencies + ceph latencies. Doing more parallism increase iops. yes, I'm expecting that, but for now I can't get close to what I should see using SSD as an OSD in ceph.. (doing a bench with dd = iodepth=1) I'm only using dd to test seq read/write speed. Theses result are with 1 client/rbd volume. now with more fio client (numjobs=X) I can reach up to 300kiops with 8-10 clients. I would love to see these results in my setup :) J ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd performance issue - can't find bottleneck
On 06/18/2015 12:23 PM, Mark Nelson wrote: so.. in order to increase performance, do I need to change the ssd drives? I'm just guessing, but because your read performance is slow as well, you may multiple issues going on. The Intel 530 being slow at O_DSYNC writes is one of them, but it's possible there is something else too. If I were in your position I think I'd try to beg/borrw/steal a single DC S3700 or even 520 (despite it's presumed lack of safety) and just see how a single OSD cluster using it does on your setup before replacing everything. Oh, sorry - this was my bad, I was doing different test with different setups to find out what might be the problem. I thought that maybe the mellanox network hardware/setup is the problem (wouldn't know why, but I wanted to check) so I switched the servers to use 1Gbps network cards and thus the slow read results. After I switched back to 56Gbps network, sequential read/write tests are satisfactory: root@cf03:/ceph/tmp# dd if=/dev/zero of=test bs=100M count=100 oflag=direct 100+0 records in 100+0 records out 1048576 bytes (10 GB) copied, 27.0479 s, 388 MB/s root@cf03:/ceph/tmp# dd if=test of=/dev/null bs=100M iflag=direct 100+0 records in 100+0 records out 1048576 bytes (10 GB) copied, 7.30296 s, 1.4 GB/s and now rados bench shows: root@cf03:~# rados -p rbd bench 30 rand sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 208 192 767.782 768 0.084049 0.0796911 2 16 390 374 747.833 728 0.055108 0.0834168 3 16 579 563 750.523 756 0.080945 0.0841484 4 16 756 740 739.865 708 0.119879 0.0853113 5 16 942 926 740.668 744 0.131534 0.085389 6 16 1128 1112 741.207 744 0.085159 0.0857775 7 16 1314 1298 741.587 744 0.137615 0.0857103 8 16 1496 1480 739.877 728 0.047122 0.0858808 9 16 1678 1662 738.548 728 0.118557 0.0860778 10 16 1866 1850 739.882 752 0.07375 0.0861203 11 16 2054 2038 740.974 752 0.053814 0.0860436 12 16 2247 2231743.55 772 0.101077 0.0857194 13 16 2430 2414 742.652 732 0.038217 0.0856958 14 16 2592 2576 735.886 648 0.014755 0.0864883 15 16 2764 2748 732.688 688 0.125262 0.0870332 16 16 2934 2918729.39 680 0.144276 0.0873883 17 16 3109 3093 727.655 700 0.05022 0.0876425 18 16 3274 3258 723.892 660 0.027348 0.0880826 19 16 3428 3412 718.209 616 0.145429 0.0888024 20 16 3590 3574 714.695 648 0.145609 0.0892346 21 16 3753 3737 711.704 652 0.146557 0.08958 22 16 3914 3898 708.623 644 0.164886 0.0900086 23 16 4077 4061 706.158 652 0.021976 0.0903442 24 16 4243 4227 704.398 664 0.013213 0.0905628 25 16 4409 4393 702.779 664 0.039111 0.0908182 26 16 4576 4560 701.438 668 0.179205 0.0909782 27 16 4744 4728 700.344 672 0.176603 0.0911509 28 16 4924 4908 701.043 720 0.062736 0.0911056 29 16 5107 5091 702.107 732 0.103679 0.0910063 30 16 5294 5278 703.633 748 0.078924 0.0908063 Total time run:30.105242 Total reads made: 5294 Read size:4194304 Bandwidth (MB/sec):703.399 Average Latency: 0.0909628 Max latency: 0.198346 Min latency: 0.00676 ..but unfortunately fio still shows low iops - 2-4k... J -- Jacek Jarosiewicz Administrator Systemów Informatycznych SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy 42.756.000 zł NIP: 957-05-49-503 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa SUPERMEDIA - http://www.supermedia.pl dostep do internetu - hosting - kolokacja - lacza - telefonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw did not create auth url for swift
can you please let me know if you solved this issue please ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] best Linux distro for Ceph
Hi Shane, We (Bloomberg) have many large clusters and we currently use Ubuntu. We have just recently upgraded to Trusty (14.04). Our new super object store that we're building out is using Trusty but we may switch to RHEL because of other departments joining in - final decision has not been made. However, our OpenStack clusters will stay Ubuntu. Thanks, Chris On Wed, Jun 17, 2015 at 2:06 PM, Shane Gibson shane_gib...@symantec.com wrote: Ok - I know this post has the potential to spread to unsavory corners of discussion about the best linux distro ... blah blah blah ... please, don't let it go there ... ! I'm seeking some input from people that have been running larger Ceph clusters ... on the order of 100s of physical servers with thousands of OSDs in them. Our primary use case is Object via Swift API integration and adding Block store capability for both OpenStack/KVM backing VMs, as well as general use for various block store scenarios. We'd *like* to look at CephFS, and I'm heartened to see a kernel module (over the FUSE based), and a growing use base around it, and hoping production ready will soon be stamped on CephFS ... We currently deploy Ubuntu (primarily Trusty - 14.04), and CentOS 7.1. Currently we've been testing our Ceph clusters on both, but our preference as an organization is CentOS 7.1.1503 (currently). However - I see a lot of noise in the list about needing to track the more modern kernel versions as opposed to the already dated 3.10.x that CentOS 7.1 deploys. Yes, I know RH and community backport a lot of the newer kernel features to their kernel version ... but ... not everything gets backported. Can someone out there with real world, larger scale Ceph cluster operational experience provide a guideline on the Linux Distro they deploy/use, and works well with Ceph, and is more inline with keeping up with modern kernel versions ... without crossing the line in to the bleeding and painful edge versions ... ? Thank you ... ~~shane ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Chris Jones http://www.cloudm2.com cjo...@cloudm2.com (p) 770.655.0770 This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Journal creation ?
The journal should be a raw partition and should not have any filesystem on it. Inside your /var/lib/ceph/osd/ceph-# you should make symlink to the journal partition that you are going to use for that osd. On Thu, Jun 18, 2015 at 2:36 AM, Shane Gibson shane_gib...@symantec.com wrote: All - I am building my first ceph cluster, and doing it the hard way, manually without the aid of ceph-deploy. I have successfully built the mon cluster and am now adding OSDs. My main question: How do I prepare the Journal prior to the prepare/activate stages of the OSD creation? More details: Basically - all of the documentation seems to assume the journal is prepared. Do I simply create a single raw partition on a physical device and the ceph-disk prepare... and ceph-disk activate... steps will take care of everything for the journal ... presumably based on the ceph-disk prepare ... --type filesystem setting? Or do I need to actually format it as a filesystem prior to giving it over to the Ceph OSD ??? The architecture I'm thinking of is as follows - based on the hardware I have for OSDs (currenly 9 servers each with): RAID 0 mirror for OS hard drives (2 disks) data disk for journal placement for 5 physical disks (4TB) data disk for journal placement for 5 physical disks (4TB) 10 data disks as OSDs (one OSD per disk) (4TB each) Essentially - there are 12 data disks in the node (all 4 TB 7200 rpm spinning disks). Splitting the Journal across two of them gives me a failure domain of 5 data disks + 1 journal disk in a single physical server for crush map purposes ... It also vaguely helps spread the I/O workload for the journaling activity across 2 physical disks in a chassis instead of a one (since the journal disk is pretty darn slow). In this configuration I'd create 5 separate partitions on Journal Disk A and 5 on Journal Disk B ... but do they need to be formatted and mounted? Yes, we know as we go to more real production workloads, we'll want/need to change this for performance reasons - eg the Journal on SSDs ... Any pointers on where I missed this info in the documentation would be helpful too ... I've been all over the ceph.com/docs/ site and haven't found it yet... Thanks, ~~shane ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node
Hello Everyone, I have setup a new cluster with Ceph-hammer version (0.94.2 The install went through fine without any issues but from the admin node I am not able to execute any of the Ceph commands Error: root@ceph-main:/cephcluster# ceph auth export 2015-06-18 12:43:28.922367 7f54d286b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2015-06-18 12:43:28.922375 7f54d286b700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound I googled for this and only found one article relevant, but it did not solve my problem. http://t75390.file-systems-ceph-user.file-systemstalk.us/newbie-error-connecting-to-cluster-permissionerror-t75390.html Is there any other workaround or fix for this ?? Regards Teclus Dsouza Technical Architect Tech Mahindra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node
Do you have admin keyring in /etc/ceph directory? From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco) Sent: Thursday, June 18, 2015 10:35 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Importance: High Hello Everyone, I have setup a new cluster with Ceph-hammer version (0.94.2 The install went through fine without any issues but from the admin node I am not able to execute any of the Ceph commands Error: root@ceph-main:/cephcluster# ceph auth export 2015-06-18 12:43:28.922367 7f54d286b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2015-06-18 12:43:28.922375 7f54d286b700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound I googled for this and only found one article relevant, but it did not solve my problem. http://t75390.file-systems-ceph-user.file-systemstalk.us/newbie-error-connecting-to-cluster-permissionerror-t75390.html Is there any other workaround or fix for this ?? Regards Teclus Dsouza Technical Architect Tech Mahindra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node
Hello Naga, The keyring file is present under a folder I created for ceph. Are you saying the same needs to be copied to the /etc/ceph folder ? Regards Teclus From: B, Naga Venkata [mailto:nag...@hp.com] Sent: Thursday, June 18, 2015 10:37 PM To: Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco); ceph-users@lists.ceph.com Subject: RE: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Do you have admin keyring in /etc/ceph directory? From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco) Sent: Thursday, June 18, 2015 10:35 PM To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Importance: High Hello Everyone, I have setup a new cluster with Ceph-hammer version (0.94.2 The install went through fine without any issues but from the admin node I am not able to execute any of the Ceph commands Error: root@ceph-main:/cephcluster# ceph auth export 2015-06-18 12:43:28.922367 7f54d286b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2015-06-18 12:43:28.922375 7f54d286b700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound I googled for this and only found one article relevant, but it did not solve my problem. http://t75390.file-systems-ceph-user.file-systemstalk.us/newbie-error-connecting-to-cluster-permissionerror-t75390.html Is there any other workaround or fix for this ?? Regards Teclus Dsouza Technical Architect Tech Mahindra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node
And also this needs the correct permission set as otherwise it will give this error. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of B, Naga Venkata Sent: Thursday, June 18, 2015 10:07 AM To: Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco); ceph-users@lists.ceph.com Subject: Re: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Do you have admin keyring in /etc/ceph directory? From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco) Sent: Thursday, June 18, 2015 10:35 PM To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Importance: High Hello Everyone, I have setup a new cluster with Ceph-hammer version (0.94.2 The install went through fine without any issues but from the admin node I am not able to execute any of the Ceph commands Error: root@ceph-main:/cephcluster# ceph auth export 2015-06-18 12:43:28.922367 7f54d286b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2015-06-18 12:43:28.922375 7f54d286b700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound I googled for this and only found one article relevant, but it did not solve my problem. http://t75390.file-systems-ceph-user.file-systemstalk.us/newbie-error-connecting-to-cluster-permissionerror-t75390.html Is there any other workaround or fix for this ?? Regards Teclus Dsouza Technical Architect Tech Mahindra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] intel atom erasure coded pool
Has there been any testing/feedback on using the 8-core intel atom c2750 with EC pools? Or any use case really? There are some enticing 1U 12x3.5’ chassis out there with with atom processor. The idea of low-power, dense, EC pool storage has a lot of appeal. We’re looking to build out a pretty cold EC pool (media storage strong hot/cold skew) behind a large-ish NVME cache tier. Thanks! -Reid ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] keyring getting overwritten by mon generated bootstrap-osd keyring
Dear Ceph Community, We are fetching from our own encrypted data bags the mon and osd bootstrap keyring values. We are successful in setting the mon_secret to a preset value, but fail to do so for the /var/lib/ceph/boostrap-osd keyring. Similar to how we set mon_secret, we set osd_secret. We added log messages printing out the osd_secret in the ceph community cookbook recipe osd.rb. This value logged correctly in our chef client log. However, after chef completes the ceph osd recipe, the /var/lib/ceph/boostrap-osd/ceph.keyring file is not the same value as the intended osd_secret. It is overwritten by the bootstrap-odd keyring value created during the mon recipe. Since it is being reverted, how can we set the initial /var/lib/ceph/boostrap-osd/ceph.keyring file to start out with the correct value? We see the bootstrap-osd keyring file is being created during the mon installation, but we are not sure where and how to set the bootstrap-osd/keyring value. Sincerely, Johanni B. Thunstrom ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] SSD test results with Plextor M6 Pro, HyperX Fury, Kingston V300, ADATA SP90
Hello everybody, I thought I would share the benchmarks from these four ssd's I tested (see attachment) I do still have some question: #1 *Data Set Management TRIM supported (limit 1 block) vs *Data Set Management TRIM supported (limit 8 blocks) and how this effects Ceph and also how can I test if TRIM is actually working and not corruption data. #2 are there other things I should test to compare ssd's for Ceph Journals #3 are the power loss security mechanisms on SSD relevant in Ceph when configured in a way that a full node can fully die and that a power loss of all nodes at the same time should not be possible (or has an extreme low probability) #4 how to benchmarks the OSD (disk+ssd-journal) combination so I can compare them. I got some other benchmarks question, but I will make an separate mail for them. Kind regards, Jelle de Jong #--- # Plextor M6 Pro 128G root@ceph01:~# uname -a Linux ceph01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux root@ceph01:~# smartctl -i /dev/sdc smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: PLEXTOR PX-128M6Pro Serial Number:P02441106228 LU WWN Device Id: 5 002303 1002de43e Add. Product Id: NC702090 Firmware Version: 1.02 User Capacity:128,035,676,160 bytes [128 GB] Sector Size: 512 bytes logical/physical Rotation Rate:Solid State Device Device is:Not in smartctl database [for details use: -P showall] ATA Version is: ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 4a SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is:Thu Jun 18 15:46:33 2015 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled root@ceph01:~# hdparm -I /dev/sdc | grep TRIM *Data Set Management TRIM supported (limit 8 blocks) root@ceph01:~# hdparm -W 0 /dev/sdc 0 fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k --numjobs=2 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k --numjobs=4 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k --numjobs=8 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k --numjobs=16 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k --numjobs=32 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k --numjobs=64 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test 1# write: io=163136KB, bw=2718.1KB/s, iops=679, runt= 60001msec 2# write: io=323768KB, bw=5396.5KB/s, iops=1349, runt= 60001msec 4# write: io=643624KB, bw=10727KB/s, iops=2681, runt= 60001msec 8# write: io=1238.3MB, bw=21132KB/s, iops=5283, runt= 60002msec 16# write: io=2218.9MB, bw=37868KB/s, iops=9466, runt= 60001msec 32# write: io=3342.7MB, bw=57045KB/s, iops=14261, runt= 60003msec 64# write: io=3149.6MB, bw=53745KB/s, iops=13436, runt= 60007msec # second run after testing the other ssd's 1# write: io=162100KB, bw=2701.7KB/s, iops=675, runt= 60001msec 2# write: io=321076KB, bw=5351.2KB/s, iops=1337, runt= 60001msec 4# write: io=641076KB, bw=10684KB/s, iops=2671, runt= 60001msec 8# write: io=1230.5MB, bw=20999KB/s, iops=5249, runt= 60002msec 16# write: io=2199.9MB, bw=37543KB/s, iops=9385, runt= 60002msec 32# write: io=3367.4MB, bw=57467KB/s, iops=14366, runt= 60002msec 64# write: io=3270.5MB, bw=55809KB/s, iops=13952, runt= 60006msec root@ceph01:~# dd if=/dev/zero of=/dev/sdc bs=4k count=1 oflag=direct,dsync 1+0 records in 1+0 records out 4096 bytes (41 MB) copied, 14.6745 s, 2.8 MB/s root@ceph01:~# dmidecode -t system # dmidecode 2.12 SMBIOS 2.6 present. Handle 0x0002, DMI type 1, 27 bytes System Information Manufacturer: Hewlett-Packard Product Name: HP Z600 Workstation Version: Serial Number: CZC0121R1J UUID: CD0720D9-378D-11DF-BBDA-05C40AB118A9 Wake-up Type: Power Switch SKU Number: FW863AV Family: 103C_53335X Handle 0x004B, DMI type 32, 11 bytes System Boot Information Status: No errors detected #--- # Kingston HyperX Fury 120G root@ceph01:~# uname -a Linux ceph01 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux root@ceph01:~# smartctl -i
Re: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node
For the permissions use sudo chmod +r /etc/ceph/ceph.client.admin.keyring From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco) Sent: Thursday, June 18, 2015 10:21 AM To: B, Naga Venkata; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Hello Naga, The keyring file is present under a folder I created for ceph. Are you saying the same needs to be copied to the /etc/ceph folder ? Regards Teclus From: B, Naga Venkata [mailto:nag...@hp.com] Sent: Thursday, June 18, 2015 10:37 PM To: Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco); ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: RE: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Do you have admin keyring in /etc/ceph directory? From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Teclus Dsouza -X (teclus - TECH MAHINDRA LIM at Cisco) Sent: Thursday, June 18, 2015 10:35 PM To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: [ceph-users] Hammer 0.94.2: Error when running commands on CEPH admin node Importance: High Hello Everyone, I have setup a new cluster with Ceph-hammer version (0.94.2 The install went through fine without any issues but from the admin node I am not able to execute any of the Ceph commands Error: root@ceph-main:/cephcluster# ceph auth export 2015-06-18 12:43:28.922367 7f54d286b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2015-06-18 12:43:28.922375 7f54d286b700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound I googled for this and only found one article relevant, but it did not solve my problem. http://t75390.file-systems-ceph-user.file-systemstalk.us/newbie-error-connecting-to-cluster-permissionerror-t75390.html Is there any other workaround or fix for this ?? Regards Teclus Dsouza Technical Architect Tech Mahindra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD test results with Plextor M6 Pro, HyperX Fury, Kingston V300, ADATA SP90
Hello, On Thu, 18 Jun 2015 17:48:12 +0200 Jelle de Jong wrote: Hello everybody, I thought I would share the benchmarks from these four ssd's I tested (see attachment) Neither of these are DC level SSDs of course, though the HyperX at least supposedly can handle 2.5 DWPD. Alas that info is only on the the PDF, not the web page specifications and that PDF also says not for servers, no siree. Which can mean a lot of things, the worst would be something like going _very_ slow when doing housekeeping or the likes. I do still have some question: #1 *Data Set Management TRIM supported (limit 1 block) vs *Data Set Management TRIM supported (limit 8 blocks) and how this effects Ceph and also how can I test if TRIM is actually working and not corruption data. I would not deploy any SSDs that actually require TRIM to maintain their speed or TBW endurance. And I wouldn't want Ceph to do TRIMs due to the corruption issues you already are aware of. And last but not least, TRIM makes little to no sense with Ceph journals. These are raw partitions, so Ceph would need to issue the TRIM commands. And they are constantly being overwritten, trimming them would be detrimental to the performance for sure. #2 are there other things I should test to compare ssd's for Ceph Journals TBW/$. I couldn't find the endurance data for the Plextor at all. I have a cluster with journal SSDs that experience average 2MB/s writes, so in 5 years that makes 315TB. Just shy of the 354TB the 128GB HyperX promises. First rule of engineering, overspec by at lest 100%, so the 240GB model would be a fit. If one were to use such drives in the first place. #3 are the power loss security mechanisms on SSD relevant in Ceph when configured in a way that a full node can fully die and that a power loss of all nodes at the same time should not be possible (or has an extreme low probability) A full node death is often something you can recover from much faster than a dead OSD (usually no data loss, just reboot it) and if Ceph is configured correctly (mon_osd_down_out_subtree_limit = host) with very little impact when it comes back. If your journals are hosed because of a power loss, all the associated OSDs are dead until you either recreate the journal (if possible) or in the worst case (OSD HDD also hosed) the entire OSD. That said, I personally consider total power loss scenarios in the DCs we use to be very, very unlikely as well. Others here will strongly disagree with that, based on their experience. Penultimately that doesn't stop folks from accidentally powering off or unplugging servers. And I have seen SSDs w/o power loss protection getting hosed in such scenarios while ones with it had no issues. #4 how to benchmarks the OSD (disk+ssd-journal) combination so I can compare them. There are plenty of examples in the archives, from rados bench to fio with rbd ioengine to running fio in a VM (for most people the most realistic test). Block size will have of course a dramatic impact on throughput, IOPS and CPU utilization. The fio and dd tests you did are an indication of the capabilities of those SSDs, those numbers however don't translate directly to Ceph. Also, once your SSDs are fast enough to ACK things in a timely fashion, your HDDs will become the bottleneck with persistent loads. For example in my cluster with a 2 journals per SSD (DC S3700 100GB) a fio run with 4K blocks will quickly get the CPUs sweating, the HDDs to 100% utilization and the SSDs to about 10%. However with 4M blocks the CPUs are nearly bored, the HDDs of course at about 100% and the SSD are going up to 40% (they are approaching their throughput/bandwidth limit of 200MB/s, not IOPS). With rados bench I can push the SSDs to 70%, which is one of the reasons I postulate that HDDs (of the 7.2K RPM SATA persuasion) won't be doing much over 80MB/s in the best case scenario when being used as OSDs. Regards, Christian I got some other benchmarks question, but I will make an separate mail for them. Kind regards, Jelle de Jong -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Aug Ceph Hackathon
Hey Cephers, So it looks like we have the list of approved attendees for the Ceph Hackathon in Hilsboro, OR that Intel is being kind enough to host. http://pad.ceph.com/p/hackathon_2015-08 If you are not on that list and would like to be, please contact me as soon as possible to see if we can get you added. Thanks! -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CDS Jewel Details Posted
Hey cephers, The schedule and videoconference details have been added to the CDS Jewel page. http://tracker.ceph.com/projects/ceph/wiki/CDS_Jewel If you see any problems with my timezone math or have a scheduling conflict that wont allow you to attend your blueprint session, please let me know. We don't have a ton of options for moving things around, but we can try our best to at least get the blueprint owners to their own session. Shout if you have any questions. Thanks. -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs unmounts itself from time to time
On 15 June 2015 at 13:09, Gregory Farnum g...@gregs42.com wrote: On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler rol...@giesler.za.net wrote: I have a small cluster of 4 machines and quite a few drives. After about 2 - 3 weeks cephfs fails. It's not properly mounted anymore in /mnt/cephfs, which of course causes the VM's running to fail too. In /var/log/syslog I have /mnt/cephfs: File exists at /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52 repeatedly. There doesn't seem to be anything wrong with ceph at the time. # ceph -s cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f health HEALTH_WARN clock skew detected on mon.s1 monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312, quorum 0,1 h1,s1 mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby osdmap e5577: 19 osds: 19 up, 19 in pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects 1636 GB used, 9713 GB / 11358 GB avail 384 active+clean client io 12240 kB/s rd, 1524 B/s wr, 24 op/s # ceph osd tree # id weight type nameup/down reweight -111.13root default -2 8.14host h1 1 0.9 osd.1up1 3 0.9 osd.3up1 4 0.9 osd.4up1 5 0.68osd.5up1 6 0.68osd.6up1 7 0.68osd.7up1 8 0.68osd.8up1 9 0.68osd.9up1 10 0.68osd.10 up1 11 0.68osd.11 up1 12 0.68osd.12 up1 -3 0.45host s3 2 0.45osd.2up1 -4 0.9 host s2 13 0.9 osd.13 up1 -5 1.64host s1 14 0.29osd.14 up1 0 0.27osd.0up1 15 0.27osd.15 up1 16 0.27osd.16 up1 17 0.27osd.17 up1 18 0.27osd.18 up1 When I umount -l /mnt/cephfs and then mount -a after that, the the ceph volume is loaded again. I can restart the VM's and all seems well. I can't find errors pertaining to cephfs in the the other logs either. System information: Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux I'm not sure what version of Linux this really is (I assume it's a vendor kernel of some kind!), but it's definitely an old one! CephFS sees pretty continuous improvements to stability and it could be any number of resolved bugs. This is the stock standard installation of Proxmox with CephFS. If you can't upgrade the kernel, you might try out the ceph-fuse client instead as you can run a much newer and more up-to-date version of it, even on the old kernel. I'm under the impression that CephFS is the filesystem implimented by ceph-fuse. Is it not? Other than that, can you include more information about exactly what you mean when saying CephFS unmounts itself? Everything runs fine for weeks. Then suddenly a user reports that a VM is not functioning anymore. On investigation is transpires than CephFS is not mounted anymore and the error I reported is logged. I can't see anything else wrong at this stage. ceph is running, the osd are all up. thanks again Roland -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Unexpected disk write activity with btrfs OSDs
Hi, I've just noticed an odd behaviour with the btrfs OSDs. We monitor the amount of disk writes on each device, our granularity is 10s (every 10s the monitoring system collects the total amount of sector written and write io performed since boot and computes both the B/s and IO/s). With only residual write activity on our storage network (~450kB/s total for the whole Ceph cluster, which amounts to a theoretical ~120kB/s on each OSD once replication, double writes due to journal and number of OSD are factored in) : - Disks with btrfs OSD have a spike of activity every 30s (2 intervals of 10s with nearly 0 activity, one interval with a total amount of writes of ~120MB). The averages are : 4MB/s, 100 IO/s. - Disks with xfs OSD (with journal on a separate partition but same disk) don't have these spikes of activity and the averages are far lower : 160kB/s and 5 IO/s. This is not far off what is expected from the whole cluster write activity. There's a setting of 30s on our platform : filestore max sync interval I changed it to 60s with ceph tell osd.* injectargs '--filestore-max-sync-interval 60' and the amount of writes was lowered to ~2.5MB/s. I changed it to 5s (the default) with ceph tell osd.* injectargs '--filestore-max-sync-interval 5' the amount of writes to the device rose to an average of 10MB/s (and given our sampling interval of 10s appeared constant). During these tests the activity on disks hosting XFS OSDs didn't change much. So it seems filestore syncs generate far more activity on btrfs OSDs compared to XFS OSDs (journal activity included for both). Note that autodefrag is disabled on our btrfs OSDs. We use our own scheduler which in the case of our OSD limits the amount of defragmented data to ~10MB per minute in the worst case and usually (during low write activity which was the case here) triggers a single file defragmentation every 2 minutes (which amounts to a 4MB write as we only host RBDs with the default order value). So defragmentation shouldn't be an issue here. This doesn't seem to generate too much stress when filestore max sync interval is 30s (our btrfs OSDs are faster than xfs OSDs with the same amount of data according to apply latencies) but at 5s the btrfs OSDs are far slower than our xfs OSDs with 10x the average apply latency (we didn't let this continue more than 10 minutes as it began to make some VMs wait for IOs too much). Does anyone know if this is normal and why it is happening? Best regards, Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unexpected disk write activity with btrfs OSDs
I just realized I forgot to add a proper context : this is with Firefly 0.80.9 and the btrfs OSDs are running on kernel 4.0.5 (this was happening with previous kernel versions according to our monitoring history), xfs OSDs run on 4.0.5 or 3.18.9. There are 23 OSDs total and 2 of them are using btrfs. On 06/18/15 23:28, Lionel Bouton wrote: Hi, I've just noticed an odd behaviour with the btrfs OSDs. We monitor the amount of disk writes on each device, our granularity is 10s (every 10s the monitoring system collects the total amount of sector written and write io performed since boot and computes both the B/s and IO/s). With only residual write activity on our storage network (~450kB/s total for the whole Ceph cluster, which amounts to a theoretical ~120kB/s on each OSD once replication, double writes due to journal and number of OSD are factored in) : - Disks with btrfs OSD have a spike of activity every 30s (2 intervals of 10s with nearly 0 activity, one interval with a total amount of writes of ~120MB). The averages are : 4MB/s, 100 IO/s. - Disks with xfs OSD (with journal on a separate partition but same disk) don't have these spikes of activity and the averages are far lower : 160kB/s and 5 IO/s. This is not far off what is expected from the whole cluster write activity. There's a setting of 30s on our platform : filestore max sync interval I changed it to 60s with ceph tell osd.* injectargs '--filestore-max-sync-interval 60' and the amount of writes was lowered to ~2.5MB/s. I changed it to 5s (the default) with ceph tell osd.* injectargs '--filestore-max-sync-interval 5' the amount of writes to the device rose to an average of 10MB/s (and given our sampling interval of 10s appeared constant). During these tests the activity on disks hosting XFS OSDs didn't change much. So it seems filestore syncs generate far more activity on btrfs OSDs compared to XFS OSDs (journal activity included for both). Note that autodefrag is disabled on our btrfs OSDs. We use our own scheduler which in the case of our OSD limits the amount of defragmented data to ~10MB per minute in the worst case and usually (during low write activity which was the case here) triggers a single file defragmentation every 2 minutes (which amounts to a 4MB write as we only host RBDs with the default order value). So defragmentation shouldn't be an issue here. This doesn't seem to generate too much stress when filestore max sync interval is 30s (our btrfs OSDs are faster than xfs OSDs with the same amount of data according to apply latencies) but at 5s the btrfs OSDs are far slower than our xfs OSDs with 10x the average apply latency (we didn't let this continue more than 10 minutes as it began to make some VMs wait for IOs too much). Does anyone know if this is normal and why it is happening? Best regards, Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Very chatty MON logs: Is this normal?
On 06/17/2015 08:30 PM, Somnath Roy wrote: However, I'd rather not set the level to 0/0, as that would disable all logging from the MONs I don't think so. All the error scenarios and stack trace (in case of crash) are supposed to be logged with log level 0. But, generally, we need the highest log level (say 20) to get all the information when something to debug. So, I doubt how beneficial it will be to enable logging for some intermediate levels. Probably, there is no guideline for these log level too which developer should follow strictly. I don't think this is documented anywhere, but for a while now we've been using roughly this approach to debug levels: -1 - errors. 0 - info you really want in the log each time it happens. 1 - info that should be outputted by default should be stuff that don't happen often and is quite important to get to the logs when it happens. 5 - important but happens a bit often not to output as 1 10 - gross majority of debug messages in the monitor 20 - debug that could impact monitor performance severely (e.g., debug from inside a loop) 30 - debug that you should not need unless you're really looking for it It is fairly common a developer will ask you for 'debug mon = 10' in order to catch all debug messages at levels 5 and 10, because those are the ones that usually pay off when tracking down issues. But given this is left pretty much to the developer's criteria, different services may use different levels of verbosity for different things, and you may need a higher debug level to get info out of some parts of the code than others. In this particular case, the message that is being outputted should, imo, be on debug level 5 instead of 1. We used to output a lot of stuff on debug level 1, but have been moving away from that; there are still artifacts though, and this is one of them. Setting 'mon debug = 0/5' should be okay. Unless you see that setting '/5' impacts your performance and/or memory consumption, you should leave that be. '0/5' means 'output only debug 0 or lower to the logs; keep the last 1000 debug level 5 or lower in memory in case of a crash'. Your logs will not be as heavily populated but, if for some reason the daemon crashes, you get quite a few of debug information to help track down the source of the problem. HTH, -Joao Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Daniel Schneller Sent: Wednesday, June 17, 2015 12:11 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Very chatty MON logs: Is this normal? On 2015-06-17 18:52:51 +, Somnath Roy said: This is presently written from log level 1 onwards :-) So, only log level 0 will not log this.. Try, 'debug_mon = 0/0' in the conf file.. Yeah, once I had sent the mail I realized that 1 in the log line was the level. Had overlooked that before. However, I'd rather not set the level to 0/0, as that would disable all logging from the MONs. Now, I don't have enough knowledge on that part to say whether it is important enough to log at log level 1 , sorry :-( That would indeed be an interesting to know. Judging from the sheer amount, at least I have my doubts, because the cluster seems to be running without any issues. So I figure at least it isn't indicative of an immediate issue. Anyone with a little more definitve knowledge around? Should I create a bug ticket for this? Cheers, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Interesting postmortem on SSDs from Algolia
Oh that's very good to know. Are there details posted anywhere? Mark On 06/18/2015 02:46 AM, Dan van der Ster wrote: Thanks, that's a nice article. We're pretty happy with the SSDs he lists as Good, but note that they're not totally immune to these type of issues -- indeed we've found that bcache can crash a DC S3700, and Intel confirmed it was a firmware bug. Cheers, Dan On Wed, Jun 17, 2015 at 8:36 PM, Steve Anthony sma...@lehigh.edu wrote: There's often a great deal of discussion about which SSDs to use for journals, and why some of the cheaper SSDs end up being more expensive in the long run. The recent blog post at Algoria, though not Ceph specific, provides a good illustration of exactly how insidious kernel/SSD interactions can be. Thought the list might find it interesting. https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/ -Steve -- Steve Anthony LTS HPC Support Specialist Lehigh University sma...@lehigh.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd performance issue - can't find bottleneck
On 06/18/2015 04:49 AM, Jacek Jarosiewicz wrote: On 06/17/2015 04:19 PM, Mark Nelson wrote: SSD's are INTEL SSDSC2BW240A4 Ah, if I'm not mistaken that's the Intel 530 right? You'll want to see this thread by Stefan Priebe: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg05667.html In fact it was the difference in Intel 520 and Intel 530 performance that triggered many of the different investigations that have taken place by various folks into SSD flushing behavior on ATA_CMD_FLUSH. The gist of it is that the 520 is very fast but probably not safe. The 530 is safe but not fast. The DC S3700 (and similar drives with super capacitors) are thought to be both fast and safe (though some drives like the crucual M500 and later misrepresented their power loss protection so you have to be very careful!) Yes, these are Intel 530. I did the tests described in the thread You pasted and unfortunately that's my case... I think. The dd run locally on a mounted ssd partition looks like this: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct,dsync 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 211.698 s, 16.9 MB/s and when I skip the flag dsync it goes fast: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 9.05432 s, 396 MB/s (I used the same 350k block size as mentioned in the e-mail from the thread above) I tried disabling the dsync like this: [root@cf02 ~]# echo temporary write through /sys/class/scsi_disk/1\:0\:0\:0/cache_type [root@cf02 ~]# cat /sys/class/scsi_disk/1\:0\:0\:0/cache_type write through ..and then locally I see the speedup: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct,dsync 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 10.4624 s, 343 MB/s ..but when I test it from a client I still get slow results: root@cf03:/ceph/tmp# dd if=/dev/zero of=test bs=100M count=100 oflag=direct 100+0 records in 100+0 records out 1048576 bytes (10 GB) copied, 122.482 s, 85.6 MB/s and fio gives the same 2-3k iops. after the change to SSD cache_type I tried remounting the test image, recreating it and so on - nothing helped. I ran rbd bench-write on it, and it's not good either: root@cf03:~# rbd bench-write t2 bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern seq SEC OPS OPS/SEC BYTES/SEC 1 4221 4220.64 32195919.35 2 9628 4813.95 36286083.00 3 15288 4790.90 35714620.49 4 19610 4902.47 36626193.93 5 24844 4968.37 37296562.14 6 30488 5081.31 38112444.88 7 36152 5164.54 38601615.10 8 41479 5184.80 38860207.38 9 46971 5218.70 39181437.52 10 52219 5221.77 39322641.34 11 5 5151.36 38761566.30 12 62073 5172.71 38855021.35 13 65962 5073.95 38182880.49 14 71541 5110.02 38431536.17 15 77039 5135.85 38615125.42 16 82133 5133.31 38692578.98 17 87657 5156.24 38849948.84 18 92943 5141.03 38635464.85 19 97528 5133.03 38628548.32 20103100 5154.99 38751359.30 21108952 5188.09 38944016.94 22114511 5205.01 38999594.18 23120319 5231.17 39138227.64 24125975 5248.92 39195739.46 25131438 5257.50 39259023.06 26136883 5264.72 39344673.41 27142362 5272.66 39381638.20 elapsed:27 ops: 143789 ops/sec: 5273.01 bytes/sec: 39376124.30 rados bench gives: root@cf03:~# rados -p rbd bench 30 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes for up to 30 seconds or 0 objects Object prefix: benchmark_data_cf03_21194 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 162812 47.986348 0.779211 0.48964 2 164327 53.988660 1.17958 0.775733 3 16594357.32264 0.157145 0.798348 4 167357 56.989756 0.424493 0.862553 5 168973 58.3964 0.246444 0.893064 6 16 10488 58.656960 1.67389 0.901757 7 16 120 104 59.418664 1.78324 0.935242 8 16 132 116 57.990548 1.50035 0.963947 9 16 147 131 58.212860 1.85047 0.978697 10 16 161 145 57.990856 0.133187 0.99 11 16 174 158 57.445552 1.59548 1.02264 12 16 189 173 57.657760 0.179966 1.01623 13 16 206 190 58.452668 1.93064 1.02108 14
Re: [ceph-users] Interesting postmortem on SSDs from Algolia
Thanks, that's a nice article. We're pretty happy with the SSDs he lists as Good, but note that they're not totally immune to these type of issues -- indeed we've found that bcache can crash a DC S3700, and Intel confirmed it was a firmware bug. Cheers, Dan On Wed, Jun 17, 2015 at 8:36 PM, Steve Anthony sma...@lehigh.edu wrote: There's often a great deal of discussion about which SSDs to use for journals, and why some of the cheaper SSDs end up being more expensive in the long run. The recent blog post at Algoria, though not Ceph specific, provides a good illustration of exactly how insidious kernel/SSD interactions can be. Thought the list might find it interesting. https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/ -Steve -- Steve Anthony LTS HPC Support Specialist Lehigh University sma...@lehigh.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD Journal creation ?
All - I am building my first ceph cluster, and doing it the hard way, manually without the aid of ceph-deploy. I have successfully built the mon cluster and am now adding OSDs. My main question: How do I prepare the Journal prior to the prepare/activate stages of the OSD creation? More details: Basically - all of the documentation seems to assume the journal is prepared. Do I simply create a single raw partition on a physical device and the ceph-disk prepare... and ceph-disk activate... steps will take care of everything for the journal ... presumably based on the ceph-disk prepare ... --type filesystem setting? Or do I need to actually format it as a filesystem prior to giving it over to the Ceph OSD ??? The architecture I'm thinking of is as follows - based on the hardware I have for OSDs (currenly 9 servers each with): RAID 0 mirror for OS hard drives (2 disks) data disk for journal placement for 5 physical disks (4TB) data disk for journal placement for 5 physical disks (4TB) 10 data disks as OSDs (one OSD per disk) (4TB each) Essentially - there are 12 data disks in the node (all 4 TB 7200 rpm spinning disks). Splitting the Journal across two of them gives me a failure domain of 5 data disks + 1 journal disk in a single physical server for crush map purposes ... It also vaguely helps spread the I/O workload for the journaling activity across 2 physical disks in a chassis instead of a one (since the journal disk is pretty darn slow). In this configuration I'd create 5 separate partitions on Journal Disk A and 5 on Journal Disk B ... but do they need to be formatted and mounted? Yes, we know as we go to more real production workloads, we'll want/need to change this for performance reasons - eg the Journal on SSDs ... Any pointers on where I missed this info in the documentation would be helpful too ... I've been all over the ceph.com/docs/ site and haven't found it yet... Thanks, ~~shane ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hardware cache settings recomendation
Those are strange numbers, where are you getting them from? Test the drives directly with fio with every combination, that’s should tell you what’s happening Jan On 18 Jun 2015, at 07:52, Mateusz Skała mateusz.sk...@budikom.net wrote: Thanks for answer, I made some test, first leave dwc=enabled and caching on journal drive disabled. Latency grows from 20ms to 90ms on this drive. Next I enabled cache on journal drive and disabled all cache on data drives. Latency on data drives grows from 30 – 50ms to 1500 – 2000ms. Test made only on one osd host with P410i controller, with SATA drives ST1000LM014-1EJ1 for data and for journal SSD INTEL SSDSC2BW12. Regards, Mateusz From: Jan Schermer [mailto:j...@schermer.cz] Sent: Wednesday, June 17, 2015 9:41 AM To: Mateusz Skała Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Hardware cache settings recomendation Cache on top of the data drives (not journal) will not help in most cases, those writes are already buffered in the OS - so unless your OS is very light on memory and flushing constantly it will have no effect, it just adds overhead in case a flush comes. I haven’t tested this extensively with Ceph, though. Cache enabled on journal drive _could_ help if your SSD is very slow (or if you don’t have SSD for journal at all), and if it is large enough (more than the active journal size) it could prolong the life of your SSD - depending on how and when the cache starts to flush. I know from experience that write cache on Areca controller didn't flush at all until it hit a watermark (50% capacity default or something) and it will be faster than some SSDon their own. Some SSD have higher IOPS than the cache can achieve, but you likely won’t saturate that with Ceph. Another thing is write cache on the drives themselves - I’d leave that on disabled (which is probably the default) unless the drive in question has capacitors to flush the cache in case of power failure. Controllers usually have a whitelist of devices that respect flushes on which the write cache is default=enabled, but in case of for example Dell Perc you would need to have Dell original drives or enable it manually. YMMV - i’ve hit the controller cache IOPS limit in the past with cheap Dell Perc (H310 was it?) that did ~20K IOPS top on one SSD drive, while the drive itself did close to 40K. On my SSDs, disabling write cache helps latency (good for journal) bud could be troubling for the SSD lifetime. In any case I don’t think you would saturate either with Ceph, so I recommend you just test the latency with write cache enabled/disabled on the controller and pick the one that gives the best numbers this is basically how: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Ceph recommended way is to use everything as passthrough (initiator/target mode) or JBOD (RAID0 with single drives on some controllers), so I’d stick with that. Jan On 17 Jun 2015, at 08:01, Mateusz Skała mateusz.sk...@budikom.net wrote: Yes, all disk are in single drive raid 0. Now cache is enabled for all drives, should I disable cache for SSD drives? Regards, Mateusz From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net] Sent: Thursday, June 11, 2015 7:30 PM To: Mateusz Skała Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Hardware cache settings recomendation You want write cache to disk, no write cache for SSD. I assume all of your data disk are single drive raid 0? Tyler Bishop Chief Executive Officer 513-299-7108 x10 tyler.bis...@beyondhosting.net If you are not the intended recipient of this transmission you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. From: Mateusz Skała mateusz.sk...@budikom.net To: ceph-users@lists.ceph.com Sent: Saturday, June 6, 2015 4:09:59 AM Subject: [ceph-users] Hardware cache settings recomendation Hi, Please help me with hardware cache settings on controllers for ceph rbd best performance. All Ceph hosts have one SSD drive for journal. We are using 4 different controllers, all with BBU: • HP Smart Array P400 • HP Smart Array P410i • Dell PERC 6/i • Dell PERC H700 I have to set cache policy, on Dell settings are: • Read Policy o Read-Ahead (current) o No-Read-Ahead o Adaptive Read-Ahead • Write Policy o Write-Back (current) o Write-Through • Cache Policy o Cache I/O o Direct I/O (current) • Disk Cache Policy o Default (current) o Enabled o Disabled On HP controllers: • Cache Ratio (current: 25% Read / 75% Write) • Drive Write Cache o Enabled (current) o Disabled And there is one more setting
Re: [ceph-users] Accessing Ceph from Spark
Hi Yuan,Thanks for the answer. Our main use case is to replace AWS S3 with object storage in private cloud, very preferably with S3 compatible API. But we also know that we want to perform some machine learning and data processing by Spark in not so far future on the data residing in the object storage. The data locality feature would be very nice to have but I was not aware that this is possible with Ceph or Swift. We do not want to use HDFS, mainly because of the cost brought in by replication factor 3x and we also plan to store more/a lot of smaller files. Best regards,Milan From: dunk...@gmail.com Date: Wed, 17 Jun 2015 23:41:48 +0800 Subject: Re: [ceph-users] Accessing Ceph from Spark To: milan.sla...@outlook.com CC: ceph-users@lists.ceph.com Hi Milan, We've done some tests here and our hadoop can talk to RGW successfully with this SwiftFS plugin. But we haven't tried Spark yet. One thing is the data locality feature, it actually requires some special configuration of Swift proxy-server, so RGW is not able to archive the data locality there. Could you please kindly share some deployment consideration of running Spark on Swift/Ceph? Tachyon seems more promising... Sincerely, Yuan On Wed, Jun 17, 2015 at 9:58 PM, Milan Sladky milan.sla...@outlook.com wrote: Is it possible to access Ceph from Spark as it is mentioned here for Openstack Swift? https://spark.apache.org/docs/latest/storage-openstack-swift.html Thanks for help. Milan Sladky ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd performance issue - can't find bottleneck
On 06/17/2015 04:19 PM, Mark Nelson wrote: SSD's are INTEL SSDSC2BW240A4 Ah, if I'm not mistaken that's the Intel 530 right? You'll want to see this thread by Stefan Priebe: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg05667.html In fact it was the difference in Intel 520 and Intel 530 performance that triggered many of the different investigations that have taken place by various folks into SSD flushing behavior on ATA_CMD_FLUSH. The gist of it is that the 520 is very fast but probably not safe. The 530 is safe but not fast. The DC S3700 (and similar drives with super capacitors) are thought to be both fast and safe (though some drives like the crucual M500 and later misrepresented their power loss protection so you have to be very careful!) Yes, these are Intel 530. I did the tests described in the thread You pasted and unfortunately that's my case... I think. The dd run locally on a mounted ssd partition looks like this: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct,dsync 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 211.698 s, 16.9 MB/s and when I skip the flag dsync it goes fast: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 9.05432 s, 396 MB/s (I used the same 350k block size as mentioned in the e-mail from the thread above) I tried disabling the dsync like this: [root@cf02 ~]# echo temporary write through /sys/class/scsi_disk/1\:0\:0\:0/cache_type [root@cf02 ~]# cat /sys/class/scsi_disk/1\:0\:0\:0/cache_type write through ..and then locally I see the speedup: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct,dsync 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 10.4624 s, 343 MB/s ..but when I test it from a client I still get slow results: root@cf03:/ceph/tmp# dd if=/dev/zero of=test bs=100M count=100 oflag=direct 100+0 records in 100+0 records out 1048576 bytes (10 GB) copied, 122.482 s, 85.6 MB/s and fio gives the same 2-3k iops. after the change to SSD cache_type I tried remounting the test image, recreating it and so on - nothing helped. I ran rbd bench-write on it, and it's not good either: root@cf03:~# rbd bench-write t2 bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern seq SEC OPS OPS/SEC BYTES/SEC 1 4221 4220.64 32195919.35 2 9628 4813.95 36286083.00 3 15288 4790.90 35714620.49 4 19610 4902.47 36626193.93 5 24844 4968.37 37296562.14 6 30488 5081.31 38112444.88 7 36152 5164.54 38601615.10 8 41479 5184.80 38860207.38 9 46971 5218.70 39181437.52 10 52219 5221.77 39322641.34 11 5 5151.36 38761566.30 12 62073 5172.71 38855021.35 13 65962 5073.95 38182880.49 14 71541 5110.02 38431536.17 15 77039 5135.85 38615125.42 16 82133 5133.31 38692578.98 17 87657 5156.24 38849948.84 18 92943 5141.03 38635464.85 19 97528 5133.03 38628548.32 20103100 5154.99 38751359.30 21108952 5188.09 38944016.94 22114511 5205.01 38999594.18 23120319 5231.17 39138227.64 24125975 5248.92 39195739.46 25131438 5257.50 39259023.06 26136883 5264.72 39344673.41 27142362 5272.66 39381638.20 elapsed:27 ops: 143789 ops/sec: 5273.01 bytes/sec: 39376124.30 rados bench gives: root@cf03:~# rados -p rbd bench 30 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes for up to 30 seconds or 0 objects Object prefix: benchmark_data_cf03_21194 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 162812 47.986348 0.779211 0.48964 2 164327 53.988660 1.17958 0.775733 3 16594357.32264 0.157145 0.798348 4 167357 56.989756 0.424493 0.862553 5 168973 58.3964 0.246444 0.893064 6 16 10488 58.656960 1.67389 0.901757 7 16 120 104 59.418664 1.78324 0.935242 8 16 132 116 57.990548 1.50035 0.963947 9 16 147 131 58.212860 1.85047 0.978697 10 16 161 145 57.990856 0.133187 0.99 11 16 174 158 57.445552 1.59548 1.02264 12 16 189 173 57.657760 0.179966 1.01623 13 16 206 190 58.452668 1.93064 1.02108 14 16 221 205 58.562460 1.54504 1.02566 15
Re: [ceph-users] rbd performance issue - can't find bottleneck
Hi, for read benchmark with fio, what is the iodepth ? my fio 4k randr results with iodepth=1 : bw=6795.1KB/s, iops=1698 iodepth=2 : bw=14608KB/s, iops=3652 iodepth=4 : bw=32686KB/s, iops=8171 iodepth=8 : bw=76175KB/s, iops=19043 iodepth=16 :bw=173651KB/s, iops=43412 iodepth=32 :bw=336719KB/s, iops=84179 (This should be similar with rados bench -t (threads) option). This is normal because of network latencies + ceph latencies. Doing more parallism increase iops. (doing a bench with dd = iodepth=1) Theses result are with 1 client/rbd volume. now with more fio client (numjobs=X) I can reach up to 300kiops with 8-10 clients. This should be the same with lauching multiple rados bench in parallel (BTW, it could be great to have an option in rados bench to do it) - Mail original - De: Jacek Jarosiewicz jjarosiew...@supermedia.pl À: Mark Nelson mnel...@redhat.com, ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 18 Juin 2015 11:49:11 Objet: Re: [ceph-users] rbd performance issue - can't find bottleneck On 06/17/2015 04:19 PM, Mark Nelson wrote: SSD's are INTEL SSDSC2BW240A4 Ah, if I'm not mistaken that's the Intel 530 right? You'll want to see this thread by Stefan Priebe: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg05667.html In fact it was the difference in Intel 520 and Intel 530 performance that triggered many of the different investigations that have taken place by various folks into SSD flushing behavior on ATA_CMD_FLUSH. The gist of it is that the 520 is very fast but probably not safe. The 530 is safe but not fast. The DC S3700 (and similar drives with super capacitors) are thought to be both fast and safe (though some drives like the crucual M500 and later misrepresented their power loss protection so you have to be very careful!) Yes, these are Intel 530. I did the tests described in the thread You pasted and unfortunately that's my case... I think. The dd run locally on a mounted ssd partition looks like this: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct,dsync 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 211.698 s, 16.9 MB/s and when I skip the flag dsync it goes fast: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 9.05432 s, 396 MB/s (I used the same 350k block size as mentioned in the e-mail from the thread above) I tried disabling the dsync like this: [root@cf02 ~]# echo temporary write through /sys/class/scsi_disk/1\:0\:0\:0/cache_type [root@cf02 ~]# cat /sys/class/scsi_disk/1\:0\:0\:0/cache_type write through ..and then locally I see the speedup: [root@cf02 journal]# dd if=/dev/zero of=test bs=350k count=1 oflag=direct,dsync 1+0 records in 1+0 records out 358400 bytes (3.6 GB) copied, 10.4624 s, 343 MB/s ..but when I test it from a client I still get slow results: root@cf03:/ceph/tmp# dd if=/dev/zero of=test bs=100M count=100 oflag=direct 100+0 records in 100+0 records out 1048576 bytes (10 GB) copied, 122.482 s, 85.6 MB/s and fio gives the same 2-3k iops. after the change to SSD cache_type I tried remounting the test image, recreating it and so on - nothing helped. I ran rbd bench-write on it, and it's not good either: root@cf03:~# rbd bench-write t2 bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern seq SEC OPS OPS/SEC BYTES/SEC 1 4221 4220.64 32195919.35 2 9628 4813.95 36286083.00 3 15288 4790.90 35714620.49 4 19610 4902.47 36626193.93 5 24844 4968.37 37296562.14 6 30488 5081.31 38112444.88 7 36152 5164.54 38601615.10 8 41479 5184.80 38860207.38 9 46971 5218.70 39181437.52 10 52219 5221.77 39322641.34 11 5 5151.36 38761566.30 12 62073 5172.71 38855021.35 13 65962 5073.95 38182880.49 14 71541 5110.02 38431536.17 15 77039 5135.85 38615125.42 16 82133 5133.31 38692578.98 17 87657 5156.24 38849948.84 18 92943 5141.03 38635464.85 19 97528 5133.03 38628548.32 20 103100 5154.99 38751359.30 21 108952 5188.09 38944016.94 22 114511 5205.01 38999594.18 23 120319 5231.17 39138227.64 24 125975 5248.92 39195739.46 25 131438 5257.50 39259023.06 26 136883 5264.72 39344673.41 27 142362 5272.66 39381638.20 elapsed: 27 ops: 143789 ops/sec: 5273.01 bytes/sec: 39376124.30 rados bench gives: root@cf03:~# rados -p rbd bench 30 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes for up to 30 seconds or 0 objects Object prefix: benchmark_data_cf03_21194 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 28 12 47.9863 48 0.779211 0.48964 2 16 43 27 53.9886 60 1.17958 0.775733 3 16 59 43 57.322 64 0.157145 0.798348 4 16 73 57 56.9897 56 0.424493 0.862553 5 16 89 73 58.39 64 0.246444 0.893064 6 16 104 88 58.6569 60 1.67389 0.901757 7 16 120 104 59.4186 64 1.78324 0.935242 8 16 132 116 57.9905 48