Re: [ceph-users] New Ceph-cluster and performance "questions"

2018-02-08 Thread Christian Balzer

Hello,

On Thu, 8 Feb 2018 10:58:43 + Patrik Martinsson wrote:

> Hi Christian, 
> 
> First of all, thanks for all the great answers and sorry for the late
> reply. 
> 
You're welcome.

> 
> On Tue, 2018-02-06 at 10:47 +0900, Christian Balzer wrote:
> > Hello,
> >   
> > > I'm not a "storage-guy" so please excuse me if I'm missing /
> > > overlooking something obvious. 
> > > 
> > > My question is in the area "what kind of performance am I to expect
> > > with this setup". We have bought servers, disks and networking for
> > > our
> > > future ceph-cluster and are now in our "testing-phase" and I simply
> > > want to understand if our numbers line up, or if we are missing
> > > something obvious. 
> > >   
> > 
> > A myriad of variables will make for a myriad of results, expected and
> > otherwise.
> > 
> > For example, you say nothing about the Ceph version, how the OSDs are
> > created (filestore, bluestore, details), OS and kernel (PTI!!)
> > version.  
> 
> Good catch, I totally forgot this. 
> 
> $ > ceph version 12.2.1-40.el7cp
> (c6d85fd953226c9e8168c9abe81f499d66cc2716) luminous (stable), deployed
> via Red Hat Ceph Storage 3 (ceph-ansible). Bluestore is enabled, and
> osd_scenario is set to collocated.
> 
Given the (rather disconcerting) number of bugs in Luminous, you probably
want to go to 12.2.2 now and .3 when released.

> $ > cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.4 (Maipo)
> 
> $ > uname -r 
> 3.10.0-693.11.6.el7.x86_64 (PTI *not* disabled at boot)
> 
That's what I'd call an old kernel, if it weren't for the (insane level of)
RH backporting. 

As for PTI, I'd disable it on pure Ceph nodes, the logic being that if
somebody can access those in the first place you have bigger problems
already.

Make sure to run a test/benchmark before and after and let the community
here know.

> 
> 
> > > Background, 
> > > - cephmon1, DELL R730, 1 x E5-2643, 64 GB 
> > > - cephosd1-6, DELL R730, 1 x E5-2697, 64 GB  
> > 
> > Unless you're planning on having 16 SSDs per node, a CPU with less
> > and
> > faster cores would be better (see archives). 
> > 
> > In general, you will want to run atop or something similar on your
> > ceph
> > and client nodes during these tests to see where and if any resources
> > (CPU, DISK, NET) are getting stressed.  
> 
> Understood, thanks!
> 
> 
> 
> > > - each server is connected to a dedicated 50 Gbe network, with
> > > Mellanox-4 Lx cards (teamed into one interface, team0).  
> > > 
> > > In our test we only have one monitor. This will of course not be
> > > the
> > > case later on. 
> > > 
> > > Each OSD, has the following SSD's configured as pass-through (not
> > > raid
> > > 0 through the raid-controller),
> > > 
> > > - 2 x Dell 1.6TB 2.5" SATA MLC MU 6Gbs SSD (THNSF81D60CSE), only
> > > spec I
> > > can find on Dell's homepage says "Data Transfer Rate 600 Mbps"
> > > - 4 x Intel SSD DC S3700 (https://ark.intel.com/products/71916/Inte
> > > l-SS
> > > D-DC-S3700-Series-800GB-2_5in-SATA-6Gbs-25nm-MLC)  
> > 
> > When and where did you get those?
> > I wonder if they're available again, had 0 luck getting any last
> > year.  
> 
> It's actually disks that we have had "lying around", no clue where you
> could get them today. 
> 
Consider yourself lucky.

> 
> 
> > > - 3 HDD's, which is uninteresting here. At the moment I'm only
> > > interested in the performance of the SSD-pool.
> > > 
> > > Ceph-cluster is created with ceph-ansible with "default params"
> > > (ie.
> > > have not added / changed anything except the necessary). 
> > > 
> > > When ceph-cluster is up, we have 54 OSD's (36 SSD, 18HDD). 
> > > The min_size is 3 on the pool.   
> > 
> > Any reason for that?
> > It will make any OSD failure result in a cluster lockup with a size
> > of 3.
> > Unless you did set your size to 4, in which case you wrecked
> > performance.  
> 
> Hm, sorry, what I meant was size=3. Reading the documentation, I'm not
> sure I understand the difference between size and min_size. 
>
Check the archives for this, lots of pertinent and moderately recent
discussions about this. 3 and 2 (defaults) are fine for most people.
 
> 
> 
> 
> > > Rules are created as follows, 
> > > 
> > > $ > ceph osd crush rule create-replicated ssd-rule default host ssd
> > > $ > ceph osd crush rule create-replicated hdd-rule default host hdd
> > > 
> > > Testing is done on a separate node (same nic and network though), 
> > > 
> > > $ > ceph osd pool create ssd-bench 512 512 replicated ssd-rule
> > > 
> > > $ > ceph osd pool application enable ssd-bench rbd
> > > 
> > > $ > rbd create ssd-image --size 1T --pool ssd-pool
> > > 
> > > $ > rbd map ssd-image --pool ssd-bench
> > > 
> > > $ > mkfs.xfs /dev/rbd/ssd-bench/ssd-image
> > > 
> > > $ > mount /dev/rbd/ssd-bench/ssd-image /ssd-bench
> > >   
> > 
> > Unless you're planning on using the Ceph cluster in this fashion
> > (kernel
> > mounted images), you'd be better off testing in an environment that
> > matches the use 

[ceph-users] degraded PGs when adding OSDs

2018-02-08 Thread Simon Ironside

Hi Everyone,

I recently added an OSD to an active+clean Jewel (10.2.3) cluster and 
was surprised to see a peak of 23% objects degraded. Surely this should 
be at or near zero and the objects should show as misplaced?


I've searched and found Chad William Seys' thread from 2015 but didn't 
see any conclusion that explains this:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html

Thanks,
Simon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question about Erasure-coding clusters and resiliency

2018-02-08 Thread Tim Gipson
Hey all,

We are trying to get an erasure coding cluster up and running but we are having 
a problem getting the cluster to remain up if we lose an OSD host.  

Currently we have 6 OSD hosts with 6 OSDs a piece.  I'm trying to build an EC 
profile and a crush rule that will allow the cluster to continue running if we 
lose a host, but I seem to misunderstand how the configuration of an EC 
pool/cluster is supposed to be implemented.  I would like to be able to set 
this up to allow for 2 host failures before data loss occurs.

Here is my crush rule:

{
"rule_id": 2,
"rule_name": "EC_ENA",
"ruleset": 2,
"type": 3,
"min_size": 6,
"max_size": 8,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_indep",
"num": 4,
"type": "host"
},
{
"op": "choose_indep",
"num": 2,
"type": "osd"
},
{
"op": "emit"
}
]
}

Here is my EC profile:

crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Any direction or help would be greatly appreciated.

Thanks,

Tim Gipson
Systems Engineer

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How does cache tier work in writeback mode?

2018-02-08 Thread shadow_lin
Hi list,
I am testing cache tier in writeback mode.
The test resutl is confusing.The write performance is worse than without a 
cache tier.

The hot storage pool is an all ssd pool and the cold storage pool is an all hdd 
pool. I also created a hddpool and a ssdpool with the same crush rule as the 
cache tier pools for comparison.
The pool config:
tierOSDcap.(TB)pgOSDcap.(TB)pg
hot-pool204.81024ssd-pool204.81024
cold-pool14014002048hdd-pool14014002048


The cache tier config:
# ceph osd tier add cold-pool hot-pool
pool 'hot-pool' is now (or already was) a tier of 'cold-pool'
#
# ceph osd tier cache-mode hot-pool writeback
set cache-mode for pool 'hot-pool' to writeback
#
# ceph osd tier set-overlay cold-pool hot-pool
overlay for 'cold-pool' is now (or already was) 'hot-pool'
#
# ceph osd pool set hot-pool hit_set_type bloom
set pool 39 hit_set_type to bloom
#
# ceph osd pool set hot-pool hit_set_count 10
set pool 39 hit_set_count to 10
#
# ceph osd pool set hot-pool hit_set_period 3600
set pool 39 hit_set_period to 3600
#
# ceph osd pool set hot-pool target_max_bytes 24000
set pool 39 target_max_bytes to 24000
#
# ceph osd pool set hot-pool target_max_objects 30
set pool 39 target_max_objects to 30
#
# ceph osd pool set hot-pool cache_target_dirty_ratio 0.4
set pool 39 cache_target_dirty_ratio to 0.4
#
# ceph osd pool set hot-pool cache_target_dirty_high_ratio 0.6
set pool 39 cache_target_dirty_high_ratio to 0.6
#
# ceph osd pool set hot-pool cache_target_full_ratio 0.8
set pool 39 cache_target_full_ratio to 0.8
#
# ceph osd pool set hot-pool cache_min_flush_age 600
set pool 39 cache_min_flush_age to 600
#
# ceph osd pool set hot-pool cache_min_evict_age 1800
set pool 39 cache_min_evict_age to 1800


Write Test

cold-pool(tier)  write test for 10s
# rados bench -p cold-pool 10 write --no-cleanup
 
hdd-pool  write test for 10s
# rados bench -p hdd-pool 10 write --no-cleanup
 
ssd-pool  write test for 10s
# rados bench -p ssd-pool 10 write --no-cleanup

result:
tierhddssd
objects  695 737 2550
bandwith(MB/s) 2722891016
avg latency (s) 0.23  0.22   0.06


Read Test

# rados bench -p cold-pool 10 seq
 
# rados bench -p cold-pool 10 rand
 
# rados bench -p hdd-pool 10 seq
 
# rados bench -p hdd-pool 10 rand
 
# rados bench -p ssd-pool 10 seq
 
# rados bench -p ssd-pool 10 rand

seq result:
tierhddssd
bandwith(MB/s) 8067891113
avg latency (s) 0.0740.0790.056
 
rand result:
 
tierhddssd
bandwith(MB/s) 1106   790   1113
avg latency (s) 0.0560.0790.056



For my understanding the pool with cache tier  in writeback mode should 
performace like all ssd pool(client get ack after data write to hot storage)  
if the cache dosen't need to be flushed.
But In wirte test,the pool with cache tier has poorer performance than even all 
hdd pool.
And I inspect the pool stat to find out that there is only 244 objects in the 
hot-pool and 695 objects in the cold pool(the write test wrote 695 objects).But 
for my setting 695 objects shouldn't trigger the flush.

Is there any setting or concept I wrongly understood ?




2018-02-09



lin.yunfan___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] max number of pools per cluster

2018-02-08 Thread Jamie Fargen
Aleksei-

This won't be a ceph answer. Most virtualization platforms you will have a
type of disk called ephemeral, it is usually storage composed of disks on
the hypervisor, possibly RAID with parity, usually not backed up. You may
want to consider running your Cassandra instances on the ephemeral storage,
this would alleviate the data redundancy at the application and storage
level for the Cassandra service. Then keep backups of your Cassandra db on
the Ceph storage. There are some benefits and drawbacks, the main benefit
will probably be a latency decrease. You will need to evaluate the
hypervisors you are running on, disk layout, etc.

-Jamie

On Thu, Feb 8, 2018 at 9:36 AM, Aleksei Gutikov 
wrote:

>
> Hi all.
>
> We use RBDs as storage of data for applications.
> If application itself can do replication (for example Cassandra),
> we want to get profit (HA) from replication on app level.
> But we can't if all RBDs are in same pool.
> If all RBDs are in same pool - then all rbds are tied up with one set of
> PGs.
> And if for any reason even single PG was damaged and for example stuck
> inactive - then all RBDs will be affected.
>
> First that come to mind is to create a separate pool for every RBD.
>
> I'm aware of max number of PGs per OSD and about osd_pool_default_pg_num
> that should be reasonable.
> So max number of pools == osds_num * pgs_per_osd / min_pool_pgs.
> For example 1000 osds * 300 pg per osd / 32 pgs per pool = 9375.
> If osd size 1T then average RBD size will be 100G (looks sane).
>
> So my question is: is there any theoretical limit of pools per cluster?
> And, maybe, what it depends on?
>
> Thanks.
>
>
> --
>
> Best regards,
> Aleksei Gutikov
> Software Engineer | synesis.ru | Minsk. BY
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jamie Fargen
Consultant
jfar...@redhat.com
813-817-4430
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to activate OSD's

2018-02-08 Thread Cranage, Steve
The only clue I have run across so far is that the osd daemons ceph-deploy 
attempts to create on the failing OSD server (osd3) are two of the same 
osd-id's just created on the last osd server deployed (osd2). So from the osd 
tree listing - osd1 has osd.0, osd.1, osd.2 and osd.3. The next server, osd2 
has the next 4 in the correct order osd.4, osd.5, osd.6, and osd.7. The failing 
osd server should have started with osd.8 through osd.11, instead it is reusing 
osd.5 and osd.6. These are also the only log files in var/log/ceph on osd3 
server which contain only the following entry repeated over and over again:

2018-02-07 08:09:33.077286 7f264e6a8800  0 set uid:gid to 167:167 (ceph:ceph)
2018-02-07 08:09:33.077321 7f264e6a8800  0 ceph version 10.2.10 
(5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe), process ceph-osd, pid 4923
2018-02-07 08:09:33.077572 7f264e6a8800 -1  ** ERROR: unable to open OSD 
superblock on /var/lib/ceph/osd/ceph-5: (2) No such file or directory


The outputs from list and osd tree follow:

[osd3][DEBUG ] connected to host: osd3
[osd3][DEBUG ] detect platform information from remote host
[osd3][DEBUG ] detect machine type
[osd3][DEBUG ] find the location of an executable
[osd3][INFO  ] Running command: /usr/sbin/ceph-disk list
[osd3][INFO  ] 
[osd3][INFO  ] ceph-5
[osd3][INFO  ] 
[osd3][INFO  ] Path   /var/lib/ceph/osd/ceph-5
[osd3][INFO  ] ID 5
[osd3][INFO  ] Name   osd.5
[osd3][INFO  ] Status up
[osd3][INFO  ] Reweight   1.0
[osd3][INFO  ] 
[osd3][INFO  ] 
[osd3][INFO  ] ceph-6
[osd3][INFO  ] 
[osd3][INFO  ] Path   /var/lib/ceph/osd/ceph-6
[osd3][INFO  ] ID 6
[osd3][INFO  ] Name   osd.6
[osd3][INFO  ] Status up
[osd3][INFO  ] Reweight   1.0
[osd3][INFO  ] 


[cephuser@groot cephcluster]$ sudo ceph osd tree
ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 1.06311 root default
-2 0.53156 host osd1
 0 0.13289 osd.0  up  1.0  1.0
 1 0.13289 osd.1  up  1.0  1.0
 2 0.13289 osd.2  up  1.0  1.0
 3 0.13289 osd.3  up  1.0  1.0
-3 0.53156 host osd2
 4 0.13289 osd.4  up  1.0  1.0
 5 0.13289 osd.5  up  1.0  1.0
 6 0.13289 osd.6  up  1.0  1.0
 7 0.13289 osd.7  up  1.0  1.0
[cephuser@groot cephcluster]$


From: ceph-users  on behalf of Андрей 

Sent: Thursday, February 8, 2018 6:40:16 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Unable to activate OSD's


I have the same problem.
Configuration:
4 HW servers Debian GNU/Linux 9.3 (stretch)
Ceph luminous 12.2.2

Now I installed on these servers ceph version 10.2.10, OSDs activate is fine.



Среда, 7 февраля 2018, 19:54 +03:00 от "Cranage, Steve" 
:


Greetings ceph-users. I have been trying to build a test cluster in a KVM 
environment - something I have done before successfully before but this time 
I'm running into an issue I can't seem to get past. My Internet searches have 
shown instances of this by other users that involved either ownership problems 
with the OSD devices, or partition UID's needing to be set. Neither of these 
problems seem to be in play here.


The cluster is on centos 7, running Ceph 10.2.10. I have configured one mon, 
and 3 OSD servers with 4 disks each, and each is set to journal on a separate 
partition of an SSD, one SSD per VM. I have built this VM environment several 
times now, and recently I always have the same issue on at least one of my VM 
OSD's and I cannot seem to get any hints of where the problem lies from the 
sparse information printed to the console during the failure.


In addition to setting partition ownerships to ceph:ceph and UIDs to one of the 
the values  "set_data_partition" says it expects, I also zeroed out the entire 
contents of both drives and re-partioned, but I still get the same results. The 
problem at present only occurs on one virtual server, the other 8 drives split 
between the other 2 VM OSD's had no issue with prepare or activate. I see no 
difference between this server or drive configuration vs the other two that run 
fine.


Hopefully someone can at least point me to some more fruitful log information, 
"Failed to activate" isn't very helpful by itself. There is nothing in messages 
other than clean mount/unmount messages for the OSD data device being processed 
(in this case /dev/vdb1). BTW, I have also tried to repeat the same process 
without a separate journal device ( just using 

[ceph-users] max number of pools per cluster

2018-02-08 Thread Aleksei Gutikov


Hi all.

We use RBDs as storage of data for applications.
If application itself can do replication (for example Cassandra),
we want to get profit (HA) from replication on app level.
But we can't if all RBDs are in same pool.
If all RBDs are in same pool - then all rbds are tied up with one set of 
PGs.
And if for any reason even single PG was damaged and for example stuck 
inactive - then all RBDs will be affected.


First that come to mind is to create a separate pool for every RBD.

I'm aware of max number of PGs per OSD and about osd_pool_default_pg_num
that should be reasonable.
So max number of pools == osds_num * pgs_per_osd / min_pool_pgs.
For example 1000 osds * 300 pg per osd / 32 pgs per pool = 9375.
If osd size 1T then average RBD size will be 100G (looks sane).

So my question is: is there any theoretical limit of pools per cluster?
And, maybe, what it depends on?

Thanks.


--

Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to activate OSD's

2018-02-08 Thread Андрей

I have the same problem.
Configuration:
4 HW servers Debian GNU/Linux 9.3 (stretch) 
Ceph  luminous 12.2.2

Now I installed on these servers ceph version 10.2.10, OSDs activate is fine.


>Среда,  7 февраля 2018, 19:54 +03:00 от "Cranage, Steve" 
>:
>
>Greetings ceph-users. I have been trying to build a test cluster in a KVM 
>environment - something I have done before successfully before but this time 
>I'm running into an issue I can't seem to get past. My Internet searches
> have shown instances of this by other users that involved either ownership 
> problems with the OSD devices, or partition UID's needing to be set. Neither 
> of these problems seem to be in play here.
>
>The cluster is on centos 7, running Ceph 10.2.10. I have configured one mon, 
>and 3 OSD servers with 4 disks each, and each is set to journal on a separate 
>partition of an SSD, one SSD per VM. I have built this VM environment
> several times now, and recently I always have the same issue on at least one 
> of my VM OSD's and I cannot seem to get any hints of where the problem lies 
> from the sparse information printed to the console during the failure.
>
>In addition to setting partition ownerships to ceph:ceph and UIDs to one of 
>the the values  "set_data_partition " says it expects, I also zeroed out the 
>entire contents of both drives and re-partioned, but I still get the same 
>results. The problem at present only occurs on one virtual server, the other 8 
>drives split between
> the other 2 VM OSD's had no issue with prepare or activate. I see no 
> difference between this server or drive configuration vs the other two that 
> run fine.
>
>Hopefully someone can at least point me to some more fruitful log information, 
>"Failed to activate" isn't very helpful by itself. There is nothing in 
>messages other than clean mount/unmount messages for the OSD data device
> being processed (in this case /dev/vdb1). BTW, I have also tried to repeat 
> the same process without a separate journal device ( just using 
> prepare/activate osd3:/dev/vdb1) and I got the same  "Failed to activate" 
> result.
>
>
>[cephuser@groot cephcluster]$ ceph-deploy osd prepare osd3:/dev/vdb1:/dev/vdf1
>[ceph_deploy.conf][DEBUG ] found configuration file at: 
>/home/cephuser/.cephdeploy.conf
>[ceph_deploy.cli][INFO  ] Invoked (1.5.39): /bin/ceph-deploy osd prepare 
>osd3:/dev/vdb1:/dev/vdf1
>[ceph_deploy.cli][INFO  ] ceph-deploy options:
>[ceph_deploy.cli][INFO  ]  username  : None
>[ceph_deploy.cli][INFO  ]  block_db  : None
>[ceph_deploy.cli][INFO  ]  disk  : [('osd3', 
>'/dev/vdb1', '/dev/vdf1')]
>[ceph_deploy.cli][INFO  ]  dmcrypt   : False
>[ceph_deploy.cli][INFO  ]  verbose   : False
>[ceph_deploy.cli][INFO  ]  bluestore : None
>[ceph_deploy.cli][INFO  ]  block_wal : None
>[ceph_deploy.cli][INFO  ]  overwrite_conf    : False
>[ceph_deploy.cli][INFO  ]  subcommand    : prepare
>[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   : 
>/etc/ceph/dmcrypt-keys
>[ceph_deploy.cli][INFO  ]  quiet : False
>[ceph_deploy.cli][INFO  ]  cd_conf   : 
>
>[ceph_deploy.cli][INFO  ]  cluster   : ceph
>[ceph_deploy.cli][INFO  ]  fs_type   : xfs
>[ceph_deploy.cli][INFO  ]  filestore : None
>[ceph_deploy.cli][INFO  ]  func  : 0x2a6f1b8>
>[ceph_deploy.cli][INFO  ]  ceph_conf : None
>[ceph_deploy.cli][INFO  ]  default_release   : False
>[ceph_deploy.cli][INFO  ]  zap_disk  : False
>[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks osd3:/dev/vdb1:/dev/vdf1
>[osd3][DEBUG ] connection detected need for sudo
>[osd3][DEBUG ] connected to host: osd3 
>[osd3][DEBUG ] detect platform information from remote host
>[osd3][DEBUG ] detect machine type
>[osd3][DEBUG ] find the location of an executable
>[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.4.1708 Core
>[ceph_deploy.osd][DEBUG ] Deploying osd to osd3
>[osd3][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>[ceph_deploy.osd][DEBUG ] Preparing host osd3 disk /dev/vdb1 journal /dev/vdf1 
>activate False
>[osd3][DEBUG ] find the location of an executable
>[osd3][INFO  ] Running command: sudo /usr/sbin/ceph-disk -v prepare --cluster 
>ceph --fs-type xfs -- /dev/vdb1 /dev/vdf1
>[osd3][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
>--show-config-value=fsid
>[osd3][WARNIN] command: Running command: /usr/bin/ceph-osd 
>--check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log 
>--cluster ceph --setuser ceph --setgroup ceph
>[osd3][WARNIN] command: Running command: /usr/bin/ceph-osd 
>--check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log 
>--cluster ceph --setuser ceph 

Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-08 Thread Ilya Dryomov
On Thu, Feb 8, 2018 at 12:54 PM, Kevin Olbrich  wrote:
> 2018-02-08 11:20 GMT+01:00 Martin Emrich :
>>
>> I have a machine here mounting a Ceph RBD from luminous 12.2.2 locally,
>> running linux-generic-hwe-16.04 (4.13.0-32-generic).
>>
>> Works fine, except that it does not support the latest features: I had to
>> disable exclusive-lock,fast-diff,object-map,deep-flatten on the image.
>> Otherwise it runs well.
>
>
> I always thought that the latest features are built into newer kernels, are
> they available on non-HWE 4.4, HWE 4.8 or HWE 4.10?

No, some of these features haven't made it to the kernel client yet.

> Also I am researching for the OSD server side.

For the OSDs, you should be fine with pretty much any kernel supported
by your distro.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-08 Thread Kevin Olbrich
2018-02-08 11:20 GMT+01:00 Martin Emrich :

> I have a machine here mounting a Ceph RBD from luminous 12.2.2 locally,
> running linux-generic-hwe-16.04 (4.13.0-32-generic).
>
> Works fine, except that it does not support the latest features: I had to
> disable exclusive-lock,fast-diff,object-map,deep-flatten on the image.
> Otherwise it runs well.
>

I always thought that the latest features are built into newer kernels, are
they available on non-HWE 4.4, HWE 4.8 or HWE 4.10?
Also I am researching for the OSD server side.

- Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-08 Thread Martin Emrich

Am 08.02.18 um 11:50 schrieb Ilya Dryomov:

On Thu, Feb 8, 2018 at 11:20 AM, Martin Emrich
 wrote:

I have a machine here mounting a Ceph RBD from luminous 12.2.2 locally,
running linux-generic-hwe-16.04 (4.13.0-32-generic).

Works fine, except that it does not support the latest features: I had to
disable exclusive-lock,fast-diff,object-map,deep-flatten on the image.
Otherwise it runs well.

That kernel should support exclusive-lock.  It doesn't hurt to disable
exclusive-lock if you don't need it though.


Thanks, good to know... But indeed I do not need it in this case.

Cheers,

Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New Ceph-cluster and performance "questions"

2018-02-08 Thread Patrik Martinsson
Hi Christian, 

First of all, thanks for all the great answers and sorry for the late
reply. 


On Tue, 2018-02-06 at 10:47 +0900, Christian Balzer wrote:
> Hello,
> 
> > I'm not a "storage-guy" so please excuse me if I'm missing /
> > overlooking something obvious. 
> > 
> > My question is in the area "what kind of performance am I to expect
> > with this setup". We have bought servers, disks and networking for
> > our
> > future ceph-cluster and are now in our "testing-phase" and I simply
> > want to understand if our numbers line up, or if we are missing
> > something obvious. 
> > 
> 
> A myriad of variables will make for a myriad of results, expected and
> otherwise.
> 
> For example, you say nothing about the Ceph version, how the OSDs are
> created (filestore, bluestore, details), OS and kernel (PTI!!)
> version.

Good catch, I totally forgot this. 

$ > ceph version 12.2.1-40.el7cp
(c6d85fd953226c9e8168c9abe81f499d66cc2716) luminous (stable), deployed
via Red Hat Ceph Storage 3 (ceph-ansible). Bluestore is enabled, and
osd_scenario is set to collocated.

$ > cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.4 (Maipo)

$ > uname -r 
3.10.0-693.11.6.el7.x86_64 (PTI *not* disabled at boot)



> > Background, 
> > - cephmon1, DELL R730, 1 x E5-2643, 64 GB 
> > - cephosd1-6, DELL R730, 1 x E5-2697, 64 GB
> 
> Unless you're planning on having 16 SSDs per node, a CPU with less
> and
> faster cores would be better (see archives). 
> 
> In general, you will want to run atop or something similar on your
> ceph
> and client nodes during these tests to see where and if any resources
> (CPU, DISK, NET) are getting stressed.

Understood, thanks!



> > - each server is connected to a dedicated 50 Gbe network, with
> > Mellanox-4 Lx cards (teamed into one interface, team0).  
> > 
> > In our test we only have one monitor. This will of course not be
> > the
> > case later on. 
> > 
> > Each OSD, has the following SSD's configured as pass-through (not
> > raid
> > 0 through the raid-controller),
> > 
> > - 2 x Dell 1.6TB 2.5" SATA MLC MU 6Gbs SSD (THNSF81D60CSE), only
> > spec I
> > can find on Dell's homepage says "Data Transfer Rate 600 Mbps"
> > - 4 x Intel SSD DC S3700 (https://ark.intel.com/products/71916/Inte
> > l-SS
> > D-DC-S3700-Series-800GB-2_5in-SATA-6Gbs-25nm-MLC)
> 
> When and where did you get those?
> I wonder if they're available again, had 0 luck getting any last
> year.

It's actually disks that we have had "lying around", no clue where you
could get them today. 



> > - 3 HDD's, which is uninteresting here. At the moment I'm only
> > interested in the performance of the SSD-pool.
> > 
> > Ceph-cluster is created with ceph-ansible with "default params"
> > (ie.
> > have not added / changed anything except the necessary). 
> > 
> > When ceph-cluster is up, we have 54 OSD's (36 SSD, 18HDD). 
> > The min_size is 3 on the pool. 
> 
> Any reason for that?
> It will make any OSD failure result in a cluster lockup with a size
> of 3.
> Unless you did set your size to 4, in which case you wrecked
> performance.

Hm, sorry, what I meant was size=3. Reading the documentation, I'm not
sure I understand the difference between size and min_size. 




> > Rules are created as follows, 
> > 
> > $ > ceph osd crush rule create-replicated ssd-rule default host ssd
> > $ > ceph osd crush rule create-replicated hdd-rule default host hdd
> > 
> > Testing is done on a separate node (same nic and network though), 
> > 
> > $ > ceph osd pool create ssd-bench 512 512 replicated ssd-rule
> > 
> > $ > ceph osd pool application enable ssd-bench rbd
> > 
> > $ > rbd create ssd-image --size 1T --pool ssd-pool
> > 
> > $ > rbd map ssd-image --pool ssd-bench
> > 
> > $ > mkfs.xfs /dev/rbd/ssd-bench/ssd-image
> > 
> > $ > mount /dev/rbd/ssd-bench/ssd-image /ssd-bench
> > 
> 
> Unless you're planning on using the Ceph cluster in this fashion
> (kernel
> mounted images), you'd be better off testing in an environment that
> matches the use case, i.e. from a VM.

Gotcha, thanks!



> > Fio is then run like this, 
> > $ > 
> > actions="read randread write randwrite"
> > blocksizes="4k 128k 8m"
> > tmp_dir="/tmp/"
> > 
> > for blocksize in ${blocksizes}; do
> >   for action in ${actions}; do
> > rm -f ${tmp_dir}${action}_${blocksize}_${suffix}
> > fio --directory=/ssd-bench \
> > --time_based \ 
> > --direct=1 \
> > --rw=${action} \
> > --bs=$blocksize \
> > --size=1G \
> > --numjobs=100 \
> > --runtime=120 \
> > --group_reporting \
> > --name=testfile \
> > --output=${tmp_dir}${action}_${blocksize}_${suffix}
> >   done
> > done
> > 
> > After running this, we end up with these numbers 
> > 
> > read_4k iops : 159266 throughput : 622MB / sec
> > randread_4k iops : 151887 throughput : 593MB / sec
> > 
> 
> These are very nice numbers. 
> Too nice, in my book.
> I have a test cluster with a cache-tier 

Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-08 Thread Ilya Dryomov
On Thu, Feb 8, 2018 at 11:20 AM, Martin Emrich
 wrote:
> I have a machine here mounting a Ceph RBD from luminous 12.2.2 locally,
> running linux-generic-hwe-16.04 (4.13.0-32-generic).
>
> Works fine, except that it does not support the latest features: I had to
> disable exclusive-lock,fast-diff,object-map,deep-flatten on the image.
> Otherwise it runs well.

That kernel should support exclusive-lock.  It doesn't hurt to disable
exclusive-lock if you don't need it though.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Day Germany :)

2018-02-08 Thread Martin Emrich

Hi!

I just want to thank all organizers and speakers for the awesome Ceph 
Day at Darmstadt, Germany yesterday.


I learned of some cool stuff I'm eager to try out (NFS-Ganesha for RGW, 
openATTIC,...), Organization and food were great, too.


Cheers,

Martin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-08 Thread Martin Emrich
I have a machine here mounting a Ceph RBD from luminous 12.2.2 locally, 
running linux-generic-hwe-16.04 (4.13.0-32-generic).


Works fine, except that it does not support the latest features: I had 
to disable exclusive-lock,fast-diff,object-map,deep-flatten on the 
image. Otherwise it runs well.


Regards,

Martin


Am 07.02.18 um 22:28 schrieb Kevin Olbrich:

Would be interested as well.

- Kevin

2018-02-04 19:00 GMT+01:00 Yoann Moulin >:


Hello,

What is the best kernel for Luminous on Ubuntu 16.04 ?

Is linux-image-virtual-lts-xenial still the best one ? Or
linux-virtual-hwe-16.04 will offer some improvement ?

Thanks,

--
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] best way to use rbd device in (libvirt/qemu)

2018-02-08 Thread Marc Roos

Afaik you can configure the use of an rbd device like below. 

- Am I correct in assuming that the first one is not recommended because 
it can use some caching? (I thought I noticed a speed difference between 
these 2, and assumed it was related to caching). 
- I guess in both cases the kernel module is used and same libraries are 
used?
- Are there any other (dis)advantages?

1. via mapped rbd device

  
  
  
  
  


2. direct 

  
  

  
  



  
  
  
  




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-02-08 Thread Mike O'Connor
On 7/02/2018 8:23 AM, Kyle Hutson wrote:
> We had a 26-node production ceph cluster which we upgraded to Luminous
> a little over a month ago. I added a 27th-node with Bluestore and
> didn't have any issues, so I began converting the others, one at a
> time. The first two went off pretty smoothly, but the 3rd is doing
> something strange.
>
> Initially, all the OSDs came up fine, but then some started to
> segfault. Out of curiosity more than anything else, I did reboot the
> server to see if it would get better or worse, and it pretty much
> stayed the same - 12 of the 18 OSDs did not properly come up. Of
> those, 3 again segfaulted
>
> I picked one that didn't properly come up and copied the log to where
> anybody can view it:
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
> 
>
> You can contrast that with one that is up:
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
> 
>
> (which is still showing segfaults in the logs, but seems to be
> recovering from them OK?)
>
> Any ideas?
Ideas ? yes

There is a a bug which is hitting a small number of systems and at this
time there is no solution. Issues details at
http://tracker.ceph.com/issues/22102.

Please submit more details of your problem on the ticket.

Mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com