Re: [ceph-users] mount failed since failed to load ceph kernel module

2017-11-13 Thread Dai Xiang
On Tue, Nov 14, 2017 at 02:24:06AM +, Linh Vu wrote:
> Your kernel is way too old for CephFS Luminous. I'd use one of the newer 
> kernels from elrepo.org. :) We're on 4.12 here on RHEL 7.4.

I had updated kernel version to newest:
[root@d32f3a7b6eb8 ~]$ uname -a
Linux d32f3a7b6eb8 4.14.0-1.el7.elrepo.x86_64 #1 SMP Sun Nov 12 20:21:04 EST 
2017 x86_64 x86_64 x86_64 GNU/Linux
[root@d32f3a7b6eb8 ~]$ cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 

But still failed:
[root@d32f3a7b6eb8 ~]$ /bin/mount 172.17.0.4,172.17.0.5:/ /cephfs -t ceph -o 
name=admin,secretfile=/etc/ceph/admin.secret -v
failed to load ceph kernel module (1)
parsing options: rw,name=admin,secretfile=/etc/ceph/admin.secret
mount error 2 = No such file or directory
[root@d32f3a7b6eb8 ~]$ ll /cephfs
total 0

[root@d32f3a7b6eb8 ~]$ ceph -s
  cluster:
id: a5f1d744-35eb-4e1b-a7c7-cb9871ec559d
health: HEALTH_WARN
Reduced data availability: 128 pgs inactive
Degraded data redundancy: 128 pgs unclean
 
  services:
mon: 2 daemons, quorum d32f3a7b6eb8,1d22f2d81028
mgr: d32f3a7b6eb8(active), standbys: 1d22f2d81028
mds: cephfs-1/1/1 up  {0=1d22f2d81028=up:creating}, 1 up:standby
osd: 0 osds: 0 up, 0 in
 
  data:
pools:   2 pools, 128 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
 128 unknown

[root@d32f3a7b6eb8 ~]$ lsmod | grep ceph
ceph  372736  0 
libceph   315392  1 ceph
fscache65536  3 ceph,nfsv4,nfs
libcrc32c  16384  5 
libceph,nf_conntrack,xfs,dm_persistent_data,nf_nat


-- 
Best Regards
Dai Xiang
> 
> 
> Hi!
> 
> I got a confused issue in docker as below:
> 
> After install ceph successfully, i want to mount cephfs but failed:
> 
> [root@dbffa72704e4 ~]$ /bin/mount 172.17.0.4:/ /cephfs 
> -t ceph -o name=admin,secretfile=/etc/ceph/admin.secret -v
> failed to load ceph kernel module (1)
> parsing options: rw,name=admin,secretfile=/etc/ceph/admin.secret
> mount error 5 = Input/output error
> 
> But ceph related kernel modules have existed:
> 
> [root@dbffa72704e4 ~]$ lsmod | grep ceph
> ceph  327687  0
> libceph   287066  1 ceph
> dns_resolver   13140  2 nfsv4,libceph
> libcrc32c  12644  3 xfs,libceph,dm_persistent_data
> 
> Check the ceph state(i only set data disk for osd):
> 
> [root@dbffa72704e4 ~]$ ceph -s
>   cluster:
> id: 20f51975-303e-446f-903f-04e1feaff7d0
> health: HEALTH_WARN
> Reduced data availability: 128 pgs inactive
> Degraded data redundancy: 128 pgs unclean
> 
>   services:
> mon: 2 daemons, quorum dbffa72704e4,5807d12f920e
> mgr: dbffa72704e4(active), standbys: 5807d12f920e
> mds: cephfs-1/1/1 up  {0=5807d12f920e=up:creating}, 1 up:standby
> osd: 0 osds: 0 up, 0 in
> 
>   data:
> pools:   2 pools, 128 pgs
> objects: 0 objects, 0 bytes
> usage:   0 kB used, 0 kB / 0 kB avail
> pgs: 100.000% pgs unknown
>  128 unknown
> 
> [root@dbffa72704e4 ~]$ ceph version
> ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous 
> (stable)
> 
> My container is based on centos:centos7.2.1511, kernel is 3e0728877e22 
> 3.10.0-514.el7.x86_64.
> 
> I saw some ceph related images on docker hub so that i think above
> operation is ok, did i miss something important?
> 
> --
> Best Regards
> Dai Xiang

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] features required for live migration

2017-11-13 Thread Konstantin Shalygin

I'd like to use the live migration feature of KVM. In this scenario, what
features may be enabled in the rbd base image? and in my EV (snapshot
clone)?


You can use live migration without features. For KVM I can recommend 
minimal "rbd default features = 3" (layering, striping).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount failed since failed to load ceph kernel module

2017-11-13 Thread Dai Xiang
On Tue, Nov 14, 2017 at 02:24:06AM +, Linh Vu wrote:
> Your kernel is way too old for CephFS Luminous. I'd use one of the newer 
> kernels from elrepo.org. :) We're on 4.12 here on RHEL 7.4.

There is still a question:
why on my host(3.10.0-327.el7.x86_64) cephfs can mount and load kernel
module normally?
Does it mean the docker's kernel must be above 4.12.* to enable
cephfs?

-- 
Best Regards
Dai Xiang
> 
> 
> From: ceph-users  on behalf of 
> xiang@sky-data.cn 
> Sent: Tuesday, 14 November 2017 1:13:47 PM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] mount failed since failed to load ceph kernel module
> 
> Hi!
> 
> I got a confused issue in docker as below:
> 
> After install ceph successfully, i want to mount cephfs but failed:
> 
> [root@dbffa72704e4 ~]$ /bin/mount 172.17.0.4:/ /cephfs 
> -t ceph -o name=admin,secretfile=/etc/ceph/admin.secret -v
> failed to load ceph kernel module (1)
> parsing options: rw,name=admin,secretfile=/etc/ceph/admin.secret
> mount error 5 = Input/output error
> 
> But ceph related kernel modules have existed:
> 
> [root@dbffa72704e4 ~]$ lsmod | grep ceph
> ceph  327687  0
> libceph   287066  1 ceph
> dns_resolver   13140  2 nfsv4,libceph
> libcrc32c  12644  3 xfs,libceph,dm_persistent_data
> 
> Check the ceph state(i only set data disk for osd):
> 
> [root@dbffa72704e4 ~]$ ceph -s
>   cluster:
> id: 20f51975-303e-446f-903f-04e1feaff7d0
> health: HEALTH_WARN
> Reduced data availability: 128 pgs inactive
> Degraded data redundancy: 128 pgs unclean
> 
>   services:
> mon: 2 daemons, quorum dbffa72704e4,5807d12f920e
> mgr: dbffa72704e4(active), standbys: 5807d12f920e
> mds: cephfs-1/1/1 up  {0=5807d12f920e=up:creating}, 1 up:standby
> osd: 0 osds: 0 up, 0 in
> 
>   data:
> pools:   2 pools, 128 pgs
> objects: 0 objects, 0 bytes
> usage:   0 kB used, 0 kB / 0 kB avail
> pgs: 100.000% pgs unknown
>  128 unknown
> 
> [root@dbffa72704e4 ~]$ ceph version
> ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous 
> (stable)
> 
> My container is based on centos:centos7.2.1511, kernel is 3e0728877e22 
> 3.10.0-514.el7.x86_64.
> 
> I saw some ceph related images on docker hub so that i think above
> operation is ok, did i miss something important?
> 
> --
> Best Regards
> Dai Xiang

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rocksdb: Corruption: missing start of fragmented record

2017-11-13 Thread Konstantin Shalygin

Which isn't released yet, yes. I could try building the development
repository if you think that has a chance of resolving the issue?

For tests - yes...
This  ML tells 
that 12.2.2 should be based on commit 
1071fdcf73faa387d0df18489ab7b0359a0c0afb.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount failed since failed to load ceph kernel module

2017-11-13 Thread Linh Vu
Your kernel is way too old for CephFS Luminous. I'd use one of the newer 
kernels from elrepo.org. :) We're on 4.12 here on RHEL 7.4.


From: ceph-users  on behalf of 
xiang@sky-data.cn 
Sent: Tuesday, 14 November 2017 1:13:47 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] mount failed since failed to load ceph kernel module

Hi!

I got a confused issue in docker as below:

After install ceph successfully, i want to mount cephfs but failed:

[root@dbffa72704e4 ~]$ /bin/mount 172.17.0.4:/ /cephfs -t 
ceph -o name=admin,secretfile=/etc/ceph/admin.secret -v
failed to load ceph kernel module (1)
parsing options: rw,name=admin,secretfile=/etc/ceph/admin.secret
mount error 5 = Input/output error

But ceph related kernel modules have existed:

[root@dbffa72704e4 ~]$ lsmod | grep ceph
ceph  327687  0
libceph   287066  1 ceph
dns_resolver   13140  2 nfsv4,libceph
libcrc32c  12644  3 xfs,libceph,dm_persistent_data

Check the ceph state(i only set data disk for osd):

[root@dbffa72704e4 ~]$ ceph -s
  cluster:
id: 20f51975-303e-446f-903f-04e1feaff7d0
health: HEALTH_WARN
Reduced data availability: 128 pgs inactive
Degraded data redundancy: 128 pgs unclean

  services:
mon: 2 daemons, quorum dbffa72704e4,5807d12f920e
mgr: dbffa72704e4(active), standbys: 5807d12f920e
mds: cephfs-1/1/1 up  {0=5807d12f920e=up:creating}, 1 up:standby
osd: 0 osds: 0 up, 0 in

  data:
pools:   2 pools, 128 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
 128 unknown

[root@dbffa72704e4 ~]$ ceph version
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)

My container is based on centos:centos7.2.1511, kernel is 3e0728877e22 
3.10.0-514.el7.x86_64.

I saw some ceph related images on docker hub so that i think above
operation is ok, did i miss something important?

--
Best Regards
Dai Xiang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mount failed since failed to load ceph kernel module

2017-11-13 Thread xiang....@sky-data.cn
Hi! 

I got a confused issue in docker as below: 

After install ceph successfully, i want to mount cephfs but failed: 

[root@dbffa72704e4 ~]$ /bin/mount 172.17.0.4:/ /cephfs -t ceph -o 
name=admin,secretfile=/etc/ceph/admin.secret -v 
failed to load ceph kernel module (1) 
parsing options: rw,name=admin,secretfile=/etc/ceph/admin.secret 
mount error 5 = Input/output error 

But ceph related kernel modules have existed: 

[root@dbffa72704e4 ~]$ lsmod | grep ceph 
ceph 327687 0 
libceph 287066 1 ceph 
dns_resolver 13140 2 nfsv4,libceph 
libcrc32c 12644 3 xfs,libceph,dm_persistent_data 

Check the ceph state(i only set data disk for osd): 

[root@dbffa72704e4 ~]$ ceph -s 
cluster: 
id: 20f51975-303e-446f-903f-04e1feaff7d0 
health: HEALTH_WARN 
Reduced data availability: 128 pgs inactive 
Degraded data redundancy: 128 pgs unclean 

services: 
mon: 2 daemons, quorum dbffa72704e4,5807d12f920e 
mgr: dbffa72704e4(active), standbys: 5807d12f920e 
mds: cephfs-1/1/1 up {0=5807d12f920e=up:creating}, 1 up:standby 
osd: 0 osds: 0 up, 0 in 

data: 
pools: 2 pools, 128 pgs 
objects: 0 objects, 0 bytes 
usage: 0 kB used, 0 kB / 0 kB avail 
pgs: 100.000% pgs unknown 
128 unknown 

[root@dbffa72704e4 ~]$ ceph version 
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous 
(stable) 

My container is based on centos:centos7.2.1511, kernel is 3e0728877e22 
3.10.0-514.el7.x86_64. 

I saw some ceph related images on docker hub so that i think above 
operation is ok, did i miss something important? 

-- 
Best Regards 
Dai Xiang 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Incorrect pool usage statistics

2017-11-13 Thread Karun Josy
Hello,

Recently, I deleted all the disks from an erasure pool 'ecpool'.
The pool is empty. However the space usage shows around 400GB.
What might be wrong?


$ rbd ls -l ecpool
$ $ ceph df

GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
19019G 16796G2223G 11.69
POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS
template 1227G  1.59 2810G   58549
vm 21  0 0 4684G   2
ecpool  33   403G  2.7910038G  388652
imagepool   34 90430M  0.62 4684G   22789



Karun Josy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Anthony D'Atri
Oscar, a few thoughts:

o I think you might have some misunderstandings about how Ceph works.  Ceph is 
best deployed as a single cluster spanning multiple servers, generally at least 
3.  Is that your plan?  It sort of sounds as though you're thinking of Ceph 
managing only the drives local to each of your converged VDI hosts, like local 
RAID would.  Ceph doesn't work that way.  Well, technically it could but 
wouldn't be a great architecture.  You would want to have at least 3 servers, 
with all of the Ceph OSDs in a single cluster.

o Re RAID0:

> Then, may I understand that your advice is a RAID0 for each 4TB? For a
> balanced configuration...
> 
> 1 osd x 1 disk of 4TB
> 1 osd x 2 disks of 2TB
> 1 odd x 4 disks of 1 TB


For performance a greater number of smaller drives is generally going to be 
best.  VDI desktops are going to be fairly latency-sensitive and you'd really 
do best with SSDs.  All those desktops thrashing a small number of HDDs is not 
going to deliver tolerable performance.

Don't use RAID at all for the OSDs.  Even if you get hardware RAID HBAs, 
configure JBOD/passthrough mode so that OSDs are deployed directly on the 
drives.  This will minimize latency as well as manifold hassles that one adds 
when wrapping drives in HBA RAID volumes.

o Re CPU:

> The other question is considering having one OSDs vs 8 OSDs... 8 OSDs will
> consume more CPU than 1 OSD (RAID5) ?
> 
> As I want to share compute and osd in the same box, resources consumed by
> OSD can be a handicap.


If the CPU cycles used by Ceph are a problem, your architecture has IMHO bigger 
problems.  You need to design for a safety margin of RAM and CPU to accommodate 
spikes in usage, both by Ceph and by your desktops.  There is no way each of 
the systems you describe is going to have enough cycles for 100 desktops 
concurrently active.  You'd be allocating each of them only ~3GB of RAM -- I've 
not had to run MS Windows 10 but even with page sharing that seems awfully 
tight on RAM.

Since you mention PProLiant and 8 drives I'm going assume you're targeting the 
DL360?  I suggest if possible considering the 10SFF models to get you more 
drive bays, ditching the optical drive.  If you can get rear bays to use to 
boot the OS from, that's better yet so you free up front panel drive bays for 
OSD use.  You want to maximize the number of drive bays available for OSD use, 
and if at all possible you want to avoid deploying the operating system's 
filesystems and OSDs on the same drives.

With the numbers you mention throughout the thread, it would seem as though you 
would end up with potentially as little as 80GB of usable space per virtual 
desktop - will that meet your needs?  One of the difficulties with converged 
architectures is that storage and compute don't necessarily scale at the same 
rate.  To that end I suggest considering 2U 25-drive-bay systems so that you 
have room to add more drives.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

2017-11-13 Thread Christian Wuerdig
I haven't used the rados command line utility but it has an "-o
object_size" option as well as "--striper" to make it use the
libradosstriper library so I'd suggest to give these options a go.

On Mon, Nov 13, 2017 at 9:40 PM, Marc Roos  wrote:
>
> 1. I don’t think an osd should 'crash' in such situation.
> 2. How else should I 'rados put' an 8GB file?
>
>
>
>
>
>
> -Original Message-
> From: Christian Wuerdig [mailto:christian.wuer...@gmail.com]
> Sent: maandag 13 november 2017 0:12
> To: Marc Roos
> Cc: ceph-users
> Subject: Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1
>
> As per: https://www.spinics.net/lists/ceph-devel/msg38686.html
> Bluestore as a hard 4GB object size limit
>
>
> On Sat, Nov 11, 2017 at 9:27 AM, Marc Roos 
> wrote:
>>
>> osd's are crashing when putting a (8GB) file in a erasure coded pool,
>> just before finishing. The same osd's are used for replicated pools
>> rbd/cephfs, and seem to do fine. Did I made some error is this a bug?
>> Looks similar to
>> https://www.spinics.net/lists/ceph-devel/msg38685.html
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021
>> 045.html
>>
>>
>> [@c01 ~]# date ; rados -p ec21 put  $(basename
>> "/mnt/disk/blablablalbalblablalablalb.txt")
>> blablablalbalblablalablalb.txt
>> Fri Nov 10 20:27:26 CET 2017
>>
>> [Fri Nov 10 20:33:51 2017] libceph: osd9 down [Fri Nov 10 20:33:51
>> 2017] libceph: osd9 down [Fri Nov 10 20:33:51 2017] libceph: osd0
>> 192.168.10.111:6802 socket closed (con state OPEN) [Fri Nov 10
>> 20:33:51 2017] libceph: osd0 192.168.10.111:6802 socket error on write
>
>> [Fri Nov 10 20:33:52 2017] libceph: osd0 down [Fri Nov 10 20:33:52
>> 2017] libceph: osd7 down [Fri Nov 10 20:33:55 2017] libceph: osd0 down
>
>> [Fri Nov 10 20:33:55 2017] libceph: osd7 down [Fri Nov 10 20:34:41
>> 2017] libceph: osd7 up [Fri Nov 10 20:34:41 2017] libceph: osd7 up
>> [Fri Nov 10 20:35:03 2017] libceph: osd9 up [Fri Nov 10 20:35:03 2017]
>
>> libceph: osd9 up [Fri Nov 10 20:35:47 2017] libceph: osd0 up [Fri Nov
>> 10 20:35:47 2017] libceph: osd0 up
>>
>> [@c02 ~]# rados -p ec21 stat blablablalbalblablalablalb.txt 2017-11-10
>
>> 20:39:31.296101 7f840ad45e40 -1 WARNING: the following dangerous and
>> experimental features are enabled: bluestore 2017-11-10
>> 20:39:31.296290 7f840ad45e40 -1 WARNING: the following dangerous and
>> experimental features are enabled: bluestore 2017-11-10
>> 20:39:31.331588 7f840ad45e40 -1 WARNING: the following dangerous and
>> experimental features are enabled: bluestore
>> ec21/blablablalbalblablalablalb.txt mtime 2017-11-10 20:32:52.00,
>> size 8585740288
>>
>>
>>
>> 2017-11-10 20:32:52.287503 7f933028d700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1510342372287484, "job": 32, "event": "flush_started",
>> "num_memtables": 1, "num_entries": 728747, "num_deletes": 363960,
>> "memory_usage": 263854696}
>> 2017-11-10 20:32:52.287509 7f933028d700  4 rocksdb:
>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
>> AR
>> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
>> e/ 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:293]
>> [default] [JOB 32] Level-0 flush table #25279: started 2017-11-10
>> 20:32:52.503311 7f933028d700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1510342372503293, "cf_name": "default", "job": 32,
>> "event": "table_file_creation", "file_number": 25279, "file_size":
>> 4811948, "table_properties": {"data_size": 4675796, "index_size":
>> 102865, "filter_size": 32302, "raw_key_size": 646440,
>> "raw_average_key_size": 75, "raw_value_size": 4446103,
>> "raw_average_value_size": 519, "num_data_blocks": 1180, "num_entries":
>> 8560, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
>> "kDeletedKeys": "0", "kMergeOperands": "330"}} 2017-11-10
>> 20:32:52.503327 7f933028d700  4 rocksdb:
>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
>> AR
>> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
>> e/ 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:319]
>> [default] [JOB 32] Level-0 flush table #25279: 4811948 bytes OK
>> 2017-11-10 20:32:52.572413 7f933028d700  4 rocksdb:
>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
>> AR
>> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
>> e/
>> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_files.cc:242]
>> adding log 25276 to recycle list
>>
>> 2017-11-10 20:32:52.572422 7f933028d700  4 rocksdb: (Original Log Time
>> 2017/11/10-20:32:52.503339)
>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
>> AR
>> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
>> e/
>> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/memtable_list.cc:360]
>> [default] Level-0 commit table #25279 started 2017-11-10
>> 20:32:52.572425 7f933028d700  4 rocksdb: (Original Log Time
>> 

Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Oscar Segarra
Hi Brady,

For me is very difficult to make a PoC because servers are very expensive.

Then, may I understand that your advice is a RAID0 for each 4TB? For a
balanced configuration...

1 osd x 1 disk of 4TB
1 osd x 2 disks of 2TB
1 odd x 4 disks of 1 TB

Isn't it?

Thanks a lot



El 13 nov. 2017 18:40, "Brady Deetz"  escribió:



On Nov 13, 2017 11:17 AM, "Oscar Segarra"  wrote:

Hi Brady,

Thanks a lot again for your comments and experience.

This is a departure from what I've seen people do here. I agree that 100
VMs on 24 cores would be potentially over consolidating. But, when it comes
to your storage, you probably don't want to lose the data and shouldn't
skimp. Could you lower VMs per host to 75-80?
--> Yes, that's the reason I'm asking this... If I create a RAID5 or RAID0
with 8 disks... I will have just a single OSD process and therefore, I can
let 31 for myy 100 VDIs that I think can be enough.

Also, I notice you have no ssd storage. Are these VMs expected to be
performant at all? 100 VMs accessing 8 spinners could cause some serious
latency.
--> I'm planning to use all SSD in my infraestructure in order to avoid IO
issues. This might not be a problem


My mistake, I read 8x 8TB not 1TB. There are some decent sizing
conversations on the list regarding all ssd deployments. If I were doing
this and forced to scrape a few more cores per host, I would run some tests
in different configurations. My guess is that 4x raid 0 per host will
result in a nice compromise between overhead, performance, and
consolidation ratio. But again, this is a not so advised configuration. No
matter what, before I took this into production, I'd purchase enough
hardware to do a proof of concept using a minimal configuration of 3 hosts.
Then just run benchmarks with 1x raid 6, 1x raid 0, 4x raid 0, and no raid
+ pinned osd process 2-to-1 core.

If none of that works, it's back to the drawing board for you.


Minimum cluster size should be 3 because you are making 3 replicas with
min_size 2. If you lose 1 host in a cluster of 2, you will likely lose
access to data because 2 replicas existed on the host that went down. You
will have a bad time if you run a cluster with 2 replicas.
--> Yes, depend on the VDI nodes, starting from 3.

Thanks a lot in advance for your help!



2017-11-13 18:06 GMT+01:00 Brady Deetz :

>
>
> On Nov 13, 2017 10:44 AM, "Oscar Segarra"  wrote:
>
> Hi Brady,
>
> Thanks a lot for your comments.
>
> I can't think of a reason to use raid 5 and ceph together, even in a vdi
> instance. You're going to want throughput for this use case. What you can
> do is set the affinity of those osd processes to cores not in use by the
> VMs. I do think it will need to be more than 1 core. It is recommended that
> you dedicate 1 core per osd, but you could maybe get away with collocating
> the processes. You'd just have to experiment.
> What we really need to help you is more information.
> --> If my host has 32 cores and 8 disks or 8 OSDs and I have to pin each
> osd process to a core. I will have just 24 cores for all my host an windows
> guest load.
>
>
> What hardware are you planning to use?
> --> I'm planning to use a standard server as ProLiant. In my
> configuration, each ProLiant will be compute for 100 VDIs and Storage node.
> Each ProLiant will have 32 cores, 384GB RAM and a RAID1 for OS
>
>
> This is a departure from what I've seen people do here. I agree that 100
> VMs on 24 cores would be potentially over consolidating. But, when it comes
> to your storage, you probably don't want to lose the data and shouldn't
> skimp. Could you lower VMs per host to 75-80?
> Also, I notice you have no ssd storage. Are these VMs expected to be
> performant at all? 100 VMs accessing 8 spinners could cause some serious
> latency.
>
>
>
> How many osd nodes do you plan to deploy?
> --> Depends on the VDIs to deploy. If customer wants to deploy 100 VDIs
> then 2 OSD nodes will be deployed.
>
>
> Minimum cluster size should be 3 because you are making 3 replicas with
> min_size 2. If you lose 1 host in a cluster of 2, you will likely lose
> access to data because 2 replicas existed on the host that went down. You
> will have a bad time if you run a cluster with 2 replicas.
>
>
> What will the network look like?
> --> I'm planning to use a 10G. Don't know if with 1GB is enough.
>
>
> For the sake of latency alone, you want 10gbps sfp+
>
>
> Are you sure Ceph is the right solution for you?
> --> Yes, I have tested some others like gluster but looks ceph is the one
> that fits better to my solution.
>
> Have you read and do you understand the architecture docs for Ceph?
> --> Absolutely.
>
> Thanks a lot!
>
>
>
> 2017-11-13 17:27 GMT+01:00 Brady Deetz :
>
>> I can't think of a reason to use raid 5 and ceph together, even in a vdi
>> instance. You're going to want throughput for this use case. What you can
>> do is set the 

[ceph-users] Adding a monitor freezes the cluster

2017-11-13 Thread Bishoy Mikhael
Hi All,

I've tried adding 2 monitors to a 3 nodes cluster with 1 monitor, 1 MGR and
1 MDS.
The cluster was at CLEAN state when it just had 1 monitor.

# ceph status

  cluster:

id: 46a122a0-8670-4935-b644-399e744c1c03

health: HEALTH_OK



  services:

mon: 1 daemons, quorum lingcod

mgr: lingcod(active)

mds: NIO-1/1/1 up  {0=lingcod=up:active}

osd: 18 osds: 18 up, 18 in



  data:

pools:   4 pools, 1700 pgs

objects: 77489 objects, 301 GB

usage:   906 GB used, 112 TB / 113 TB avail

pgs: 1700 active+clean


I've done the following on the second node in the cluster trying to add a
monitor, but things went wrong, and now the cluster is frozen, I can't even
query the cluster status.


>From the node I wanted to add it as a monitor, I issued the following
commands:

# scp -p ${initial_monitor_ip}:/etc/ceph/ceph.client.admin.keyring
/etc/ceph/


# ceph-authtool --create-keyring /etc/ceph/${cluster_name}.mon.keyring
--gen-key -n mon. --cap mon 'allow *'


# ceph-authtool /etc/ceph/${cluster_name}.mon.keyring --import-keyring
/etc/ceph/${cluster_name}.client.admin.keyring


# ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr
'allow *'


# monmaptool --create --add ${hostname} ${ip_address} --fsid ${uuid}
/etc/ceph/monmap


# mkdir /var/lib/ceph/mon/${cluster_name}-${hostname}

# chown -R ceph:ceph /var/lib/ceph/mon/${cluster_name}-${hostname}

# chmod +r /etc/ceph/${cluster_name}.mon.keyring

# chmod +r /etc/ceph/monmap


# sudo -u ceph ceph-mon --cluster ${cluster_name} --mkfs -i ${hostname}
--monmap /etc/ceph/monmap --keyring /etc/ceph/${cluster_name}.mon.keyring
--fsid ${uuid}


# touch /var/lib/ceph/mon/${cluster_name}-${hostname}/done


# systemctl start ceph-mon@${hostname}


# ceph daemon mon.taulog add_bootstrap_peer_hint lingcod



Then, when I found that the cluster was still reporting 1 monitor, I issued
the following commands on the first monitor node.
# ceph mon add taulog ${taulog_IP}


Regards,
Bishoy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-disk should wait for device file /dev/sdXX file to be created before trying to run mkfs

2017-11-13 Thread Subhachandra Chandra
Hi,

I am using ceph-ansible to deploy ceph to run as a container on VMs
running on my laptop. The VMs run CoreOS and the docker image being
installed has the tag "tag-build-master-luminous-ubuntu-16.04". The backend
is "bluestore".

  While running the "ceph-osd-prepare" stage, the installation fails while
trying to create an XFS file system on /dev/sdX1. The issue seems to be
that the device file /dev/sdX1is not visible inside the container when this
command is run and the file is visible/created with a small delay. I
verified that the file did exist shortly after the command failed by
looking inside the container and the host.

When creating a OSD node with two data drives, the command sometimes
succeeds on one or more of the drives while the other one fails. When run
enough times it succeeds on all the drives on that node.

populate_data_path_device: Creating xfs fs on /dev/sdb1
command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 --
/dev/sdb1
/dev/sdb1: No such file or directory

It looks like ceph-disk should wait for the device file to exists before
running populate_data_path_device. Does this seem correct or am I hitting
some other issue?

Thanks
Chandra
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Object gateway and LDAP Auth

2017-11-13 Thread Josh Haft
Finally got back around to working on this and wanted to provide a solution
in case anyone else runs into the same problem.

I was able to reproduce the problem using s3cmd, and noticed different
calls utilized different signature versions. Doing a GET operation on '/'
seemed to use v2 while a 'make bucket' command attempted to use v4. Since
the former succeeded and the latter failed, I called s3cmd with
'--signature-v2' and now all operations work. I'm still not able to use
boto3, but it's no longer an LDAP issue.

Josh



On Tue, Sep 5, 2017 at 10:26 AM, Josh Haft  wrote:

> Thanks for your suggestions, Matt. ldapsearch functionality from the rados
> gw machines works fine using the same parameters specified in ceph.conf
> (uri, binddn, searchdn, ldap_secret). As expected I see network traffic
> to/from the ldap host when performing a search as well.
>
> The only configuration I have in /etc/openldap/ldap.conf is 'TLSREQCERT
> demand' and TLS_CACERTDIR pointing at the location of my certdb... is there
> something else required here for ceph-rgw or does it look elsewhere?
>
> Josh
>
>
>
>
> On Fri, Sep 1, 2017 at 11:15 PM, Matt Benjamin 
> wrote:
>
>> Hi Josh,
>>
>> I'm not certain, but you might try disabling the searchfilter to start
>> with.  If you're not seeing traffic, I would focus on verifying ldap
>> search connectivity using the same credentials, using the openldap
>> client, to rule out something low level.
>>
>> Matt
>>
>>
>> On Thu, Aug 31, 2017 at 3:33 PM, Josh  wrote:
>> > Hello!
>> >
>> > I've setup LDAP authentication on an object gateway and am attempting to
>> > create a bucket via s3 using python's boto3. It works fine using the
>> access
>> > and secret key for a radosgw user, but access is denied using a token
>> > generated via radosgw-token with the LDAP user's credentials. The user
>> does
>> > exist in the directory (I'm using Active Directory), and I am able to
>> query
>> > for that user using the creds specified in rgw_ldap_binddn and
>> > rgw_ldap_secret.
>> >
>> > I've bumped the rgw logging to 20 and can see the request come in, but
>> it
>> > ultimately gets denied:
>> > 2017-08-30 15:44:55.754721 7f4878ff9700  2 req 1:0.76:s3:PUT
>> > /foobar:create_bucket:authorizing
>> > 2017-08-30 15:44:55.754738 7f4878ff9700 10 v4 signature format = 
>> > 2017-08-30 15:44:55.754746 7f4878ff9700 10 v4 credential format =
>> > /20170830/us-east-1/s3/aws4_request
>> > 2017-08-30 15:44:55.754750 7f4878ff9700 10 access key id = 
>> > 2017-08-30 15:44:55.754755 7f4878ff9700 10 credential scope =
>> > 20170830/us-east-1/s3/aws4_request
>> > 2017-08-30 15:44:55.754769 7f4878ff9700 20 get_system_obj_state:
>> > rctx=0x7f4878ff2060 obj=default.rgw.users.keys:
>> state=0x7f48f40131a8
>> > s->prefetch_data=0
>> > 2017-08-30 15:44:55.754778 7f4878ff9700 10 cache get:
>> > name=default.rgw.users.keys+ : miss
>> > 2017-08-30 15:44:55.755312 7f4878ff9700 10 cache put:
>> > name=default.rgw.users.keys+ info.flags=0
>> > 2017-08-30 15:44:55.755321 7f4878ff9700 10 adding
>> > default.rgw.users.keys+ to cache LRU end
>> > 2017-08-30 15:44:55.755328 7f4878ff9700 10 error reading user info,
>> uid=
>> > can't authenticate
>> > 2017-08-30 15:44:55.755330 7f4878ff9700 10 failed to authorize request
>> > 2017-08-30 15:44:55.755331 7f4878ff9700 20 handler->ERRORHANDLER:
>> > err_no=-2028 new_err_no=-2028
>> > 2017-08-30 15:44:55.755393 7f4878ff9700  2 req 1:0.000747:s3:PUT
>> > /foobar:create_bucket:op status=0
>> > 2017-08-30 15:44:55.755398 7f4878ff9700  2 req 1:0.000752:s3:PUT
>> > /foobar:create_bucket:http status=403
>> > 2017-08-30 15:44:55.755402 7f4878ff9700  1 == req done
>> > req=0x7f4878ff3710 op status=0 http_status=403 ==
>> > 2017-08-30 15:44:55.755409 7f4878ff9700 20 process_request() returned
>> -2028
>> >
>> > I am also running a tcpdump on the machine while I see these log
>> messages,
>> > but strangely I see no traffic destined for my configured LDAP server.
>> > Here's some info on my setup. It seems like I'm missing something very
>> > obvious; any help would be appreciated!
>> >
>> > # rpm -q ceph-radosgw
>> > ceph-radosgw-10.2.9-0.el7.x86_64
>> >
>> > # grep rgw /etc/ceph/ceph.conf
>> > [client.rgw.hostname]
>> > rgw_frontends = civetweb port=8081s ssl_certificate=/path/to/priva
>> te/key.pem
>> > debug rgw = 20
>> > rgw_s3_auth_use_ldap = true
>> > rgw_ldap_secret = "/path/to/creds/file"
>> > rgw_ldap_uri = "ldaps://hostname.domain.com:636"
>> > rgw_ldap_binddn = "CN=valid_user,OU=Accounts,DC=domain,DC=com"
>> > rgw_ldap_searchdn = "ou=Accounts,dc=domain,dc=com"
>> > rgw_ldap_dnattr = "uid"
>> > rgw_ldap_searchfilter = "objectclass=user"
>> >
>> >
>> > Thanks,
>> > Josh
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>>
>> Matt Benjamin

[ceph-users] Reuse pool id

2017-11-13 Thread Karun Josy
Hi,

Is there anyway we can change or reuse pool id ?
I had created and deleted lot of test pools. So the IDs kind of look like
this now:

---
$ ceph osd lspools
34 imagepool,37 cvmpool,40 testecpool,41 ecpool1,
--

Can I change it to 0,1,2,3 etc ?

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Performance, and how much wiggle room there is with tunables

2017-11-13 Thread Robert Stanford
ceph osd pool create scbench 100 100
rados bench -p scbench 10 write --no-cleanup
rados bench -p scbench 10 seq


On Mon, Nov 13, 2017 at 1:28 AM, Rudi Ahlers  wrote:

> Would you mind telling me what rados command set you use, and share the
> output? I would like to compare it to our server as well.
>
> On Fri, Nov 10, 2017 at 6:29 AM, Robert Stanford 
> wrote:
>
>>
>>  In my cluster, rados bench shows about 1GB/s bandwidth.  I've done some
>> tuning:
>>
>> [osd]
>> osd op threads = 8
>> osd disk threads = 4
>> osd recovery max active = 7
>>
>>
>> I was hoping to get much better bandwidth.  My network can handle it, and
>> my disks are pretty fast as well.  Are there any major tunables I can play
>> with to increase what will be reported by "rados bench"?  Am I pretty much
>> stuck around the bandwidth it reported?
>>
>>  Thank you
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Kind Regards
> Rudi Ahlers
> Website: http://www.rudiahlers.co.za
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Lionel Bouton
Le 13/11/2017 à 15:47, Oscar Segarra a écrit :
> Thanks Mark, Peter, 
>
> For clarification, the configuration with RAID5 is having many servers
> (2 or more) with RAID5 and CEPH on top of it. Ceph will replicate data
> between servers. Of course, each server will have just one OSD daemon
> managing a big disk.
>
> It looks functionally is the same using RAID5 +  1 Ceph daemon as 8
> CEPH daemons.

Functionally it's the same but RAID5 will kill your write performance.

For example if you start with 3 OSD hosts and a pool size of 3, due to
RAID5 each and every write on your Ceph cluster will imply a read on one
server on every disks minus one then a write on *all* the disks of the
cluster.

If you use one OSD per disk you'll have a read on one disk only and a
write on 3 disks only : you'll get approximately 8 times the IOPS for
writes (with 8 disks per server). Clever RAID5 logic can minimize this
for some I/O patterns but it is a bet and will never be as good as what
you'll get with one disk per OSD.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Oscar Segarra
Thanks Mark, Peter,

For clarification, the configuration with RAID5 is having many servers (2
or more) with RAID5 and CEPH on top of it. Ceph will replicate data between
servers. Of course, each server will have just one OSD daemon managing a
big disk.

It looks functionally is the same using RAID5 +  1 Ceph daemon as 8 CEPH
daemons.

I appreciate a lot your comments!

Oscar Segarra



2017-11-13 15:37 GMT+01:00 Marc Roos :

>
> Keep in mind also if you want to have fail over in the future. We were
> running a 2nd server and were replicating via DRBD the raid arrays.
> Expanding this storage is quite hastle, compared to just adding a few
> osd's.
>
>
>
> -Original Message-
> From: Oscar Segarra [mailto:oscar.sega...@gmail.com]
> Sent: maandag 13 november 2017 15:26
> To: Peter Maloney
> Cc: ceph-users
> Subject: Re: [ceph-users] HW Raid vs. Multiple OSD
>
> Hi Peter,
>
> Thanks a lot for your consideration in terms of storage consumption.
>
> The other question is considering having one OSDs vs 8 OSDs... 8 OSDs
> will consume more CPU than 1 OSD (RAID5) ?
>
> As I want to share compute and osd in the same box, resources consumed
> by OSD can be a handicap.
>
> Thanks a lot.
>
> 2017-11-13 12:59 GMT+01:00 Peter Maloney
> :
>
>
> Once you've replaced an OSD, you'll see it is quite simple... doing
> it for a few is not much more work (you've scripted it, right?). I don't
> see RAID as giving any benefit here at all. It's not tricky...it's
> perfectly normal operation. Just get used to ceph, and it'll be as
> normal as replacing a RAID disk. And for performance degradation, maybe
> it could be better on either... or better on ceph if you don't mind
> setting the rate to the lowest... but when the QoS functionality is
> ready, probably ceph will be much better. Also RAID will cost you more
> for hardware.
>
> And raid5 is really bad for IOPS. And ceph already replicates, so
> you will have 2 layers of redundancy... and ceph does it cluster wide,
> not just one machine. Using ceph with replication is like all your free
> space as hot spares... you could lose 2 disks on all your machines, and
> it can still run (assuming it had time to recover in between, and enough
> space). And you don't want min_size=1, and if you have 2 layers of
> redundancy, you'll be tempted to do that probably.
>
> But for some workloads, like RBD, ceph doesn't balance out the
> workload very evenly for a specific client, only many clients at once...
> raid might help solve that, but I don't see it as worth it.
>
> I would just software RAID1 the OS and mons, and mds, not the OSDs.
>
>
> On 11/13/17 12:26, Oscar Segarra wrote:
>
>
> Hi,
>
> I'm designing my infraestructure. I want to provide 8TB (8
> disks x 1TB each) of data per host just for Microsoft Windows 10 VDI. In
> each host I will have storage (ceph osd) and compute (on kvm).
>
> I'd like to hear your opinion about theese two
> configurations:
>
> 1.- RAID5 with 8 disks (I will have 7TB but for me it is
> enough) + 1 OSD daemon
> 2.- 8 OSD daemons
>
> I'm a little bit worried that 8 osd daemons can affect
> performance because all jobs running and scrubbing.
>
> Another question is the procedure of a replacement of a
> failed
> disk. In case of a big RAID, replacement is direct. In case of many
> OSDs, the procedure is a little bit tricky.
>
>
> http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-
> ceph-cluster/
>  ceph-cluster/>
>
>
> What is your advice?
>
> Thanks a lot everybody in advance...
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>
>
>
>
> --
>
> 
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300 
> Fax: +49 4152 889 333 
> E-mail: peter.malo...@brockmann-consult.de
> 
> Internet: http://www.brockmann-consult.de
> 
> 
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Marc Roos
 
Keep in mind also if you want to have fail over in the future. We were 
running a 2nd server and were replicating via DRBD the raid arrays. 
Expanding this storage is quite hastle, compared to just adding a few 
osd's. 



-Original Message-
From: Oscar Segarra [mailto:oscar.sega...@gmail.com] 
Sent: maandag 13 november 2017 15:26
To: Peter Maloney
Cc: ceph-users
Subject: Re: [ceph-users] HW Raid vs. Multiple OSD

Hi Peter, 

Thanks a lot for your consideration in terms of storage consumption. 

The other question is considering having one OSDs vs 8 OSDs... 8 OSDs 
will consume more CPU than 1 OSD (RAID5) ?

As I want to share compute and osd in the same box, resources consumed 
by OSD can be a handicap.

Thanks a lot.

2017-11-13 12:59 GMT+01:00 Peter Maloney 
:


Once you've replaced an OSD, you'll see it is quite simple... doing 
it for a few is not much more work (you've scripted it, right?). I don't 
see RAID as giving any benefit here at all. It's not tricky...it's 
perfectly normal operation. Just get used to ceph, and it'll be as 
normal as replacing a RAID disk. And for performance degradation, maybe 
it could be better on either... or better on ceph if you don't mind 
setting the rate to the lowest... but when the QoS functionality is 
ready, probably ceph will be much better. Also RAID will cost you more 
for hardware.

And raid5 is really bad for IOPS. And ceph already replicates, so 
you will have 2 layers of redundancy... and ceph does it cluster wide, 
not just one machine. Using ceph with replication is like all your free 
space as hot spares... you could lose 2 disks on all your machines, and 
it can still run (assuming it had time to recover in between, and enough 
space). And you don't want min_size=1, and if you have 2 layers of 
redundancy, you'll be tempted to do that probably.

But for some workloads, like RBD, ceph doesn't balance out the 
workload very evenly for a specific client, only many clients at once... 
raid might help solve that, but I don't see it as worth it.

I would just software RAID1 the OS and mons, and mds, not the OSDs.


On 11/13/17 12:26, Oscar Segarra wrote:


Hi,  

I'm designing my infraestructure. I want to provide 8TB (8 
disks x 1TB each) of data per host just for Microsoft Windows 10 VDI. In 
each host I will have storage (ceph osd) and compute (on kvm).

I'd like to hear your opinion about theese two configurations:

1.- RAID5 with 8 disks (I will have 7TB but for me it is 
enough) + 1 OSD daemon
2.- 8 OSD daemons

I'm a little bit worried that 8 osd daemons can affect 
performance because all jobs running and scrubbing.

Another question is the procedure of a replacement of a failed 
disk. In case of a big RAID, replacement is direct. In case of many 
OSDs, the procedure is a little bit tricky.


http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/
 

 


What is your advice?

Thanks a lot everybody in advance...

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 




-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300  
Fax: +49 4152 889 333  
E-mail: peter.malo...@brockmann-consult.de 
 
Internet: http://www.brockmann-consult.de 
 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Oscar Segarra
Hi Peter,

Thanks a lot for your consideration in terms of storage consumption.

The other question is considering having one OSDs vs 8 OSDs... 8 OSDs will
consume more CPU than 1 OSD (RAID5) ?

As I want to share compute and osd in the same box, resources consumed by
OSD can be a handicap.

Thanks a lot.

2017-11-13 12:59 GMT+01:00 Peter Maloney :

> Once you've replaced an OSD, you'll see it is quite simple... doing it for
> a few is not much more work (you've scripted it, right?). I don't see RAID
> as giving any benefit here at all. It's not tricky...it's perfectly normal
> operation. Just get used to ceph, and it'll be as normal as replacing a
> RAID disk. And for performance degradation, maybe it could be better on
> either... or better on ceph if you don't mind setting the rate to the
> lowest... but when the QoS functionality is ready, probably ceph will be
> much better. Also RAID will cost you more for hardware.
>
> And raid5 is really bad for IOPS. And ceph already replicates, so you will
> have 2 layers of redundancy... and ceph does it cluster wide, not just one
> machine. Using ceph with replication is like all your free space as hot
> spares... you could lose 2 disks on all your machines, and it can still run
> (assuming it had time to recover in between, and enough space). And you
> don't want min_size=1, and if you have 2 layers of redundancy, you'll be
> tempted to do that probably.
>
> But for some workloads, like RBD, ceph doesn't balance out the workload
> very evenly for a specific client, only many clients at once... raid might
> help solve that, but I don't see it as worth it.
>
> I would just software RAID1 the OS and mons, and mds, not the OSDs.
>
>
> On 11/13/17 12:26, Oscar Segarra wrote:
>
> Hi,
>
> I'm designing my infraestructure. I want to provide 8TB (8 disks x 1TB
> each) of data per host just for Microsoft Windows 10 VDI. In each host I
> will have storage (ceph osd) and compute (on kvm).
>
> I'd like to hear your opinion about theese two configurations:
>
> 1.- RAID5 with 8 disks (I will have 7TB but for me it is enough) + 1 OSD
> daemon
> 2.- 8 OSD daemons
>
> I'm a little bit worried that 8 osd daemons can affect performance because
> all jobs running and scrubbing.
>
> Another question is the procedure of a replacement of a failed disk. In
> case of a big RAID, replacement is direct. In case of many OSDs, the
> procedure is a little bit tricky.
>
> http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-
> ceph-cluster/
>
> What is your advice?
>
> Thanks a lot everybody in advance...
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
>
> 
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300 <+49%204152%20889300>
> Fax: +49 4152 889 333 <+49%204152%20889333>
> E-mail: peter.malo...@brockmann-consult.de
> Internet: http://www.brockmann-consult.de
> 
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Active+clean PGs reported many times in log

2017-11-13 Thread Matteo Dacrema
Hi, 
I noticed that sometimes the monitors start to log active+clean pgs many times 
in the same line. For example I have 18432 and the logs shows " 2136 
active+clean, 28 active+clean, 2 active+clean+scrubbing+deep, 16266 
active+clean;”
After a minute monitor start to log correctly again.

Is it normal ?

2017-11-13 11:05:08.876724 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797105: 18432 pgs: 3 active+clean+scrubbing+deep, 18429 active+clean; 
59520 GB data, 129 TB used, 110 TB / 239 TB avail; 40596 kB/s rd, 89723 kB/s 
wr, 4899 op/s
2017-11-13 11:05:09.911266 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797106: 18432 pgs: 2 active+clean+scrubbing+deep, 18430 active+clean; 
59520 GB data, 129 TB used, 110 TB / 239 TB avail; 45931 kB/s rd, 114 MB/s wr, 
6179 op/s
2017-11-13 11:05:10.751378 7fb359cfb700  0 mon.controller001@0(leader) e1 
handle_command mon_command({"prefix": "osd pool stats", "format": "json"} v 0) 
v1
2017-11-13 11:05:10.751599 7fb359cfb700  0 log_channel(audit) log [DBG] : 
from='client.? 10.16.24.127:0/547552484' entity='client.telegraf' 
cmd=[{"prefix": "osd pool stats", "format": "json"}]: dispatch
2017-11-13 11:05:10.926839 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797107: 18432 pgs: 3 active+clean+scrubbing+deep, 18429 active+clean; 
59520 GB data, 129 TB used, 110 TB / 239 TB avail; 47617 kB/s rd, 134 MB/s wr, 
7414 op/s
2017-11-13 11:05:11.921115 7fb35d17d700  1 mon.controller001@0(leader).osd 
e120942 e120942: 216 osds: 216 up, 216 in
2017-11-13 11:05:11.926818 7fb35d17d700  0 log_channel(cluster) log [INF] : 
osdmap e120942: 216 osds: 216 up, 216 in
2017-11-13 11:05:11.984732 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797109: 18432 pgs: 3 active+clean+scrubbing+deep, 18429 active+clean; 
59520 GB data, 129 TB used, 110 TB / 239 TB avail; 54110 kB/s rd, 115 MB/s wr, 
7827 op/s
2017-11-13 11:05:13.085799 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797110: 18432 pgs: 973 active+clean, 12 active+clean, 3 
active+clean+scrubbing+deep, 17444 active+clean; 59520 GB data, 129 TB used, 
110 TB / 239 TB avail; 115 MB/s rd, 90498 kB/s wr, 8490 op/s
2017-11-13 11:05:14.181219 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797111: 18432 pgs: 2136 active+clean, 28 active+clean, 2 
active+clean+scrubbing+deep, 16266 active+clean; 59520 GB data, 129 TB used, 
110 TB / 239 TB avail; 136 MB/s rd, 94461 kB/s wr, 10237 op/s
2017-11-13 11:05:15.324630 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797112: 18432 pgs: 3179 active+clean, 44 active+clean, 2 
active+clean+scrubbing+deep, 15207 active+clean; 59519 GB data, 129 TB used, 
110 TB / 239 TB avail; 184 MB/s rd, 81743 kB/s wr, 13786 op/s
2017-11-13 11:05:16.381452 7fb35d17d700  0 log_channel(cluster) log [INF] : 
pgmap v99797113: 18432 pgs: 3600 active+clean, 52 active+clean, 2 
active+clean+scrubbing+deep, 14778 active+clean; 59518 GB data, 129 TB used, 
110 TB / 239 TB avail; 208 MB/s rd, 77342 kB/s wr, 14382 op/s
2017-11-13 11:05:17.272757 7fb3570f2700  1 leveldb: Level-0 table #26314650: 
started
2017-11-13 11:05:17.390808 7fb3570f2700  1 leveldb: Level-0 table #26314650: 
18281928 bytes OK
2017-11-13 11:05:17.392636 7fb3570f2700  1 leveldb: Delete type=0 #26314647

2017-11-13 11:05:17.397516 7fb3570f2700  1 leveldb: Manual compaction at 
level-0 from 'pgmap\x0099796362' @ 72057594037927935 : 1 .. 'pgmap\x0099796613' 
@ 0 : 0; will stop at 'pgmap_pg\x006.ff' @ 29468156273 : 1


Thank you
Matteo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No ops on some OSD

2017-11-13 Thread Marc Roos
 
Very very nice, Thanks! Is there a heavy penalty to pay for enabling 
this? 



-Original Message-
From: John Spray [mailto:jsp...@redhat.com] 
Sent: maandag 13 november 2017 11:48
To: Marc Roos
Cc: iswaradrmwn; ceph-users
Subject: Re: [ceph-users] No ops on some OSD

On Sun, Nov 12, 2017 at 2:56 PM, Marc Roos  
wrote:
>
> [@c03 ~]# ceph osd status
> 2017-11-12 15:54:13.164823 7f478a6ad700 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-11-12 15:54:13.211219 7f478a6ad700 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore no valid 
> command found; 10 closest matches:
> osd map   {} osd lspools {} osd 
> count-metadata  osd versions osd find  
> osd metadata {} osd getmaxosd osd ls-tree 
> {} {} osd getmap {} osd getcrushmap 
> {} Error EINVAL: invalid command

The "osd status" command comes from the ceph-mgr module called "status" 
-- this is enabled by default but it's possible that it got switched off 
on your system?  Check your ceph-mgr logs and whether it's in "ceph mgr 
module ls" (or try enabling with "ceph mgr module enable status")

John

>
>
>
> -Original Message-
> From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]
> Sent: zondag 12 november 2017 2:17
> Cc: ceph-users
> Subject: Re: [ceph-users] No ops on some OSD
>
> Still the same syntax (ceph osd status)
>
> Thanks
>
> Regards,
>
> I Gede Iswara Darmawan
>
> Information System - School of Industrial and System Engineering
>
> Telkom University
>
> P / SMS / WA : 081 322 070719
>
> E : iswaradr...@gmail.com / iswaradr...@live.com
>
>
> On Sat, Nov 4, 2017 at 6:11 PM, Marc Roos 
> wrote:
>
>
>
>
> What is the new syntax for "ceph osd status" for luminous?
>
>
>
>
>
> -Original Message-
> From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]
> Sent: donderdag 2 november 2017 6:19
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] No ops on some OSD
>
> Hello,
>
> I want to ask about my problem. There's some OSD that dont 
> have any load
> (indicated with No ops on that OSD).
>
> Hereby I attached the ceph osd status result :
> https://pastebin.com/fFLcCbpk . Look at OSD 17,61 and 72. 
> There's no
> load or operation happened at that OSD. How to fix this?
>
> Thank you
> Regards,
>
> I Gede Iswara Darmawan
>
> Information System - School of Industrial and System 
> Engineering
>
> Telkom University
>
> P / SMS / WA : 081 322 070719
>
> E : iswaradr...@gmail.com / iswaradr...@live.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CRUSH - adding device class to existing rule without causing complete rebalance

2017-11-13 Thread Patrick Fruh
Hi everyone,

I only have a single rule in my crushmap and only OSDs classed as hdd (after 
the luminous update):

rule replicated_ruleset {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

Since luminous added device classes, I tried updating the rule to

step take default class hdd

so I could later add a rule for ssds without the existing pools trying to 
balance on those.
Since all existing OSDs are classed as hdd I thought nothing should change with 
this rule change (since the OSDs to be used are the same before and after the 
change), however after inserting the new crushmap, the whole cluster started 
rebalancing, after which I quickly reinserted the old crushmap.

So, is there any way to limit my existing pools to HDDs without causing a 
complete rebalance (at least that's what it looked like)?

Best,
Patrick
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Michael

Oscar Segarra wrote:

I'd like to hear your opinion about theese two configurations:

1.- RAID5 with 8 disks (I will have 7TB but for me it is enough) + 1 
OSD daemon

2.- 8 OSD daemons
You mean 1 OSD daemon on top of RAID5? I don't think I'd do that. You'll 
probably want redundancy at Ceph's level anyhow, and then where is the 
point...?
I'm a little bit worried that 8 osd daemons can affect performance 
because all jobs running and scrubbing.
If you ran RAID instead of Ceph, RAID might still perform better. But I 
don't believe anything much changes for the better if you run the Ceph 
on top of RAID rather than on top of individual OSD, unless your 
configuration is bad. I generally don't think you have to worry that 
much that a reasonably modern machine can't handle running a few extra 
jobs, either.


But you could certainly do some tests on your hardware to be sure.

Another question is the procedure of a replacement of a failed disk. 
In case of a big RAID, replacement is direct. In case of many OSDs, 
the procedure is a little bit tricky.


http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/ 

I wasn't using Ceph in 2014, but at least in my limited experience, 
today the most important step is done when you add the new drive and 
activate an OSD on it.


You probably still want to remove the leftovers of the old failed OSD 
for it to not clutter your list, but as far as I can tell replication 
and so on will trigger *before* you remove it. (There is a configurable 
timeout for how long an OSD can be down, after which the OSD is 
essentially treated as dead already, at which point replication and 
rebalancing starts).



-Michael


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Peter Maloney
Once you've replaced an OSD, you'll see it is quite simple... doing it
for a few is not much more work (you've scripted it, right?). I don't
see RAID as giving any benefit here at all. It's not tricky...it's
perfectly normal operation. Just get used to ceph, and it'll be as
normal as replacing a RAID disk. And for performance degradation, maybe
it could be better on either... or better on ceph if you don't mind
setting the rate to the lowest... but when the QoS functionality is
ready, probably ceph will be much better. Also RAID will cost you more
for hardware.

And raid5 is really bad for IOPS. And ceph already replicates, so you
will have 2 layers of redundancy... and ceph does it cluster wide, not
just one machine. Using ceph with replication is like all your free
space as hot spares... you could lose 2 disks on all your machines, and
it can still run (assuming it had time to recover in between, and enough
space). And you don't want min_size=1, and if you have 2 layers of
redundancy, you'll be tempted to do that probably.

But for some workloads, like RBD, ceph doesn't balance out the workload
very evenly for a specific client, only many clients at once... raid
might help solve that, but I don't see it as worth it.

I would just software RAID1 the OS and mons, and mds, not the OSDs.

On 11/13/17 12:26, Oscar Segarra wrote:
> Hi, 
>
> I'm designing my infraestructure. I want to provide 8TB (8 disks x 1TB
> each) of data per host just for Microsoft Windows 10 VDI. In each host
> I will have storage (ceph osd) and compute (on kvm).
>
> I'd like to hear your opinion about theese two configurations:
>
> 1.- RAID5 with 8 disks (I will have 7TB but for me it is enough) + 1
> OSD daemon
> 2.- 8 OSD daemons
>
> I'm a little bit worried that 8 osd daemons can affect performance
> because all jobs running and scrubbing.
>
> Another question is the procedure of a replacement of a failed disk.
> In case of a big RAID, replacement is direct. In case of many OSDs,
> the procedure is a little bit tricky.
>
> http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/
>
> What is your advice?
>
> Thanks a lot everybody in advance...
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ubuntu upgrade Zesty => Aardvark, Implications for Ceph?

2017-11-13 Thread Ranjan Ghosh

Hi everyone,

In January, support for Ubuntu Zesty will run out and we're planning to 
upgrade our servers to Aardvark. We have a two-node-cluster (and one 
additional monitoring-only server) and we're using the packages that 
come with the distro. We have mounted CephFS on the same server with the 
kernel client in FSTab. AFAIK, Aardvark includes Ceph 12.0. What would 
happen if we used the usual "do-release-upgrade" to upgrade the servers 
one-by-one? I assume the procedure described here 
"http://ceph.com/releases/v12-2-0-luminous-released/; (section "Upgrade 
from Jewel or Kraken") probably won't work for us, because 
"do-release-upgrade" will upgrade all packages (including the ceph ones) 
at once and then reboots the machine. So we cannot really upgrade only 
the monitoring nodes. And I'd rather avoid switching to PPAs beforehand. 
So, what are the real consequences if we upgrade all servers one-by-one 
with "do-release-upgrade" and then reboot all the nodes? Is it only the 
downtime why this isnt recommended or do we lose data? Any other 
recommendations on how to tackle this?


Thank you / BR

Ranjan



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Oscar Segarra
Hi,

I'm designing my infraestructure. I want to provide 8TB (8 disks x 1TB
each) of data per host just for Microsoft Windows 10 VDI. In each host I
will have storage (ceph osd) and compute (on kvm).

I'd like to hear your opinion about theese two configurations:

1.- RAID5 with 8 disks (I will have 7TB but for me it is enough) + 1 OSD
daemon
2.- 8 OSD daemons

I'm a little bit worried that 8 osd daemons can affect performance because
all jobs running and scrubbing.

Another question is the procedure of a replacement of a failed disk. In
case of a big RAID, replacement is direct. In case of many OSDs, the
procedure is a little bit tricky.

http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/

What is your advice?

Thanks a lot everybody in advance...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rocksdb: Corruption: missing start of fragmented record

2017-11-13 Thread Michael

Konstantin Shalygin wrote:
> I think Christian talks about version 12.2.2, not 12.2.*

Which isn't released yet, yes. I could try building the development 
repository if you think that has a chance of resolving the issue?


Although I'd still like to know how I could theoretically get my hands 
at these rocksdb files manually, if anyone knows how to do that? I still 
have no idea how.


I also reported this as a bug last week, in case anyone has information 
or the same issue:

http://tracker.ceph.com/issues/22044
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: 答复: Where can I find the fix commit of #3370 ?

2017-11-13 Thread Ilya Dryomov
On Mon, Nov 13, 2017 at 10:53 AM, 周 威  wrote:
> Hi, Ilya
>
> The kernel version is 3.10.106.
> Part of dmesg related to ceph:
> [7349718.004905] libceph: osd297 down
> [7349718.005190] libceph: osd299 down
> [7349785.671015] libceph: osd295 down
> [7350006.357509] libceph: osd291 weight 0x0 (out)
> [7350006.357795] libceph: osd292 weight 0x0 (out)
> [7350006.358075] libceph: osd293 weight 0x0 (out)
> [7350006.358356] libceph: osd294 weight 0x0 (out)
> [7350013.312399] libceph: osd289 weight 0x0 (out)
> [7350013.312683] libceph: osd290 weight 0x0 (out)
> [7350013.312964] libceph: osd296 weight 0x0 (out)
> [7350013.313244] libceph: osd298 weight 0x0 (out)
> [7350023.322571] libceph: osd288 weight 0x0 (out)
> [7350038.338217] libceph: osd297 weight 0x0 (out)
> [7350038.338501] libceph: osd299 weight 0x0 (out)
> [7350115.364496] libceph: osd295 weight 0x0 (out)
> [7350179.683200] libceph: osd294 weight 0x1 (in)
> [7350179.683495] libceph: osd294 up
> [7350193.654197] libceph: osd293 weight 0x1 (in)
> [7350193.654486] libceph: osd297 weight 0x1 (in)
> [7350193.654769] libceph: osd293 up
> [7350193.655046] libceph: osd297 up
> [7350228.750112] libceph: osd299 weight 0x1 (in)
> [7350228.750399] libceph: osd299 up
> [7350255.739415] libceph: osd289 weight 0x1 (in)
> [7350255.739700] libceph: osd289 up
> [7350268.578031] libceph: osd288 weight 0x1 (in)
> [7350268.578315] libceph: osd288 up
> [7383411.866068] libceph: osd299 down
> [7383558.405675] libceph: osd299 up
> [7383411.866068] libceph: osd299 down
> [7383558.405675] libceph: osd299 up
> [7387106.574308] libceph: osd291 weight 0x1 (in)
> [7387106.574593] libceph: osd291 up
> [7387124.168198] libceph: osd296 weight 0x1 (in)
> [7387124.168492] libceph: osd296 up
> [7387131.732934] libceph: osd292 weight 0x1 (in)
> [7387131.733218] libceph: osd292 up
> [7387131.741277] libceph: osd290 weight 0x1 (in)
> [7387131.741558] libceph: osd290 up
> [7387149.788781] libceph: osd298 weight 0x1 (in)
> [7387149.789066] libceph: osd298 up
>
> A node of osds restart some days before.
> And after evict session:
> [7679890.147116] libceph: mds0 x.x.x.x:6800 socket closed (con state OPEN)
> [7679890.491439] libceph: mds0 x.x.x.x:6800 connection reset
> [7679890.491727] libceph: reset on mds0
> [7679890.492006] ceph: mds0 closed our session
> [7679890.492286] ceph: mds0 reconnect start
> [7679910.479911] ceph: mds0 caps stale
> [7679927.886621] ceph: mds0 reconnect denied
>
> We have to restart the machine to recovery it.
> I will send you an email if it happen again.

3.10.z is EOL.  I'd recommend upgrading to 4.9.z or the newly released
4.14 -- 4.14.z will be the next longterm series.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No ops on some OSD

2017-11-13 Thread John Spray
On Sun, Nov 12, 2017 at 2:56 PM, Marc Roos  wrote:
>
> [@c03 ~]# ceph osd status
> 2017-11-12 15:54:13.164823 7f478a6ad700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> 2017-11-12 15:54:13.211219 7f478a6ad700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> no valid command found; 10 closest matches:
> osd map   {}
> osd lspools {}
> osd count-metadata 
> osd versions
> osd find 
> osd metadata {}
> osd getmaxosd
> osd ls-tree {} {}
> osd getmap {}
> osd getcrushmap {}
> Error EINVAL: invalid command

The "osd status" command comes from the ceph-mgr module called
"status" -- this is enabled by default but it's possible that it got
switched off on your system?  Check your ceph-mgr logs and whether
it's in "ceph mgr module ls" (or try enabling with "ceph mgr module
enable status")

John

>
>
>
> -Original Message-
> From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]
> Sent: zondag 12 november 2017 2:17
> Cc: ceph-users
> Subject: Re: [ceph-users] No ops on some OSD
>
> Still the same syntax (ceph osd status)
>
> Thanks
>
> Regards,
>
> I Gede Iswara Darmawan
>
> Information System - School of Industrial and System Engineering
>
> Telkom University
>
> P / SMS / WA : 081 322 070719
>
> E : iswaradr...@gmail.com / iswaradr...@live.com
>
>
> On Sat, Nov 4, 2017 at 6:11 PM, Marc Roos 
> wrote:
>
>
>
>
> What is the new syntax for "ceph osd status" for luminous?
>
>
>
>
>
> -Original Message-
> From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]
> Sent: donderdag 2 november 2017 6:19
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] No ops on some OSD
>
> Hello,
>
> I want to ask about my problem. There's some OSD that dont have any
> load
> (indicated with No ops on that OSD).
>
> Hereby I attached the ceph osd status result :
> https://pastebin.com/fFLcCbpk . Look at OSD 17,61 and 72. There's
> no
> load or operation happened at that OSD. How to fix this?
>
> Thank you
> Regards,
>
> I Gede Iswara Darmawan
>
> Information System - School of Industrial and System Engineering
>
> Telkom University
>
> P / SMS / WA : 081 322 070719
>
> E : iswaradr...@gmail.com / iswaradr...@live.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No ops on some OSD

2017-11-13 Thread Marc Roos

 
Indeed this what I have

[@c01 ceph]# ceph --version
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous 
(stable)

[@c01 ceph]# ceph tell osd.* version|head
osd.0: {
"version": "ceph version 12.2.1 
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)"
}
osd.1: {
"version": "ceph version 12.2.1 
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)"
}
osd.2: {
"version": "ceph version 12.2.1 
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)"
}
osd.3: {

[@c01 ceph]# rpm -qa | grep ceph | sort
ceph-12.2.1-0.el7.x86_64
ceph-base-12.2.1-0.el7.x86_64
ceph-common-12.2.1-0.el7.x86_64
ceph-mds-12.2.1-0.el7.x86_64
ceph-mgr-12.2.1-0.el7.x86_64
ceph-mon-12.2.1-0.el7.x86_64
ceph-osd-12.2.1-0.el7.x86_64
ceph-selinux-12.2.1-0.el7.x86_64
collectd-ceph-5.7.1-2.el7.x86_64
libcephfs2-12.2.1-0.el7.x86_64
nfs-ganesha-ceph-2.5.2-.el7.x86_64
python-cephfs-12.2.1-0.el7.x86_64



-Original Message-
From: Caspar Smit [mailto:caspars...@supernas.eu] 
Sent: maandag 13 november 2017 10:51
To: ceph-users
Subject: Re: [ceph-users] No ops on some OSD

Weird

# ceph --version
ceph version 12.2.1 (fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous 
(stable)

# ceph osd status
+++---+---++-++-+
| id |  host  |  used | avail | wr ops | wr data | rd ops | rd data |
+++---+---++-++-+
| 0  | node04 |  115M | 11.6G |0   | 0   |0   | 0   |
+++---+---++-++-+

ps. output is from a single host with 1 (virtual) OSD configured but the 
command works

Try to remove that dangerous and experimental features setting from your 
ceph.conf and see if that solves it.

Caspar

2017-11-12 15:56 GMT+01:00 Marc Roos :



[@c03 ~]# ceph osd status
2017-11-12 15:54:13.164823 7f478a6ad700 -1 WARNING: the following
dangerous and experimental features are enabled: bluestore
2017-11-12 15:54:13.211219 7f478a6ad700 -1 WARNING: the following
dangerous and experimental features are enabled: bluestore
no valid command found; 10 closest matches:
osd map   {}
osd lspools {}
osd count-metadata 
osd versions
osd find 
osd metadata {}
osd getmaxosd
osd ls-tree {} {}
osd getmap {}
osd getcrushmap {}
Error EINVAL: invalid command



-Original Message-
From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]

Sent: zondag 12 november 2017 2:17
Cc: ceph-users
Subject: Re: [ceph-users] No ops on some OSD

Still the same syntax (ceph osd status)

Thanks

Regards,

I Gede Iswara Darmawan

Information System - School of Industrial and System Engineering

Telkom University

P / SMS / WA : 081 322 070719

E : iswaradr...@gmail.com / iswaradr...@live.com


On Sat, Nov 4, 2017 at 6:11 PM, Marc Roos 

wrote:




What is the new syntax for "ceph osd status" for luminous?





-Original Message-
From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]
Sent: donderdag 2 november 2017 6:19
To: ceph-users@lists.ceph.com
Subject: [ceph-users] No ops on some OSD

Hello,

I want to ask about my problem. There's some OSD that dont 
have any
load
(indicated with No ops on that OSD).

Hereby I attached the ceph osd status result :
https://pastebin.com/fFLcCbpk . Look at OSD 17,61 and 72. 
There's
no
load or operation happened at that OSD. How to fix this?

Thank you
Regards,

I Gede Iswara Darmawan

Information System - School of Industrial and System 
Engineering

Telkom University

P / SMS / WA : 081 322 070719

E : iswaradr...@gmail.com / iswaradr...@live.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 

 >






[ceph-users] 答复: 答复: Where can I find the fix commit of #3370 ?

2017-11-13 Thread 周 威
Hi, Ilya

The kernel version is 3.10.106.
Part of dmesg related to ceph:
[7349718.004905] libceph: osd297 down
[7349718.005190] libceph: osd299 down
[7349785.671015] libceph: osd295 down
[7350006.357509] libceph: osd291 weight 0x0 (out)
[7350006.357795] libceph: osd292 weight 0x0 (out)
[7350006.358075] libceph: osd293 weight 0x0 (out)
[7350006.358356] libceph: osd294 weight 0x0 (out)
[7350013.312399] libceph: osd289 weight 0x0 (out)
[7350013.312683] libceph: osd290 weight 0x0 (out)
[7350013.312964] libceph: osd296 weight 0x0 (out)
[7350013.313244] libceph: osd298 weight 0x0 (out)
[7350023.322571] libceph: osd288 weight 0x0 (out)
[7350038.338217] libceph: osd297 weight 0x0 (out)
[7350038.338501] libceph: osd299 weight 0x0 (out)
[7350115.364496] libceph: osd295 weight 0x0 (out)
[7350179.683200] libceph: osd294 weight 0x1 (in)
[7350179.683495] libceph: osd294 up
[7350193.654197] libceph: osd293 weight 0x1 (in)
[7350193.654486] libceph: osd297 weight 0x1 (in)
[7350193.654769] libceph: osd293 up
[7350193.655046] libceph: osd297 up
[7350228.750112] libceph: osd299 weight 0x1 (in)
[7350228.750399] libceph: osd299 up
[7350255.739415] libceph: osd289 weight 0x1 (in)
[7350255.739700] libceph: osd289 up
[7350268.578031] libceph: osd288 weight 0x1 (in)
[7350268.578315] libceph: osd288 up
[7383411.866068] libceph: osd299 down
[7383558.405675] libceph: osd299 up
[7383411.866068] libceph: osd299 down
[7383558.405675] libceph: osd299 up
[7387106.574308] libceph: osd291 weight 0x1 (in)
[7387106.574593] libceph: osd291 up
[7387124.168198] libceph: osd296 weight 0x1 (in)
[7387124.168492] libceph: osd296 up
[7387131.732934] libceph: osd292 weight 0x1 (in)
[7387131.733218] libceph: osd292 up
[7387131.741277] libceph: osd290 weight 0x1 (in)
[7387131.741558] libceph: osd290 up
[7387149.788781] libceph: osd298 weight 0x1 (in)
[7387149.789066] libceph: osd298 up

A node of osds restart some days before.
And after evict session:
[7679890.147116] libceph: mds0 x.x.x.x:6800 socket closed (con state OPEN)
[7679890.491439] libceph: mds0 x.x.x.x:6800 connection reset
[7679890.491727] libceph: reset on mds0
[7679890.492006] ceph: mds0 closed our session
[7679890.492286] ceph: mds0 reconnect start
[7679910.479911] ceph: mds0 caps stale
[7679927.886621] ceph: mds0 reconnect denied

We have to restart the machine to recovery it.
I will send you an email if it happen again.

Thanks for your reply.

-邮件原件-
发件人: Ilya Dryomov [mailto:idryo...@gmail.com] 
发送时间: 2017年11月13日 17:30
收件人: 周 威 
抄送: ceph-users@lists.ceph.com
主题: Re: 答复: [ceph-users] Where can I find the fix commit of #3370 ?

On Mon, Nov 13, 2017 at 10:18 AM, 周 威  wrote:
> Hi, Ilya
>
> I'm using the kernel of centos 7, should be 3.10 I checked the patch, 
> and it is appears in my kernel source.
> We got the same stack of #3370, the process is hung in sleep_on_page_killable.
> The debugs/ceph/osdc shows there is a read request are waiting response, 
> while the command `ceph daemon osd.x ops` shows nothing.
> Evict the session from mds does not help.
> The version of ceph cluster is 10.2.9.

I don't think it's related to that ticket.

Which version of centos 7?  Can you provide dmesg?

Is it reproducible?  A debug ms = 1 log for that OSD would help with narrowing 
this down.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No ops on some OSD

2017-11-13 Thread Caspar Smit
Weird

# ceph --version
ceph version 12.2.1 (fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous
(stable)

# ceph osd status
+++---+---++-++-+
| id |  host  |  used | avail | wr ops | wr data | rd ops | rd data |
+++---+---++-++-+
| 0  | node04 |  115M | 11.6G |0   | 0   |0   | 0   |
+++---+---++-++-+

ps. output is from a single host with 1 (virtual) OSD configured but the
command works

Try to remove that dangerous and experimental features setting from your
ceph.conf and see if that solves it.

Caspar

2017-11-12 15:56 GMT+01:00 Marc Roos :

>
> [@c03 ~]# ceph osd status
> 2017-11-12 15:54:13.164823 7f478a6ad700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> 2017-11-12 15:54:13.211219 7f478a6ad700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> no valid command found; 10 closest matches:
> osd map   {}
> osd lspools {}
> osd count-metadata 
> osd versions
> osd find 
> osd metadata {}
> osd getmaxosd
> osd ls-tree {} {}
> osd getmap {}
> osd getcrushmap {}
> Error EINVAL: invalid command
>
>
>
> -Original Message-
> From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]
> Sent: zondag 12 november 2017 2:17
> Cc: ceph-users
> Subject: Re: [ceph-users] No ops on some OSD
>
> Still the same syntax (ceph osd status)
>
> Thanks
>
> Regards,
>
> I Gede Iswara Darmawan
>
> Information System - School of Industrial and System Engineering
>
> Telkom University
>
> P / SMS / WA : 081 322 070719
>
> E : iswaradr...@gmail.com / iswaradr...@live.com
>
>
> On Sat, Nov 4, 2017 at 6:11 PM, Marc Roos 
> wrote:
>
>
>
>
> What is the new syntax for "ceph osd status" for luminous?
>
>
>
>
>
> -Original Message-
> From: I Gede Iswara Darmawan [mailto:iswaradr...@gmail.com]
> Sent: donderdag 2 november 2017 6:19
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] No ops on some OSD
>
> Hello,
>
> I want to ask about my problem. There's some OSD that dont have any
> load
> (indicated with No ops on that OSD).
>
> Hereby I attached the ceph osd status result :
> https://pastebin.com/fFLcCbpk . Look at OSD 17,61 and 72. There's
> no
> load or operation happened at that OSD. How to fix this?
>
> Thank you
> Regards,
>
> I Gede Iswara Darmawan
>
> Information System - School of Industrial and System Engineering
>
> Telkom University
>
> P / SMS / WA : 081 322 070719
>
> E : iswaradr...@gmail.com / iswaradr...@live.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: Where can I find the fix commit of #3370 ?

2017-11-13 Thread Ilya Dryomov
On Mon, Nov 13, 2017 at 10:18 AM, 周 威  wrote:
> Hi, Ilya
>
> I'm using the kernel of centos 7, should be 3.10
> I checked the patch, and it is appears in my kernel source.
> We got the same stack of #3370, the process is hung in sleep_on_page_killable.
> The debugs/ceph/osdc shows there is a read request are waiting response, 
> while the command `ceph daemon osd.x ops` shows nothing.
> Evict the session from mds does not help.
> The version of ceph cluster is 10.2.9.

I don't think it's related to that ticket.

Which version of centos 7?  Can you provide dmesg?

Is it reproducible?  A debug ms = 1 log for that OSD would help with
narrowing this down.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: Where can I find the fix commit of #3370 ?

2017-11-13 Thread 周 威
Hi, Ilya

I'm using the kernel of centos 7, should be 3.10
I checked the patch, and it is appears in my kernel source.
We got the same stack of #3370, the process is hung in sleep_on_page_killable.
The debugs/ceph/osdc shows there is a read request are waiting response, while 
the command `ceph daemon osd.x ops` shows nothing.
Evict the session from mds does not help.
The version of ceph cluster is 10.2.9.

Thanks for reply.

-邮件原件-
发件人: Ilya Dryomov [mailto:idryo...@gmail.com] 
发送时间: 2017年11月13日 16:59
收件人: ? ? 
抄送: ceph-users@lists.ceph.com
主题: Re: [ceph-users] Where can I find the fix commit of #3370 ?

On Mon, Nov 13, 2017 at 7:45 AM, ? ?  wrote:
> I met the same issue as 
> https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftrack
> er.ceph.com%2Fissues%2F3370=02%7C01%7Cchoury%40msn.cn%7C04153e5ed
> c1c415fecad08d52a74c829%7C84df9e7fe9f640afb435%7C1%7C0%7C6
> 36461603408918629=XyQJ2UojB960pTv0T3UT8%2B7bBCvJEeL9Te7JCZyNQXM%
> 3D=0 ,
>
> But I can’t find the commit id of 
> 2978257c56935878f8a756c6cb169b569e99bb91 , Can someone help me?

I updated the ticket.  It's very old though, which kernel are you running?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

2017-11-13 Thread Marc Roos
 

I have been asking myself (and here) the same question. I think it is 
because of having this in ceph.conf
enable experimental unrecoverable data corrupting features = bluestore
But I am not sure if I can remove this, or have to replace this with 
something else.

ceph-12.2.1-0.el7.x86_64
ceph-base-12.2.1-0.el7.x86_64
ceph-common-12.2.1-0.el7.x86_64
ceph-mds-12.2.1-0.el7.x86_64
ceph-mgr-12.2.1-0.el7.x86_64
ceph-mon-12.2.1-0.el7.x86_64
ceph-osd-12.2.1-0.el7.x86_64
ceph-selinux-12.2.1-0.el7.x86_64
collectd-ceph-5.7.1-2.el7.x86_64
libcephfs2-12.2.1-0.el7.x86_64
nfs-ganesha-ceph-2.5.2-.el7.x86_64
python-cephfs-12.2.1-0.el7.x86_64




-Original Message-
From: Caspar Smit [mailto:caspars...@supernas.eu] 
Sent: maandag 13 november 2017 9:58
To: ceph-users
Subject: Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

Hi,

Why would Ceph 12.2.1 give you this message:

2017-11-10 20:39:31.296101 7f840ad45e40 -1 WARNING: the following 
dangerous and experimental features are enabled: bluestore



Or is that a leftover warning message from an old client?

Kind regards,
Caspar


2017-11-10 21:27 GMT+01:00 Marc Roos :



osd's are crashing when putting a (8GB) file in a erasure coded 
pool,
just before finishing. The same osd's are used for replicated pools
rbd/cephfs, and seem to do fine. Did I made some error is this a 
bug?
Looks similar to
https://www.spinics.net/lists/ceph-devel/msg38685.html 
 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/
021045.html 

 


[@c01 ~]# date ; rados -p ec21 put  $(basename
"/mnt/disk/blablablalbalblablalablalb.txt")
blablablalbalblablalablalb.txt
Fri Nov 10 20:27:26 CET 2017

[Fri Nov 10 20:33:51 2017] libceph: osd9 down
[Fri Nov 10 20:33:51 2017] libceph: osd9 down
[Fri Nov 10 20:33:51 2017] libceph: osd0 192.168.10.111:6802 socket
closed (con state OPEN)
[Fri Nov 10 20:33:51 2017] libceph: osd0 192.168.10.111:6802 socket
error on write
[Fri Nov 10 20:33:52 2017] libceph: osd0 down
[Fri Nov 10 20:33:52 2017] libceph: osd7 down
[Fri Nov 10 20:33:55 2017] libceph: osd0 down
[Fri Nov 10 20:33:55 2017] libceph: osd7 down
[Fri Nov 10 20:34:41 2017] libceph: osd7 up
[Fri Nov 10 20:34:41 2017] libceph: osd7 up
[Fri Nov 10 20:35:03 2017] libceph: osd9 up
[Fri Nov 10 20:35:03 2017] libceph: osd9 up
[Fri Nov 10 20:35:47 2017] libceph: osd0 up
[Fri Nov 10 20:35:47 2017] libceph: osd0 up

[@c02 ~]# rados -p ec21 stat blablablalbalblablalablalb.txt
2017-11-10 20:39:31.296101 7f840ad45e40 -1 WARNING: the following
dangerous and experimental features are enabled: bluestore
2017-11-10 20:39:31.296290 7f840ad45e40 -1 WARNING: the following
dangerous and experimental features are enabled: bluestore
2017-11-10 20:39:31.331588 7f840ad45e40 -1 WARNING: the following
dangerous and experimental features are enabled: bluestore
ec21/blablablalbalblablalablalb.txt mtime 2017-11-10 
20:32:52.00,
size 8585740288



2017-11-10 20:32:52.287503 7f933028d700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1510342372287484, "job": 32, "event": 
"flush_started",
"num_memtables": 1, "num_entries": 728747, "num_deletes": 363960,
"memory_usage": 263854696}
2017-11-10 20:32:52.287509 7f933028d700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILAB
LE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/rel
ease/
12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:293]
[default] [JOB 32] Level-0 flush table #25279: started
2017-11-10 20:32:52.503311 7f933028d700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1510342372503293, "cf_name": "default", "job": 32,
"event": "table_file_creation", "file_number": 25279, "file_size":
4811948, "table_properties": {"data_size": 4675796, "index_size":
102865, "filter_size": 32302, "raw_key_size": 646440,
"raw_average_key_size": 75, "raw_value_size": 4446103,
"raw_average_value_size": 519, "num_data_blocks": 1180, 
"num_entries":
8560, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
"kDeletedKeys": "0", "kMergeOperands": "330"}}
2017-11-10 20:32:52.503327 7f933028d700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILAB
LE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/rel
ease/
12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:319]
[default] [JOB 32] Level-0 flush 

[ceph-users] force scrubbing

2017-11-13 Thread Kenneth Waegeman

Hi all,


Is there a way to force scrub a pg of an erasure coded pool?

I tried  ceph pg deep-scrub 5.4c7, but after a week it still hasn't 
scrubbed the pg (last scrub timestamp not changed)


Thanks!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Where can I find the fix commit of #3370 ?

2017-11-13 Thread Ilya Dryomov
On Mon, Nov 13, 2017 at 7:45 AM, ? ?  wrote:
> I met the same issue as http://tracker.ceph.com/issues/3370 ,
>
> But I can’t find the commit id of 2978257c56935878f8a756c6cb169b569e99bb91 ,
> Can someone help me?

I updated the ticket.  It's very old though, which kernel are you
running?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

2017-11-13 Thread Caspar Smit
Hi,

Why would Ceph 12.2.1 give you this message:

2017-11-10 20:39:31.296101 7f840ad45e40 -1 WARNING: the following
dangerous and experimental features are enabled: bluestore

Or is that a leftover warning message from an old client?

Kind regards,
Caspar

2017-11-10 21:27 GMT+01:00 Marc Roos :

>
> osd's are crashing when putting a (8GB) file in a erasure coded pool,
> just before finishing. The same osd's are used for replicated pools
> rbd/cephfs, and seem to do fine. Did I made some error is this a bug?
> Looks similar to
> https://www.spinics.net/lists/ceph-devel/msg38685.html
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2017-September/021045.html
>
>
> [@c01 ~]# date ; rados -p ec21 put  $(basename
> "/mnt/disk/blablablalbalblablalablalb.txt")
> blablablalbalblablalablalb.txt
> Fri Nov 10 20:27:26 CET 2017
>
> [Fri Nov 10 20:33:51 2017] libceph: osd9 down
> [Fri Nov 10 20:33:51 2017] libceph: osd9 down
> [Fri Nov 10 20:33:51 2017] libceph: osd0 192.168.10.111:6802 socket
> closed (con state OPEN)
> [Fri Nov 10 20:33:51 2017] libceph: osd0 192.168.10.111:6802 socket
> error on write
> [Fri Nov 10 20:33:52 2017] libceph: osd0 down
> [Fri Nov 10 20:33:52 2017] libceph: osd7 down
> [Fri Nov 10 20:33:55 2017] libceph: osd0 down
> [Fri Nov 10 20:33:55 2017] libceph: osd7 down
> [Fri Nov 10 20:34:41 2017] libceph: osd7 up
> [Fri Nov 10 20:34:41 2017] libceph: osd7 up
> [Fri Nov 10 20:35:03 2017] libceph: osd9 up
> [Fri Nov 10 20:35:03 2017] libceph: osd9 up
> [Fri Nov 10 20:35:47 2017] libceph: osd0 up
> [Fri Nov 10 20:35:47 2017] libceph: osd0 up
>
> [@c02 ~]# rados -p ec21 stat blablablalbalblablalablalb.txt
> 2017-11-10 20:39:31.296101 7f840ad45e40 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> 2017-11-10 20:39:31.296290 7f840ad45e40 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> 2017-11-10 20:39:31.331588 7f840ad45e40 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> ec21/blablablalbalblablalablalb.txt mtime 2017-11-10 20:32:52.00,
> size 8585740288
>
>
>
> 2017-11-10 20:32:52.287503 7f933028d700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1510342372287484, "job": 32, "event": "flush_started",
> "num_memtables": 1, "num_entries": 728747, "num_deletes": 363960,
> "memory_usage": 263854696}
> 2017-11-10 20:32:52.287509 7f933028d700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:293]
> [default] [JOB 32] Level-0 flush table #25279: started
> 2017-11-10 20:32:52.503311 7f933028d700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1510342372503293, "cf_name": "default", "job": 32,
> "event": "table_file_creation", "file_number": 25279, "file_size":
> 4811948, "table_properties": {"data_size": 4675796, "index_size":
> 102865, "filter_size": 32302, "raw_key_size": 646440,
> "raw_average_key_size": 75, "raw_value_size": 4446103,
> "raw_average_value_size": 519, "num_data_blocks": 1180, "num_entries":
> 8560, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
> "kDeletedKeys": "0", "kMergeOperands": "330"}}
> 2017-11-10 20:32:52.503327 7f933028d700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:319]
> [default] [JOB 32] Level-0 flush table #25279: 4811948 bytes OK
> 2017-11-10 20:32:52.572413 7f933028d700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_files.cc:242]
> adding log 25276 to recycle list
>
> 2017-11-10 20:32:52.572422 7f933028d700  4 rocksdb: (Original Log Time
> 2017/11/10-20:32:52.503339)
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/memtable_list.cc:360]
> [default] Level-0 commit table #25279 started
> 2017-11-10 20:32:52.572425 7f933028d700  4 rocksdb: (Original Log Time
> 2017/11/10-20:32:52.572312)
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/memtable_list.cc:383]
> [default] Level-0 commit table #25279: memtable #1 done
> 2017-11-10 20:32:52.572428 7f933028d700  4 rocksdb: (Original Log Time
> 2017/11/10-20:32:52.572328) EVENT_LOG_v1 {"time_micros":
> 1510342372572321, "job": 32, "event": "flush_finished", "lsm_state": [4,
> 4, 36, 140, 0, 0, 0], "immutable_memtables": 0}
> 2017-11-10 20:32:52.572430 7f933028d700 

Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

2017-11-13 Thread Marc Roos
 
1. I don’t think an osd should 'crash' in such situation. 
2. How else should I 'rados put' an 8GB file?






-Original Message-
From: Christian Wuerdig [mailto:christian.wuer...@gmail.com] 
Sent: maandag 13 november 2017 0:12
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

As per: https://www.spinics.net/lists/ceph-devel/msg38686.html
Bluestore as a hard 4GB object size limit


On Sat, Nov 11, 2017 at 9:27 AM, Marc Roos  
wrote:
>
> osd's are crashing when putting a (8GB) file in a erasure coded pool, 
> just before finishing. The same osd's are used for replicated pools 
> rbd/cephfs, and seem to do fine. Did I made some error is this a bug?
> Looks similar to
> https://www.spinics.net/lists/ceph-devel/msg38685.html
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021
> 045.html
>
>
> [@c01 ~]# date ; rados -p ec21 put  $(basename
> "/mnt/disk/blablablalbalblablalablalb.txt")
> blablablalbalblablalablalb.txt
> Fri Nov 10 20:27:26 CET 2017
>
> [Fri Nov 10 20:33:51 2017] libceph: osd9 down [Fri Nov 10 20:33:51 
> 2017] libceph: osd9 down [Fri Nov 10 20:33:51 2017] libceph: osd0 
> 192.168.10.111:6802 socket closed (con state OPEN) [Fri Nov 10 
> 20:33:51 2017] libceph: osd0 192.168.10.111:6802 socket error on write 

> [Fri Nov 10 20:33:52 2017] libceph: osd0 down [Fri Nov 10 20:33:52 
> 2017] libceph: osd7 down [Fri Nov 10 20:33:55 2017] libceph: osd0 down 

> [Fri Nov 10 20:33:55 2017] libceph: osd7 down [Fri Nov 10 20:34:41 
> 2017] libceph: osd7 up [Fri Nov 10 20:34:41 2017] libceph: osd7 up 
> [Fri Nov 10 20:35:03 2017] libceph: osd9 up [Fri Nov 10 20:35:03 2017] 

> libceph: osd9 up [Fri Nov 10 20:35:47 2017] libceph: osd0 up [Fri Nov 
> 10 20:35:47 2017] libceph: osd0 up
>
> [@c02 ~]# rados -p ec21 stat blablablalbalblablalablalb.txt 2017-11-10 

> 20:39:31.296101 7f840ad45e40 -1 WARNING: the following dangerous and 
> experimental features are enabled: bluestore 2017-11-10 
> 20:39:31.296290 7f840ad45e40 -1 WARNING: the following dangerous and 
> experimental features are enabled: bluestore 2017-11-10 
> 20:39:31.331588 7f840ad45e40 -1 WARNING: the following dangerous and 
> experimental features are enabled: bluestore 
> ec21/blablablalbalblablalablalb.txt mtime 2017-11-10 20:32:52.00, 
> size 8585740288
>
>
>
> 2017-11-10 20:32:52.287503 7f933028d700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1510342372287484, "job": 32, "event": "flush_started",
> "num_memtables": 1, "num_entries": 728747, "num_deletes": 363960,
> "memory_usage": 263854696}
> 2017-11-10 20:32:52.287509 7f933028d700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
> AR 
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
> e/ 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:293]
> [default] [JOB 32] Level-0 flush table #25279: started 2017-11-10 
> 20:32:52.503311 7f933028d700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1510342372503293, "cf_name": "default", "job": 32,
> "event": "table_file_creation", "file_number": 25279, "file_size":
> 4811948, "table_properties": {"data_size": 4675796, "index_size":
> 102865, "filter_size": 32302, "raw_key_size": 646440,
> "raw_average_key_size": 75, "raw_value_size": 4446103,
> "raw_average_value_size": 519, "num_data_blocks": 1180, "num_entries":
> 8560, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
> "kDeletedKeys": "0", "kMergeOperands": "330"}} 2017-11-10 
> 20:32:52.503327 7f933028d700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
> AR 
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
> e/ 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/flush_job.cc:319]
> [default] [JOB 32] Level-0 flush table #25279: 4811948 bytes OK 
> 2017-11-10 20:32:52.572413 7f933028d700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
> AR 
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
> e/ 
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_files.cc:242]
> adding log 25276 to recycle list
>
> 2017-11-10 20:32:52.572422 7f933028d700  4 rocksdb: (Original Log Time
> 2017/11/10-20:32:52.503339)
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
> AR 
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
> e/ 
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/memtable_list.cc:360]
> [default] Level-0 commit table #25279 started 2017-11-10 
> 20:32:52.572425 7f933028d700  4 rocksdb: (Original Log Time
> 2017/11/10-20:32:52.572312)
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
> AR 
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/releas
> e/ 
> 12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/memtable_list.cc:383]
> [default] Level-0 commit table #25279: memtable #1 done 2017-11-10 
> 20:32:52.572428 7f933028d700  

Re: [ceph-users] Cluster hang (deep scrub bug? "waiting for scrub")

2017-11-13 Thread Matteo Dacrema
I’ve seen that only one time and noticed that there’s a bug fixed in 10.2.10 (  
http://tracker.ceph.com/issues/20041  ) 
Yes I use snapshots.

As I can see in my case the PG was scrubbing since 20 days but I’ve only 7 days 
logs so I’m not able to identify the affected PG.



> Il giorno 10 nov 2017, alle ore 14:05, Peter Maloney 
>  ha scritto:
> 
> I have often seen a problem where a single osd in an eternal deep scrup
> will hang any client trying to connect. Stopping or restarting that
> single OSD fixes the problem.
> 
> Do you use snapshots?
> 
> Here's what the scrub bug looks like (where that many seconds is 14 hours):
> 
>> ceph daemon "osd.$osd_number" dump_blocked_ops
> 
>>  {
>>  "description": "osd_op(client.6480719.0:2000419292 4.a27969ae
>> rbd_data.46820b238e1f29.aa70 [set-alloc-hint object_size
>> 524288 write_size 524288,write 0~4096] snapc 16ec0=[16ec0]
>> ack+ondisk+write+known_if_redirected e148441)",
>>  "initiated_at": "2017-09-12 20:04:27.987814",
>>  "age": 49315.666393,
>>  "duration": 49315.668515,
>>  "type_data": [
>>  "delayed",
>>  {
>>  "client": "client.6480719",
>>  "tid": 2000419292
>>  },
>>  [
>>  {
>>  "time": "2017-09-12 20:04:27.987814",
>>  "event": "initiated"
>>  },
>>  {
>>  "time": "2017-09-12 20:04:27.987862",
>>  "event": "queued_for_pg"
>>  },
>>  {
>>  "time": "2017-09-12 20:04:28.004142",
>>  "event": "reached_pg"
>>  },
>>  {
>>  "time": "2017-09-12 20:04:28.004219",
>>  "event": "waiting for scrub"
>>  }
>>  ]
>>  ]
>>  }
> 
> 
> 
> 
> 
> 
> On 11/09/17 17:20, Matteo Dacrema wrote:
>> Update:  I noticed that there was a pg that remained scrubbing from the 
>> first day I found the issue to when I reboot the node and problem 
>> disappeared.
>> Can this cause the behaviour I described before?
>> 
>> 
>>> Il giorno 09 nov 2017, alle ore 15:55, Matteo Dacrema  
>>> ha scritto:
>>> 
>>> Hi all,
>>> 
>>> I’ve experienced a strange issue with my cluster.
>>> The cluster is composed by 10 HDDs nodes with 20 nodes + 4 journal each 
>>> plus 4 SSDs nodes with 5 SSDs each.
>>> All the nodes are behind 3 monitors and 2 different crush maps.
>>> All the cluster is on 10.2.7 
>>> 
>>> About 20 days ago I started to notice that long backups hangs with "task 
>>> jbd2/vdc1-8:555 blocked for more than 120 seconds” on the HDD crush map.
>>> About few days ago another VM start to have high iowait without doing iops 
>>> also on the HDD crush map.
>>> 
>>> Today about a hundreds VMs wasn’t able to read/write from many volumes all 
>>> of them on HDD crush map. Ceph health was ok and no significant log entries 
>>> were found.
>>> Not all the VMs experienced this problem and in the meanwhile the iops on 
>>> the journal and HDDs was very low even if I was able to do significant iops 
>>> on the working VMs.
>>> 
>>> After two hours of debug I decided to reboot one of the OSD nodes and the 
>>> cluster start to respond again. Now the OSD node is back in the cluster and 
>>> the problem is disappeared.
>>> 
>>> Can someone help me to understand what happened?
>>> I see strange entries in the log files like:
>>> 
>>> accept replacing existing (lossy) channel (new one lossy=1)
>>> fault with nothing to send, going to standby
>>> leveldb manual compact 
>>> 
>>> I can share all the logs that can help to identify the issue.
>>> 
>>> Thank you.
>>> Regards,
>>> 
>>> Matteo
>>> 
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> --
>>> Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non 
>>> infetto.
>>> Seguire il link qui sotto per segnalarlo come spam: 
>>> http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=12EAC4481A.A6F60
>>> 
>>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> -- 
> 
> 
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.malo...@brockmann-consult.de
> Internet: http://www.brockmann-consult.de
> 
> 
> 
> --
> Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non 
> infetto.
> Seguire il link qui sotto per segnalarlo come spam: 
>