Re: [ceph-users] 答复: How's cephfs going?

2017-07-17 Thread Brady Deetz
We have a cephfs data pool with 52.8M files stored in 140.7M objects. That
translates to a metadata pool size of 34.6MB across 1.5M objects.

On Jul 18, 2017 12:54 AM, "Blair Bethwaite" 
wrote:

> We are a data-intensive university, with an increasingly large fleet
> of scientific instruments capturing various types of data (mostly
> imaging of one kind or another). That data typically needs to be
> stored, protected, managed, shared, connected/moved to specialised
> compute for analysis. Given the large variety of use-cases we are
> being somewhat more circumspect it our CephFS adoption and really only
> dipping toes in the water, ultimately hoping it will become a
> long-term default NAS choice from Luminous onwards.
>
> On 18 July 2017 at 15:21, Brady Deetz  wrote:
> > All of that said, you could also consider using rbd and zfs or whatever
> filesystem you like. That would allow you to gain the benefits of scaleout
> while still getting a feature rich fs. But, there are some down sides to
> that architecture too.
>
> We do this today (KVMs with a couple of large RBDs attached via
> librbd+QEMU/KVM), but the throughput able to be achieved this way is
> nothing like native CephFS - adding more RBDs doesn't seem to help
> increase overall throughput. Also, if you have NFS clients you will
> absolutely need SSD ZIL. And of course you then have a single point of
> failure and downtime for regular updates etc.
>
> In terms of small file performance I'm interested to hear about
> experiences with in-line file storage on the MDS.
>
> Also, while we're talking about CephFS - what size metadata pools are
> people seeing on their production systems with 10s-100s millions of
> files?
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: How's cephfs going?

2017-07-17 Thread Blair Bethwaite
We are a data-intensive university, with an increasingly large fleet
of scientific instruments capturing various types of data (mostly
imaging of one kind or another). That data typically needs to be
stored, protected, managed, shared, connected/moved to specialised
compute for analysis. Given the large variety of use-cases we are
being somewhat more circumspect it our CephFS adoption and really only
dipping toes in the water, ultimately hoping it will become a
long-term default NAS choice from Luminous onwards.

On 18 July 2017 at 15:21, Brady Deetz  wrote:
> All of that said, you could also consider using rbd and zfs or whatever 
> filesystem you like. That would allow you to gain the benefits of scaleout 
> while still getting a feature rich fs. But, there are some down sides to that 
> architecture too.

We do this today (KVMs with a couple of large RBDs attached via
librbd+QEMU/KVM), but the throughput able to be achieved this way is
nothing like native CephFS - adding more RBDs doesn't seem to help
increase overall throughput. Also, if you have NFS clients you will
absolutely need SSD ZIL. And of course you then have a single point of
failure and downtime for regular updates etc.

In terms of small file performance I'm interested to hear about
experiences with in-line file storage on the MDS.

Also, while we're talking about CephFS - what size metadata pools are
people seeing on their production systems with 10s-100s millions of
files?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: How's cephfs going?

2017-07-17 Thread Brady Deetz
No problem. We are a functional mri research institute. We have a fairly
mixed workload. But, I can tell you that we see 60+gbps of throughput when
multiple clients are reading sequencially on large files (1+GB) with 1-4MB
block sizes. IO involving small files and small block sizes are not very
good. Ssd would help a lot with small io, but our hardware architecture is
not designed for that and we don't care too much about throughput when a
person opens a spreadsheet.

One of the greatest benefits we've gained from CephFS that wasn't expected
to be as consequencial as it was is the xattrs. Specifically ceph.dir.* we
use this feature to track usage and it has dramatically reduced the number
of metadata operations we perform while trying to determine statistics
about a directory.

But, we very much miss the ability to perform nightly snapshots. I think
snapshots are supposed to be marked stable soon, but for now it is my
understanding that they are still not listed as stable. The xattrs have
indirectly facilitated this, but it isn't as convenient as a filesystem
snapshot.

All of that said, you could also consider using rbd and zfs or whatever
filesystem you like. That would allow you to gain the benefits of scaleout
while still getting a feature rich fs. But, there are some down sides to
that architecture too.

On Jul 17, 2017 10:21 PM, "许雪寒"  wrote:

Thanks, sir☺
You are really a lot of help☺

May I ask what kind of business are you using cephFS for? What's the io
pattern:-)

If answering this may involve any business secret, I really understand if
you don't answer:-)

Thanks again:-)

发件人: Brady Deetz [mailto:bde...@gmail.com]
发送时间: 2017年7月18日 8:01
收件人: 许雪寒
抄送: ceph-users
主题: Re: [ceph-users] How's cephfs going?

I feel that the correct answer to this question is: it depends.

I've been running a 1.75PB Jewel based cephfs cluster in production for
about a 2 years at Laureate Institute for Brain Research. Before that we
had a good 6-8 month planning and evaluation phase. I'm running with
active/standby dedicated mds servers, 3x dedicated mons, and 12 osd nodes
with 24 disks in each server. Every group of 12 disks have journals mapped
to 1x Intel P3700. Each osd node has dual 40gbps ethernet lagged with lacp.
In our evaluation we did find that the rumors are true. Your cpu choice
will influence performance.

Here's why my answer is "it depends." If you expect to get the same
complete feature set as you do with isilon, scale-io, gluster, or other
more established scaleout systems, it is not production ready. But, in
terms of stability, it is. Over the course of the past 2 years I've
triggered 1 mds bug that put my filesystem into read only mode. That bug
was patched in 8 hours thanks to this community. Also that bug was trigger
by a stupid mistake on my part that the application did not validate before
the action was performed.

If you have a couple of people with a strong background in Linux,
networking, and architecture, I'd say Ceph may be a good fit for you. If
not, maybe not.

On Jul 16, 2017 9:59 PM, "许雪寒"  wrote:
Hi, everyone.

We intend to use cephfs of Jewel version, however, we don’t know its
status. Is it production ready in Jewel? Does it still have lots of bugs?
Is it a major effort of the current ceph development? And who are using
cephfs now?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-17 Thread Anton Dmitriev
My cluster stores more than 1.5 billion objects in RGW, cephfs I dont 
use. Bucket index pool stored on separate SSD placement. But compaction 
occurs on all OSD, also on those, which doesn`t contain bucket indexes. 
After restarting 5 times every OSD nothing changed, each of them doing 
comapct again and again.


As an example omap dir size on one of OSDs, which doesnt contain bucket 
indexes.


root@storage01:/var/lib/ceph/osd/ceph-0/current/omap$ ls -l | wc -l
1455
root@storage01:/var/lib/ceph/osd/ceph-0/current/omap$ du -hd1
2,8G

Not so big at first look.

On 17.07.2017 22:03, Josh Durgin wrote:

Both of you are seeing leveldb perform compaction when the osd starts
up. This can take a while for large amounts of omap data (created by
things like cephfs directory metadata or rgw bucket indexes).

The 'leveldb_compact_on_mount' option wasn't changed in 10.2.9, but 
leveldb will compact automatically if there is enough work to do.


Does restarting an OSD affected by this with 10.2.9 again after it's
completed compaction still have these symptoms?

Josh

On 07/17/2017 05:57 AM, Lincoln Bryant wrote:

Hi Anton,

We observe something similar on our OSDs going from 10.2.7 to 10.2.9 
(see thread "some OSDs stuck down after 10.2.7 -> 10.2.9 update"). 
Some of our OSDs are not working at all on 10.2.9 or die with suicide 
timeouts. Those that come up/in take a very long time to boot up. 
Seems to not affect every OSD in our case though.


--Lincoln

On 7/17/2017 1:29 AM, Anton Dmitriev wrote:
During start it consumes ~90% CPU, strace shows, that OSD process 
doing something with LevelDB.

Compact is disabled:
r...@storage07.main01.ceph.apps.prod.int.grcc:~$ cat 
/etc/ceph/ceph.conf | grep compact

#leveldb_compact_on_mount = true

But with debug_leveldb=20 I see, that compaction is running, but why?

2017-07-17 09:27:37.394008 7f4ed2293700  1 leveldb: Compacting 1@1 + 
12@2 files
2017-07-17 09:27:37.593890 7f4ed2293700  1 leveldb: Generated table 
#76778: 277817 keys, 2125970 bytes
2017-07-17 09:27:37.718954 7f4ed2293700  1 leveldb: Generated table 
#76779: 221451 keys, 2124338 bytes
2017-07-17 09:27:37.777362 7f4ed2293700  1 leveldb: Generated table 
#76780: 63755 keys, 809913 bytes
2017-07-17 09:27:37.919094 7f4ed2293700  1 leveldb: Generated table 
#76781: 231475 keys, 2026376 bytes
2017-07-17 09:27:38.035906 7f4ed2293700  1 leveldb: Generated table 
#76782: 190956 keys, 1573332 bytes
2017-07-17 09:27:38.127597 7f4ed2293700  1 leveldb: Generated table 
#76783: 148675 keys, 1260956 bytes
2017-07-17 09:27:38.286183 7f4ed2293700  1 leveldb: Generated table 
#76784: 294105 keys, 2123438 bytes
2017-07-17 09:27:38.469562 7f4ed2293700  1 leveldb: Generated table 
#76785: 299617 keys, 2124267 bytes
2017-07-17 09:27:38.619666 7f4ed2293700  1 leveldb: Generated table 
#76786: 277305 keys, 2124936 bytes
2017-07-17 09:27:38.711423 7f4ed2293700  1 leveldb: Generated table 
#76787: 110536 keys, 951545 bytes
2017-07-17 09:27:38.869917 7f4ed2293700  1 leveldb: Generated table 
#76788: 296199 keys, 2123506 bytes
2017-07-17 09:27:39.028395 7f4ed2293700  1 leveldb: Generated table 
#76789: 248634 keys, 2096715 bytes
2017-07-17 09:27:39.028414 7f4ed2293700  1 leveldb: Compacted 1@1 + 
12@2 files => 21465292 bytes
2017-07-17 09:27:39.053288 7f4ed2293700  1 leveldb: compacted to: 
files[ 0 0 48 549 948 0 0 ]
2017-07-17 09:27:39.054014 7f4ed2293700  1 leveldb: Delete type=2 
#76741


Strace:

open("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", O_RDONLY) 
= 18
stat("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", 
{st_mode=S_IFREG|0644, st_size=2154394, ...}) = 0

mmap(NULL, 2154394, PROT_READ, MAP_SHARED, 18, 0) = 0x7f96a67a
close(18)   = 0
brk(0x55d15664) = 0x55d15664
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN R

Re: [ceph-users] updating the documentation

2017-07-17 Thread Dan Mick
On 07/12/2017 11:29 AM, Sage Weil wrote:
> We have a fair-sized list of documentation items to update for the 
> luminous release.  The other day when I starting looking through what is 
> there now, though, I was also immediately struck by how out of date much 
> of the content is.  In addition to addressing the immediate updates for 
> luminous, I think we also need a systematic review of the current docs 
> (including the information structure) and a coordinated effort to make 
> updates and revisions.
> 
> First question is, of course: is anyone is interested in helping 
> coordinate this effort?
> 
> In the meantime, we can also avoid making the problem worse by requiring 
> that all pull requests include any relevant documentation updates.  This 
> means (1) helping educate contributors that doc updates are needed, (2) 
> helping maintainers and reviewers remember that doc updates are part of 
> the merge criteria (it will likely take a bit of time before this is 
> second nature), and (3) generally inducing developers to become aware of 
> the documentation that exists so that they know what needs to be updated 
> when they make a change.

As a reminder, there is a 'needs-doc' tag.  I don't think it's seen much
use and I'm aware of its problems.  Nothing beats discipline.


-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] XFS attempt to access beyond end of device

2017-07-17 Thread Blair Bethwaite
Brilliant, thanks Marcus. We have just (noticed we've) hit this too
and looks like your script will fix this (will test and report
back...).

On 18 July 2017 at 14:08, Marcus Furlong  wrote:
> [ 92.938882] XFS (sdi1): Mounting V5 Filesystem
> [ 93.065393] XFS (sdi1): Ending clean mount
> [ 93.175299] attempt to access beyond end of device
> [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767
>
> This shows that the error occurs when trying to access sector 1913441278 of
> Partition 1, which we can see from the above, doesn't exist.


I think you mean 19134412768.


-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] XFS attempt to access beyond end of device

2017-07-17 Thread Marcus Furlong
On 22 March 2017 at 05:51, Dan van der Ster  wrote:
> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong 
wrote:
>> Hi,
>>
>> I'm experiencing the same issue as outlined in this post:
>>
>>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html
>>
>> I have also deployed this jewel cluster using ceph-deploy.
>>
>> This is the message I see at boot (happens for all drives, on all OSD
nodes):
>>
>> [ 92.938882] XFS (sdi1): Mounting V5 Filesystem
>> [ 93.065393] XFS (sdi1): Ending clean mount
>> [ 93.175299] attempt to access beyond end of device
>> [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767
>>
>> and again while the cluster is in operation:
>>
>> [429280.254400] attempt to access beyond end of device
>> [429280.254412] sdi1: rw=0, want=19134412768, limit=19134412767
>>
>
> We see these as well, and I'm also curious what's causing it. Perhaps
> sgdisk is doing something wrong when creating the ceph-data partition?

Apologies for reviving an old thread, but I figured out what happened and
never documented it, so I thought an update might be useful.

The disk layout I've ascertained is as follows:

sector 0 = protective MBR (or empty)
sectors 1 to 33 = GPT (33 sectors)
sectors 34 to 2047 = free (as confirmed by sgdisk -f -E)
sectors 2048 to 19134414814 (19134412767 sectors: Data Partition 1)
sectors 19134414815 to 19134414847 (33 sectors: GPT backup data)

And the error:

[ 92.938882] XFS (sdi1): Mounting V5 Filesystem
[ 93.065393] XFS (sdi1): Ending clean mount
[ 93.175299] attempt to access beyond end of device
[ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767

This shows that the error occurs when trying to access sector 1913441278 of
Partition 1, which we can see from the above, doesn't exist.

I noticed that the file system size is 3.5KiB less than the size of the
partition, and the XFS block size is 4KiB.

EMDS = 19134412767 * 512 = 9796819336704 <- actual partition size
CDS = 9567206383 * 1024 = 9796819336192 (512 bytes less than EMDS) <- oddly
/proc/partitions reports 512 bytes less, because it's using 1024 bytes as
the unit
FSS = 2391801595 * 4096 = 9796819333120 (3072 bytes less than CDS) <-
filesystem

It turns out, if I create a partition that matches the block size of the
XFS filesystem, then the error does not occur. i.e. no error when the
filesystem starts _and_ ends on a partition boundary.

When this happens, e.g. as follows, then there is no issue. This partition
is 7 sectors smaller than the one referenced above.

# sgdisk --new=0:2048:19134414807 -- /dev/sdi
Creating new GPT entries.
The operation has completed successfully.

# sgdisk -p /dev/sdi
Disk /dev/sdf: 19134414848 sectors, 8.9 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 3E61A8BA-838A-4D7E-BB8E-293972EB45AE
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 19134414814
Partitions will be aligned on 2048-sector boundaries
Total free space is 2021 sectors (1010.5 KiB)

When the end of the partition is not aligned to the 4KiB blocks used by
XFS, the error occurs. This explains why the defaults from parted work
correctly, as the 1MiB "padding" is 4K-aligned.

This non-alignment happens because ceph-deploy uses sgdisk, and sgdisk
seems to align the start of the partition with 2048-sector boundaries, but
_not_ the end of the partition, when used with the -L parameter.

The fix was to recreate the partition table, and reduce the unused sectors
down to the max filesystem size:

https://gist.github.com/furlongm/292aefa930f40dc03f21693d1fc19f35

In my testing, I could only reproduce this with XFS, not with other
filesystems. It can be reproduced on smaller XFS filesystems but seems to
take more time.

Cheers,
Marcus.
-- 
Marcus Furlong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: How's cephfs going?

2017-07-17 Thread 许雪寒
Thanks, sir☺ 
You are really a lot of help☺

May I ask what kind of business are you using cephFS for? What's the io 
pattern:-)

If answering this may involve any business secret, I really understand if you 
don't answer:-)

Thanks again:-)

发件人: Brady Deetz [mailto:bde...@gmail.com] 
发送时间: 2017年7月18日 8:01
收件人: 许雪寒
抄送: ceph-users
主题: Re: [ceph-users] How's cephfs going?

I feel that the correct answer to this question is: it depends. 

I've been running a 1.75PB Jewel based cephfs cluster in production for about a 
2 years at Laureate Institute for Brain Research. Before that we had a good 6-8 
month planning and evaluation phase. I'm running with active/standby dedicated 
mds servers, 3x dedicated mons, and 12 osd nodes with 24 disks in each server. 
Every group of 12 disks have journals mapped to 1x Intel P3700. Each osd node 
has dual 40gbps ethernet lagged with lacp. In our evaluation we did find that 
the rumors are true. Your cpu choice will influence performance. 

Here's why my answer is "it depends." If you expect to get the same complete 
feature set as you do with isilon, scale-io, gluster, or other more established 
scaleout systems, it is not production ready. But, in terms of stability, it 
is. Over the course of the past 2 years I've triggered 1 mds bug that put my 
filesystem into read only mode. That bug was patched in 8 hours thanks to this 
community. Also that bug was trigger by a stupid mistake on my part that the 
application did not validate before the action was performed. 

If you have a couple of people with a strong background in Linux, networking, 
and architecture, I'd say Ceph may be a good fit for you. If not, maybe not. 

On Jul 16, 2017 9:59 PM, "许雪寒"  wrote:
Hi, everyone.
 
We intend to use cephfs of Jewel version, however, we don’t know its status. Is 
it production ready in Jewel? Does it still have lots of bugs? Is it a major 
effort of the current ceph development? And who are using cephfs now?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: How's cephfs going?

2017-07-17 Thread 许雪寒
Hi, thanks for the advice:-)

By the way, may I ask what kind of business you are using cephFS for? What's 
the IO pattern of that business? And which version of ceph are you using? If 
this involves any business secret, it's really understandable not to answer:-)

Thanks again for the help:-)

-邮件原件-
发件人: Deepak Naidu [mailto:dna...@nvidia.com] 
发送时间: 2017年7月18日 6:59
收件人: Blair Bethwaite; 许雪寒
抄送: ceph-users@lists.ceph.com
主题: RE: [ceph-users] How's cephfs going?

Based on my experience, it's really stable and yes is production ready. Most of 
the use case for cephFS depends on what your trying to achieve. Few feedbacks.

1) Kernel client is nice/stable and can achieve higher bandwidth if you have 
40G or higher network.
2) ceph-fuse is very slow, as the writes are cached on your client RAM, 
regardless of direct IO.
3) Look out for blue store for long term. It stands true for CEPH not in 
particular to ceph FS only.
4) If you want per folder based namespace(in lack of words) you need to ensure 
your running latest kernel or backport the fixes to your running kernel.
5) Higher IO blocks will provide faster throughput. It would not be great of 
smaller IO blocks.
6) Use SSD for ceph FS Metadata pool(it really helps), this is based on my 
experience, folks can debate. I guess ebay has some writeup where they didn’t 
see any advantage on using SSD.
7) Lookup below experimental features
http://docs.ceph.com/docs/master/cephfs/experimental-features/?highlight=experimental

--
Deepak


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Blair 
Bethwaite
Sent: Sunday, July 16, 2017 8:14 PM
To: 许雪寒
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How's cephfs going?

It works and can reasonably be called "production ready". However in Jewel 
there are still some features (e.g. directory sharding, multi active MDS, and 
some security constraints) that may limit widespread usage. Also note that 
userspace client support in e.g. nfs-ganesha and samba is a mixed bag across 
distros and you may find yourself having to resort to re-exporting ceph-fuse or 
kernel mounts in order to provide those gateway services. We haven't tried 
Luminous CephFS yet as still waiting for the first full (non-RC) release to 
drop, but things seem very positive there...

On 17 July 2017 at 12:59, 许雪寒  wrote:
> Hi, everyone.
>
>
>
> We intend to use cephfs of Jewel version, however, we don’t know its status.
> Is it production ready in Jewel? Does it still have lots of bugs? Is 
> it a major effort of the current ceph development? And who are using cephfs 
> now?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

---
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Systemd dependency cycle in Luminous

2017-07-17 Thread Michael Andersen
Thanks for pointing me towards that! You saved me a lot of stress

On Jul 17, 2017 4:39 PM, "Tim Serong"  wrote:

> On 07/17/2017 11:22 AM, Michael Andersen wrote:
> > Hi all
> >
> > I recently upgraded two separate ceph clusters from Jewel to Luminous.
> > (OS is Ubuntu xenial) Everything went smoothly except on one of the
> > monitors in each cluster I had a problem shutting down/starting up. It
> > seems the systemd dependencies are messed up. I get:
> >
> > systemd[1]: ceph-osd.target: Found ordering cycle on
> ceph-osd.target/start
> > systemd[1]: ceph-osd.target: Found dependency on ceph-osd@16.service
> /start
> > systemd[1]: ceph-osd.target: Found dependency on ceph-mon.target/start
> > systemd[1]: ceph-osd.target: Found dependency on ceph.target/start
> > systemd[1]: ceph-osd.target: Found dependency on ceph-osd.target/start
> >
> > Has anyone seen this? I ignored the first time this happened (and fixed
> > it by uninstalling, purging and reinstalling ceph on that one node) but
> > now it has happened while upgrading a completely different cluster and
> > this one would be quite a pain to uninstall/reinstall ceph on. Any ideas?
>
> I hit the same thing on SUSE Linux, but it should be fixed now, by
> https://github.com/ceph/ceph/pull/15835/commits/357dfa5954.  This went
> into the Luminous branch on July 3, so if your Luminous build is older
> than that, you won't have this fix yet.  See the above commit message
> for the full description, but TL;DR: having a MONs colocated with OSDs
> will sometimes (but not every time) confuse systemd, due to the various
> target files specifying dependencies between each other, without
> specifying explicit ordering.
>
> Regards,
>
> Tim
> --
> Tim Serong
> Senior Clustering Engineer
> SUSE
> tser...@suse.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How's cephfs going?

2017-07-17 Thread Brady Deetz
I feel that the correct answer to this question is: it depends.

I've been running a 1.75PB Jewel based cephfs cluster in production for
about a 2 years at Laureate Institute for Brain Research. Before that we
had a good 6-8 month planning and evaluation phase. I'm running with
active/standby dedicated mds servers, 3x dedicated mons, and 12 osd nodes
with 24 disks in each server. Every group of 12 disks have journals mapped
to 1x Intel P3700. Each osd node has dual 40gbps ethernet lagged with lacp.
In our evaluation we did find that the rumors are true. Your cpu choice
will influence performance.

Here's why my answer is "it depends." If you expect to get the same
complete feature set as you do with isilon, scale-io, gluster, or other
more established scaleout systems, it is not production ready. But, in
terms of stability, it is. Over the course of the past 2 years I've
triggered 1 mds bug that put my filesystem into read only mode. That bug
was patched in 8 hours thanks to this community. Also that bug was trigger
by a stupid mistake on my part that the application did not validate before
the action was performed.

If you have a couple of people with a strong background in Linux,
networking, and architecture, I'd say Ceph may be a good fit for you. If
not, maybe not.

On Jul 16, 2017 9:59 PM, "许雪寒"  wrote:

> Hi, everyone.
>
>
>
> We intend to use cephfs of Jewel version, however, we don’t know its
> status. Is it production ready in Jewel? Does it still have lots of bugs?
> Is it a major effort of the current ceph development? And who are using
> cephfs now?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Systemd dependency cycle in Luminous

2017-07-17 Thread Tim Serong
On 07/17/2017 11:22 AM, Michael Andersen wrote:
> Hi all
> 
> I recently upgraded two separate ceph clusters from Jewel to Luminous.
> (OS is Ubuntu xenial) Everything went smoothly except on one of the
> monitors in each cluster I had a problem shutting down/starting up. It
> seems the systemd dependencies are messed up. I get:
> 
> systemd[1]: ceph-osd.target: Found ordering cycle on ceph-osd.target/start
> systemd[1]: ceph-osd.target: Found dependency on ceph-osd@16.service/start
> systemd[1]: ceph-osd.target: Found dependency on ceph-mon.target/start
> systemd[1]: ceph-osd.target: Found dependency on ceph.target/start
> systemd[1]: ceph-osd.target: Found dependency on ceph-osd.target/start
> 
> Has anyone seen this? I ignored the first time this happened (and fixed
> it by uninstalling, purging and reinstalling ceph on that one node) but
> now it has happened while upgrading a completely different cluster and
> this one would be quite a pain to uninstall/reinstall ceph on. Any ideas?

I hit the same thing on SUSE Linux, but it should be fixed now, by
https://github.com/ceph/ceph/pull/15835/commits/357dfa5954.  This went
into the Luminous branch on July 3, so if your Luminous build is older
than that, you won't have this fix yet.  See the above commit message
for the full description, but TL;DR: having a MONs colocated with OSDs
will sometimes (but not every time) confuse systemd, due to the various
target files specifying dependencies between each other, without
specifying explicit ordering.

Regards,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tser...@suse.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread David Turner
What are your pool settings? That can affect your read/write speeds as much
as anything in the ceph.conf file.

On Mon, Jul 17, 2017, 4:55 PM Gencer Genç  wrote:

> I don't think so.
>
> Because I tried one thing a few minutes ago. I opened 4 ssh channel and
> run rsync command and copy bigfile to different targets in cephfs at the
> same time. Then i looked into network graphs and i see numbers up to
> 1.09 gb/s. But why single copy/rsync cannot exceed 200mb/s? What
> prevents it im really wonder this.
>
> Gencer.
>
>
> -Original Message-
> From: Patrick Donnelly [mailto:pdonn...@redhat.com]
> Sent: 17 Temmuz 2017 Pazartesi 23:21
> To: gen...@gencgiyen.com
> Cc: Ceph Users 
> Subject: Re: [ceph-users] Yet another performance tuning for CephFS
>
> On Mon, Jul 17, 2017 at 1:08 PM,   wrote:
> > But lets try another. Lets say i have a file in my server which is 5GB.
> If i
> > do this:
> >
> > $ rsync ./bigfile /mnt/cephfs/targetfile --progress
> >
> > Then i see max. 200 mb/s. I think it is still slow :/ Is this an
> expected?
>
> Perhaps that is the bandwidth limit of your local device rsync is reading
> from?
>
> --
> Patrick Donnelly
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bucket policies in Luminous

2017-07-17 Thread Graham Allan
Thanks for the update. I saw there was a set of new 12.1.1 packages 
today so I updated to these (appears to contain the update below), 
rather than build my own radosgw.


I'm not sure what changed; I don't get a crash now but I don't seem able 
to set any policy now.


my sample policy:

% cat s3policy
{
  "Version": "2012-10-17",
  "Statement": [
{
  "Effect": "Allow",
  "Principal": {"AWS": ["arn:aws:iam:::user/gta2"]},
  "Action": "s3:ListBucket",
  "Resource": ["arn:aws:s3:::gta/*"]
}
  ]
}

but...

% s3cmd setpolicy s3policy s3://gta
ERROR: S3 error: 400 (InvalidArgument)

I have "debug rgw = 20" but nothing revealing in the logs.

Do you see anything obviously wrong in my policy file?

Thanks,

Graham

On 07/12/2017 11:27 PM, Pritha Srivastava wrote:


- Original Message -

From: "Adam C. Emerson" 
To: "Graham Allan" 
Cc: "Ceph Users" 
Sent: Thursday, July 13, 2017 1:23:27 AM
Subject: Re: [ceph-users] Bucket policies in Luminous

Graham Allan Wrote:

I thought I'd try out the new bucket policy support in Luminous. My goal
was simply to permit access on a bucket to another user.

[snip]

Thanks for any ideas,


It's probably the 'blank' tenant. I'll make up a test case to exercise
this and come up with a patch for it. Sorry about the trouble.



The fix in this PR: https://github.com/ceph/ceph/pull/15997 should help.

Thanks,
Pritha


--
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How's cephfs going?

2017-07-17 Thread Deepak Naidu
Based on my experience, it's really stable and yes is production ready. Most of 
the use case for cephFS depends on what your trying to achieve. Few feedbacks.

1) Kernel client is nice/stable and can achieve higher bandwidth if you have 
40G or higher network.
2) ceph-fuse is very slow, as the writes are cached on your client RAM, 
regardless of direct IO.
3) Look out for blue store for long term. It stands true for CEPH not in 
particular to ceph FS only.
4) If you want per folder based namespace(in lack of words) you need to ensure 
your running latest kernel or backport the fixes to your running kernel.
5) Higher IO blocks will provide faster throughput. It would not be great of 
smaller IO blocks.
6) Use SSD for ceph FS Metadata pool(it really helps), this is based on my 
experience, folks can debate. I guess ebay has some writeup where they didn’t 
see any advantage on using SSD.
7) Lookup below experimental features
http://docs.ceph.com/docs/master/cephfs/experimental-features/?highlight=experimental

--
Deepak


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Blair 
Bethwaite
Sent: Sunday, July 16, 2017 8:14 PM
To: 许雪寒
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How's cephfs going?

It works and can reasonably be called "production ready". However in Jewel 
there are still some features (e.g. directory sharding, multi active MDS, and 
some security constraints) that may limit widespread usage. Also note that 
userspace client support in e.g. nfs-ganesha and samba is a mixed bag across 
distros and you may find yourself having to resort to re-exporting ceph-fuse or 
kernel mounts in order to provide those gateway services. We haven't tried 
Luminous CephFS yet as still waiting for the first full (non-RC) release to 
drop, but things seem very positive there...

On 17 July 2017 at 12:59, 许雪寒  wrote:
> Hi, everyone.
>
>
>
> We intend to use cephfs of Jewel version, however, we don’t know its status.
> Is it production ready in Jewel? Does it still have lots of bugs? Is 
> it a major effort of the current ceph development? And who are using cephfs 
> now?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread Gencer Genç
I don't think so.

Because I tried one thing a few minutes ago. I opened 4 ssh channel and 
run rsync command and copy bigfile to different targets in cephfs at the 
same time. Then i looked into network graphs and i see numbers up to 
1.09 gb/s. But why single copy/rsync cannot exceed 200mb/s? What 
prevents it im really wonder this.

Gencer.


-Original Message-
From: Patrick Donnelly [mailto:pdonn...@redhat.com] 
Sent: 17 Temmuz 2017 Pazartesi 23:21
To: gen...@gencgiyen.com
Cc: Ceph Users 
Subject: Re: [ceph-users] Yet another performance tuning for CephFS

On Mon, Jul 17, 2017 at 1:08 PM,   wrote:
> But lets try another. Lets say i have a file in my server which is 5GB. If i
> do this:
>
> $ rsync ./bigfile /mnt/cephfs/targetfile --progress
>
> Then i see max. 200 mb/s. I think it is still slow :/ Is this an expected?

Perhaps that is the bandwidth limit of your local device rsync is reading from?

-- 
Patrick Donnelly

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread gencer

I have a seperate 10GbE network for ceph and another for public.

No they are not NVMe, unfortunately.

Do you know any test command that i can try to see if this is the max. 
Read speed from rsync?


Because I tried one thing a few minutes ago. I opened 4 ssh channel and 
run rsync command and copy bigfile to different targets in cephfs at the 
same time. Then i looked into network graphs and i see numbers up to 
1.09 gb/s. But why single copy/rsync cannot exceed 200mb/s? What 
prevents it im really wonder this.


Gencer.

On 2017-07-17 23:24, Peter Maloney wrote:

You should have a separate public and cluster network. And journal or
wal/db performance is important... are the devices fast NVMe?

On 07/17/17 21:31, gen...@gencgiyen.com wrote:


Hi,

I located and applied almost every different tuning setting/config
over the internet. I couldn’t manage to speed up my speed one byte
further. It is always same speed whatever I do.

I was on jewel, now I tried BlueStore on Luminous. Still exact same
speed I gain from cephfs.

It doesn’t matter if I disable debug log, or remove [osd] section
as below and re-add as below (see .conf). Results are exactly the
same. Not a single byte is gained from those tunings. I also did
tuning for kernel (sysctl.conf).

Basics:

I have 2 nodes with 10 OSD each and each OSD is 3TB SATA drive. Each
node has 24 cores and 64GB of RAM. Ceph nodes are connected via
10GbE NIC. No FUSE used. But tried that too. Same results.

$ dd if=/dev/zero of=/mnt/c/testfile bs=100M count=10 oflag=direct

10+0 records in

10+0 records out

1048576000 bytes (1.0 GB, 1000 MiB) copied, 5.77219 s, 182 MB/s

182MB/s. This is the best speed i get so far. Usually 170~MB/s. Hm..
I get much much much higher speeds on different filesystems. Even
with glusterfs. Is there anything I can do or try?

Read speed is also around 180-220MB/s but not higher.

This is What I am using on ceph.conf:

[global]

fsid = d7163667-f8c5-466b-88df-8747b26c91df

mon_initial_members = server1

mon_host = 192.168.0.1

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier

osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier


osd_mkfs_type = xfs

osd pool default size = 2

enable experimental unrecoverable data corrupting features =
bluestore rocksdb

bluestore fsck on mount = true

rbd readahead disable after bytes = 0

rbd readahead max bytes = 4194304

log to syslog = false

debug_lockdep = 0/0

debug_context = 0/0

debug_crush = 0/0

debug_buffer = 0/0

debug_timer = 0/0

debug_filer = 0/0

debug_objecter = 0/0

debug_rados = 0/0

debug_rbd = 0/0

debug_journaler = 0/0

debug_objectcatcher = 0/0

debug_client = 0/0

debug_osd = 0/0

debug_optracker = 0/0

debug_objclass = 0/0

debug_filestore = 0/0

debug_journal = 0/0

debug_ms = 0/0

debug_monc = 0/0

debug_tp = 0/0

debug_auth = 0/0

debug_finisher = 0/0

debug_heartbeatmap = 0/0

debug_perfcounter = 0/0

debug_asok = 0/0

debug_throttle = 0/0

debug_mon = 0/0

debug_paxos = 0/0

debug_rgw = 0/0

[osd]

osd max write size = 512

osd client message size cap = 2147483648

osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier


filestore xattr use omap = true

osd_op_threads = 8

osd disk threads = 4

osd map cache size = 1024

filestore_queue_max_ops = 25000

filestore_queue_max_bytes = 10485760

filestore_queue_committing_max_ops = 5000

filestore_queue_committing_max_bytes = 1048576

journal_max_write_entries = 1000

journal_queue_max_ops = 3000

journal_max_write_bytes = 1048576000

journal_queue_max_bytes = 1048576000

filestore_max_sync_interval = 15

filestore_merge_threshold = 20

filestore_split_multiple = 2

osd_enable_op_tracker = false

filestore_wbthrottle_enable = false

osd_client_message_size_cap = 0

osd_client_message_cap = 0

filestore_fd_cache_size = 64

filestore_fd_cache_shards = 32

filestore_op_threads = 12

As I stated above, it doesn’t matter if I have this [osd] section
or not. Results are same.

I am open to all suggestions.

Thanks,

Gencer.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread Peter Maloney
You should have a separate public and cluster network. And journal or
wal/db performance is important... are the devices fast NVMe?


On 07/17/17 21:31, gen...@gencgiyen.com wrote:
>
> Hi,
>
>  
>
> I located and applied almost every different tuning setting/config
> over the internet. I couldn’t manage to speed up my speed one byte
> further. It is always same speed whatever I do.
>
>  
>
> I was on jewel, now I tried BlueStore on Luminous. Still exact same
> speed I gain from cephfs.
>
>  
>
> It doesn’t matter if I disable debug log, or remove [osd] section as
> below and re-add as below (see .conf). Results are exactly the same.
> Not a single byte is gained from those tunings. I also did tuning for
> kernel (sysctl.conf).
>
>  
>
> Basics:
>
>  
>
> I have 2 nodes with 10 OSD each and each OSD is 3TB SATA drive. Each
> node has 24 cores and 64GB of RAM. Ceph nodes are connected via 10GbE
> NIC. No FUSE used. But tried that too. Same results.
>
>  
>
> $ dd if=/dev/zero of=/mnt/c/testfile bs=100M count=10 oflag=direct
>
> 10+0 records in
>
> 10+0 records out
>
> 1048576000 bytes (1.0 GB, 1000 MiB) copied, 5.77219 s, 182 MB/s
>
>  
>
> 182MB/s. This is the best speed i get so far. Usually 170~MB/s. Hm.. I
> get much much much higher speeds on different filesystems. Even with
> glusterfs. Is there anything I can do or try?
>
>  
>
> Read speed is also around 180-220MB/s but not higher.
>
>  
>
> This is What I am using on ceph.conf:
>
>  
>
> [global]
>
> fsid = d7163667-f8c5-466b-88df-8747b26c91df
>
> mon_initial_members = server1
>
> mon_host = 192.168.0.1
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
>  
>
> osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier
>
> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
>
> osd_mkfs_type = xfs
>
>  
>
> osd pool default size = 2
>
> enable experimental unrecoverable data corrupting features = bluestore
> rocksdb
>
> bluestore fsck on mount = true
>
> rbd readahead disable after bytes = 0
>
> rbd readahead max bytes = 4194304
>
>  
>
> log to syslog = false
>
> debug_lockdep = 0/0
>
> debug_context = 0/0
>
> debug_crush = 0/0
>
> debug_buffer = 0/0
>
> debug_timer = 0/0
>
> debug_filer = 0/0
>
> debug_objecter = 0/0
>
> debug_rados = 0/0
>
> debug_rbd = 0/0
>
> debug_journaler = 0/0
>
> debug_objectcatcher = 0/0
>
> debug_client = 0/0
>
> debug_osd = 0/0
>
> debug_optracker = 0/0
>
> debug_objclass = 0/0
>
> debug_filestore = 0/0
>
> debug_journal = 0/0
>
> debug_ms = 0/0
>
> debug_monc = 0/0
>
> debug_tp = 0/0
>
> debug_auth = 0/0
>
> debug_finisher = 0/0
>
> debug_heartbeatmap = 0/0
>
> debug_perfcounter = 0/0
>
> debug_asok = 0/0
>
> debug_throttle = 0/0
>
> debug_mon = 0/0
>
> debug_paxos = 0/0
>
> debug_rgw = 0/0
>
>  
>
>  
>
> [osd]
>
> osd max write size = 512
>
> osd client message size cap = 2147483648
>
> osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier
>
> filestore xattr use omap = true
>
> osd_op_threads = 8
>
> osd disk threads = 4
>
> osd map cache size = 1024
>
> filestore_queue_max_ops = 25000
>
> filestore_queue_max_bytes = 10485760
>
> filestore_queue_committing_max_ops = 5000
>
> filestore_queue_committing_max_bytes = 1048576
>
> journal_max_write_entries = 1000
>
> journal_queue_max_ops = 3000
>
> journal_max_write_bytes = 1048576000
>
> journal_queue_max_bytes = 1048576000
>
> filestore_max_sync_interval = 15
>
> filestore_merge_threshold = 20
>
> filestore_split_multiple = 2
>
> osd_enable_op_tracker = false
>
> filestore_wbthrottle_enable = false
>
> osd_client_message_size_cap = 0
>
> osd_client_message_cap = 0
>
> filestore_fd_cache_size = 64
>
> filestore_fd_cache_shards = 32
>
> filestore_op_threads = 12
>
>  
>
>  
>
> As I stated above, it doesn’t matter if I have this [osd] section or
> not. Results are same.
>
>  
>
> I am open to all suggestions.
>
>  
>
> Thanks,
>
> Gencer.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread Patrick Donnelly
On Mon, Jul 17, 2017 at 1:08 PM,   wrote:
> But lets try another. Lets say i have a file in my server which is 5GB. If i
> do this:
>
> $ rsync ./bigfile /mnt/cephfs/targetfile --progress
>
> Then i see max. 200 mb/s. I think it is still slow :/ Is this an expected?

Perhaps that is the bandwidth limit of your local device rsync is reading from?

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI production ready?

2017-07-17 Thread Alvaro Soto
Thanks Jason,

The second part, nevermind know I see that the solution is to use the
TCMU daemon, I was thinking in a out of the box iSCSI endpoint directly
from CEPH, sorry don't have to much expertise in this area.

Best.


On Jul 17, 2017 6:54 AM, "Jason Dillaman"  wrote:

On Sat, Jul 15, 2017 at 11:01 PM, Alvaro Soto  wrote:
> Hi guys,
> does anyone know any news about in what release iSCSI interface is going
to
> be production ready, if not yet?

There are several flavors of RBD iSCSI implementations that are in-use
by the community. We are working to solidify the integration with LIO
TCMU (via tcmu-runner) right now for Luminous [1].

> I mean without the use of a gateway, like a different endpoint connector
to
> a CEPH cluster.

I'm not sure what you mean here.

> Thanks in advance.
> Best.
>
> --
>
> ATTE. Alvaro Soto Escobar
>
> --
> Great people talk about ideas,
> average people talk about things,
> small people talk ... about other people.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


[1] https://github.com/ceph/ceph/pull/16182

--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread gencer

Hi Patrick.

Thank you for prompt response.

I added ceph.conf file but i think you missed it.

These are the configs i tuned: (also i disabled debug logs in global 
section). Correct me if i understand you wrongly on this.


Btw before i gave you config i want to answer on sync io. Yes if i 
remove oflag then it goes to 1.1gb/s. Very fast indeed.


But lets try another. Lets say i have a file in my server which is 5GB. 
If i do this:


$ rsync ./bigfile /mnt/cephfs/targetfile --progress

Then i see max. 200 mb/s. I think it is still slow :/ Is this an 
expected?


Am i doing something wrong here?

Anyway, here is configs for osd i tried to tune.

[osd]

osd max write size = 512

osd client message size cap = 2147483648

osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier

filestore xattr use omap = true

osd_op_threads = 8

osd disk threads = 4

osd map cache size = 1024

filestore_queue_max_ops = 25000

filestore_queue_max_bytes = 10485760

filestore_queue_committing_max_ops = 5000

filestore_queue_committing_max_bytes = 1048576

journal_max_write_entries = 1000

journal_queue_max_ops = 3000

journal_max_write_bytes = 1048576000

journal_queue_max_bytes = 1048576000

filestore_max_sync_interval = 15

filestore_merge_threshold = 20

filestore_split_multiple = 2

osd_enable_op_tracker = false

filestore_wbthrottle_enable = false

osd_client_message_size_cap = 0

osd_client_message_cap = 0

filestore_fd_cache_size = 64

filestore_fd_cache_shards = 32

filestore_op_threads = 12






On 2017-07-17 22:41, Patrick Donnelly wrote:

Hi Gencer,

On Mon, Jul 17, 2017 at 12:31 PM,   wrote:
I located and applied almost every different tuning setting/config 
over the
internet. I couldn’t manage to speed up my speed one byte further. It 
is

always same speed whatever I do.


I believe you're frustrated but this type of information isn't really
helpful. Instead tell us which config settings you've tried tuning.

I have 2 nodes with 10 OSD each and each OSD is 3TB SATA drive. Each 
node
has 24 cores and 64GB of RAM. Ceph nodes are connected via 10GbE NIC. 
No

FUSE used. But tried that too. Same results.



$ dd if=/dev/zero of=/mnt/c/testfile bs=100M count=10 oflag=direct


This looks like your problem: don't use oflag=direct. That will cause
CephFS to do synchronous I/O at great cost to performance in order to
avoid buffering by the client.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread Patrick Donnelly
Hi Gencer,

On Mon, Jul 17, 2017 at 12:31 PM,   wrote:
> I located and applied almost every different tuning setting/config over the
> internet. I couldn’t manage to speed up my speed one byte further. It is
> always same speed whatever I do.

I believe you're frustrated but this type of information isn't really
helpful. Instead tell us which config settings you've tried tuning.

> I have 2 nodes with 10 OSD each and each OSD is 3TB SATA drive. Each node
> has 24 cores and 64GB of RAM. Ceph nodes are connected via 10GbE NIC. No
> FUSE used. But tried that too. Same results.
>
>
>
> $ dd if=/dev/zero of=/mnt/c/testfile bs=100M count=10 oflag=direct

This looks like your problem: don't use oflag=direct. That will cause
CephFS to do synchronous I/O at great cost to performance in order to
avoid buffering by the client.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Yet another performance tuning for CephFS

2017-07-17 Thread gencer
Hi,

 

I located and applied almost every different tuning setting/config over the
internet. I couldn't manage to speed up my speed one byte further. It is
always same speed whatever I do.

 

I was on jewel, now I tried BlueStore on Luminous. Still exact same speed I
gain from cephfs.

 

It doesn't matter if I disable debug log, or remove [osd] section as below
and re-add as below (see .conf). Results are exactly the same. Not a single
byte is gained from those tunings. I also did tuning for kernel
(sysctl.conf).

 

Basics:

 

I have 2 nodes with 10 OSD each and each OSD is 3TB SATA drive. Each node
has 24 cores and 64GB of RAM. Ceph nodes are connected via 10GbE NIC. No
FUSE used. But tried that too. Same results.

 

$ dd if=/dev/zero of=/mnt/c/testfile bs=100M count=10 oflag=direct

10+0 records in

10+0 records out

1048576000 bytes (1.0 GB, 1000 MiB) copied, 5.77219 s, 182 MB/s

 

182MB/s. This is the best speed i get so far. Usually 170~MB/s. Hm.. I get
much much much higher speeds on different filesystems. Even with glusterfs.
Is there anything I can do or try?

 

Read speed is also around 180-220MB/s but not higher.

 

This is What I am using on ceph.conf:

 

[global]

fsid = d7163667-f8c5-466b-88df-8747b26c91df

mon_initial_members = server1

mon_host = 192.168.0.1

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

 

osd mount options = rw,noexec,nodev,noatime,nodiratime,nobarrier

osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier

osd_mkfs_type = xfs

 

osd pool default size = 2

enable experimental unrecoverable data corrupting features = bluestore
rocksdb

bluestore fsck on mount = true

rbd readahead disable after bytes = 0

rbd readahead max bytes = 4194304

 

log to syslog = false

debug_lockdep = 0/0

debug_context = 0/0

debug_crush = 0/0

debug_buffer = 0/0

debug_timer = 0/0

debug_filer = 0/0

debug_objecter = 0/0

debug_rados = 0/0

debug_rbd = 0/0

debug_journaler = 0/0

debug_objectcatcher = 0/0

debug_client = 0/0

debug_osd = 0/0

debug_optracker = 0/0

debug_objclass = 0/0

debug_filestore = 0/0

debug_journal = 0/0

debug_ms = 0/0

debug_monc = 0/0

debug_tp = 0/0

debug_auth = 0/0

debug_finisher = 0/0

debug_heartbeatmap = 0/0

debug_perfcounter = 0/0

debug_asok = 0/0

debug_throttle = 0/0

debug_mon = 0/0

debug_paxos = 0/0

debug_rgw = 0/0

 

 

[osd]

osd max write size = 512

osd client message size cap = 2147483648

osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier

filestore xattr use omap = true

osd_op_threads = 8

osd disk threads = 4

osd map cache size = 1024

filestore_queue_max_ops = 25000

filestore_queue_max_bytes = 10485760

filestore_queue_committing_max_ops = 5000

filestore_queue_committing_max_bytes = 1048576

journal_max_write_entries = 1000

journal_queue_max_ops = 3000

journal_max_write_bytes = 1048576000

journal_queue_max_bytes = 1048576000

filestore_max_sync_interval = 15

filestore_merge_threshold = 20

filestore_split_multiple = 2

osd_enable_op_tracker = false

filestore_wbthrottle_enable = false

osd_client_message_size_cap = 0

osd_client_message_cap = 0

filestore_fd_cache_size = 64

filestore_fd_cache_shards = 32

filestore_op_threads = 12

 

 

As I stated above, it doesn't matter if I have this [osd] section or not.
Results are same.

 

I am open to all suggestions.

 

Thanks,

Gencer.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-17 Thread Josh Durgin

Both of you are seeing leveldb perform compaction when the osd starts
up. This can take a while for large amounts of omap data (created by
things like cephfs directory metadata or rgw bucket indexes).

The 'leveldb_compact_on_mount' option wasn't changed in 10.2.9, but 
leveldb will compact automatically if there is enough work to do.


Does restarting an OSD affected by this with 10.2.9 again after it's
completed compaction still have these symptoms?

Josh

On 07/17/2017 05:57 AM, Lincoln Bryant wrote:

Hi Anton,

We observe something similar on our OSDs going from 10.2.7 to 10.2.9 
(see thread "some OSDs stuck down after 10.2.7 -> 10.2.9 update"). Some 
of our OSDs are not working at all on 10.2.9 or die with suicide 
timeouts. Those that come up/in take a very long time to boot up. Seems 
to not affect every OSD in our case though.


--Lincoln

On 7/17/2017 1:29 AM, Anton Dmitriev wrote:
During start it consumes ~90% CPU, strace shows, that OSD process 
doing something with LevelDB.

Compact is disabled:
r...@storage07.main01.ceph.apps.prod.int.grcc:~$ cat 
/etc/ceph/ceph.conf | grep compact

#leveldb_compact_on_mount = true

But with debug_leveldb=20 I see, that compaction is running, but why?

2017-07-17 09:27:37.394008 7f4ed2293700  1 leveldb: Compacting 1@1 + 
12@2 files
2017-07-17 09:27:37.593890 7f4ed2293700  1 leveldb: Generated table 
#76778: 277817 keys, 2125970 bytes
2017-07-17 09:27:37.718954 7f4ed2293700  1 leveldb: Generated table 
#76779: 221451 keys, 2124338 bytes
2017-07-17 09:27:37.777362 7f4ed2293700  1 leveldb: Generated table 
#76780: 63755 keys, 809913 bytes
2017-07-17 09:27:37.919094 7f4ed2293700  1 leveldb: Generated table 
#76781: 231475 keys, 2026376 bytes
2017-07-17 09:27:38.035906 7f4ed2293700  1 leveldb: Generated table 
#76782: 190956 keys, 1573332 bytes
2017-07-17 09:27:38.127597 7f4ed2293700  1 leveldb: Generated table 
#76783: 148675 keys, 1260956 bytes
2017-07-17 09:27:38.286183 7f4ed2293700  1 leveldb: Generated table 
#76784: 294105 keys, 2123438 bytes
2017-07-17 09:27:38.469562 7f4ed2293700  1 leveldb: Generated table 
#76785: 299617 keys, 2124267 bytes
2017-07-17 09:27:38.619666 7f4ed2293700  1 leveldb: Generated table 
#76786: 277305 keys, 2124936 bytes
2017-07-17 09:27:38.711423 7f4ed2293700  1 leveldb: Generated table 
#76787: 110536 keys, 951545 bytes
2017-07-17 09:27:38.869917 7f4ed2293700  1 leveldb: Generated table 
#76788: 296199 keys, 2123506 bytes
2017-07-17 09:27:39.028395 7f4ed2293700  1 leveldb: Generated table 
#76789: 248634 keys, 2096715 bytes
2017-07-17 09:27:39.028414 7f4ed2293700  1 leveldb: Compacted 1@1 + 
12@2 files => 21465292 bytes
2017-07-17 09:27:39.053288 7f4ed2293700  1 leveldb: compacted to: 
files[ 0 0 48 549 948 0 0 ]

2017-07-17 09:27:39.054014 7f4ed2293700  1 leveldb: Delete type=2 #76741

Strace:

open("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", O_RDONLY) = 18
stat("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", 
{st_mode=S_IFREG|0644, st_size=2154394, ...}) = 0

mmap(NULL, 2154394, PROT_READ, MAP_SHARED, 18, 0) = 0x7f96a67a
close(18)   = 0
brk(0x55d15664) = 0x55d15664
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_si

Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread gencer
Update!

Yeah, That was the problem. I zap the disks (purge) and re-create them 
according to official documentation. Now everything is OK.

I can see all disk and total sizes properly.

Let's see if this will bring any performance improvements if we compare to 
previous standard schema (usinbg jewel).

Thanks!,
Gencer.

-Original Message-
From: Wido den Hollander [mailto:w...@42on.com] 
Sent: Monday, July 17, 2017 6:17 PM
To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong


> Op 17 juli 2017 om 17:03 schreef gen...@gencgiyen.com:
> 
> 
> I used this methods:
> 
> $ ceph-deploy osd prepare sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1 
>  (one from 09th server one from 10th server..)
> 
> and then;
> 
> $ ceph-deploy osd activate sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1 ...
> 

You should use a whole disk, not a partition. So /dev/sdb without the '1'  at 
the end.

> This is my second creation for ceph cluster. At first I used bluestore. This 
> time i did not use bluestore (also removed from conf file). Still seen as 
> 200GB.
> 
> How can I make sure BlueStore is disabled (even if i not put any command).
> 

Just use BlueStore with Luminous as all testing is welcome! But in this case 
you invoked the command with the wrong parameters.

Wido

> -Gencer.
> 
> -Original Message-
> From: Wido den Hollander [mailto:w...@42on.com]
> Sent: Monday, July 17, 2017 5:57 PM
> To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong
> 
> 
> > Op 17 juli 2017 om 16:41 schreef gen...@gencgiyen.com:
> > 
> > 
> > Hi Wido,
> > 
> > Each disk is 3TB SATA (2.8TB seen) but what I got is this:
> > 
> > First let me gave you df -h:
> > 
> > /dev/sdb1   2.8T  754M  2.8T   1% /var/lib/ceph/osd/ceph-0
> > /dev/sdc1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-2
> > /dev/sdd1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-4
> > /dev/sde1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-6
> > /dev/sdf1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-8
> > /dev/sdg1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-10
> > /dev/sdh1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-12
> > /dev/sdi1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-14
> > /dev/sdj1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-16
> > /dev/sdk1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-18
> > 
> > 
> > Then here is my results from ceph df commands:
> > 
> > ceph df
> > 
> > GLOBAL:
> > SIZE AVAIL RAW USED %RAW USED
> > 200G  179G   21381M 10.44
> > POOLS:
> > NAMEID USED %USED MAX AVAIL OBJECTS
> > rbd 0 0 086579M   0
> > cephfs_data 1 0 086579M   0
> > cephfs_metadata 2  2488 086579M  21
> > 
> 
> Ok, that's odd. But I think these disks are using BlueStore since that's what 
> Luminous defaults to.
> 
> The partitions seem to be mixed up, so can you check on how you created the 
> OSDs? Was that with ceph-disk? If so, what additional arguments did you use?
> 
> Wido
> 
> > ceph osd df
> > ID WEIGHT  REWEIGHT SIZE   USEAVAIL %USE  VAR  PGS
> >  0 0.00980  1.0 10240M  1070M 9170M 10.45 1.00 173
> >  2 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 150
> >  4 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 148
> >  6 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 167
> >  8 0.00980  1.0 10240M  1069M 9171M 10.44 1.00 166
> > 10 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 171
> > 12 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 160
> > 14 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> > 16 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 182
> > 18 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 168
> >  1 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 167
> >  3 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 156
> >  5 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 152
> >  7 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 158
> >  9 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 174
> > 11 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 153
> > 13 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> > 15 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 186
> > 17 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 185
> > 19 0.00980  1.0 10240M  1067M 9172M 10.43 1.00 154
> >   TOTAL   200G 21381M  179G 10.44
> > MIN/MAX VAR: 1.00/1.00  STDDEV: 0.00
> > 
> > 
> > -Gencer.
> > 
> > -Original Message-
> > From: Wido den Hollander [mailto:w...@42on.com]
> > Sent: Monday, July 17, 2017 4:57 PM
> > To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> > Subject: Re: [ceph-users] Ceph (Luminous) shows total_space wrong
> > 
> > 
> > > Op 17 juli 2017 om 15:49 schreef gen...@gencgiyen.co

Re: [ceph-users] missing feature 400000000000000 ?

2017-07-17 Thread Richard Hesketh
Correct me if I'm wrong, but I understand rbd-nbd is a userland client for 
mapping RBDs to local block devices (like "rbd map" in the kernel client), not 
a client for mounting the cephfs filesystem which is what Riccardo is using?

Rich

On 17/07/17 12:48, Massimiliano Cuttini wrote:
> Hi Riccardo,
> 
> using ceph-fuse will add extra layer.
> Consider to use instead ceph-nbd which is a porting to use network device 
> blocks.
> This should be faster and allow you to use latest tunables (which it's 
> better).
> 
> Il 17/07/2017 10:56, Riccardo Murri ha scritto:
>> Thanks a lot to all!  Both the suggestion to use "ceph osd tunables
>> hammer" and to use "ceph-fuse" instead solved the issue.
>>
>> Riccardo



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread gencer
When I use /dev/sdb or /dev/sdc (the whole disk) i get errors like this:

ceph_disk.main.FilesystemTypeError: Cannot discover filesystem type: device 
/dev/sdb: Line is truncated:
  RuntimeError: command returned non-zero exit status: 1
  RuntimeError: Failed to execute command: /usr/sbin/ceph-disk -v activate 
--mark-init systemd --mount /dev/sdb

Are you sure that we need to remove "1" from at the end?

Can you point me on any doc for this because ceph's own documentation also 
shows as sdb1 sdc1...

If you have any sample, I will be very happy :)

-Gencer.

-Original Message-
From: Wido den Hollander [mailto:w...@42on.com] 
Sent: Monday, July 17, 2017 6:17 PM
To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong


> Op 17 juli 2017 om 17:03 schreef gen...@gencgiyen.com:
> 
> 
> I used this methods:
> 
> $ ceph-deploy osd prepare sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1 
>  (one from 09th server one from 10th server..)
> 
> and then;
> 
> $ ceph-deploy osd activate sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1 ...
> 

You should use a whole disk, not a partition. So /dev/sdb without the '1'  at 
the end.

> This is my second creation for ceph cluster. At first I used bluestore. This 
> time i did not use bluestore (also removed from conf file). Still seen as 
> 200GB.
> 
> How can I make sure BlueStore is disabled (even if i not put any command).
> 

Just use BlueStore with Luminous as all testing is welcome! But in this case 
you invoked the command with the wrong parameters.

Wido

> -Gencer.
> 
> -Original Message-
> From: Wido den Hollander [mailto:w...@42on.com]
> Sent: Monday, July 17, 2017 5:57 PM
> To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong
> 
> 
> > Op 17 juli 2017 om 16:41 schreef gen...@gencgiyen.com:
> > 
> > 
> > Hi Wido,
> > 
> > Each disk is 3TB SATA (2.8TB seen) but what I got is this:
> > 
> > First let me gave you df -h:
> > 
> > /dev/sdb1   2.8T  754M  2.8T   1% /var/lib/ceph/osd/ceph-0
> > /dev/sdc1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-2
> > /dev/sdd1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-4
> > /dev/sde1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-6
> > /dev/sdf1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-8
> > /dev/sdg1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-10
> > /dev/sdh1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-12
> > /dev/sdi1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-14
> > /dev/sdj1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-16
> > /dev/sdk1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-18
> > 
> > 
> > Then here is my results from ceph df commands:
> > 
> > ceph df
> > 
> > GLOBAL:
> > SIZE AVAIL RAW USED %RAW USED
> > 200G  179G   21381M 10.44
> > POOLS:
> > NAMEID USED %USED MAX AVAIL OBJECTS
> > rbd 0 0 086579M   0
> > cephfs_data 1 0 086579M   0
> > cephfs_metadata 2  2488 086579M  21
> > 
> 
> Ok, that's odd. But I think these disks are using BlueStore since that's what 
> Luminous defaults to.
> 
> The partitions seem to be mixed up, so can you check on how you created the 
> OSDs? Was that with ceph-disk? If so, what additional arguments did you use?
> 
> Wido
> 
> > ceph osd df
> > ID WEIGHT  REWEIGHT SIZE   USEAVAIL %USE  VAR  PGS
> >  0 0.00980  1.0 10240M  1070M 9170M 10.45 1.00 173
> >  2 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 150
> >  4 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 148
> >  6 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 167
> >  8 0.00980  1.0 10240M  1069M 9171M 10.44 1.00 166
> > 10 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 171
> > 12 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 160
> > 14 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> > 16 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 182
> > 18 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 168
> >  1 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 167
> >  3 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 156
> >  5 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 152
> >  7 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 158
> >  9 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 174
> > 11 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 153
> > 13 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> > 15 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 186
> > 17 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 185
> > 19 0.00980  1.0 10240M  1067M 9172M 10.43 1.00 154
> >   TOTAL   200G 21381M  179G 10.44
> > MIN/MAX VAR: 1.00/1.00  STDDEV: 0.00
> > 
> > 
> > -Gencer.
> > 
> > -Original Message-
> > From: Wido den Hollander [mailto:w...@

Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread Wido den Hollander

> Op 17 juli 2017 om 17:03 schreef gen...@gencgiyen.com:
> 
> 
> I used this methods:
> 
> $ ceph-deploy osd prepare sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1  
> (one from 09th server one from 10th server..)
> 
> and then;
> 
> $ ceph-deploy osd activate sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1 ...
> 

You should use a whole disk, not a partition. So /dev/sdb without the '1'  at 
the end.

> This is my second creation for ceph cluster. At first I used bluestore. This 
> time i did not use bluestore (also removed from conf file). Still seen as 
> 200GB.
> 
> How can I make sure BlueStore is disabled (even if i not put any command).
> 

Just use BlueStore with Luminous as all testing is welcome! But in this case 
you invoked the command with the wrong parameters.

Wido

> -Gencer.
> 
> -Original Message-
> From: Wido den Hollander [mailto:w...@42on.com] 
> Sent: Monday, July 17, 2017 5:57 PM
> To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong
> 
> 
> > Op 17 juli 2017 om 16:41 schreef gen...@gencgiyen.com:
> > 
> > 
> > Hi Wido,
> > 
> > Each disk is 3TB SATA (2.8TB seen) but what I got is this:
> > 
> > First let me gave you df -h:
> > 
> > /dev/sdb1   2.8T  754M  2.8T   1% /var/lib/ceph/osd/ceph-0
> > /dev/sdc1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-2
> > /dev/sdd1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-4
> > /dev/sde1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-6
> > /dev/sdf1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-8
> > /dev/sdg1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-10
> > /dev/sdh1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-12
> > /dev/sdi1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-14
> > /dev/sdj1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-16
> > /dev/sdk1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-18
> > 
> > 
> > Then here is my results from ceph df commands:
> > 
> > ceph df
> > 
> > GLOBAL:
> > SIZE AVAIL RAW USED %RAW USED
> > 200G  179G   21381M 10.44
> > POOLS:
> > NAMEID USED %USED MAX AVAIL OBJECTS
> > rbd 0 0 086579M   0
> > cephfs_data 1 0 086579M   0
> > cephfs_metadata 2  2488 086579M  21
> > 
> 
> Ok, that's odd. But I think these disks are using BlueStore since that's what 
> Luminous defaults to.
> 
> The partitions seem to be mixed up, so can you check on how you created the 
> OSDs? Was that with ceph-disk? If so, what additional arguments did you use?
> 
> Wido
> 
> > ceph osd df
> > ID WEIGHT  REWEIGHT SIZE   USEAVAIL %USE  VAR  PGS
> >  0 0.00980  1.0 10240M  1070M 9170M 10.45 1.00 173
> >  2 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 150
> >  4 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 148
> >  6 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 167
> >  8 0.00980  1.0 10240M  1069M 9171M 10.44 1.00 166
> > 10 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 171
> > 12 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 160
> > 14 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> > 16 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 182
> > 18 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 168
> >  1 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 167
> >  3 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 156
> >  5 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 152
> >  7 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 158
> >  9 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 174
> > 11 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 153
> > 13 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> > 15 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 186
> > 17 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 185
> > 19 0.00980  1.0 10240M  1067M 9172M 10.43 1.00 154
> >   TOTAL   200G 21381M  179G 10.44
> > MIN/MAX VAR: 1.00/1.00  STDDEV: 0.00
> > 
> > 
> > -Gencer.
> > 
> > -Original Message-
> > From: Wido den Hollander [mailto:w...@42on.com]
> > Sent: Monday, July 17, 2017 4:57 PM
> > To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> > Subject: Re: [ceph-users] Ceph (Luminous) shows total_space wrong
> > 
> > 
> > > Op 17 juli 2017 om 15:49 schreef gen...@gencgiyen.com:
> > > 
> > > 
> > > Hi,
> > > 
> > >  
> > > 
> > > I successfully managed to work with ceph jewel. Want to try luminous.
> > > 
> > >  
> > > 
> > > I also set experimental bluestore while creating osds. Problem is, I 
> > > have 20x3TB hdd in two nodes and i would expect 55TB usable (as on
> > > jewel) on luminous but i see 200GB. Ceph thinks I have only 200GB 
> > > space available in total. I see all osds are up and in.
> > > 
> > >  
> > > 
> > > 20 osd up; 20 osd in. 0 down.
> > > 
> > >  
> > > 
> > > Ceph -s shows HEALTH_OK. I have only

Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread gencer
Also one more thing, If I want to use BlueStore how do I let it to know that I 
have more space? Do i need to specify a size at any point?

-Gencer.

-Original Message-
From: gen...@gencgiyen.com [mailto:gen...@gencgiyen.com] 
Sent: Monday, July 17, 2017 6:04 PM
To: 'Wido den Hollander' ; 'ceph-users@lists.ceph.com' 

Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong

I used this methods:

$ ceph-deploy osd prepare sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1  (one 
from 09th server one from 10th server..)

and then;

$ ceph-deploy osd activate sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1 ...

This is my second creation for ceph cluster. At first I used bluestore. This 
time i did not use bluestore (also removed from conf file). Still seen as 200GB.

How can I make sure BlueStore is disabled (even if i not put any command).

-Gencer.

-Original Message-
From: Wido den Hollander [mailto:w...@42on.com] 
Sent: Monday, July 17, 2017 5:57 PM
To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong


> Op 17 juli 2017 om 16:41 schreef gen...@gencgiyen.com:
> 
> 
> Hi Wido,
> 
> Each disk is 3TB SATA (2.8TB seen) but what I got is this:
> 
> First let me gave you df -h:
> 
> /dev/sdb1   2.8T  754M  2.8T   1% /var/lib/ceph/osd/ceph-0
> /dev/sdc1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-2
> /dev/sdd1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-4
> /dev/sde1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-6
> /dev/sdf1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-8
> /dev/sdg1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-10
> /dev/sdh1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-12
> /dev/sdi1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-14
> /dev/sdj1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-16
> /dev/sdk1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-18
> 
> 
> Then here is my results from ceph df commands:
> 
> ceph df
> 
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 200G  179G   21381M 10.44
> POOLS:
> NAMEID USED %USED MAX AVAIL OBJECTS
> rbd 0 0 086579M   0
> cephfs_data 1 0 086579M   0
> cephfs_metadata 2  2488 086579M  21
> 

Ok, that's odd. But I think these disks are using BlueStore since that's what 
Luminous defaults to.

The partitions seem to be mixed up, so can you check on how you created the 
OSDs? Was that with ceph-disk? If so, what additional arguments did you use?

Wido

> ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL %USE  VAR  PGS
>  0 0.00980  1.0 10240M  1070M 9170M 10.45 1.00 173
>  2 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 150
>  4 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 148
>  6 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 167
>  8 0.00980  1.0 10240M  1069M 9171M 10.44 1.00 166
> 10 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 171
> 12 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 160
> 14 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> 16 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 182
> 18 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 168
>  1 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 167
>  3 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 156
>  5 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 152
>  7 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 158
>  9 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 174
> 11 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 153
> 13 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> 15 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 186
> 17 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 185
> 19 0.00980  1.0 10240M  1067M 9172M 10.43 1.00 154
>   TOTAL   200G 21381M  179G 10.44
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.00
> 
> 
> -Gencer.
> 
> -Original Message-
> From: Wido den Hollander [mailto:w...@42on.com]
> Sent: Monday, July 17, 2017 4:57 PM
> To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> Subject: Re: [ceph-users] Ceph (Luminous) shows total_space wrong
> 
> 
> > Op 17 juli 2017 om 15:49 schreef gen...@gencgiyen.com:
> > 
> > 
> > Hi,
> > 
> >  
> > 
> > I successfully managed to work with ceph jewel. Want to try luminous.
> > 
> >  
> > 
> > I also set experimental bluestore while creating osds. Problem is, I 
> > have 20x3TB hdd in two nodes and i would expect 55TB usable (as on
> > jewel) on luminous but i see 200GB. Ceph thinks I have only 200GB 
> > space available in total. I see all osds are up and in.
> > 
> >  
> > 
> > 20 osd up; 20 osd in. 0 down.
> > 
> >  
> > 
> > Ceph -s shows HEALTH_OK. I have only one monitor and one mds. 
> > (1/1/1) and it is up:active.
> > 
> >  
> > 
> > ceph osd tree gave me all OSDs in nodes are up and results are 
> >

Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread gencer
I used this methods:

$ ceph-deploy osd prepare sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1  (one 
from 09th server one from 10th server..)

and then;

$ ceph-deploy osd activate sr-09-01-18:/dev/sdb1 sr-10-01-18:/dev/sdb1 ...

This is my second creation for ceph cluster. At first I used bluestore. This 
time i did not use bluestore (also removed from conf file). Still seen as 200GB.

How can I make sure BlueStore is disabled (even if i not put any command).

-Gencer.

-Original Message-
From: Wido den Hollander [mailto:w...@42on.com] 
Sent: Monday, July 17, 2017 5:57 PM
To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
Subject: RE: [ceph-users] Ceph (Luminous) shows total_space wrong


> Op 17 juli 2017 om 16:41 schreef gen...@gencgiyen.com:
> 
> 
> Hi Wido,
> 
> Each disk is 3TB SATA (2.8TB seen) but what I got is this:
> 
> First let me gave you df -h:
> 
> /dev/sdb1   2.8T  754M  2.8T   1% /var/lib/ceph/osd/ceph-0
> /dev/sdc1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-2
> /dev/sdd1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-4
> /dev/sde1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-6
> /dev/sdf1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-8
> /dev/sdg1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-10
> /dev/sdh1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-12
> /dev/sdi1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-14
> /dev/sdj1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-16
> /dev/sdk1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-18
> 
> 
> Then here is my results from ceph df commands:
> 
> ceph df
> 
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 200G  179G   21381M 10.44
> POOLS:
> NAMEID USED %USED MAX AVAIL OBJECTS
> rbd 0 0 086579M   0
> cephfs_data 1 0 086579M   0
> cephfs_metadata 2  2488 086579M  21
> 

Ok, that's odd. But I think these disks are using BlueStore since that's what 
Luminous defaults to.

The partitions seem to be mixed up, so can you check on how you created the 
OSDs? Was that with ceph-disk? If so, what additional arguments did you use?

Wido

> ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL %USE  VAR  PGS
>  0 0.00980  1.0 10240M  1070M 9170M 10.45 1.00 173
>  2 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 150
>  4 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 148
>  6 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 167
>  8 0.00980  1.0 10240M  1069M 9171M 10.44 1.00 166
> 10 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 171
> 12 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 160
> 14 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> 16 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 182
> 18 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 168
>  1 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 167
>  3 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 156
>  5 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 152
>  7 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 158
>  9 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 174
> 11 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 153
> 13 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> 15 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 186
> 17 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 185
> 19 0.00980  1.0 10240M  1067M 9172M 10.43 1.00 154
>   TOTAL   200G 21381M  179G 10.44
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.00
> 
> 
> -Gencer.
> 
> -Original Message-
> From: Wido den Hollander [mailto:w...@42on.com]
> Sent: Monday, July 17, 2017 4:57 PM
> To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> Subject: Re: [ceph-users] Ceph (Luminous) shows total_space wrong
> 
> 
> > Op 17 juli 2017 om 15:49 schreef gen...@gencgiyen.com:
> > 
> > 
> > Hi,
> > 
> >  
> > 
> > I successfully managed to work with ceph jewel. Want to try luminous.
> > 
> >  
> > 
> > I also set experimental bluestore while creating osds. Problem is, I 
> > have 20x3TB hdd in two nodes and i would expect 55TB usable (as on
> > jewel) on luminous but i see 200GB. Ceph thinks I have only 200GB 
> > space available in total. I see all osds are up and in.
> > 
> >  
> > 
> > 20 osd up; 20 osd in. 0 down.
> > 
> >  
> > 
> > Ceph -s shows HEALTH_OK. I have only one monitor and one mds. 
> > (1/1/1) and it is up:active.
> > 
> >  
> > 
> > ceph osd tree gave me all OSDs in nodes are up and results are 
> > 1.... I checked via df -h but all disks ahows 2.7TB. Basically 
> > something is wrong.
> > Same settings and followed schema on jewel is successful except luminous.
> > 
> 
> What do these commands show:
> 
> - ceph df
> - ceph osd df
> 
> Might be that you are looking at the wrong numbers.
> 
> Wido
> 
> >  
> > 
> > What might it be?
> > 
> >  
> > 
> > What do you need to know to sol

Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread Wido den Hollander

> Op 17 juli 2017 om 16:41 schreef gen...@gencgiyen.com:
> 
> 
> Hi Wido,
> 
> Each disk is 3TB SATA (2.8TB seen) but what I got is this:
> 
> First let me gave you df -h:
> 
> /dev/sdb1   2.8T  754M  2.8T   1% /var/lib/ceph/osd/ceph-0
> /dev/sdc1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-2
> /dev/sdd1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-4
> /dev/sde1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-6
> /dev/sdf1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-8
> /dev/sdg1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-10
> /dev/sdh1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-12
> /dev/sdi1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-14
> /dev/sdj1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-16
> /dev/sdk1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-18
> 
> 
> Then here is my results from ceph df commands:
> 
> ceph df
> 
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 200G  179G   21381M 10.44
> POOLS:
> NAMEID USED %USED MAX AVAIL OBJECTS
> rbd 0 0 086579M   0
> cephfs_data 1 0 086579M   0
> cephfs_metadata 2  2488 086579M  21
> 

Ok, that's odd. But I think these disks are using BlueStore since that's what 
Luminous defaults to.

The partitions seem to be mixed up, so can you check on how you created the 
OSDs? Was that with ceph-disk? If so, what additional arguments did you use?

Wido

> ceph osd df
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL %USE  VAR  PGS
>  0 0.00980  1.0 10240M  1070M 9170M 10.45 1.00 173
>  2 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 150
>  4 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 148
>  6 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 167
>  8 0.00980  1.0 10240M  1069M 9171M 10.44 1.00 166
> 10 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 171
> 12 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 160
> 14 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> 16 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 182
> 18 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 168
>  1 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 167
>  3 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 156
>  5 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 152
>  7 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 158
>  9 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 174
> 11 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 153
> 13 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
> 15 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 186
> 17 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 185
> 19 0.00980  1.0 10240M  1067M 9172M 10.43 1.00 154
>   TOTAL   200G 21381M  179G 10.44
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.00
> 
> 
> -Gencer.
> 
> -Original Message-
> From: Wido den Hollander [mailto:w...@42on.com] 
> Sent: Monday, July 17, 2017 4:57 PM
> To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
> Subject: Re: [ceph-users] Ceph (Luminous) shows total_space wrong
> 
> 
> > Op 17 juli 2017 om 15:49 schreef gen...@gencgiyen.com:
> > 
> > 
> > Hi,
> > 
> >  
> > 
> > I successfully managed to work with ceph jewel. Want to try luminous.
> > 
> >  
> > 
> > I also set experimental bluestore while creating osds. Problem is, I 
> > have 20x3TB hdd in two nodes and i would expect 55TB usable (as on 
> > jewel) on luminous but i see 200GB. Ceph thinks I have only 200GB 
> > space available in total. I see all osds are up and in.
> > 
> >  
> > 
> > 20 osd up; 20 osd in. 0 down.
> > 
> >  
> > 
> > Ceph -s shows HEALTH_OK. I have only one monitor and one mds. (1/1/1) 
> > and it is up:active.
> > 
> >  
> > 
> > ceph osd tree gave me all OSDs in nodes are up and results are 
> > 1.... I checked via df -h but all disks ahows 2.7TB. Basically 
> > something is wrong.
> > Same settings and followed schema on jewel is successful except luminous.
> > 
> 
> What do these commands show:
> 
> - ceph df
> - ceph osd df
> 
> Might be that you are looking at the wrong numbers.
> 
> Wido
> 
> >  
> > 
> > What might it be?
> > 
> >  
> > 
> > What do you need to know to solve this problem? Why ceph thinks I have 
> > 200GB space only?
> > 
> >  
> > 
> > Thanks,
> > 
> > Gencer.
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread gencer
Hi Wido,

Each disk is 3TB SATA (2.8TB seen) but what I got is this:

First let me gave you df -h:

/dev/sdb1   2.8T  754M  2.8T   1% /var/lib/ceph/osd/ceph-0
/dev/sdc1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-2
/dev/sdd1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-4
/dev/sde1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-6
/dev/sdf1   2.8T  753M  2.8T   1% /var/lib/ceph/osd/ceph-8
/dev/sdg1   2.8T  752M  2.8T   1% /var/lib/ceph/osd/ceph-10
/dev/sdh1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-12
/dev/sdi1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-14
/dev/sdj1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-16
/dev/sdk1   2.8T  751M  2.8T   1% /var/lib/ceph/osd/ceph-18


Then here is my results from ceph df commands:

ceph df

GLOBAL:
SIZE AVAIL RAW USED %RAW USED
200G  179G   21381M 10.44
POOLS:
NAMEID USED %USED MAX AVAIL OBJECTS
rbd 0 0 086579M   0
cephfs_data 1 0 086579M   0
cephfs_metadata 2  2488 086579M  21

ceph osd df
ID WEIGHT  REWEIGHT SIZE   USEAVAIL %USE  VAR  PGS
 0 0.00980  1.0 10240M  1070M 9170M 10.45 1.00 173
 2 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 150
 4 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 148
 6 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 167
 8 0.00980  1.0 10240M  1069M 9171M 10.44 1.00 166
10 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 171
12 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 160
14 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
16 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 182
18 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 168
 1 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 167
 3 0.00980  1.0 10240M  1069M 9170M 10.45 1.00 156
 5 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 152
 7 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 158
 9 0.00980  1.0 10240M  1069M 9170M 10.44 1.00 174
11 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 153
13 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 179
15 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 186
17 0.00980  1.0 10240M  1068M 9171M 10.44 1.00 185
19 0.00980  1.0 10240M  1067M 9172M 10.43 1.00 154
  TOTAL   200G 21381M  179G 10.44
MIN/MAX VAR: 1.00/1.00  STDDEV: 0.00


-Gencer.

-Original Message-
From: Wido den Hollander [mailto:w...@42on.com] 
Sent: Monday, July 17, 2017 4:57 PM
To: ceph-users@lists.ceph.com; gen...@gencgiyen.com
Subject: Re: [ceph-users] Ceph (Luminous) shows total_space wrong


> Op 17 juli 2017 om 15:49 schreef gen...@gencgiyen.com:
> 
> 
> Hi,
> 
>  
> 
> I successfully managed to work with ceph jewel. Want to try luminous.
> 
>  
> 
> I also set experimental bluestore while creating osds. Problem is, I 
> have 20x3TB hdd in two nodes and i would expect 55TB usable (as on 
> jewel) on luminous but i see 200GB. Ceph thinks I have only 200GB 
> space available in total. I see all osds are up and in.
> 
>  
> 
> 20 osd up; 20 osd in. 0 down.
> 
>  
> 
> Ceph -s shows HEALTH_OK. I have only one monitor and one mds. (1/1/1) 
> and it is up:active.
> 
>  
> 
> ceph osd tree gave me all OSDs in nodes are up and results are 
> 1.... I checked via df -h but all disks ahows 2.7TB. Basically something 
> is wrong.
> Same settings and followed schema on jewel is successful except luminous.
> 

What do these commands show:

- ceph df
- ceph osd df

Might be that you are looking at the wrong numbers.

Wido

>  
> 
> What might it be?
> 
>  
> 
> What do you need to know to solve this problem? Why ceph thinks I have 
> 200GB space only?
> 
>  
> 
> Thanks,
> 
> Gencer.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to force "rbd unmap"

2017-07-17 Thread Ilya Dryomov
On Thu, Jul 6, 2017 at 2:43 PM, Ilya Dryomov  wrote:
> On Thu, Jul 6, 2017 at 2:23 PM, Stanislav Kopp  wrote:
>> 2017-07-06 14:16 GMT+02:00 Ilya Dryomov :
>>> On Thu, Jul 6, 2017 at 1:28 PM, Stanislav Kopp  wrote:
 Hi,

 2017-07-05 20:31 GMT+02:00 Ilya Dryomov :
> On Wed, Jul 5, 2017 at 7:55 PM, Stanislav Kopp  wrote:
>> Hello,
>>
>> I have problem that sometimes I can't unmap rbd device, I get "sysfs
>> write failed rbd: unmap failed: (16) Device or resource busy", there
>> is no open files and "holders" directory is empty. I saw on the
>> mailling list that you can "force" unmapping the device, but I cant
>> find how does it work. "man rbd" only mentions "force" as "KERNEL RBD
>> (KRBD) OPTION", but "modinfo rbd" doesn't show this option. Did I miss
>> something?
>
> Forcing unmap on an open device is not a good idea.  I'd suggest
> looking into what's holding the device and fixing that instead.

 We use pacemaker's resource agent for rbd mount/unmount
 (https://github.com/ceph/ceph/blob/master/src/ocf/rbd.in)
 I've reproduced the failure again and now saw in ps output that there
 is still unmout fs process in D state:

 root 29320  0.0  0.0 21980 1272 ?D09:18   0:00
 umount /export/rbd1

 this explains rbd unmap problem, but strange enough I don't see this
 mount in /proc/mounts, so it looks like it was successfully unmounted,
 if I try to strace the "umount" procces it hung (the strace, with no
 output), looks like kernel problem? Do you have some tips for further
 debugging?
>>>
>>> Check /sys/kernel/debug/ceph//osdc.  It lists
>>> in-flight requests, that's what umount is blocked on.
>>
>> I see this in my output, but don't know what does it means honestly:
>>
>> root@nfs-test01:~# cat
>> /sys/kernel/debug/ceph/4f23f683-21e6-49f3-ae2c-c95b150b9dc6.client138566/osdc
>> REQUESTS 2 homeless 0
>> 658 osd9 0.75514984 [9,1,6]/9 [9,1,6]/9
>> rbd_data.6e28c6b8b4567. 0x400024 10'0
>> set-alloc-hint,write
>> 659 osd15 0.40f1ea02 [15,7,9]/15 [15,7,9]/15
>> rbd_data.6e28c6b8b4567.0001 0x400024 10'0
>> set-alloc-hint,write
>
> It means you have two pending writes (OSD requests), to osd9 and osd15.
> What is the output of
>
> $ ceph -s
> $ ceph pg dump pgs_brief

Stanislav and I tracked this down to a pacemaker misconfiguration:
"... the problem was wrong netmask, we use /22 for this network, but
primary interface of machine was configured with /23 and the VIP even
with /24, because of that, the VIP was often the first interface
which caused unmount problem, because it was stopped as first resource
in fail-over situation. After fixing netmask for both VIP and
machine's IP it's working without issues (at least it never fails
since)."

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mount rbd

2017-07-17 Thread lista

Dear,
 
In your last message i has understood, the exclusive-lock is work in kernel 4.9 
or higher and this could help-me, with don't permission write in two machines, 
but this feature only avaible in kernel 4.12, is right ?
 
I will reading more about the pacemaker, in my environment testing, i Would use 
the heartbeat, but the pacemaker it seams to be one alternative better.

 
 
Thanks a Lot
Marcelo 

>>>By default, since 4.9, 
> > >> > > when the exclusive-lock feature is enabled, only a single client 
> > >> > > can write to 
> > >> > the 
> > >> > > block device at a time -

Em 14/07/2017, Nick Fisk  escreveu:
> 
> 
> > -Original Message- 
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jason Dillaman 
> > Sent: 14 July 2017 16:40 
> > To: li...@marcelofrota.info 
> > Cc: ceph-users  
> > Subject: Re: [ceph-users] Ceph mount rbd 
> > 
> > On Fri, Jul 14, 2017 at 9:44 AM,   wrote: 
> > > Gonzalo, 
> > > 
> > > 
> > > 
> > > You are right, i told so much about my enviroment actual and maybe i 
> > > didn't know explain my problem the better form, with ceph in the 
> > > moment, mutiple hosts clients can mount and write datas in my system 
> > > and this is one problem, because i could have filesystem corruption. 
> > > 
> > > 
> > > 
> > > Example, today, if runing the comand in two machines in the same time, 
> > > it will work. 
> > > 
> > > 
> > > 
> > > mount /dev/rbd0 /mnt/veeamrepo 
> > > 
> > > cd /mnt/veeamrepo ; touch testfile.txt 
> > > 
> > > 
> > > 
> > > I need ensure, only one machine will can execute this. 
> > > 
> > 
> > A user could do the same thing with any number of remote block devices (i.e. I could map an iSCSI target multiple times). As I said 
> > before, you can use the "exclusive" option available since kernel 4.12, roll your own solution using the advisory locks available from 
> > the rbd CLI, or just use CephFS if you want to be able to access a file system on multiple hosts. 
> 
> Pacemaker, will also prevent a RBD to be mounted multiple times, if you want to manage the fencing outside of Ceph. 
> 
> > 
> > > 
> > > Thanks a lot, 
> > > 
> > > Marcelo 
> > > 
> > > 
> > > Em 14/07/2017, Gonzalo Aguilar Delgado  
> > > escreveu: 
> > > 
> > > 
> > >> Hi, 
> > >> 
> > >> Why you would like to maintain copies by yourself. You replicate on 
> > >> ceph and then on different files inside ceph? Let ceph take care of counting. 
> > >> Create a pool with 3 or more copies and let ceph take care of what's 
> > >> stored and where. 
> > >> 
> > >> Best regards, 
> > >> 
> > >> 
> > >> El 13/07/17 a las 17:06, li...@marcelofrota.info escribió: 
> > >> > 
> > >> > I will explain More about my system actual, in the moment i have 2 
> > >> > machines using drbd in mode master/slave and i running the 
> > >> > aplication in machine master, but existing 2 questions importants 
> > >> > in my enviroment with drbd actualy : 
> > >> > 
> > >> > 1 - If machine one is master and mounting partitions, the slave 
> > >> > don't can mount the system, Unless it happens one problem in 
> > >> > machine master, this is one mode, to prevent write in filesystem 
> > >> > incorrect 
> > >> > 
> > >> > 2 - When i write data in machine master in drbd, the drbd write 
> > >> > datas in slave machine Automatically, with this, if one problem 
> > >> > happens in node master, the machine slave have coppy the data. 
> > >> > 
> > >> > In the moment, in my enviroment testing with ceph, using the 
> > >> > version 
> > >> > 4.10 of kernel and i mount the system in two machines in the same 
> > >> > time, in production enviroment, i could serious problem with this 
> > >> > comportament. 
> > >> > 
> > >> > How can i use the ceph and Ensure that I could get these 2 
> > >> > behaviors kept in a new environment with Ceph? 
> > >> > 
> > >> > Thanks a lot, 
> > >> > 
> > >> > Marcelo 
> > >> > 
> > >> > 
> > >> > Em 28/06/2017, Jason Dillaman  escreveu: 
> > >> > > ... additionally, the forthcoming 4.12 kernel release will 
> > >> > > support non-cooperative exclusive locking. By default, since 4.9, 
> > >> > > when the exclusive-lock feature is enabled, only a single client 
> > >> > > can write to 
> > >> > the 
> > >> > > block device at a time -- but they will cooperatively pass the 
> > >> > > lock 
> > >> > back 
> > >> > > and forth upon write request. With the new "rbd map" option, you 
> > >> > > can 
> > >> > map a 
> > >> > > image on exactly one host and prevent other hosts from mapping 
> > >> > > the 
> > >> > image. 
> > >> > > If that host should die, the exclusive-lock will automatically 
> > >> > > become available to other hosts for mapping. 
> > >> > > 
> > >> > > Of course, I always have to ask the use-case behind mapping the 
> > >> > > same 
> > >> > image 
> > >> > > on multiple hosts. Perhaps CephFS would be a better fit if you 
> > >>

Re: [ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread Wido den Hollander

> Op 17 juli 2017 om 15:49 schreef gen...@gencgiyen.com:
> 
> 
> Hi,
> 
>  
> 
> I successfully managed to work with ceph jewel. Want to try luminous.
> 
>  
> 
> I also set experimental bluestore while creating osds. Problem is, I have
> 20x3TB hdd in two nodes and i would expect 55TB usable (as on jewel) on
> luminous but i see 200GB. Ceph thinks I have only 200GB space available in
> total. I see all osds are up and in.
> 
>  
> 
> 20 osd up; 20 osd in. 0 down.
> 
>  
> 
> Ceph -s shows HEALTH_OK. I have only one monitor and one mds. (1/1/1) and it
> is up:active.
> 
>  
> 
> ceph osd tree gave me all OSDs in nodes are up and results are 1.... I
> checked via df -h but all disks ahows 2.7TB. Basically something is wrong.
> Same settings and followed schema on jewel is successful except luminous.
> 

What do these commands show:

- ceph df
- ceph osd df

Might be that you are looking at the wrong numbers.

Wido

>  
> 
> What might it be?
> 
>  
> 
> What do you need to know to solve this problem? Why ceph thinks I have 200GB
> space only?
> 
>  
> 
> Thanks,
> 
> Gencer.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph (Luminous) shows total_space wrong

2017-07-17 Thread gencer
Hi,

 

I successfully managed to work with ceph jewel. Want to try luminous.

 

I also set experimental bluestore while creating osds. Problem is, I have
20x3TB hdd in two nodes and i would expect 55TB usable (as on jewel) on
luminous but i see 200GB. Ceph thinks I have only 200GB space available in
total. I see all osds are up and in.

 

20 osd up; 20 osd in. 0 down.

 

Ceph -s shows HEALTH_OK. I have only one monitor and one mds. (1/1/1) and it
is up:active.

 

ceph osd tree gave me all OSDs in nodes are up and results are 1.... I
checked via df -h but all disks ahows 2.7TB. Basically something is wrong.
Same settings and followed schema on jewel is successful except luminous.

 

What might it be?

 

What do you need to know to solve this problem? Why ceph thinks I have 200GB
space only?

 

Thanks,

Gencer.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems getting nfs-ganesha with cephfs backend to work.

2017-07-17 Thread Micha Krause

Hi,


> Change Pseudo to something like /mypseudofolder

I tried this, without success, but I managed to get something working with 
version 2.5.

I can mount the NFS export now, however 2 problems remain:

1. The root directory of the mount-point looks empty (ls shows no files), 
however directories
   and files can be accessed, and ls works in subdirectories.

2. I can't create devices in the nfs mount, not sure if ganesha supports this 
with other backends.


Micha Krause
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-17 Thread Lincoln Bryant

Hi Anton,

We observe something similar on our OSDs going from 10.2.7 to 10.2.9 
(see thread "some OSDs stuck down after 10.2.7 -> 10.2.9 update"). Some 
of our OSDs are not working at all on 10.2.9 or die with suicide 
timeouts. Those that come up/in take a very long time to boot up. Seems 
to not affect every OSD in our case though.


--Lincoln

On 7/17/2017 1:29 AM, Anton Dmitriev wrote:
During start it consumes ~90% CPU, strace shows, that OSD process 
doing something with LevelDB.

Compact is disabled:
r...@storage07.main01.ceph.apps.prod.int.grcc:~$ cat 
/etc/ceph/ceph.conf | grep compact

#leveldb_compact_on_mount = true

But with debug_leveldb=20 I see, that compaction is running, but why?

2017-07-17 09:27:37.394008 7f4ed2293700  1 leveldb: Compacting 1@1 + 
12@2 files
2017-07-17 09:27:37.593890 7f4ed2293700  1 leveldb: Generated table 
#76778: 277817 keys, 2125970 bytes
2017-07-17 09:27:37.718954 7f4ed2293700  1 leveldb: Generated table 
#76779: 221451 keys, 2124338 bytes
2017-07-17 09:27:37.777362 7f4ed2293700  1 leveldb: Generated table 
#76780: 63755 keys, 809913 bytes
2017-07-17 09:27:37.919094 7f4ed2293700  1 leveldb: Generated table 
#76781: 231475 keys, 2026376 bytes
2017-07-17 09:27:38.035906 7f4ed2293700  1 leveldb: Generated table 
#76782: 190956 keys, 1573332 bytes
2017-07-17 09:27:38.127597 7f4ed2293700  1 leveldb: Generated table 
#76783: 148675 keys, 1260956 bytes
2017-07-17 09:27:38.286183 7f4ed2293700  1 leveldb: Generated table 
#76784: 294105 keys, 2123438 bytes
2017-07-17 09:27:38.469562 7f4ed2293700  1 leveldb: Generated table 
#76785: 299617 keys, 2124267 bytes
2017-07-17 09:27:38.619666 7f4ed2293700  1 leveldb: Generated table 
#76786: 277305 keys, 2124936 bytes
2017-07-17 09:27:38.711423 7f4ed2293700  1 leveldb: Generated table 
#76787: 110536 keys, 951545 bytes
2017-07-17 09:27:38.869917 7f4ed2293700  1 leveldb: Generated table 
#76788: 296199 keys, 2123506 bytes
2017-07-17 09:27:39.028395 7f4ed2293700  1 leveldb: Generated table 
#76789: 248634 keys, 2096715 bytes
2017-07-17 09:27:39.028414 7f4ed2293700  1 leveldb: Compacted 1@1 + 
12@2 files => 21465292 bytes
2017-07-17 09:27:39.053288 7f4ed2293700  1 leveldb: compacted to: 
files[ 0 0 48 549 948 0 0 ]

2017-07-17 09:27:39.054014 7f4ed2293700  1 leveldb: Delete type=2 #76741

Strace:

open("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", O_RDONLY) = 18
stat("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", 
{st_mode=S_IFREG|0644, st_size=2154394, ...}) = 0

mmap(NULL, 2154394, PROT_READ, MAP_SHARED, 18, 0) = 0x7f96a67a
close(18)   = 0
brk(0x55d15664) = 0x55d15664
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [

Re: [ceph-users] upgrade procedure to Luminous

2017-07-17 Thread Lars Marowsky-Bree
On 2017-07-14T15:18:54, Sage Weil  wrote:

> Yes, but how many of those clusters can only upgrade by updating the 
> packages and rebooting?  Our documented procedures have always recommended 
> upgrading the packages, then restarting either mons or osds first and to 
> my recollection nobody has complained.  TBH my first encounter with the 
> "reboot on upgrade" procedure in the Linux world was with Fedora (which I 
> just recently switched to for my desktop)--and FWIW it felt very 
> anachronistic.

Admittedly, it is. This is my main reason for hoping for containers.

My main issue is not that they must be rebooted. In most cases, ceph-mon
can be restarted. My fear is that they *might* be rebooted by a failure
during that time, and it'd have been my expectation that normal
operation does not expose Ceph to such degraded scenarios. Ceph is,
after all, supposedly at least tolerant of one fault at a time.

And I'd obviously have considered upgrades a normal operation, not a
critical phase.

If one considers upgrades an operation that degrades redundancy, sure,
the current behaviour is in line.

> won't see something we haven't.  It also means, in this case, that we can 
> rip out out a ton of legacy code in luminous without having to keep 
> compatibility workarounds in place for another whole LTS cycle (a year!).  

Seriously, welcome to the world of enterprise software and customer
expectations ;-) 1 year! I wish! ;-)

> True, but this is rare, and even so the worst that can happen in this 
> case is the OSDs don't come up until the other mons are upgrade.  If the 
> admin plans to upgrade the mons in succession without lingering with 
> mixed-versions mon the worst-case downtime window is very small--and only 
> kicks in if *more than one* of the mon nodes fails (taking out OSDs in 
> more than one failure domain).

This is an interesting design philosophy in a fault tolerant distributed
system.

> > And customers don't always upgrade all nodes at once in a short period
> > (the benefit of a supposed rolling upgrade cycle), increasing the risk.
> I think they should plan to do this for the mons.  We can make a note 
> stating as much in the upgrade procedure docs?

Yes, we'll have to orchestrate this accordingly.

Upgrade all MONs; restart all MONs (while warning users that this is a
critical time period); start rebooting for the kernel/glibc updates.

> Anyway, does that make sense?  Yes, it means that you can't just reboot in 
> succession if your mons are mixed with OSDs.  But this time adding that 
> restriction let us do the SnapSet and snapdir conversion in a single 
> release, which is a *huge* win and will let us rip out a bunch of ugly OSD 
> code.  We might not have a need for it next time around (and can try to 
> avoid it), but I'm guessing something will come up and it will again be a 
> hard call to make balancing between sloppy/easy upgrades vs simpler 
> code...

The next major transition probably will be from non-containerized L to
fully-containerized N(autilus?). That'll be a fascinating can of worms
anyway. But that would *really* benefit if nodes could be more easily
redeployed and not just restarting daemon processes.

Thanks, at least now we know this is intentional. That was helpful, at
least!


-- 
Architect SDS
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD cache being filled up in small increases instead of 4MB

2017-07-17 Thread Jason Dillaman
On Sat, Jul 15, 2017 at 8:00 PM, Ruben Rodriguez  wrote:
>
>
> On 14/07/17 18:43, Ruben Rodriguez wrote:
>> How to reproduce...
>
> I'll provide more concise details on how to test this behavior:
>
> Ceph config:
>
> [client]
> rbd readahead max bytes = 0 # we don't want forced readahead to fool us
> rbd cache = true
>
> Start a qemu vm, with a rbd image attached with virtio-scsi:
>
>   
>   
> 
>   
>   
> 
> 
> 
>   
>   
>   
>   
>   
> 
>
> Block device parameters, inside the vm:
> NAME ALIGN  MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE   RA WSAME
> sdb  0 4194304 4194304 512 5121 noop  128 40962G
>
> Collect performance statistics from librbd, using command:
>
> $ ceph --admin-daemon /var/run/ceph/ceph-client.[...].asok perf dump
>
> Note the values for:
> - rd: number of read operations done by qemu
> - rd_bytes: length of read requests done by qemu
> - cache_ops_hit: read operations hitting the cache
> - cache_ops_miss: read ops missing the cache
> - data_read: data read from the cache
> - op_r: number of objects sent by the OSD
>
> Perform one small read, not at the beginning of the image (because udev
> may have read it already), at a 4MB boundary line:
>
> dd if=/dev/sda ibs=512 count=1 skip=41943040 iflag=skip_bytes
>
> Do it again advancing 5000 bytes (to not overlap with the previous read)
> Run the perf dump command again
>
> dd if=/dev/sda ibs=512 count=1 skip=41948040 iflag=skip_bytes
> Run the perf dump command again
>
> If you compare the op_r values at each step, you should see a cache miss
> each time, and a object read each time. Same object fetched twice.
>
> IMPACT:
>
> Let's take a look at how the op_r value increases by doing some common
> operations:
>
> - Booting a vm: This operation needs (in my case) ~70MB to be read,
> which include the kernel, initrd and all files read by systemd and
> daemons, until a command prompt appears. Values read
> "rd": 2524,
> "rd_bytes": 69685248,
> "cache_ops_hit": 228,
> "cache_ops_miss": 2268,
> "cache_bytes_hit": 90353664,
> "cache_bytes_miss": 63902720,
> "data_read": 69186560,
> "op": 2295,
> "op_r": 2279,
> That is 2299 objects being fetched from the OSD to read 69MB.
>
> - Greping inside the linux source code (833MB), takes almost 3 minutes.
>   Values get increased to:
> "rd": 65127,
> "rd_bytes": 1081487360,
> "cache_ops_hit": 228,
> "cache_ops_miss": 64885,
> "cache_bytes_hit": 90353664,
> "cache_bytes_miss": 1075672064,
> "data_read": 1080988672,
> "op_r": 64896,
> That is over 60.000 objects fetched to read <1GB, and *0* cache hits.
> Optimized, this should take 10 seconds, and fetch ~700 objects.
>
> Is my Qemu implementation completely broken? Or is this expected? Please
> help!

I recommend watching the IO patterns via blktrace. The "60.000 objects
fetched" is a semi-misnomer -- it's saying that ~65,000 individual IO
operations were sent to the OSD. This doesn't imply that the
operations are against unique objects (i.e. there might be a lot of
ops hitting the same object). Your average IO size is at least 16K so
there must be some level of OS request merging / readhead since
otherwise I would expect ~2 million 512 byte IO requests.

>
> --
> Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation
> GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409
> https://fsf.org | https://gnu.org
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD cache being filled up in small increases instead of 4MB

2017-07-17 Thread Jason Dillaman
Are you 100% positive that your files are actually stored sequentially
on the block device? I would recommend running blktrace to verify the
IO pattern from your use-case.

On Sat, Jul 15, 2017 at 5:42 PM, Ruben Rodriguez  wrote:
>
>
> On 15/07/17 09:43, Nick Fisk wrote:
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> Gregory Farnum
>>> Sent: 15 July 2017 00:09
>>> To: Ruben Rodriguez 
>>> Cc: ceph-users 
>>> Subject: Re: [ceph-users] RBD cache being filled up in small increases 
>>> instead
>>> of 4MB
>>>
>>> On Fri, Jul 14, 2017 at 3:43 PM, Ruben Rodriguez  wrote:

 I'm having an issue with small sequential reads (such as searching
 through source code files, etc), and I found that multiple small reads
 withing a 4MB boundary would fetch the same object from the OSD
 multiple times, as it gets inserted into the RBD cache partially.

 How to reproduce: rbd image accessed from a Qemu vm using virtio-scsi,
 writethrough cache on. Monitor with perf dump on the rbd client. The
 image is filled up with zeroes in advance. Rbd readahead is off.

 1 - Small read from a previously unread section of the disk:
 dd if=/dev/sdb ibs=512 count=1 skip=41943040 iflag=skip_bytes
 Notes: dd cannot read less than 512 bytes. The skip is arbitrary to
 avoid the beginning of the disk, which would have been read at boot.

 Expected outcomes: perf dump should show a +1 increase on values rd,
 cache_ops_miss and op_r. This happens correctly.
 It should show a 4194304 increase in data_read as a whole object is
 put into the cache. Instead it increases by 4096. (not sure why 4096, btw).

 2 - Small read from less than 4MB distance (in the example, +5000b).
 dd if=/dev/sdb ibs=512 count=1 skip=41948040 iflag=skip_bytes Expected
 outcomes: perf dump should show a +1 increase on cache_ops_hit.
 Instead cache_ops_miss increases.
 It should show a 4194304 increase in data_read as a whole object is
 put into the cache. Instead it increases by 4096.
 op_r should not increase. Instead it increases by one, indicating that
 the object was fetched again.

 My tests show that this could be causing a 6 to 20-fold performance
 loss in small sequential reads.

 Is it by design that the RBD cache only inserts the portion requested
 by the client instead of the whole last object fetched? Could it be a
 tunable in any of my layers (fs, block device, qemu, rbd...) that is
 preventing this?
>>>
>>> I don't know the exact readahead default values in that stack, but there's 
>>> no
>>> general reason to think RBD (or any Ceph component) will read a whole
>>> object at a time. In this case, you're asking for 512 bytes and it appears 
>>> to
>>> have turned that into a 4KB read (probably the virtual block size in use?),
>>> which seems pretty reasonable — if you were asking for 512 bytes out of
>>> every 4MB and it was reading 4MB each time, you'd probably be wondering
>>> why you were only getting 1/8192 the expected bandwidth. ;) -Greg
>>
>> I think the general readahead logic might be a bit more advanced in the 
>> Linux kernel vs using readahead from the librbd client.
>
> Yes, the problems I'm having should be corrected by the vm kernel
> issuing larger reads, but I'm failing to get that to happen.
>
>> The kernel will watch how successful each readahead is and scale as 
>> necessary. You might want to try uping the read_ahead_kb for the block 
>> device in the VM. Something between 4MB to 32MB works well for RBD's, but 
>> make sure you have a 4.x kernel as some fixes to readahead max size were 
>> introduced and not sure if they ever got backported.
>
> I'm using kernel 4.4 and 4.8. I have readahead, min_io_size,
> optimum_io_size and max_sectors_kb set to 4MB. It helps in some use
> cases, like fio or dd tests, but not with real world tests like cp,
> grep, tar on a large pool of small files.
>
> From all I can tell, optimal read performance would happen when the vm
> kernel reads in 4MB increases _every_ _time_. I can force that with an
> ugly hack (putting the files inside a formatted big file, mounted as
> loop) and gives a 20-fold performance gain. But that is just silly...
>
> I documented that experiment on this thread:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018924.html
>
>> Unless you tell the rbd client to not disable readahead after reading the 
>> 1st x number of bytes (rbd readahead disable after bytes=0), it will stop 
>> reading ahead and will only cache exactly what is requested by the client.
>
> I realized that, so as a proof of concept I made some changes to the
> readahead mechanism. I force it on, make it trigger every time, and made
> the max and min readahead size be 4MB. This way I ensure whole objects
> get into the cache, and I get a 6-fold performance gain reading small files.
>
> This is just a

Re: [ceph-users] RBD cache being filled up in small increases instead of 4MB

2017-07-17 Thread Jason Dillaman
On Sat, Jul 15, 2017 at 5:35 PM, Ruben Rodriguez  wrote:
>
>
> On 15/07/17 15:33, Jason Dillaman wrote:
>> On Sat, Jul 15, 2017 at 9:43 AM, Nick Fisk  wrote:
>>> Unless you tell the rbd client to not disable readahead after reading the 
>>> 1st x number of bytes (rbd readahead disable after bytes=0), it will stop 
>>> reading ahead and will only cache exactly what is requested by the client.
>>
>> The default is to disable librbd readahead caching after reading 50MB
>> -- since we expect the OS to take over and do a much better job.
>
> I understand having the expectation that the client would do the right
> thing, but from all I can tell it is not the case. I've run out of ways
> to try to make virtio-scsi (or any other driver) *always* read in 4MB
> increments. "minimum_io_size" seems to be ignored.
> BTW I just sent this patch to Qemu (and I'm open to any suggestions on
> that side!): https://bugs.launchpad.net/qemu/+bug/1600563
>
> But this expectation you mention still has a problem: if you would only
> put in the RBD cache what the OS specifically requested, the chances of
> that data being requested twice would be pretty low, since the OS page
> cache would take care of it better than the RBD cache anyway. So why
> bother having a read cache if it doesn't fetch anything extra?

You are correct that the read cache is of little value -- the OS will
always do a better job than we can at caching the necessary data. The
main use-case for the read-cache, in general, is to just service the
readahead or in the cases where the librbd client application isn't
providing its own cache (e.g. direct IO).

> Incidentally, if the RBD cache were to include the whole object instead
> of just the requested portion, RBD readahead would be unnecessary.

Not necessarily since readahead can fetch the next N objects when it
detects a sequential read, with the impact of slowing down all read IO
for the other (vast majority) of IO requests.

> --
> Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation
> GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409
> https://fsf.org | https://gnu.org



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI production ready?

2017-07-17 Thread Jason Dillaman
On Sat, Jul 15, 2017 at 11:01 PM, Alvaro Soto  wrote:
> Hi guys,
> does anyone know any news about in what release iSCSI interface is going to
> be production ready, if not yet?

There are several flavors of RBD iSCSI implementations that are in-use
by the community. We are working to solidify the integration with LIO
TCMU (via tcmu-runner) right now for Luminous [1].

> I mean without the use of a gateway, like a different endpoint connector to
> a CEPH cluster.

I'm not sure what you mean here.

> Thanks in advance.
> Best.
>
> --
>
> ATTE. Alvaro Soto Escobar
>
> --
> Great people talk about ideas,
> average people talk about things,
> small people talk ... about other people.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


[1] https://github.com/ceph/ceph/pull/16182

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] missing feature 400000000000000 ?

2017-07-17 Thread Massimiliano Cuttini

Hi Riccardo,

using ceph-fuse will add extra layer.
Consider to use instead ceph-nbd which is a porting to use network 
device blocks.
This should be faster and allow you to use latest tunables (which it's 
better).




Il 17/07/2017 10:56, Riccardo Murri ha scritto:

Thanks a lot to all!  Both the suggestion to use "ceph osd tunables
hammer" and to use "ceph-fuse" instead solved the issue.

Riccardo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ANN: ElastiCluster to deploy CephFS

2017-07-17 Thread Riccardo Murri
Hello,

I would just like to let you know that ElastiCluster [1], a command-line
tool to create and configure compute clusters on various IaaS clouds
(OpenStack, AWS, GCE, and anything supported by Apache LibCloud), is
now supporting CephFS as a shared cluster filesystem [2].

Although ElastiCluster's main purpose is to deploy temporary compute
clusters on IaaS clouds (with cluster filesystems being mainly a way to
share / store data to be processed), it is totally possible to install
just the filesystem part, e.g., for testing.

I would love any feedback on how to better configure CephFS by default
for compute cluster users, or in general on DO's and DON'Ts for Ceph on
cloud-based VMs.

Cheers,
Riccardo

P.S. Having been able to pass from "0 Ceph experience" to writing a
working playbook for CephFS is a testimony of the great documentation
and the helpfullness of this mailing list. Thank you so much! Ceph is
awesome and so is its community!

[1]: http://elasticluster.readthedocs.io/
[2]: http://elasticluster.readthedocs.io/en/latest/playbooks.html#cephfs


--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/#Riccardo.Murri

S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)

Tel: +41 44 635 4208
Fax: +41 44 635 6888
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any recommendations for CephFS metadata/data pool sizing?

2017-07-17 Thread Riccardo Murri
(David Turner, Mon, Jul 03, 2017 at 03:12:28PM +:)
> I would also recommend keeping each pool at base 2 numbers of PGs.  So with
> the 512 PGs example, do 512 PGs for the data pool and 64 PGs for the
> metadata pool.

Thanks for all the suggestions!

Eventually I went with a 1:7 metadata:data split as a default for
testing the FS.

Thanks,
Riccardo

-- 
Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What caps are necessary for FUSE-mounts of the FS?

2017-07-17 Thread Riccardo Murri
Hi John, all,

(John Spray, Thu, Jun 29, 2017 at 12:50:47PM +0100:)
> On Thu, Jun 29, 2017 at 11:42 AM, Riccardo Murri
>  wrote:
> > The documentation at  states:
> >
> > """
> > Before mounting a Ceph File System in User Space (FUSE), ensure that
> > the client host has a copy of the Ceph configuration file and a
> > keyring with CAPS for the Ceph metadata server.
> > """
> >
> > Now, I have two questions:
> >
> > 1. What capabilities should be given to the FS user in the keyring?
> > Would this be creating a keyring file with the minimally-required caps
> > to mount the FS read+write?
> >
> > ceph-authtool --create-keyring ceph.fs.keyring --gen-key
> > --caps mds 'allow rwx'
> 
> Docs are here:
> http://docs.ceph.com/docs/master/cephfs/client-auth/

Thanks, this was useful.

It might be worth to add a link to this page from the "Mount Ceph FS
using FUSE" page, as the "client auth" page you mentioned comes later in
the table of contents / page hierarchy of "Using CephFS".

Riccardo

-- 
Riccardo Murri / Email: riccardo.mu...@gmail.com / Tel.: +41 77 458 98 32
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster network question

2017-07-17 Thread Laszlo Budai

Hi David,

thank you for the answer.

It seems that in case of a dedicated cluster network it is needed to have the 
monitors also connected to that network, otherwise ceph-deploy fails:

# ceph-deploy new --public-network 10.1.1.0/24 --cluster-network=10.3.3.0/24 
monitor{1,2,3}
...
[2017-07-17 10:59:29,301][ceph_deploy][ERROR ] RuntimeError: subnet 
(10.3.3.0/24) is not valid for any of the ips found [u'10.1.1.5', 
u'192.168.100.5']

Kind regards,
Laszlo


On 14.07.2017 19:39, David Turner wrote:

Only the osds use the dedicated cluster network.  Ping the mons and mds 
services on the network will do nothing.


On Fri, Jul 14, 2017, 11:39 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:

Dear all,

I'm reading the docs at 
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/ 
regarding the cluster network and I wonder which nodes are connected to the 
dedicated cluster network?

The digram on the mentioned page only shows the OSDs connected to the cluster 
network, while the text says: "To support two networks, each Ceph Node will need to 
have more than one NIC." - which would mean that OSD + MON + MSD all should be 
connected to the dedicated cluster network. Which one is correct? Can I have the 
dedicated cluster network only for the OSDs? while the MONs are only connected to the 
public net?

Thank you!
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems getting nfs-ganesha with cephfs backend to work.

2017-07-17 Thread Ricardo Dias
Hi,

Not sure if using the root path in Pseudo is valid.

Change Pseudo to something like /mypseudofolder

And see if that solves the problem.

Ricardo

> On 17 Jul 2017, at 09:45, Micha Krause  wrote:
> 
> Hi,
> 
> im trying to get nfs-ganesha to work with ceph as the FSAL Backend.
> 
> Um using Version 2.4.5, this is my ganeasha.conf:
> 
> EXPORT
> {
>Export_ID=1;
>Path = /;
>Pseudo = /;
>Access_Type = RW;
>Protocols = 3;
>Transports = TCP;
>FSAL {
>Name = CEPH;
>User_Id = "test-cephfs";
>Secret_Access_Key = "***";
>}
>CLIENT {
>Clients = client-fqdn;
>Access_Type = RO;
>}
> }
> 
> Here are some loglines from ganeshea indicating that there is a problem:
> 
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> load_fsal :NFS STARTUP :DEBUG :Loading FSAL CEPH with 
> /usr/lib/x86_64-linux-gnu/ganesha/libfsalceph.so
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] init 
> :FSAL :DEBUG :Ceph module registering.
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> init_config :FSAL :DEBUG :Ceph module setup.
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> create_export :FSAL :CRIT :Unable to mount Ceph cluster for /.
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> mdcache_fsal_create_export :FSAL :MAJ :Failed to call create_export on 
> underlying FSAL Ceph
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> fsal_put :FSAL :INFO :FSAL Ceph now unused
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> fsal_cfg_commit :CONFIG :CRIT :Could not create export for (/) to (/)
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> build_default_root :CONFIG :DEBUG :Allocating Pseudo root export
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> pseudofs_create_export :FSAL :DEBUG :Created exp 0x55daf1020d80 - /
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> build_default_root :CONFIG :INFO :Export 0 (/) successfully created
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] main 
> :NFS STARTUP :WARN :No export entries found in configuration file !!!
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:25): 
> 1 validation errors in block FSAL
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:25): 
> Errors processing block (FSAL)
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:11): 
> 1 validation errors in block EXPORT
> 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
> config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:11): 
> Errors processing block (EXPORT)
> 
> I have no problems mounting cephfs with the kernel-client on this machine, 
> using the same authentication data.
> 
> 
> Has anyone gotten this to work, and maybe could give me a hint on what I'm 
> doing wrong?
> 
> 
> 
> Micha Krause
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] missing feature 400000000000000 ?

2017-07-17 Thread Riccardo Murri
Thanks a lot to all!  Both the suggestion to use "ceph osd tunables
hammer" and to use "ceph-fuse" instead solved the issue.

Riccardo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problems getting nfs-ganesha with cephfs backend to work.

2017-07-17 Thread Micha Krause

Hi,

im trying to get nfs-ganesha to work with ceph as the FSAL Backend.

Um using Version 2.4.5, this is my ganeasha.conf:

EXPORT
{
Export_ID=1;
Path = /;
Pseudo = /;
Access_Type = RW;
Protocols = 3;
Transports = TCP;
FSAL {
Name = CEPH;
User_Id = "test-cephfs";
Secret_Access_Key = "***";
}
CLIENT {
Clients = client-fqdn;
Access_Type = RO;
}
}

Here are some loglines from ganeshea indicating that there is a problem:

17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
load_fsal :NFS STARTUP :DEBUG :Loading FSAL CEPH with 
/usr/lib/x86_64-linux-gnu/ganesha/libfsalceph.so
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] init 
:FSAL :DEBUG :Ceph module registering.
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
init_config :FSAL :DEBUG :Ceph module setup.
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
create_export :FSAL :CRIT :Unable to mount Ceph cluster for /.
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
mdcache_fsal_create_export :FSAL :MAJ :Failed to call create_export on 
underlying FSAL Ceph
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
fsal_put :FSAL :INFO :FSAL Ceph now unused
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
fsal_cfg_commit :CONFIG :CRIT :Could not create export for (/) to (/)
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
build_default_root :CONFIG :DEBUG :Allocating Pseudo root export
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
pseudofs_create_export :FSAL :DEBUG :Created exp 0x55daf1020d80 - /
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
build_default_root :CONFIG :INFO :Export 0 (/) successfully created
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] main 
:NFS STARTUP :WARN :No export entries found in configuration file !!!
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:25): 1 
validation errors in block FSAL
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:25): 
Errors processing block (FSAL)
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:11): 1 
validation errors in block EXPORT
17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] 
config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:11): 
Errors processing block (EXPORT)

I have no problems mounting cephfs with the kernel-client on this machine, 
using the same authentication data.


Has anyone gotten this to work, and maybe could give me a hint on what I'm 
doing wrong?



Micha Krause
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com