Re: [ceph-users] 答复: How does rbd preserve the consistency of WRITE requests that span across multiple objects?

2017-05-24 Thread Jason Dillaman
A userspace application should issue fsync or fdatasync calls where appropriate. On Wed, May 24, 2017 at 10:15 PM, 许雪寒 wrote: > Thanks for your reply:-) > > I've got your point. By the way, if an application opens a file WITHOUT > setting the O_DIRECT or O_SYNC, then it sequentially issues two o

[ceph-users] 答复: How does rbd preserve the consistency of WRITE requests that span across multiple objects?

2017-05-24 Thread 许雪寒
Thanks for your reply:-) I've got your point. By the way, if an application opens a file WITHOUT setting the O_DIRECT or O_SYNC, then it sequentially issues two overlapping glibc write operations to the underlying file system. As far as I understand the linux file system, those writes might not

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Daniel K
Yes -- the crashed server also mounted cephfs as a client, and also likely had active writes to the file when it crashed. I have the max file size set to 17,592,186,044,416 -- but this file was about 5.8TB. The likely reason for the crash? The file was mounted as a fileio backstore to LIO, which

[ceph-users] Error EACCES: access denied

2017-05-24 Thread Ali Moeinvaziri
Hi, I'm new to ceph and trying to set up a few test nodes. On activating first osd, the command ceph-deploy fails pointing to: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /var/lib/ceph/tmp/mnt.i1cISA/keyring osd a

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Gregory Farnum
On Wed, May 24, 2017 at 3:15 AM, John Spray wrote: > On Tue, May 23, 2017 at 11:41 PM, Daniel K wrote: >> Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs >> enroute. >> >> I was "beating up" on the cluster, and had been writing to a 6TB file in >> CephFS for several hours

Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-24 Thread Gregory Farnum
On Wed, May 24, 2017 at 12:30 PM, John Spray wrote: > On Wed, May 24, 2017 at 8:17 PM, Jake Grimmett wrote: >> Hi John, >> That's great, thank you so much for the advice. >> Some of our users have massive files so this would have been a big block. >> >> Is there any particular reason for having a

Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-24 Thread John Spray
On Wed, May 24, 2017 at 8:17 PM, Jake Grimmett wrote: > Hi John, > That's great, thank you so much for the advice. > Some of our users have massive files so this would have been a big block. > > Is there any particular reason for having a file size limit? Without the size limit, a user can create

Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-24 Thread Jake Grimmett
Hi John, That's great, thank you so much for the advice. Some of our users have massive files so this would have been a big block. Is there any particular reason for having a file size limit? Would setting max_file_size to 0 remove all limits? Thanks again, Jake On 24 May 2017 19:45:52 BST, Jo

[ceph-users] Help build a drive reliability service!

2017-05-24 Thread Patrick McGarry
Hey cephers, Just wanted to share the genesis of a new community project that could use a few helping hands (and any amount of feedback/discussion that you might like to offer). As a bit of backstory, around 2013 the Backblaze folks started publishing statistics about hard drive reliability from

Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-24 Thread John Spray
On Wed, May 24, 2017 at 7:41 PM, Brady Deetz wrote: > Are there any repercussions to configuring this on an existing large fs? No. It's just a limit that's enforced at the point of appending to files or setting their size, it doesn't affect how anything is stored. John > On Wed, May 24, 2017 a

Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-24 Thread Brady Deetz
Are there any repercussions to configuring this on an existing large fs? On Wed, May 24, 2017 at 1:36 PM, John Spray wrote: > On Wed, May 24, 2017 at 7:19 PM, Jake Grimmett > wrote: > > Dear All, > > > > I've been testing out cephfs, and bumped into what appears to be an upper > > file size lim

Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-24 Thread John Spray
On Wed, May 24, 2017 at 7:19 PM, Jake Grimmett wrote: > Dear All, > > I've been testing out cephfs, and bumped into what appears to be an upper > file size limit of ~1.1TB > > e.g: > > [root@cephfs1 ~]# time rsync --progress -av /ssd/isilon_melis.tar > /ceph/isilon_melis.tar > sending incremental

[ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-24 Thread Jake Grimmett
Dear All, I've been testing out cephfs, and bumped into what appears to be an upper file size limit of ~1.1TB e.g: [root@cephfs1 ~]# time rsync --progress -av /ssd/isilon_melis.tar /ceph/isilon_melis.tar sending incremental file list isilon_melis.tar 1099341824000 54% 237.51MB/s1:02:05 rsy

[ceph-users] Non efficient implementation of LRC?

2017-05-24 Thread Oleg Kolosov
Hi In minimum_to_decode function of ErasureCodeLrc.c , I suspect non efficient implementation. More specifically, instead of reading the minimum, we read the maximum. The problem is in this case: // // Get all available chunks in that layer to recover the // missing one(s). // set_difference(i

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Daniel K
Networking is 10Gig. I notice recovery IO is wildly variable, I assume that's normal. Very little load as this is yet to go into production, I was "seeing what it would handle" at the time it broke. I checked this morning and the slow request had gone and I could access the blocked file again. A

Re: [ceph-users] Jewel upgrade and feature set mismatch

2017-05-24 Thread Ilya Dryomov
On Wed, May 24, 2017 at 4:27 PM, Shain Miley wrote: > Hi, > > Thanks for all your help so far...very useful information indeed. > > > Here is the debug output from the file you referenced below: > > > root@rbd1:/sys/kernel/debug/ceph/504b5794-34bd-44e7-a8c3-0494cf800c23.client67751889# > cat osdc

Re: [ceph-users] How does rbd preserve the consistency of WRITE requests that span across multiple objects?

2017-05-24 Thread Jason Dillaman
Just like a regular block device, re-orders are permitted between write barriers/flushes. For example, if I had a HDD with 512 byte sectors and I attempted to write 4K, there is no guarantee what the disk will look like if you had a crash mid-write or if you concurrently issued an overlapping write

Re: [ceph-users] Large OSD omap directories (LevelDBs)

2017-05-24 Thread george.vasilakakos
Hi Greg, > This does sound weird, but I also notice that in your earlier email you > seemed to have only ~5k PGs across ~1400 OSDs, which is a pretty > low number. You may just have a truly horrible PG balance; can you share > more details (eg ceph osd df)? Our distribution is pretty bad, we're

Re: [ceph-users] Jewel upgrade and feature set mismatch

2017-05-24 Thread Shain Miley
Hi, Thanks for all your help so far...very useful information indeed. Here is the debug output from the file you referenced below: root@rbd1:/sys/kernel/debug/ceph/504b5794-34bd-44e7-a8c3-0494cf800c23.client67751889# cat osdc 2311osd144 3.1347f3bc rb.0.25f2ab0.238e1f29.

Re: [ceph-users] Jewel upgrade and feature set mismatch

2017-05-24 Thread Ilya Dryomov
On Wed, May 24, 2017 at 1:47 PM, Shain Miley wrote: > Hello, > We just upgraded from Hammer to Jewel, and after the cluster once again > reported a healthy state I set the crush tunables to ‘optimal’ (from > legacy). > 12 hours later and the cluster is almost done with the pg remapping under > the

Re: [ceph-users] Internalls of RGW data store

2017-05-24 Thread Jens Rosenboom
2017-05-24 6:26 GMT+00:00 Anton Dmitriev : > Hi > > Correct me if I am wrong: when uploading file to RGW it becomes split into > stripe units and this stripe units mapped to RADOS objects. This RADOS > objects are files on OSD filestore. Yes, see this blog post that explains this in a bit more det

[ceph-users] Jewel upgrade and feature set mismatch

2017-05-24 Thread Shain Miley
Hello, We just upgraded from Hammer to Jewel, and after the cluster once again reported a healthy state I set the crush tunables to ‘optimal’ (from legacy). 12 hours later and the cluster is almost done with the pg remapping under the new rules. The issue I am having is the server where we mount

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread John Spray
On Tue, May 23, 2017 at 11:41 PM, Daniel K wrote: > Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs > enroute. > > I was "beating up" on the cluster, and had been writing to a 6TB file in > CephFS for several hours, during which I changed the crushmap to better > match my

[ceph-users] Bug in OSD Maps

2017-05-24 Thread Stuart Harland
Hello I think I’m running into a bug that is described at http://tracker.ceph.com/issues/14213 for Hammer. However I’m running the latest version of Jewel 10.2.7, although I’m in the middle of upgrading the cluster (from 10.2.5). At first it was on a couple of nodes, but now it seems to be mor