Re: [ceph-users] CephFS unexplained writes

2015-05-07 Thread Gregory Farnum
Sam? This looks to be the HashIndex::SUBDIR_ATTR, but I don't know
exactly what it's for nor why it would be getting constantly created
and removed on a pure read workload...

On Thu, May 7, 2015 at 2:55 PM, Erik Logtenberg e...@logtenberg.eu wrote:
 It does sound contradictory: why would read operations in cephfs result
 in writes to disk? But they do. I upgraded to Hammer last week and I am
 still seeing this.

 The setup is as follows:

 EC-pool on hdd's for data
 replicated pool on ssd's for data-cache
 replicated pool on ssd's for meta-data

 Now whenever I start doing heavy reads on cephfs, I see intense bursts
 of write operations on the hdd's. The reads I'm doing are things like
 reading a large file (streaming a video), or running a big rsync job
 with --dry-run (so it just checks meta-data). No clue why that would
 have any effect on the hdd's, but it does.

 Now, to further figure out what's going on, I tried using lsof, atop,
 iotop, but those tools don't provide the necessary information. In lsof
 I just see a whole bunch of files opened at any time, but it doesn't
 change much during these tests.
 In atop and iotop I can clearly see that the hdd's are doing a lot of
 writes when I'm reading in cephfs, but those tools can't tell me what
 those writes are.

 So I tried strace, which can trace file operations and attach to running
 processes.
 # strace -f -e trace=file -p 5076
 This gave me an idea of what was going on. 5076 is the process id of the
 osd for one of the hdd's. I saw mostly stat's and open's, but those are
 all reads, not writes. Of course btrfs can cause writes when doing reads
 (atime), but I have the osd mounted with noatime.
 The only write operations that I saw a lot of are these:

 [pid  5350]
 getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
 user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17
 [pid  5350]
 setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
 user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0
 [pid  5350]
 removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
 user.cephos.phash.contents@1) = -1 ENODATA (No data available)

 So it appears that the osd's aren't writing actual data to disk, but
 metadata in the form of xattr's. Can anyone explain what this setting
 and removing of xattr's could be for?

 Kind regards,

 Erik.


 On 03/16/2015 10:44 PM, Gregory Farnum wrote:
 The information you're giving sounds a little contradictory, but my
 guess is that you're seeing the impacts of object promotion and
 flushing. You can sample the operations the OSDs are doing at any
 given time by running ops_in_progress (or similar, I forget exact
 phrasing) command on the OSD admin socket. I'm not sure if rados df
 is going to report cache movement activity or not.

 That though would mostly be written to the SSDs, not the hard drives —
 although the hard drives could still get metadata updates written when
 objects are flushed. What data exactly are you seeing that's leading
 you to believe writes are happening against these drives? What is the
 exact CephFS and cache pool configuration?
 -Greg

 On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote:
 Hi,

 I forgot to mention: while I am seeing these writes in iotop and
 /proc/diskstats for the hdd's, I am -not- seeing any writes in rados
 df for the pool residing on these disks. There is only one pool active
 on the hdd's and according to rados df it is getting zero writes when
 I'm just reading big files from cephfs.

 So apparently the osd's are doing some non-trivial amount of writing on
 their own behalf. What could it be?

 Thanks,

 Erik.


 On 03/16/2015 10:26 PM, Erik Logtenberg wrote:
 Hi,

 I am getting relatively bad performance from cephfs. I use a replicated
 cache pool on ssd in front of an erasure coded pool on rotating media.

 When reading big files (streaming video), I see a lot of disk i/o,
 especially writes. I have no clue what could cause these writes. The
 writes are going to the hdd's and they stop when I stop reading.

 I mounted everything with noatime and nodiratime so it shouldn't be
 that. On a related note, the Cephfs metadata is stored on ssd too, so
 metadata-related changes shouldn't hit the hdd's anyway I think.

 Any thoughts? How can I get more information about what ceph is doing?
 Using iotop I only see that the osd processes are busy but it doesn't
 give many hints as to what they are doing.

 Thanks,

 Erik.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 

Re: [ceph-users] CephFS unexplained writes

2015-05-07 Thread Erik Logtenberg
It does sound contradictory: why would read operations in cephfs result
in writes to disk? But they do. I upgraded to Hammer last week and I am
still seeing this.

The setup is as follows:

EC-pool on hdd's for data
replicated pool on ssd's for data-cache
replicated pool on ssd's for meta-data

Now whenever I start doing heavy reads on cephfs, I see intense bursts
of write operations on the hdd's. The reads I'm doing are things like
reading a large file (streaming a video), or running a big rsync job
with --dry-run (so it just checks meta-data). No clue why that would
have any effect on the hdd's, but it does.

Now, to further figure out what's going on, I tried using lsof, atop,
iotop, but those tools don't provide the necessary information. In lsof
I just see a whole bunch of files opened at any time, but it doesn't
change much during these tests.
In atop and iotop I can clearly see that the hdd's are doing a lot of
writes when I'm reading in cephfs, but those tools can't tell me what
those writes are.

So I tried strace, which can trace file operations and attach to running
processes.
# strace -f -e trace=file -p 5076
This gave me an idea of what was going on. 5076 is the process id of the
osd for one of the hdd's. I saw mostly stat's and open's, but those are
all reads, not writes. Of course btrfs can cause writes when doing reads
(atime), but I have the osd mounted with noatime.
The only write operations that I saw a lot of are these:

[pid  5350]
getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17
[pid  5350]
setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0
[pid  5350]
removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
user.cephos.phash.contents@1) = -1 ENODATA (No data available)

So it appears that the osd's aren't writing actual data to disk, but
metadata in the form of xattr's. Can anyone explain what this setting
and removing of xattr's could be for?

Kind regards,

Erik.


On 03/16/2015 10:44 PM, Gregory Farnum wrote:
 The information you're giving sounds a little contradictory, but my
 guess is that you're seeing the impacts of object promotion and
 flushing. You can sample the operations the OSDs are doing at any
 given time by running ops_in_progress (or similar, I forget exact
 phrasing) command on the OSD admin socket. I'm not sure if rados df
 is going to report cache movement activity or not.
 
 That though would mostly be written to the SSDs, not the hard drives —
 although the hard drives could still get metadata updates written when
 objects are flushed. What data exactly are you seeing that's leading
 you to believe writes are happening against these drives? What is the
 exact CephFS and cache pool configuration?
 -Greg
 
 On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote:
 Hi,

 I forgot to mention: while I am seeing these writes in iotop and
 /proc/diskstats for the hdd's, I am -not- seeing any writes in rados
 df for the pool residing on these disks. There is only one pool active
 on the hdd's and according to rados df it is getting zero writes when
 I'm just reading big files from cephfs.

 So apparently the osd's are doing some non-trivial amount of writing on
 their own behalf. What could it be?

 Thanks,

 Erik.


 On 03/16/2015 10:26 PM, Erik Logtenberg wrote:
 Hi,

 I am getting relatively bad performance from cephfs. I use a replicated
 cache pool on ssd in front of an erasure coded pool on rotating media.

 When reading big files (streaming video), I see a lot of disk i/o,
 especially writes. I have no clue what could cause these writes. The
 writes are going to the hdd's and they stop when I stop reading.

 I mounted everything with noatime and nodiratime so it shouldn't be
 that. On a related note, the Cephfs metadata is stored on ssd too, so
 metadata-related changes shouldn't hit the hdd's anyway I think.

 Any thoughts? How can I get more information about what ceph is doing?
 Using iotop I only see that the osd processes are busy but it doesn't
 give many hints as to what they are doing.

 Thanks,

 Erik.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS unexplained writes

2015-03-16 Thread Gregory Farnum
The information you're giving sounds a little contradictory, but my
guess is that you're seeing the impacts of object promotion and
flushing. You can sample the operations the OSDs are doing at any
given time by running ops_in_progress (or similar, I forget exact
phrasing) command on the OSD admin socket. I'm not sure if rados df
is going to report cache movement activity or not.

That though would mostly be written to the SSDs, not the hard drives —
although the hard drives could still get metadata updates written when
objects are flushed. What data exactly are you seeing that's leading
you to believe writes are happening against these drives? What is the
exact CephFS and cache pool configuration?
-Greg

On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote:
 Hi,

 I forgot to mention: while I am seeing these writes in iotop and
 /proc/diskstats for the hdd's, I am -not- seeing any writes in rados
 df for the pool residing on these disks. There is only one pool active
 on the hdd's and according to rados df it is getting zero writes when
 I'm just reading big files from cephfs.

 So apparently the osd's are doing some non-trivial amount of writing on
 their own behalf. What could it be?

 Thanks,

 Erik.


 On 03/16/2015 10:26 PM, Erik Logtenberg wrote:
 Hi,

 I am getting relatively bad performance from cephfs. I use a replicated
 cache pool on ssd in front of an erasure coded pool on rotating media.

 When reading big files (streaming video), I see a lot of disk i/o,
 especially writes. I have no clue what could cause these writes. The
 writes are going to the hdd's and they stop when I stop reading.

 I mounted everything with noatime and nodiratime so it shouldn't be
 that. On a related note, the Cephfs metadata is stored on ssd too, so
 metadata-related changes shouldn't hit the hdd's anyway I think.

 Any thoughts? How can I get more information about what ceph is doing?
 Using iotop I only see that the osd processes are busy but it doesn't
 give many hints as to what they are doing.

 Thanks,

 Erik.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS unexplained writes

2015-03-16 Thread Erik Logtenberg
Hi,

I am getting relatively bad performance from cephfs. I use a replicated
cache pool on ssd in front of an erasure coded pool on rotating media.

When reading big files (streaming video), I see a lot of disk i/o,
especially writes. I have no clue what could cause these writes. The
writes are going to the hdd's and they stop when I stop reading.

I mounted everything with noatime and nodiratime so it shouldn't be
that. On a related note, the Cephfs metadata is stored on ssd too, so
metadata-related changes shouldn't hit the hdd's anyway I think.

Any thoughts? How can I get more information about what ceph is doing?
Using iotop I only see that the osd processes are busy but it doesn't
give many hints as to what they are doing.

Thanks,

Erik.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS unexplained writes

2015-03-16 Thread Erik Logtenberg
Hi,

I forgot to mention: while I am seeing these writes in iotop and
/proc/diskstats for the hdd's, I am -not- seeing any writes in rados
df for the pool residing on these disks. There is only one pool active
on the hdd's and according to rados df it is getting zero writes when
I'm just reading big files from cephfs.

So apparently the osd's are doing some non-trivial amount of writing on
their own behalf. What could it be?

Thanks,

Erik.


On 03/16/2015 10:26 PM, Erik Logtenberg wrote:
 Hi,
 
 I am getting relatively bad performance from cephfs. I use a replicated
 cache pool on ssd in front of an erasure coded pool on rotating media.
 
 When reading big files (streaming video), I see a lot of disk i/o,
 especially writes. I have no clue what could cause these writes. The
 writes are going to the hdd's and they stop when I stop reading.
 
 I mounted everything with noatime and nodiratime so it shouldn't be
 that. On a related note, the Cephfs metadata is stored on ssd too, so
 metadata-related changes shouldn't hit the hdd's anyway I think.
 
 Any thoughts? How can I get more information about what ceph is doing?
 Using iotop I only see that the osd processes are busy but it doesn't
 give many hints as to what they are doing.
 
 Thanks,
 
 Erik.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com