Re: [ceph-users] CephFS unexplained writes
Sam? This looks to be the HashIndex::SUBDIR_ATTR, but I don't know exactly what it's for nor why it would be getting constantly created and removed on a pure read workload... On Thu, May 7, 2015 at 2:55 PM, Erik Logtenberg e...@logtenberg.eu wrote: It does sound contradictory: why would read operations in cephfs result in writes to disk? But they do. I upgraded to Hammer last week and I am still seeing this. The setup is as follows: EC-pool on hdd's for data replicated pool on ssd's for data-cache replicated pool on ssd's for meta-data Now whenever I start doing heavy reads on cephfs, I see intense bursts of write operations on the hdd's. The reads I'm doing are things like reading a large file (streaming a video), or running a big rsync job with --dry-run (so it just checks meta-data). No clue why that would have any effect on the hdd's, but it does. Now, to further figure out what's going on, I tried using lsof, atop, iotop, but those tools don't provide the necessary information. In lsof I just see a whole bunch of files opened at any time, but it doesn't change much during these tests. In atop and iotop I can clearly see that the hdd's are doing a lot of writes when I'm reading in cephfs, but those tools can't tell me what those writes are. So I tried strace, which can trace file operations and attach to running processes. # strace -f -e trace=file -p 5076 This gave me an idea of what was going on. 5076 is the process id of the osd for one of the hdd's. I saw mostly stat's and open's, but those are all reads, not writes. Of course btrfs can cause writes when doing reads (atime), but I have the osd mounted with noatime. The only write operations that I saw a lot of are these: [pid 5350] getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17 [pid 5350] setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0 [pid 5350] removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents@1) = -1 ENODATA (No data available) So it appears that the osd's aren't writing actual data to disk, but metadata in the form of xattr's. Can anyone explain what this setting and removing of xattr's could be for? Kind regards, Erik. On 03/16/2015 10:44 PM, Gregory Farnum wrote: The information you're giving sounds a little contradictory, but my guess is that you're seeing the impacts of object promotion and flushing. You can sample the operations the OSDs are doing at any given time by running ops_in_progress (or similar, I forget exact phrasing) command on the OSD admin socket. I'm not sure if rados df is going to report cache movement activity or not. That though would mostly be written to the SSDs, not the hard drives — although the hard drives could still get metadata updates written when objects are flushed. What data exactly are you seeing that's leading you to believe writes are happening against these drives? What is the exact CephFS and cache pool configuration? -Greg On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote: Hi, I forgot to mention: while I am seeing these writes in iotop and /proc/diskstats for the hdd's, I am -not- seeing any writes in rados df for the pool residing on these disks. There is only one pool active on the hdd's and according to rados df it is getting zero writes when I'm just reading big files from cephfs. So apparently the osd's are doing some non-trivial amount of writing on their own behalf. What could it be? Thanks, Erik. On 03/16/2015 10:26 PM, Erik Logtenberg wrote: Hi, I am getting relatively bad performance from cephfs. I use a replicated cache pool on ssd in front of an erasure coded pool on rotating media. When reading big files (streaming video), I see a lot of disk i/o, especially writes. I have no clue what could cause these writes. The writes are going to the hdd's and they stop when I stop reading. I mounted everything with noatime and nodiratime so it shouldn't be that. On a related note, the Cephfs metadata is stored on ssd too, so metadata-related changes shouldn't hit the hdd's anyway I think. Any thoughts? How can I get more information about what ceph is doing? Using iotop I only see that the osd processes are busy but it doesn't give many hints as to what they are doing. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] CephFS unexplained writes
It does sound contradictory: why would read operations in cephfs result in writes to disk? But they do. I upgraded to Hammer last week and I am still seeing this. The setup is as follows: EC-pool on hdd's for data replicated pool on ssd's for data-cache replicated pool on ssd's for meta-data Now whenever I start doing heavy reads on cephfs, I see intense bursts of write operations on the hdd's. The reads I'm doing are things like reading a large file (streaming a video), or running a big rsync job with --dry-run (so it just checks meta-data). No clue why that would have any effect on the hdd's, but it does. Now, to further figure out what's going on, I tried using lsof, atop, iotop, but those tools don't provide the necessary information. In lsof I just see a whole bunch of files opened at any time, but it doesn't change much during these tests. In atop and iotop I can clearly see that the hdd's are doing a lot of writes when I'm reading in cephfs, but those tools can't tell me what those writes are. So I tried strace, which can trace file operations and attach to running processes. # strace -f -e trace=file -p 5076 This gave me an idea of what was going on. 5076 is the process id of the osd for one of the hdd's. I saw mostly stat's and open's, but those are all reads, not writes. Of course btrfs can cause writes when doing reads (atime), but I have the osd mounted with noatime. The only write operations that I saw a lot of are these: [pid 5350] getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17 [pid 5350] setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0 [pid 5350] removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents@1) = -1 ENODATA (No data available) So it appears that the osd's aren't writing actual data to disk, but metadata in the form of xattr's. Can anyone explain what this setting and removing of xattr's could be for? Kind regards, Erik. On 03/16/2015 10:44 PM, Gregory Farnum wrote: The information you're giving sounds a little contradictory, but my guess is that you're seeing the impacts of object promotion and flushing. You can sample the operations the OSDs are doing at any given time by running ops_in_progress (or similar, I forget exact phrasing) command on the OSD admin socket. I'm not sure if rados df is going to report cache movement activity or not. That though would mostly be written to the SSDs, not the hard drives — although the hard drives could still get metadata updates written when objects are flushed. What data exactly are you seeing that's leading you to believe writes are happening against these drives? What is the exact CephFS and cache pool configuration? -Greg On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote: Hi, I forgot to mention: while I am seeing these writes in iotop and /proc/diskstats for the hdd's, I am -not- seeing any writes in rados df for the pool residing on these disks. There is only one pool active on the hdd's and according to rados df it is getting zero writes when I'm just reading big files from cephfs. So apparently the osd's are doing some non-trivial amount of writing on their own behalf. What could it be? Thanks, Erik. On 03/16/2015 10:26 PM, Erik Logtenberg wrote: Hi, I am getting relatively bad performance from cephfs. I use a replicated cache pool on ssd in front of an erasure coded pool on rotating media. When reading big files (streaming video), I see a lot of disk i/o, especially writes. I have no clue what could cause these writes. The writes are going to the hdd's and they stop when I stop reading. I mounted everything with noatime and nodiratime so it shouldn't be that. On a related note, the Cephfs metadata is stored on ssd too, so metadata-related changes shouldn't hit the hdd's anyway I think. Any thoughts? How can I get more information about what ceph is doing? Using iotop I only see that the osd processes are busy but it doesn't give many hints as to what they are doing. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS unexplained writes
The information you're giving sounds a little contradictory, but my guess is that you're seeing the impacts of object promotion and flushing. You can sample the operations the OSDs are doing at any given time by running ops_in_progress (or similar, I forget exact phrasing) command on the OSD admin socket. I'm not sure if rados df is going to report cache movement activity or not. That though would mostly be written to the SSDs, not the hard drives — although the hard drives could still get metadata updates written when objects are flushed. What data exactly are you seeing that's leading you to believe writes are happening against these drives? What is the exact CephFS and cache pool configuration? -Greg On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote: Hi, I forgot to mention: while I am seeing these writes in iotop and /proc/diskstats for the hdd's, I am -not- seeing any writes in rados df for the pool residing on these disks. There is only one pool active on the hdd's and according to rados df it is getting zero writes when I'm just reading big files from cephfs. So apparently the osd's are doing some non-trivial amount of writing on their own behalf. What could it be? Thanks, Erik. On 03/16/2015 10:26 PM, Erik Logtenberg wrote: Hi, I am getting relatively bad performance from cephfs. I use a replicated cache pool on ssd in front of an erasure coded pool on rotating media. When reading big files (streaming video), I see a lot of disk i/o, especially writes. I have no clue what could cause these writes. The writes are going to the hdd's and they stop when I stop reading. I mounted everything with noatime and nodiratime so it shouldn't be that. On a related note, the Cephfs metadata is stored on ssd too, so metadata-related changes shouldn't hit the hdd's anyway I think. Any thoughts? How can I get more information about what ceph is doing? Using iotop I only see that the osd processes are busy but it doesn't give many hints as to what they are doing. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS unexplained writes
Hi, I am getting relatively bad performance from cephfs. I use a replicated cache pool on ssd in front of an erasure coded pool on rotating media. When reading big files (streaming video), I see a lot of disk i/o, especially writes. I have no clue what could cause these writes. The writes are going to the hdd's and they stop when I stop reading. I mounted everything with noatime and nodiratime so it shouldn't be that. On a related note, the Cephfs metadata is stored on ssd too, so metadata-related changes shouldn't hit the hdd's anyway I think. Any thoughts? How can I get more information about what ceph is doing? Using iotop I only see that the osd processes are busy but it doesn't give many hints as to what they are doing. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS unexplained writes
Hi, I forgot to mention: while I am seeing these writes in iotop and /proc/diskstats for the hdd's, I am -not- seeing any writes in rados df for the pool residing on these disks. There is only one pool active on the hdd's and according to rados df it is getting zero writes when I'm just reading big files from cephfs. So apparently the osd's are doing some non-trivial amount of writing on their own behalf. What could it be? Thanks, Erik. On 03/16/2015 10:26 PM, Erik Logtenberg wrote: Hi, I am getting relatively bad performance from cephfs. I use a replicated cache pool on ssd in front of an erasure coded pool on rotating media. When reading big files (streaming video), I see a lot of disk i/o, especially writes. I have no clue what could cause these writes. The writes are going to the hdd's and they stop when I stop reading. I mounted everything with noatime and nodiratime so it shouldn't be that. On a related note, the Cephfs metadata is stored on ssd too, so metadata-related changes shouldn't hit the hdd's anyway I think. Any thoughts? How can I get more information about what ceph is doing? Using iotop I only see that the osd processes are busy but it doesn't give many hints as to what they are doing. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com