[ceph-users] Why is file extents size observed by "rbd diff" much larger than observed by "du" the object file on the OSD's machie?
Hi, everyone. Recently, I've got a need to check the real size of an rbd image. I followed the instructions in "http://ceph.com/planet/rbd-image-real-size/;. the result is shown as follows: [xuxuehan@localhost ~]$ rbd diff xxh_pool/clone_test_img2 OffsetLength Type 0 4194304 data 4194304 4190208 data 8388608 4182016 data 12582912 4194304 data 16777216 4194304 data 20971520 4194304 data 25165824 4186112 data 29360128 4190208 data 33554432 4194304 data 37748736 4190208 data 41943040 4194304 data 46137344 4186112 data 50331648 4186112 data 54525952 4194304 data 58720256 4190208 data 62914560 4194304 data However, I checked the file size of the object "rbd_data.1bfad6b8b4567.0001" which belongs to clone_test_img2, and the result is as follows: [xuxuehan@hdp2384 ~]$ du /home/ceph/software/ceph/var/lib/ceph/osd/ceph-2/current/1.154_head/rbd\\udata.1bfad6b8b4567.0001* 2020 /home/ceph/software/ceph/var/lib/ceph/osd/ceph-2/current/1.154_head/rbd\udata.1bfad6b8b4567.0001__head_A2511954__1 As shown above, the bytes changed observed in the result of "rbd diff" is about 4MB, while the real disk space usage observed by "du" is only about 2MB. Why are they so different? Please help me, thanks:-) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] What is replay_version used for?
Hi, everyone. What is Objecter::Op::replay_version used for? Thanks:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] What is replay_version used for?
Hi, everyone. What is Objecter::Op::replay_version used for? Thanks:-) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to know the address of ceph clients from OSD?
Hi, everyone. Sometimes, I've got a need to know the ip address of the ceph client at the time, is there any way to list those ip address in ceph cluster? I'm using ceph rbd with kvm servers. Thank you:-) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to know the ceph client's ip address?
Hi, everyone. Sometimes, I've got a need to know the ip address of the ceph client at the time, is there any way to list those ip address in ceph cluster? Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Assertion "needs_recovery" fails when balance_read reaches a replica OSD where the target object is not recovered yet.
Hi, everyone. In our online system, some OSDs always fail due to the following error: 2016-10-25 19:00:00.626567 7f9a63bff700 -1 error_msg osd/ReplicatedPG.cc: In function 'void ReplicatedPG::wait_for_unreadable_object(const hobject_t&, OpRequestRef)' thread 7f9a63bff700 time 2016-10-25 19:00:00.624499 osd/ReplicatedPG.cc: 387: FAILED assert(needs_recovery) ceph version 0.94.5-12-g83f56a1 (83f56a1c84e3dbd95a4c394335a7b1dc926dd1c4) 1: (ReplicatedPG::wait_for_unreadable_object(hobject_t const&, std::tr1::shared_ptrOpRequest)+0x3f5) [0x8b5a65] 2: (ReplicatedPG::do_op(std::tr1::shared_ptrOpRequest&)+0x5e9) [0x8f0c79] 3: (ReplicatedPG::do_request(std::tr1::shared_ptrOpRequest&, ThreadPool::TPHandle&)+0x4e3) [0x87fdc3] 4: (OSD::dequeue_op(boost::intrusive_ptrPG, std::tr1::shared_ptrOpRequest, ThreadPool::TPHandle&)+0x178) [0x66b3f8] 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59e) [0x66f8ee] 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x795) [0xa76d85] 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa7a610] 8: /lib64/libpthread.so.0() [0x3471407a51] 9: (clone()+0x6d) [0x34710e893d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. Our verion of ceph is 0.94.5. After doing some reading of the source code and analysis of our online scenarios, we made some conjecture: When encountering a large number of "balance_reads", the OSDs can be so busy that they can't send heartbeats in time, which could lead to monitors wrongly mark them down and triggers other OSDs to go through peering+recovery+process during which, on the replica OSDs, the assertion "needs_recovery" at ReplicatedPG.cc:387 has a large probability to fail. To confirm this guess, we did some designated test. If I write extra code to make the recovery of some object wait for those ops targeting that object with the type "CEPH_MSG_OSD_OP" to finish, the assertion "needs_recovery" at ReplicatedPG.cc:387 will always fail. And on the other hand, if I make those ops targeting some object with the type "CEPH_MSG_OSD_OP" wait for the corresponding recovery to finish, the assertion won't be triggered. Can we come to the conclusion that the cause to the assertion failure is just as we thought? And, it seems that the purpose of the failed assertion is to make sure that the "missing_loc.needs_recovery_map" do contain the unreadable object. However, "missing_loc.needs_recovery_map" seems to be always empty on replica OSDs. Can we fix this problem simply by bypassing this assertion in some way like: if ( is_primary() ){ bool needs_recovery = missing_loc.needs_recovery(soid, ); assert(needs_recovery); } I've also submit a new issue: BUG #18021. Please help me. Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question about last_backfill
Hi, everyone. In the PGLog::merge_log method, pg log entries in "olog" are inserted into current PGLog's "missing" structure only when they have "version" larger than current PGLog's head and its target object has "soid" less than current pg info's last_backfill. What does "last_backfill" in pg_info_t mean? Is it the max object id that the pg possessed after the last recovery_backfill process? If so, why only objects with "soid" less than "last_backfill" is considered missing, what if new objects are created by the current osd and modified by other osd during the current osd was "out" or "down"? I'm really confused about this, please help me, thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question about PG class
Hi, everyone. What are the meanings of the fields actingbackfill, want_acting and backfill_targets of the PG class? Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question about writing a program that transfer snapshot diffs between ceph clusters
Hi, everyone. I'm trying to write a program based on the librbd API that transfers snapshot diffs between ceph clusters without the need for a temporary storage which is required if I use the "rbd export-diff" and "rbd import-diff" pair. I found that the configuration object "g_conf" and ceph context object "g_ceph_context" are global variables which are used almost everywhere in the source code, while what I need ot do in the first place is to construct two or more configuration objects, each corresponding to a ceph cluster, and make those operations intended to a ceph cluster use the corresponding configuration object. How can I accomplish this task? Or, is it just not viable? Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] log file owner not right
Hi, everyone. Recently, I deployed a ceph cluster manually. And I found that, after I start the ceph osd through "/etc/init.d/ceph -a start osd", the size of the log file "ceph-osd.log" is 0, and its owner isnot "ceph" which I configured in /etc/ceph/ceph.conf but the user who actually run the /etc/init.d/ceph script. I read the /etc/init.d/ceph script, and found that the command "ceph-conf" is run by the current user with the arguments "-n $type.$id", which makes it create a ceph-osd.log which is owned by the current user. How should I deal with this problem? Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about OSDSuperblock
Title: 缃戞槗閭 Sorry, sir. I don't quite follow you. I agree that the osds must get the current map to know who to contact so it can catch up. But it looks to me that the osd is getting the current map through get_map(superblock.current_epoch) in which the content of the variable superblock.current_epoch is read from the disk by OSD::read_superblock and has't been updated by a monitor at boot time, which means it is not the real curent epoch but an old one. How can OSD get the current map using an old epoch? On xxhdx1985126 <xxhdx1985...@163.com>, Oct 23, 2016 3:13 AM wrote: Sorry, sir. I don't quite follow you. I agree that the osds must get the current map to know who to contact so it can catch up. But it looks to me that the osd is getting the current map through get_map(superblock.current_epoch) in which the variable superblock.current_epoch is read from the disk by OSD::read_superblock at boot time and has't been updated by a monitor, which means it is not the real curent epoch. How can OSD get the current map using an old epoch? Sent from my Mi phoneOn David Turner <david.tur...@storagecraft.com>, Oct 23, 2016 12:34 AM wrote: The osd needs to know where it thought data was, in particular so it knows what it has. Then it gets the current map so it knows who to talk to so it can catch back up. Sent from my iPhone On Oct 22, 2016, at 7:12 AM, xxhdx1985126 <xxhdx1985...@163.com> wrote: Hi, everyone. I'm trying to read the source code that boots an OSD instance, and I find something really overwhelms me. In the OSD::init() method, it read the OSDSuperblock by calling OSD::read_superblock(), and the it tried to get the "current" map : "osdmap = get_map(superblock.current_epoch)". Then OSD uses this osdmap to calculate the acting and up set of each pg.聽 I really don't understand this! Since the OSDSuperblock is read from the disk, the superblock.current_epoch should be an old epoch which is recorded by the last OSD instance that run on this directory. Why use an old "current_epoch" to calculate the acting and up set of each pg? Please help me, thank you:-) 聽 David聽Turner聽| Cloud Operations Engineer聽| StorageCraft Technology Corporation 380 Data Drive Suite 300聽| Draper聽| Utah聽| 84020 Office: 801.871.2760 | Mobile: 385.224.2943 If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about OSDSuperblock
Title: 缃戞槗閭 Sorry, sir. I don't quite follow you. I agree that the osds must get the current map to know who to contact so it can catch up. But it looks to me that the osd is getting the current map through get_map(superblock.current_epoch) in which the variable superblock.current_epoch is read from the disk by OSD::read_superblock at boot time and has't been updated by a monitor, which means it is not the real curent epoch. How can OSD get the current map using an old epoch? Sent from my Mi phoneOn David Turner <david.tur...@storagecraft.com>, Oct 23, 2016 12:34 AM wrote: The osd needs to know where it thought data was, in particular so it knows what it has. Then it gets the current map so it knows who to talk to so it can catch back up. Sent from my iPhone On Oct 22, 2016, at 7:12 AM, xxhdx1985126 <xxhdx1985...@163.com> wrote: Hi, everyone. I'm trying to read the source code that boots an OSD instance, and I find something really overwhelms me. In the OSD::init() method, it read the OSDSuperblock by calling OSD::read_superblock(), and the it tried to get the "current" map : "osdmap = get_map(superblock.current_epoch)". Then OSD uses this osdmap to calculate the acting and up set of each pg.聽 I really don't understand this! Since the OSDSuperblock is read from the disk, the superblock.current_epoch should be an old epoch which is recorded by the last OSD instance that run on this directory. Why use an old "current_epoch" to calculate the acting and up set of each pg? Please help me, thank you:-) 聽 David聽Turner聽| Cloud Operations Engineer聽| StorageCraft Technology Corporation 380 Data Drive Suite 300聽| Draper聽| Utah聽| 84020 Office: 801.871.2760 | Mobile: 385.224.2943 If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question about OSDSuperblock
Hi, everyone. I'm trying to read the source code that boots an OSD instance, and I find something really overwhelms me. In the OSD::init() method, it read the OSDSuperblock by calling OSD::read_superblock(), and the it tried to get the "current" map : "osdmap = get_map(superblock.current_epoch)". Then OSD uses this osdmap to calculate the acting and up set of each pg. I really don't understand this! Since the OSDSuperblock is read from the disk, the superblock.current_epoch should be an old epoch which is recorded by the last OSD instance that run on this directory. Why use an old "current_epoch" to calculate the acting and up set of each pg? Please help me, thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Does marking OSD "down" trigger "AdvMap" event in other OSD?
Hi, everyone. If one OSD's state transforms from up to down, by "kill -i" for example, will an "AdvMap" event be triggered on other related OSDs?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fw:PG go "incomplete" after setting min_size
Sorry, I forgot to tell that those pgs assigned to the kill osd are still writable after I raise min_size from 1 to 2 but before I restarted the killed osd. Forwarding messages From: "xxhdx1985126" <xxhdx1985...@163.com> Date: 2016-10-09 18:08:45 To: "ceph-us...@ceph.com" <ceph-us...@ceph.com> Subject: PG go "incomplete" after setting min_size Hi, everyone. I'm a newbie about ceph and trying to do some test to understand the behavior of ceph. The following situation really overwhelmed me: I first killed a osd, which made the size of the acting set of some pg became 1. Then I set min_size from 1 to 2, after which I started the killed osd. Then there came the phenomenon that all the pg previous assigned to the killed osd goes "incomplete". My cluster contains only 2 hosts, each running 10 osds. And I made the configurations that replicas of pgs be assigned to both osds. Is it supposed to be this way? What is the philosophy about this? Thank you:-) 网易天天特卖:韩国emart排毒止咳蜂蜜柚子茶 88元4kg包邮(100%正品,网易亲自采购),30分钟即刻顺丰包邮发货!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG go "incomplete" after setting min_size
Hi, everyone. I'm a newbie about ceph and trying to do some test to understand the behavior of ceph. The following situation really overwhelmed me: I first killed a osd, which made the size of the acting set of some pg became 1. Then I set min_size from 1 to 2, after which I started the killed osd. Then there came the phenomenon that all the pg previous assigned to the killed osd goes "incomplete". My cluster contains only 2 hosts, each running 10 osds. And I made the configurations that replicas of pgs be assigned to both osds. Is it supposed to be this way? What is the philosophy about this? Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Is it possible to recover the data of which all replicas are lost?
Hi, everyone. I've got a problem, here. Due to some miss operations, I deleted all three replicas of my data, is there any way to recover it? This is a very urgent problem. Please help me, Thanks.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Does the journal of a single OSD roll itself automatically?
Hi, everyone. After the file system synchronization, does OSD delete those journals that corresponds to operations before the synchronization point?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How is RBD image implemented?
Hi, everyone. On the "Block Storage" page, I found this: "Ceph RBD interfaces with the same Ceph object storage system that provides the librados interface and the Ceph FS file system, and it stores block device images as objects.". Does it mean literally that a RBD image is implemented as an object not a file on Ceph? If this is true, wouldn't it be a problem when creating a very large image? Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What file system does ceph use for an individual OSD, is it still EBOFS?
Thanks, sir:-) At 2016-09-19 13:00:18, "Ian Colle" <ico...@redhat.com> wrote: Some use xfs, others btrfs, and still others use (gasp) zfs and ext4. Upstream automated testing currently only runs on xfs, if that gives you a sense of the community's comfort level, but there are strong advocates for each of the others I listed initially. Caveat emptor. Ian On Sunday, September 18, 2016, xxhdx1985126 <xxhdx1985...@163.com> wrote: Hi, everyone. I'm newbie for Ceph. According to Sage A. Weil's paper, Ceph was using EBOFS as the file system for its OSDs. However, I looked into the source code of Ceph and could hardly find any code of EBOFS. Is Ceph still using EBOFS or has opted to use other types of file system for a single OSB? Thank you:-) -- Ian R. Colle Global Director of Software Engineering Red Hat, Inc. ico...@redhat.com +1-303-601-7713 http://www.linkedin.com/in/ircolle http://www.twitter.com/ircolle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] What file system does ceph use for an individual OSD, is it still EBOFS?
Hi, everyone. I'm newbie for Ceph. According to Sage A. Weil's paper, Ceph was using EBOFS as the file system for its OSDs. However, I looked into the source code of Ceph and could hardly find any code of EBOFS. Is Ceph still using EBOFS or has opted to use other types of file system for a single OSB? Thank you:-)___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com