Re: [ceph-users] understanding PG count for a file
Hi, Is that all objects of a file will be stored in only 2 OSD(in case of replication count is 2)? How Big is this file? Small files will not be splitted. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Running 12.2.5 without problems, should I upgrade to 12.2.7 or wait for 12.2.8?
Hi, I'm Running 12.2.5 and I have no Problems at the moment. However my servers reporting daily that they want to upgrade to 12.2.7, is this save or should I wait for 12.2.8? Are there any predictions when the 12.2.8 release will be available? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Converting to dynamic bucket resharding in Luminous
Hi, I have a Jewel Ceph cluster with RGW index sharding enabled. I've configured the index to have 128 shards. I am upgrading to Luminous. What will happen if I enable dynamic bucket index resharding in ceph.conf? Will it maintain my 128 shards (the buckets are currently empty), and will it split them (to 256, and beyond) when they get full enough? Yes it will. However I would not recommend enabling dynamic resharding, I had some problems with it, like resharding loops where large buckets failed to reshard, and it tried resharding them over and over again. I had problems deleting some buckets that had multiple reshards done because of missing objects (Maybe objects where deleted during a dynamic reshard, and this was not recorded to the indexes). So for the time being I disabled dynamic resharding again. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cache Tiering not flushing and evicting due to missing scrub
Hi, increasing pg_num for a cache pool gives you a warning, that pools must be scrubed afterwards. Turns out If you ignore this flushing and evicting will not work. You realy should do something like this: for pg in $(ceph pg dump | awk '$1 ~ "^." { print $1 }'); do ceph pg scrub $pg; done After just a few seconds my pool started flushing and evicting again. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw: can't delete bucket
Hi, I finaly managed to delete the bucket, I wrote a script that reads the omap keys from the bucket index and deletes every key without a matching object on the data pool. Not sure if this has any negative repercussions, but after the script deleted thousands of keys from the index, i was able to delete the bucket with: radosgw-admin --id radosgw.rgw bucket rm --bucket my-bucket --bypass-gc --purge-objects I have started an "orpands find" process, to clean up any remaining objects. I suspect that automatic resharding somehow lead to this problem, maybe there is something wrong with handling multiparts. I had a handful of normal, but 800k multipart objects, the index had only 2 shards (should be around 16 I think). Radosgw frequently triggerd resharding of this index, but the new index allso only had 2 shards. Maybe something goes wrong when multipart uploads are aborted/completed during resharding? Micha Krause On 04.04.2018 16:14, Micha Krause wrote: Hi, I have a Bucket with multiple broken multipart uploads, which can't be aborted. radosgw-admin bucket check shows thousands of _multipart_ objects, unfortunately the --fix and --check-objects don't change anything. I decided to get rid of the bucket completely, but even this command: radosgw-admin --id radosgw.rgw bucket rm --bucket my-bucket --inconsistent-index --yes-i-really-mean-it --bypass-gc --purge-objects Is not able to do it, and shows: 7fe7cdceab80 -1 ERROR: unable to remove bucket(2) No such file or directory --debug-ms=1 shows that the error occurs after: 2018-04-04 15:59:58.884901 7f6f500c5700 1 -- 10.210.64.16:0/772895100 <== osd.458 10.210.34.34:6804/2673 1 osd_op_reply(13121 default.193458319.2__multipart_my/object.2~p8vFLOvgFKMGOSLeEc8WGT2Bgw5ehg9.meta [omap-get-vals] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 346+0+0 (2461758253 0 0) 0x7f6f48268ba0 con 0x5626ec47ee20 And indeed when I try to get the object with the rados get command, it also shows "No such file or directory" I tried to rados put -p .rgw.buckets default.193458319.2__multipart_my/object.2~p8vFLOvgFKMGOSLeEc8WGT2Bgw5ehg9.meta emptyfile in it's place, but the error stays the same. Any ideas how I can get rid of my bucket? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw: can't delete bucket
Hi, I have a Bucket with multiple broken multipart uploads, which can't be aborted. radosgw-admin bucket check shows thousands of _multipart_ objects, unfortunately the --fix and --check-objects don't change anything. I decided to get rid of the bucket completely, but even this command: radosgw-admin --id radosgw.rgw bucket rm --bucket my-bucket --inconsistent-index --yes-i-really-mean-it --bypass-gc --purge-objects Is not able to do it, and shows: 7fe7cdceab80 -1 ERROR: unable to remove bucket(2) No such file or directory --debug-ms=1 shows that the error occurs after: 2018-04-04 15:59:58.884901 7f6f500c5700 1 -- 10.210.64.16:0/772895100 <== osd.458 10.210.34.34:6804/2673 1 osd_op_reply(13121 default.193458319.2__multipart_my/object.2~p8vFLOvgFKMGOSLeEc8WGT2Bgw5ehg9.meta [omap-get-vals] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 346+0+0 (2461758253 0 0) 0x7f6f48268ba0 con 0x5626ec47ee20 And indeed when I try to get the object with the rados get command, it also shows "No such file or directory" I tried to rados put -p .rgw.buckets default.193458319.2__multipart_my/object.2~p8vFLOvgFKMGOSLeEc8WGT2Bgw5ehg9.meta emptyfile in it's place, but the error stays the same. Any ideas how I can get rid of my bucket? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] wrong stretch package dependencies (was Luminous v12.2.3 released)
Hi, agreed. but the packages built for stretch do depend on the library I had a wrong debian version in my sources list :-( Thanks for looking into it. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous v12.2.3 released
Hi, Debian Packages for stretch have broken dependencies: The following packages have unmet dependencies: ceph-common : Depends: libleveldb1 but it is not installable Depends: libsnappy1 but it is not installable ceph-mon : Depends: libleveldb1 but it is not installable Depends: libsnappy1 but it is not installable ceph-osd : Depends: libleveldb1 but it is not installable Depends: libsnappy1 but it is not installable ceph-base : Depends: libleveldb1 but it is not installable Depends: libsnappy1 but it is not installable https://packages.debian.org/search?keywords=libsnappy1&searchon=names&suite=all§ion=all https://packages.debian.org/search?suite=all§ion=all&arch=any&searchon=names&keywords=libleveldb1 They should Probably depend on the 1v5 packages, and they did in version 12.2.2. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw: Huge Performance impact during dynamic bucket index resharding
Hi, Radosgw decided to reshard a bucket with 25 million objects from 256 to 512 shards. Resharding took about 1 hour, during this time all buckets on the cluster had a huge performance drop. "GET" requests for small objects (on other buckets) took multiple seconds. Are there any configuration options to reduce this impact? or limit resharding to a maximum of 256 shards? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Separation of public/cluster networks
Hi, I've build a few clusters with separated public/cluster network, but I'm wondering if this is really the way to go. http://docs.ceph.com/docs/jewel/rados/configuration/network-config-ref states 2 reasons: 1. There is more traffic in the backend, which could cause latencies in the public network. Is a low latency public network really an advantage, if my cluster network has high latency? 2. Security: evil users could cause damage in the cluster net. Couldn't you cause the same kind, or even more damage in the public network? On the other hand, if one host looses it's cluster network, it will report random OSDs down over the remaining public net. (yes I know about the "mon osd min down reporters" workaround) Advantages of a single, shared network: 1. Hosts with network problems, that can't reach other OSDs, all so can't reach the mon. So our mon server doesn't get conflicting informations. 2. Given the same network bandwidth overall, OSDs can use a bigger part of the bandwidth for backend traffic. 3. KISS principle. So if my server has 4 x 10GB/s network should I really split them in 2 x 20GB/s (cluster/public) or am I better off using 1 x 40GB/s (shared)? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
Hi, Did you edit the code before trying Luminous? Yes, I'm still on jewel. I also noticed from your > original mail that it appears you're using multiple active metadata> servers? If so, that's not stable in Jewel. You may have tripped on> one of many bugs fixed in Luminous for that configuration. No, Im using active/backup configuration. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
Hi, I had a chance to catch John Spray at the Ceph Day, and he suggested that I try to reproduce this bug in luminos. To fix my immediate problem we discussed 2 ideas: 1. Manually edit the Meta-data, unfortunately I was not able to find any Information on how the meta-data is structured :-( 2. Edit the code to set the link count to 0 if it is negative: diff --git a/src/mds/StrayManager.cc b/src/mds/StrayManager.cc index 9e53907..2ca1449 100644 --- a/src/mds/StrayManager.cc +++ b/src/mds/StrayManager.cc @@ -553,6 +553,10 @@ bool StrayManager::__eval_stray(CDentry *dn, bool delay) logger->set(l_mdc_num_strays_delayed, num_strays_delayed); } + if (in->inode.nlink < 0) { +in->inode.nlink=0; + } + // purge? if (in->inode.nlink == 0) { // past snaprealm parents imply snapped dentry remote links. diff --git a/src/xxHash b/src/xxHash --- a/src/xxHash +++ b/src/xxHash @@ -1 +1 @@ Im not sure if this works, the patched mds no longer crashes, however I expected that this value: root@mds02:~ # ceph daemonperf mds.1 -mds-- --mds_server-- ---objecter--- -mds_cache- ---mds_log rlat inos caps|hsr hcs hcr |writ read actv|recd recy stry purg|segs evts subm| 0 100k 0 | 000 | 000 | 00 625k 0 | 30 25k 0 Should go down, but it stays at 625k, unfortunately I don't have another System to compare. After I started the patched mds once, I reverted back to an unpatched mds, and it also stopped crashing, so I guess it did "fix" something. A question just out of curiosity, I tried to log these events with something like: dout(10) << "Fixed negative inode count"; or derr << "Fixed negative inode count"; But my compiler yelled at me for trying this. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
Hi, looking at the code, and running with debug mds = 10 it looks like I have an inode with negative link count. -2> 2017-09-14 13:28:39.249399 7f3919616700 10 mds.0.cache.strays eval_stray [dentry #100/stray7/17aa2f6 [2,head] auth (dversion lock) pv=0 v=23058565 inode=0x7f394b7e0730 0x7f3945a96270] -1> 2017-09-14 13:28:39.249445 7f3919616700 10 mds.0.cache.strays inode is [inode 17aa2f6 [2,head] ~mds0/stray7/17aa2f6 auth v23057120 s=4476488 nl=-1 n(v0 b4476488 1=1+0) (iversion lock) 0x7f394b7e I guess "nl" stands for number of links. The code in StrayManager.cc checks for: if (in->inode.nlink == 0) { ... } else { eval_remote_stray(dn, NULL); } void StrayManager::eval_remote_stray(CDentry *stray_dn, CDentry *remote_dn) { ... assert(stray_in->inode.nlink >= 1); ... } So if my link count is indeed -1 ceph will die here. The question is: how can I get rid of this inode? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] MDS crashes shortly after startup while trying to purge stray files.
Hi, I was deleting a lot of hard linked files, when "something" happened. Now my mds starts for a few seconds, writes a lot of these lines: -43> 2017-09-06 13:51:43.396588 7f9047b21700 10 log_client will send 2017-09-06 13:51:40.531563 mds.0 10.210.32.12:6802/2735447218 4963 : cluster [ERR] loaded dup inode 17d6511 [2,head] v17234443 at ~mds 0/stray8/17d6511, but inode 17d6511.head v17500983 already exists at ~mds0/stray7/17d6511 And finally this: -3> 2017-09-06 13:51:43.396762 7f9047b21700 10 monclient: _send_mon_message to mon.2 at 10.210.34.11:6789/0 -2> 2017-09-06 13:51:43.396770 7f9047b21700 1 -- 10.210.32.12:6802/2735447218 --> 10.210.34.11:6789/0 -- log(1000 entries from seq 4003 at 2017-09-06 13:51:38.718139) v1 -- ?+0 0x7f905c5d5d40 con 0x7f905902 c600 -1> 2017-09-06 13:51:43.399561 7f9047b21700 1 -- 10.210.32.12:6802/2735447218 <== mon.2 10.210.34.11:6789/0 26 mdsbeacon(152160002/0 up:active seq 8 v47532) v7 126+0+0 (20071477 0 0) 0x7f90591b208 0 con 0x7f905902c600 0> 2017-09-06 13:51:43.401125 7f9043b19700 -1 *** Caught signal (Aborted) ** in thread 7f9043b19700 thread_name:mds_rank_progr ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x5087b7) [0x7f904ed547b7] 2: (()+0xf890) [0x7f904e156890] 3: (gsignal()+0x37) [0x7f904c5e1067] 4: (abort()+0x148) [0x7f904c5e2448] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7f904ee5e386] 6: (StrayManager::eval_remote_stray(CDentry*, CDentry*)+0x492) [0x7f904ebaad12] 7: (StrayManager::__eval_stray(CDentry*, bool)+0x5f5) [0x7f904ebaefd5] 8: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x7f904ebaf7ae] 9: (MDCache::scan_stray_dir(dirfrag_t)+0x165) [0x7f904eb04145] 10: (MDCache::populate_mydir()+0x7fc) [0x7f904eb73acc] 11: (MDCache::open_root()+0xef) [0x7f904eb7447f] 12: (MDSInternalContextBase::complete(int)+0x203) [0x7f904ecad5c3] 13: (MDSRank::_advance_queues()+0x382) [0x7f904ea689e2] 14: (MDSRank::ProgressThread::entry()+0x4a) [0x7f904ea68e6a] 15: (()+0x8064) [0x7f904e14f064] 16: (clone()+0x6d) [0x7f904c69462d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse 99/99 (syslog threshold) -1/-1 (stderr threshold) max_recent 1 max_new 1000 log_file /var/log/ceph/ceph-mds.0.log --- end dump of recent events --- Looking at daemonperf, it seems the mds crashes when trying to write something: root@mds01:~ # /etc/init.d/ceph restart [ ok ] Restarting ceph (via systemctl): ceph.service. root@mds01:~ # ceph daemonperf mds.0 ---objecter--- writ read actv| 000 000 000 6 120 000 000 000 031 011 000 010 011 011 011 011 000 010 010 011 000 6400 Traceback (most recent call last): File "/usr/bin/ceph", line 948, in retval = main() File "/usr/bin/ceph", line 638, in main DaemonWatcher(sockpath).run(interval, count) File "/usr/lib/python2.7/dist-packages/ceph_daemon.py", line 265, in run dump = json.loads(admin_socket(self.asok_path, ["perf", "dump"])) File "/usr/lib/python2.7/dist-packages/ceph_daemon.py", line 60, in admin_socket raise RuntimeError('exception getting command descriptions: ' + str(e)) RuntimeError: exception getting command descriptions: [Errno 111] Connection refused And indeed, I am able to prevent the crash by running: root@mds02:~ # ceph --admin-daemon /var/run/ceph/ceph-mds.1.asok force_readonly during startup of the mds. Any advice on how to repair the filesystem? I already tried this without success: http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/ Ceph Version used is Jewel 10.2.9. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problems getting nfs-ganesha with cephfs backend to work.
Hi, Ganesha version 2.5.0.1 from the nfs-ganesha repo hosted on download.ceph.com <http://download.ceph.com> I didn't know about that repo, and compiled ganesha myself. The developers in the #ganesha IRC channel pointed me to the libcephfs version. After recompiling ganesha with a kraken libcephfs instead of a jewel version both errors went away. I'm sure using a compiled Version from the repo you mention would have worked out of the box. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problems getting nfs-ganesha with cephfs backend to work.
Hi, > Change Pseudo to something like /mypseudofolder I tried this, without success, but I managed to get something working with version 2.5. I can mount the NFS export now, however 2 problems remain: 1. The root directory of the mount-point looks empty (ls shows no files), however directories and files can be accessed, and ls works in subdirectories. 2. I can't create devices in the nfs mount, not sure if ganesha supports this with other backends. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Problems getting nfs-ganesha with cephfs backend to work.
Hi, im trying to get nfs-ganesha to work with ceph as the FSAL Backend. Um using Version 2.4.5, this is my ganeasha.conf: EXPORT { Export_ID=1; Path = /; Pseudo = /; Access_Type = RW; Protocols = 3; Transports = TCP; FSAL { Name = CEPH; User_Id = "test-cephfs"; Secret_Access_Key = "***"; } CLIENT { Clients = client-fqdn; Access_Type = RO; } } Here are some loglines from ganeshea indicating that there is a problem: 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] load_fsal :NFS STARTUP :DEBUG :Loading FSAL CEPH with /usr/lib/x86_64-linux-gnu/ganesha/libfsalceph.so 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] init :FSAL :DEBUG :Ceph module registering. 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] init_config :FSAL :DEBUG :Ceph module setup. 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] create_export :FSAL :CRIT :Unable to mount Ceph cluster for /. 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] mdcache_fsal_create_export :FSAL :MAJ :Failed to call create_export on underlying FSAL Ceph 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] fsal_put :FSAL :INFO :FSAL Ceph now unused 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] fsal_cfg_commit :CONFIG :CRIT :Could not create export for (/) to (/) 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] build_default_root :CONFIG :DEBUG :Allocating Pseudo root export 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] pseudofs_create_export :FSAL :DEBUG :Created exp 0x55daf1020d80 - / 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] build_default_root :CONFIG :INFO :Export 0 (/) successfully created 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:25): 1 validation errors in block FSAL 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:25): Errors processing block (FSAL) 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:11): 1 validation errors in block EXPORT 17/07/2017 10:20:49 : epoch 596c7360 : ngw02 : ganesha.nfsd-16430[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:11): Errors processing block (EXPORT) I have no problems mounting cephfs with the kernel-client on this machine, using the same authentication data. Has anyone gotten this to work, and maybe could give me a hint on what I'm doing wrong? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 10.2.4 Jewel released
Hi, If you haven't already installed the previous branch, please try wip-msgr-jewel-fix2 instead. That's a cleaner and more precise solution to the real problem. :) Any predictions when this fix will hit the Debian repositories? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 5 pgs of 712 stuck in active+remapped
Hi, > Ah, Thanks Micha, That makes sense. I'll see if I can dig up another server to build on OSD server. Sadly, XenServer is not tolerant of new kernels. > Do you happen to know if there is a dkms package of RBD anywhere? I might be able to build the latest RBD against the 3.10 kernel that comes with XenServer 7 I don't know, is XenServer really using the kernel-rbd, and not librbd? Just want to make sure you aren't looking at the wrong thing to update. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 5 pgs of 712 stuck in active+remapped
Hi, As far as I know, this is exactly the problem why the new tunables where introduced, If you use 3 Replicas with only 3 hosts, crush sometimes doesn't find a solution to place all pgs. If you are really stuck with bobtail turntables, I can think of 2 possible workarounds: 1. Add another osd Server. 2. Bad idea, but could work: build your crush rule manually, e.g.: set all primary pgs to host ceph1, first copy to host ceph2 and second copy to host3. Micha Krause Am 08.07.2016 um 05:47 schrieb Nathanial Byrnes: Hello, I've got a Jewel Cluster (3 nodes, 15 OSD's) running with bobtail tunables (my xenserver cluster uses 3.10 as the kernel and there's no upgrading that). I started the cluster out on Hammer, upgraded to Jewel, discovered that optimal tunables would not work, and then set the tunables back to bobtail. Once the re-balancing completed, I was stuck with 1 pg in active+remapped. Repair didn't fix the pg. I then upped the number of pgs from 328 to 712 (oddly I asked for 512, but ended p with 712...), now I have 5 pgs stuck in active+remapped. I also tried re-weighting the pgs a couple times, but no change Here is my osd tree: ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 15.0 root default -2 5.0 host ceph1 0 1.0 osd.0 up 0.95001 1.0 1 1.0 osd.1 up 1.0 1.0 2 1.0 osd.2 up 1.0 1.0 3 1.0 osd.3 up 0.90002 1.0 4 1.0 osd.4 up 1.0 1.0 -3 5.0 host ceph3 10 1.0 osd.10 up 1.0 1.0 11 1.0 osd.11 up 1.0 1.0 12 1.0 osd.12 up 1.0 1.0 13 1.0 osd.13 up 1.0 1.0 14 1.0 osd.14 up 1.0 1.0 -4 5.0 host ceph2 5 1.0 osd.5 up 1.0 1.0 6 1.0 osd.6 up 1.0 1.0 7 1.0 osd.7 up 1.0 1.0 8 1.0 osd.8 up 1.0 1.0 9 1.0 osd.9 up 1.0 1.0 Any suggestions on how to troubleshoot or repair this? Thanks and Regards, Nate ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph cluster upgrade
Hi, > Is this the way to go? I would like as little performance degradation while rebalancing as possible. Please advice if I need to take in account certain preparations. Set these in your ceph.conf beforehand: osd recovery op priority = 1 osd max backfills= 1 I would allso suggest creating a new crush rule, instead of modifying your existing one. This enables you to change the rule on a per pool basis: ceph osd pool set crush_rulenum Then start with your smallest pool, and see how it goes. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)
Hi, I think I found a Solution for my Problem, here are my findings: This Bug can be easily reproduced in a test environment: 1. Delete all rgw related pools. 2. Start infernalis radosgw to initialize them again. 3. Create user. 4. User creates bucket. 5. Upgrade radosgw to jewel 6. User creates bucket -> fail I found this scary script from Yehuda: https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone which needs to be modified according to http://www.spinics.net/lists/ceph-users/msg27957.html. After the modification, a lot of the script becomes obsolete (in my opinion), and can be rewritten to this (less scary): #!/bin/sh set -x RADOSGW_ADMIN=radosgw-admin echo "Exercise initialization code" $RADOSGW_ADMIN user info --uid=foo # exercise init code (???) echo "Get default zonegroup" $RADOSGW_ADMIN zonegroup get --rgw-zonegroup=default | sed 's/"id":.*/"id": "default",/g' | sed 's/"master_zone.*/"master_zone": "default",/g' > default-zg.json echo "Get default zone" $RADOSGW_ADMIN zone get --zone-id=default > default-zone.json echo "Creating realm" $RADOSGW_ADMIN realm create --rgw-realm=myrealm echo "Creating default zonegroup" $RADOSGW_ADMIN zonegroup set --rgw-zonegroup=default < default-zg.json echo "Creating default zone" $RADOSGW_ADMIN zone set --rgw-zone=default < default-zone.json echo "Setting default zonegroup to 'default'" $RADOSGW_ADMIN zonegroup default --rgw-zonegroup=default echo "Setting default zone to 'default'" $RADOSGW_ADMIN zone default --rgw-zone=default My plan to do this in production is now: 1. Stop all rados-gateways 2. Upgrade rados-gateways to jewel 3. Run less scary script 4. Start rados-gateways This whole thing is a serious problem, there should at least be a clear notice in the Jewel release notes about this. I was lucky to catch this in my test-cluster, I'm sure a lot of people will run into this in production. Micha Krause Am 05.07.2016 um 09:30 schrieb Micha Krause: *bump* Am 01.07.2016 um 13:00 schrieb Micha Krause: Hi, > In Infernalis there was this command: radosgw-admin regions list But this is missing in Jewel. Ok, I just found out that this was renamed to zonegroup list: root@rgw01:~ # radosgw-admin --id radosgw.rgw zonegroup list read_default_id : -2 { "default_info": "", "zonegroups": [ "default" ] } This looks to me like there is indeed only one zonegroup or region configured. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)
*bump* Am 01.07.2016 um 13:00 schrieb Micha Krause: Hi, > In Infernalis there was this command: radosgw-admin regions list But this is missing in Jewel. Ok, I just found out that this was renamed to zonegroup list: root@rgw01:~ # radosgw-admin --id radosgw.rgw zonegroup list read_default_id : -2 { "default_info": "", "zonegroups": [ "default" ] } This looks to me like there is indeed only one zonegroup or region configured. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)
Hi, > In Infernalis there was this command: radosgw-admin regions list But this is missing in Jewel. Ok, I just found out that this was renamed to zonegroup list: root@rgw01:~ # radosgw-admin --id radosgw.rgw zonegroup list read_default_id : -2 { "default_info": "", "zonegroups": [ "default" ] } This looks to me like there is indeed only one zonegroup or region configured. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)
Hi, > See this thread, https://www.mail-archive.com/ceph-users@lists.ceph.com/msg23852.html Yes, I found this as well, but I don't think I have configured more than one region. I never touched any region settings, and I have to admit I wouldn't even know how to check which regions I have. In Infernalis there was this command: radosgw-admin regions list But this is missing in Jewel. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)
Hi, If i try to create a bucket (using s3cmd) im getting this error: WARNING: 500 (UnknownError): The rados-gateway server says: ERROR: endpoints not configured for upstream zone The Servers where updated to jewel, but I'm not sure the error wasn't there before. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] s3cmd with RGW
Hi, > However, while creating bucket using *s3cmd mb s3://buck *gives error message DEBUG: ConnMan.get(): creating new connection: http://buck.s3.amazonaws.com:7480 ERROR: [Errno 110] Connection timed out Can anyone show forward path to check this further? Not sure if all of these settings are necessary, but I have set these variables in .s3cfg to our radosgw-servers: cloudfront_host = rgw.noris.net host_base = rgw.noris.net host_bucket = %(bucket)s.rgw.noris.net simpledb_host = rgw.noris.net Allso check your dns settings, you should have a wild card dns-record for your base: micha@micha:~$ host *.rgw.noris.net *.rgw.noris.net has address 62.128.8.6 *.rgw.noris.net has address 62.128.8.7 *.rgw.noris.net has IPv6 address 2001:780:6::6 *.rgw.noris.net has IPv6 address 2001:780:6::7 Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Trying to understand the contents of .rgw.buckets.index
Hi, > The index is stored in the omap of the object which you can list with> the 'rados' command. > > So it's not data inside the RADOS object, but in the omap key/value store. Thank you very much: rados -p .rgw.buckets.index listomapkeys .dir.default.55059808.22 | wc -l 2228777 So this data is then stored in the omap directory on my osd as .sst files? is there a way to correlate a rados object with a specific sst (leveldb?) file? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Trying to understand the contents of .rgw.buckets.index
Hi, I'm having problems listing the contents of an s3 bucket with ~2M objects. I already found the new bucket index sharding feature, but I'm interested how these Indexes are stored. My index pool shows no space used, and all objects have 0B. root@mon01:~ # rados df -p .rgw.buckets.index pool name KB objects clones degraded unfound rdrd KB wrwr KB .rgw.buckets.index0 6200 0 28177514336051356 228972310 Why would sharing a 0B object make any difference? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, > > > That's strange. 3.13 is way before any changes that could have had any such effect. Can you by any chance try with older kernels to see where it starts misbehaving for you? 3.12? 3.10? 3.8? If I have to compile Kernels anyway I will test 3.16.3 as well :-/. Debian has released Kernel 3.16.3 a few days ago, so I tested it, and it also hangs: osdc shows: 3628783 osd21 37.b0474ca0 rb.0.167614e.2ae8944a.00038b0d ??? write 3628784 osd21 37.b0474ca0 rb.0.167614e.2ae8944a.00038b0d ??? write 3628785 osd21 37.b0474ca0 rb.0.167614e.2ae8944a.00038b0d ??? write 3628788 osd25 37.63cdc62f rb.0.1676178.2ae8944a.00017ff5 read 3628789 osd26 37.6bc418b7 rb.0.1689cf4.238e1f29.0288 ??? write 3628790 osd19 37.256efded rb.0.1689cf4.238e1f29.029a ??? write 3628791 osd26 37.268309ac rb.0.1689cf4.238e1f29.029b ??? write 3628792 osd22 37.b3b9e9bd rb.0.1689cf4.238e1f29.029d ??? write 3628793 osd21 37.4eec30a0 rb.0.1689cf4.238e1f29.1fff ??? write 3628794 osd19 37.8a60bdc7 rb.0.1689cf4.238e1f29.20d4 ??? write 3628795 osd18 37.c904eca5 rb.0.1689cf4.238e1f29.20d5 ??? write 3628796 osd18 37.b71458a6 rb.0.1689cf4.238e1f29.5212 ??? write 3628797 osd29 37.5e179f39 rb.0.1345def.2ae8944a.000c8011 ??? write 3628798 osd23 37.3e6fc4fb rb.0.1345def.2ae8944a.000c8013 ??? write 3628799 osd26 37.8467da30 rb.0.1345def.2ae8944a.000c8957 ??? write 3628800 osd16 37.d720935f rb.0.1345def.2ae8944a.000c895a ??? write 3628801 osd28 37.7925c9e4 rb.0.1345def.2ae8944a.000c8a79 ??? write 3628802 osd20 37.bcdf4c74 rb.0.1345def.2ae8944a.000c8a7c ??? write 3628803 osd26 37.ebc514ac rb.0.1345def.2ae8944a.000dc031 ??? write 3628804 osd27 37.77bd6435 rb.0.1345def.2ae8944a.000dd74e ??? write 3628805 osd28 37.b724973a rb.0.1345def.2ae8944a.000dd751 ??? write 3628806 osd26 37.14b308b0 rb.0.1345def.2ae8944a.000efff4 ??? write 3628807 osd23 37.adf44248 rb.0.1345def.2ae8944a.000f001a ??? write 3628808 osd26 37.629f422c rb.0.1345def.2ae8944a.000f0ccf ??? write 3628809 osd20 37.90d5ce6b rb.0.1345def.2ae8944a.000f0cd3 ??? write 3628810 osd27 37.11918e9b rb.0.1345def.2ae8944a.00103ff3 ??? write 3628811 osd29 37.286d56b9 rb.0.1345def.2ae8944a.00104023 ??? write 3628812 osd28 37.191f0724 rb.0.1345def.2ae8944a.0010459f ??? write 3628813 osd18 37.69e6111d rb.0.1345def.2ae8944a.001045a1 ??? write this output does not change. dmesg shows hung task stuff again, but no rbd related lines. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, > > That's strange. 3.13 is way before any changes that could have had any such effect. Can you by any chance try with older kernels to see where it starts misbehaving for you? 3.12? 3.10? 3.8? my crush tunables are set to bobtail, so I can't go bellow 3.9, I will try 3.12 tomorrow and report back. Ok, I have tested 3.12.9 and it also hangs. I have no other pre-build kernels to test :-(. If I have to compile Kernels anyway I will test 3.16.3 as well :-/. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, > That's strange. 3.13 is way before any changes that could have had any such effect. Can you by any chance try with older kernels to see where it starts misbehaving for you? 3.12? 3.10? 3.8? my crush tunables are set to bobtail, so I can't go bellow 3.9, I will try 3.12 tomorrow and report back. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, > 3.13.0-35 -generic? really? I found my self in a similar situation like yours and making a downgrade to that version works fine, > also you could try 3.14.9-031, it work fine for me also. yes, it's an Ubuntu Machine, I was not able to reproduce the problem here, but the workload is quite different to the nfs gateway server running on Debian. On the gateway I have tested 3.13.10 and 3.14.12 and about 30min after I/O starts, rbd hangs. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, things work fine on kernel 3.13.0-35 I can reproduce this on 3.13.10, and I had in once on 3.13.0-35 as well. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, Well, these don't point at rbd at all. Are you seeing *any* progress when this happens? Could it be that things just get very slow and don't actually hang? Can you try watching sysfs osdc file for a while to see if requests are going through or not? (/sys/kernel/debug/ceph/./osdc) at least for 10 minutes nothing happened here. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, I was able to get a dmesg output from the centos Machine with kernel 3.16: kworker/3:2:9521 blocked for more than 120 seconds. Not tainted 3.16.2-1.el6.elrepo.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:2 D 0003 0 9521 2 0x0080 Workqueue: events handle_timeout [libceph] 8801228cfcd8 0046 0003 8801228cc010 00014400 00014400 8800ba01c250 880234ed3070 8800baf237c8 8800baf237cc 8800ba01c250 Call Trace: [] schedule+0x29/0x70 [] schedule_preempt_disabled+0xe/0x10 [] __mutex_lock_slowpath+0xdb/0x1d0 [] mutex_lock+0x23/0x40 [] handle_timeout+0x63/0x1c0 [libceph] [] process_one_work+0x17c/0x420 [] worker_thread+0x123/0x420 [] ? maybe_create_worker+0x180/0x180 [] kthread+0xce/0xf0 [] ? kthread_freezable_should_stop+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? kthread_freezable_should_stop+0x70/0x70 INFO: task kworker/3:1:62 blocked for more than 120 seconds. Not tainted 3.16.2-1.el6.elrepo.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:1 D 0003 062 2 0x Workqueue: events handle_osds_timeout [libceph] 880037907ce8 0046 880037904010 00014400 00014400 880232389130 880234ed3070 8101d833 8800baf237c8 8800baf237cc 880232389130 Call Trace: [] ? native_sched_clock+0x33/0xd0 [] schedule+0x29/0x70 [] schedule_preempt_disabled+0xe/0x10 [] __mutex_lock_slowpath+0xdb/0x1d0 [] ? put_prev_entity+0x2f/0x400 [] mutex_lock+0x23/0x40 [] handle_osds_timeout+0x53/0x120 [libceph] [] process_one_work+0x17c/0x420 [] worker_thread+0x123/0x420 [] ? maybe_create_worker+0x180/0x180 [] kthread+0xce/0xf0 [] ? kthread_freezable_should_stop+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? kthread_freezable_should_stop+0x70/0x70 INFO: task kworker/u8:0:9486 blocked for more than 120 seconds. Not tainted 3.16.2-1.el6.elrepo.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u8:0D 0002 0 9486 2 0x0080 Workqueue: writeback bdi_writeback_workfn (flush-253:7) 8802337cf368 0046 ae5d42c1 8802337cc010 00014400 00014400 880232554fb0 8800ba4be210 8802337cc010 8800ba5579b8 880232fa0250 880232554fb0 Call Trace: [] schedule+0x29/0x70 [] schedule_preempt_disabled+0xe/0x10 [] __mutex_lock_slowpath+0x1c2/0x1d0 [] mutex_lock+0x23/0x40 [] ceph_con_send+0x4d/0x150 [libceph] [] __send_queued+0x134/0x180 [libceph] [] __ceph_osdc_start_request+0x5b/0xb0 [libceph] [] ceph_osdc_start_request+0x51/0x80 [libceph] [] rbd_img_obj_request_submit+0xb0/0x110 [rbd] [] rbd_img_request_submit+0x49/0x60 [rbd] [] rbd_request_fn+0x248/0x2b0 [rbd] [] __blk_run_queue+0x37/0x50 [] queue_unplugged+0x4e/0xb0 [] blk_flush_plug_list+0x15e/0x200 [] io_schedule+0x75/0xd0 [] get_request+0x167/0x340 [] ? bit_waitqueue+0xe0/0xe0 [] ? elv_merge+0xeb/0xf0 [] blk_queue_bio+0xc8/0x340 [] generic_make_request+0xc0/0x100 [] submit_bio+0x80/0x170 [] ? bio_alloc_bioset+0xa1/0x1e0 [] _submit_bh+0x146/0x220 [] submit_bh+0x10/0x20 [] __block_write_full_page.clone.0+0x1a3/0x340 [] ? I_BDEV+0x10/0x10 [] ? I_BDEV+0x10/0x10 [] block_write_full_page+0xc6/0x100 [] blkdev_writepage+0x18/0x20 [] __writepage+0x17/0x50 [] write_cache_pages+0x244/0x510 [] ? set_page_dirty+0x60/0x60 [] generic_writepages+0x51/0x80 [] do_writepages+0x20/0x40 [] __writeback_single_inode+0x49/0x230 [] ? wake_up_bit+0x2f/0x40 [] writeback_sb_inodes+0x279/0x390 [] ? put_super+0x25/0x40 [] __writeback_inodes_wb+0x9e/0xd0 [] wb_writeback+0x1fb/0x2c0 [] wb_do_writeback+0x100/0x1f0 [] bdi_writeback_workfn+0x70/0x210 [] process_one_work+0x17c/0x420 [] worker_thread+0x123/0x420 [] ? maybe_create_worker+0x180/0x180 [] kthread+0xce/0xf0 [] ? kthread_freezable_should_stop+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? kthread_freezable_should_stop+0x70/0x70 Micha Krause Am 23.09.2014 um 15:37 schrieb Micha Krause: bump I have observed this crash on ubuntu with kernel 3.13 and centos with 3.16 as well now. rbd hangs, and iostat shows something similar to the Output below. Micha Krause Am 19.09.2014 um 09:22 schrieb Micha Krause: Hi, > I have build an NFS Server based on Sebastiens Blog Post here: http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ Im using Kernel 3.14-0.bpo.1-amd64 on Debian wheezy, the host is a VM on Vmware. Using rsync im writing data via nfs from one client to this Server. The NFS Server crashes multiple times per day, I can't even login to the Server then. After a reset, there is no kernel log about the cr
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
bump I have observed this crash on ubuntu with kernel 3.13 and centos with 3.16 as well now. rbd hangs, and iostat shows something similar to the Output below. Micha Krause Am 19.09.2014 um 09:22 schrieb Micha Krause: Hi, > I have build an NFS Server based on Sebastiens Blog Post here: http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ Im using Kernel 3.14-0.bpo.1-amd64 on Debian wheezy, the host is a VM on Vmware. Using rsync im writing data via nfs from one client to this Server. The NFS Server crashes multiple times per day, I can't even login to the Server then. After a reset, there is no kernel log about the crash, so I guess something is blocking all I/Os. Ok, it seems that I just can't get a shell, but I can run commands via ssh directly. I was able to get the following informations: dmesg: [18102.981064] INFO: task nfsd:2769 blocked for more than 120 seconds. [18102.981112] Not tainted 3.14-0.bpo.1-amd64 #1 [18102.981150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18102.981216] nfsdD 88003fc14340 0 2769 2 0x [18102.981218] 88003bac6e20 0046 88003d47ada0 [18102.981219] 00014340 88003ce31fd8 00014340 88003bac6e20 [18102.981221] 88003ce31728 8800029539f0 7fff 7fff [18102.981223] Call Trace: [18102.981225] [] ? schedule_timeout+0x1ed/0x250 [18102.981231] [] ? _xfs_buf_find+0xd2/0x280 [xfs] [18102.981234] [] ? kmem_cache_alloc+0x1bc/0x1f0 [18102.981236] [] ? __down_common+0x97/0xea [18102.981241] [] ? _xfs_buf_find+0xea/0x280 [xfs] [18102.981243] [] ? down+0x37/0x40 [18102.981247] [] ? xfs_buf_lock+0x32/0xf0 [xfs] [18102.981252] [] ? _xfs_buf_find+0xea/0x280 [xfs] [18102.981257] [] ? xfs_buf_get_map+0x35/0x1a0 [xfs] [18102.981263] [] ? xfs_buf_read_map+0x33/0x130 [xfs] [18102.981269] [] ? xfs_trans_read_buf_map+0x34a/0x4f0 [xfs] [18102.981275] [] ? xfs_imap_to_bp+0x69/0xf0 [xfs] [18102.981281] [] ? xfs_iread+0x7d/0x3f0 [xfs] [18102.981284] [] ? make_kgid+0x9/0x10 [18102.981286] [] ? inode_init_always+0x10e/0x1d0 [18102.981292] [] ? xfs_iget+0x2ba/0x810 [xfs] [18102.981298] [] ? xfs_ialloc+0xe6/0x740 [xfs] [18102.981305] [] ? kmem_zone_alloc+0x6e/0xf0 [xfs] [18102.981311] [] ? xfs_dir_ialloc+0x83/0x300 [xfs] [18102.981317] [] ? xfs_trans_reserve+0x213/0x220 [xfs] [18102.981323] [] ? xfs_create+0x4fe/0x720 [xfs] [18102.981329] [] ? xfs_vn_mknod+0xd2/0x200 [xfs] [18102.981331] [] ? vfs_create+0xe4/0x160 [18102.981335] [] ? do_nfsd_create+0x53e/0x610 [nfsd] [18102.981339] [] ? nfsd3_proc_create+0x16d/0x250 [nfsd] [18102.981342] [] ? nfsd_dispatch+0xe4/0x230 [nfsd] [18102.981347] [] ? svc_process_common+0x354/0x690 [sunrpc] [18102.981349] [] ? try_to_wake_up+0x280/0x280 [18102.981353] [] ? svc_process+0x10b/0x160 [sunrpc] [18102.981359] [] ? nfsd+0xb7/0x130 [nfsd] [18102.981363] [] ? nfsd_destroy+0x70/0x70 [nfsd] [18102.981365] [] ? kthread+0xbc/0xe0 [18102.981367] [] ? flush_kthread_worker+0xa0/0xa0 [18102.981369] [] ? ret_from_fork+0x7c/0xb0 [18102.981371] [] ? flush_kthread_worker+0xa0/0xa0 iostat: avg-cpu: %user %nice %system %iowait %steal %idle 0.000.001.00 99.000.000.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-0 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-1 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-2 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-3 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-4 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 rbd0 0.00 0.000.000.00 0.00 0.00 0.00 46.000.000.000.00 0.00 100.00 rbd1 0.00 0.000.000.00 0.00 0.00 0.00 12.000.000.000.00 0.00 100.00 rbd2 0.00 0.000.000.00 0.00 0.00 0.00 136.000.000.000.00 0.00 100.00 rbd3 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 rbd4 0.00 0.000.000.00 0.00 0.00 0.00 11.000.000.000.00 0.00 100.00 rbd5 0.00 0.000.000.00 0.00 0.00 0.00 57.000.000.000.00 0.00 100.00 emcpowerig0.00 0.000.000.00 0.00 0.00
Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server
82453 osd26 37.ae80f3ac rb.0.165171e.238e1f29.0003a13b write 582454 osd26 37.ae80f3ac rb.0.165171e.238e1f29.0003a13b write 582455 osd15 37.fc3cadcd rb.0.165171e.238e1f29.0003a13c write 582456 osd15 37.fc3cadcd rb.0.165171e.238e1f29.0003a13c write 582457 osd19 37.dcc1f244 rb.0.165171e.238e1f29.0003a13d write 582458 osd28 37.c5ce907a rb.0.165171e.238e1f29.0003a13e write 582459 osd28 37.c5ce907a rb.0.165171e.238e1f29.0003a13e write 582460 osd18 37.d7371b26 rb.0.165171e.238e1f29.0003a13f write 582461 osd18 37.89ec9be5 rb.0.165171e.238e1f29.0003a140 write 582462 osd437.5032c82f rb.0.165171e.238e1f29.0003bfe2 write 582463 osd437.5032c82f rb.0.165171e.238e1f29.0003bfe2 write 582464 osd26 37.54a4fd50 rb.0.165171e.238e1f29.0003bfe9 write 582465 osd23 37.2929897b rb.0.165171e.238e1f29.0003c136 write 582466 osd23 37.2929897b rb.0.165171e.238e1f29.0003c136 write 582467 osd20 37.b9aff419 rb.0.165171e.238e1f29.0003dfe1 write 582468 osd20 37.b9aff419 rb.0.165171e.238e1f29.0003dfe1 write 582469 osd24 37.685a8638 rb.0.165171e.238e1f29.0003e08c write 582470 osd26 37.adfd8b12 rb.0.165171e.238e1f29.0003e14a write 582471 osd237.a67386a2 rb.0.165171e.238e1f29.0003e14b write 582472 osd18 37.688ac754 rb.0.165171e.238e1f29.0002802c write 582473 osd15 37.d2bed74d rb.0.1676160.238e1f29.0002000d write 582474 osd23 37.9c9a1a8f rb.0.165171e.238e1f29.00020002 write epoch 39226 flags pg_pool 0 pg_num 256 / 255 pg_pool 1 pg_num 128 / 127 pg_pool 4 pg_num 32 / 31 pg_pool 19 pg_num 512 / 511 pg_pool 25 pg_num 8 / 7 pg_pool 27 pg_num 1 / 0 pg_pool 28 pg_num 1 / 0 pg_pool 29 pg_num 1 / 0 pg_pool 30 pg_num 1 / 0 pg_pool 31 pg_num 1 / 0 pg_pool 32 pg_num 1 / 0 pg_pool 33 pg_num 1 / 0 pg_pool 34 pg_num 1 / 0 pg_pool 35 pg_num 2 / 1 pg_pool 36 pg_num 1 / 0 pg_pool 37 pg_num 64 / 63 pg_pool 40 pg_num 2 / 1 pg_pool 41 pg_num 1 / 0 osd010.210.33.22:6815 100%(exists, up) osd110.210.33.22:6800 100%(exists, up) osd210.210.33.22:6805 100%(exists, up) osd310.210.32.22:6800 0%(doesn't exist) osd410.210.33.22:6810 100%(exists, up) osd510.210.33.22:6820 100%(doesn't exist) osd610.210.33.22:6805 100%(doesn't exist) osd710.210.33.22:6825 100%(doesn't exist) osd810.210.33.22:6830 100%(doesn't exist) osd910.210.33.22:6835 100%(doesn't exist) osd10 10.210.33.22:6805 100%(doesn't exist) osd11 10.210.33.22:6845 100%(doesn't exist) osd12 10.210.33.22:6850 100%(doesn't exist) osd13 10.210.33.22:6855 100%(doesn't exist) osd14 10.210.33.22:6860 100%(doesn't exist) osd15 10.210.32.23:6800 100%(exists, up) osd16 10.210.32.23:6807 100%(exists, up) osd17 10.210.32.23:6801 100%(exists, up) osd18 10.210.32.23:6816 100%(exists, up) osd19 10.210.32.23:6812 100%(exists, up) osd20 10.210.34.21:6800 100%(exists, up) osd21 10.210.34.21:6804 100%(exists, up) osd22 10.210.34.21:6809 100%(exists, up) osd23 10.210.34.21:6814 100%(exists, up) osd24 10.210.34.21:6819 100%(exists, up) osd25 10.210.33.21:6800 100%(exists, up) osd26 10.210.33.21:6805 100%(exists, up) osd27 10.210.33.21:6810 100%(exists, up) osd28 10.210.33.21:6815 100%(exists, up) osd29 10.210.33.21:6820 100%(exists, up) osd30 10.210.33.22:6865 100%(doesn't exist) osd31 10.210.33.22:6870 0%(doesn't exist) osd32 10.210.33.21:6800 0%(doesn't exist) I don't know how to interpret this, the doesn't exist lines are correct, these osds where removed. Why are they still known to the rbd client? The OSDs where removed before the client was booted. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Frequent Crashes on rbd to nfs gateway Server
Hi, I have build an NFS Server based on Sebastiens Blog Post here: http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ Im using Kernel 3.14-0.bpo.1-amd64 on Debian wheezy, the host is a VM on Vmware. Using rsync im writing data via nfs from one client to this Server. The NFS Server crashes multiple times per day, I can't even login to the Server then. After a reset, there is no kernel log about the crash, so I guess something is blocking all I/Os. Any ideas on how to debug this? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't export cephfs via nfs
Hi, > Have you confirmed that if you unmount cephfs on /srv/micha the NFS export works? Yes, im probably hitting this bug: http://tracker.ceph.com/issues/7750 Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't export cephfs via nfs
Hi, Im trying to build a cephfs to nfs gateway, but somehow i can't mount the share if it is backed by cephfs: mount ngw01.ceph:/srv/micha /mnt/tmp/ mount.nfs: Connection timed out cephfs mount on the gateway: 10.210.32.11:6789:/ngw on /srv type ceph (rw,relatime,name=cephfs-ngw,secret=,nodcache,nofsc,acl) This is probably my problem, it works if I export the cephfs root :-( : http://tracker.ceph.com/issues/7750 Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't export cephfs via nfs
Hi, any ideas? Micha Krause Am 11.08.2014 16:34, schrieb Micha Krause: Hi, Im trying to build a cephfs to nfs gateway, but somehow i can't mount the share if it is backed by cephfs: mount ngw01.ceph:/srv/micha /mnt/tmp/ mount.nfs: Connection timed out cephfs mount on the gateway: 10.210.32.11:6789:/ngw on /srv type ceph (rw,relatime,name=cephfs-ngw,secret=,nodcache,nofsc,acl) /etc/exports: /srv/micha 10.6.6.137(rw,no_root_squash,async) /etc 10.6.6.137(rw,no_root_squash,async) I can mount the /etc export with no problem. uname -a Linux ngw01 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1 (2014-07-13) x86_64 GNU/Linux Im using the nfs-kernel-server. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't export cephfs via nfs
Hi, > The NFS crossmnt options can help you. Thanks for the suggestion, I tried it, but it makes no difference. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't export cephfs via nfs
Hi, Im trying to build a cephfs to nfs gateway, but somehow i can't mount the share if it is backed by cephfs: mount ngw01.ceph:/srv/micha /mnt/tmp/ mount.nfs: Connection timed out cephfs mount on the gateway: 10.210.32.11:6789:/ngw on /srv type ceph (rw,relatime,name=cephfs-ngw,secret=,nodcache,nofsc,acl) /etc/exports: /srv/micha 10.6.6.137(rw,no_root_squash,async) /etc 10.6.6.137(rw,no_root_squash,async) I can mount the /etc export with no problem. uname -a Linux ngw01 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1 (2014-07-13) x86_64 GNU/Linux Im using the nfs-kernel-server. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Difference between "ceph osd reweight" and "ceph osd crush reweight"
Hi, "ceph osd crush reweight" sets the CRUSH weight of the OSD. This weight is an arbitrary value (generally the size of the disk in TB or something) and controls how much data the system tries to allocate to the OSD. "ceph osd reweight" sets an override weight on the OSD. This value is in the range 0 to 1, and forces CRUSH to re-place (1-weight) of the data that would otherwise live on this drive. It does *not* change the weights assigned to the buckets above the OSD, and is a corrective measure in case the normal CRUSH distribution isn't working out quite right. (For instance, if one of your OSDs is at 90% and the others are at 50%, you could reduce this weight to try and compensate for it.) thanks, so if I have some older osds, and I want them to receive less data/iops than the other nodes, I would use "ceph osd crush reweight"? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Difference between "ceph osd reweight" and "ceph osd crush reweight"
Hi, could someone explain to me what the difference is between ceph osd reweight and ceph osd crush reweight Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs snapshots : mkdir: cannot create directory `.snap/test': Operation not permitted
Hi, I'm playing around with cephfs, everything works fine except creating snapshots: # mkdir .snap/test mkdir: cannot create directory `.snap/test': Operation not permitted Client Kernel version: 3.14 Ceph Cluster version: 0.80.1 I tried it on 2 different clients, both Debian, one with jessie, one with wheezy + backports kernel. Is there some config-option to enable snapshots, or is this a bug? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw setting puplic ACLs fails.
Hi, >> So how does AWS S3 handle Public access to objects? You have to explicitly set public ACL on each object. Ok, but this also does not work with radosgw + s3cmd: s3cmd setacl -P s3://test/fstab ERROR: S3 error: 403 (AccessDenied): Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw setting puplic ACLs fails.
Hi, > Note this breaks AWS S3 compatibility and is why it is a configurable. So how does AWS S3 handle Public access to objects? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw setting puplic ACLs fails.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, No solution so far, but I also asked in IRC and linuxkidd told me they where looking for a workaround. Micha Krause -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKVrncACgkQfAR45tA28LhUqQCeMcR430bhaYFncB2/NFTcJIM1 zmcAoICqWwjkMfNjP2yolxBeKI0IvDgJ =rNFL -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw setting puplic ACLs fails.
Hi, I'm trying to set public ACLs to an object, so that I can access the object via Web-browser. unfortunately without success: s3cmd setacl --acl-public s3://test/hosts ERROR: S3 error: 403 (AccessDenied): The radosgw log says: x-amz-date:Fri, 08 Nov 2013 12:56:55 + /test/hosts?acl 2013-11-08 13:56:55.090604 7fe3314c6700 15 calculated digest=K6fFJdBvy1YXZw0kqZ7qt6sRkzk= 2013-11-08 13:56:55.090606 7fe3314c6700 15 auth_sign=K6fFJdBvy1YXZw0kqZ7qt6sRkzk= 2013-11-08 13:56:55.090607 7fe3314c6700 15 compare=0 2013-11-08 13:56:55.090610 7fe3314c6700 2 req 60:0.000290:s3:PUT /hosts:put_acls:reading permissions 2013-11-08 13:56:55.090621 7fe3314c6700 20 get_obj_state: rctx=0xf32a50 obj=.rgw:test state=0xf21888 s->prefetch_data=0 2013-11-08 13:56:55.090630 7fe3314c6700 10 moving .rgw+test to cache LRU end 2013-11-08 13:56:55.090632 7fe3314c6700 10 cache get: name=.rgw+test : hit 2013-11-08 13:56:55.090635 7fe3314c6700 20 get_obj_state: s->obj_tag was set empty 2013-11-08 13:56:55.090637 7fe3314c6700 20 Read xattr: user.rgw.idtag 2013-11-08 13:56:55.090639 7fe3314c6700 20 Read xattr: user.rgw.manifest 2013-11-08 13:56:55.090641 7fe3314c6700 10 moving .rgw+test to cache LRU end 2013-11-08 13:56:55.090642 7fe3314c6700 10 cache get: name=.rgw+test : hit 2013-11-08 13:56:55.090650 7fe3314c6700 20 rgw_get_bucket_info: bucket instance: test(@{i=.rgw.buckets.index}.rgw.buckets[default.4212.2]) 2013-11-08 13:56:55.090654 7fe3314c6700 20 reading from .rgw:.bucket.meta.test:default.4212.2 2013-11-08 13:56:55.090659 7fe3314c6700 20 get_obj_state: rctx=0xf32a50 obj=.rgw:.bucket.meta.test:default.4212.2 state=0xf39678 s->prefetch_data=0 2013-11-08 13:56:55.090663 7fe3314c6700 10 moving .rgw+.bucket.meta.test:default.4212.2 to cache LRU end 2013-11-08 13:56:55.090665 7fe3314c6700 10 cache get: name=.rgw+.bucket.meta.test:default.4212.2 : hit 2013-11-08 13:56:55.090668 7fe3314c6700 20 get_obj_state: s->obj_tag was set empty 2013-11-08 13:56:55.090670 7fe3314c6700 20 Read xattr: user.rgw.acl 2013-11-08 13:56:55.090671 7fe3314c6700 20 Read xattr: user.rgw.idtag 2013-11-08 13:56:55.090672 7fe3314c6700 20 Read xattr: user.rgw.manifest 2013-11-08 13:56:55.090674 7fe3314c6700 10 moving .rgw+.bucket.meta.test:default.4212.2 to cache LRU end 2013-11-08 13:56:55.090676 7fe3314c6700 10 cache get: name=.rgw+.bucket.meta.test:default.4212.2 : hit 2013-11-08 13:56:55.090690 7fe3314c6700 15 Read AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/";>testTesthttp://www.w3.org/2001/XMLSchema-instance"; xsi:type="CanonicalUser">testTestFULL_CONTROL 2013-11-08 13:56:55.090702 7fe3314c6700 20 get_obj_state: rctx=0xf32a50 obj=test:hosts state=0xf633e8 s->prefetch_data=0 2013-11-08 13:56:55.093871 7fe3314c6700 10 manifest: total_size = 156 2013-11-08 13:56:55.093875 7fe3314c6700 10 manifest: ofs=0 loc=test:hosts 2013-11-08 13:56:55.093876 7fe3314c6700 20 get_obj_state: setting s->obj_tag to default.4212.50 2013-11-08 13:56:55.093882 7fe3314c6700 15 Read AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/";>testTesthttp://www.w3.org/2001/XMLSchema-instance"; xsi:type="CanonicalUser">testTestFULL_CONTROL 2013-11-08 13:56:55.093889 7fe3314c6700 2 req 60:0.003568:s3:PUT /hosts:put_acls:verifying op mask 2013-11-08 13:56:55.093894 7fe3314c6700 20 required_mask= 2 user.op_mask=7 2013-11-08 13:56:55.093896 7fe3314c6700 2 req 60:0.003576:s3:PUT /hosts:put_acls:verifying op permissions 2013-11-08 13:56:55.093900 7fe3314c6700 5 Searching permissions for uid=test mask=56 2013-11-08 13:56:55.093903 7fe3314c6700 5 Found permission: 15 2013-11-08 13:56:55.093905 7fe3314c6700 5 Searching permissions for group=1 mask=56 2013-11-08 13:56:55.093907 7fe3314c6700 5 Permissions for group not found 2013-11-08 13:56:55.093909 7fe3314c6700 5 Getting permissions id=test owner=test perm=8 2013-11-08 13:56:55.093912 7fe3314c6700 10 uid=test requested perm (type)=8, policy perm=8, user_perm_mask=15, acl perm=8 2013-11-08 13:56:55.093914 7fe3314c6700 2 req 60:0.003593:s3:PUT /hosts:put_acls:verifying op params 2013-11-08 13:56:55.093916 7fe3314c6700 2 req 60:0.003596:s3:PUT /hosts:put_acls:executing 2013-11-08 13:56:55.093938 7fe3314c6700 15 read len=343 data=http://s3.amazonaws.com/doc/2006-03-01/";>http://www.w3.org/2001/XMLSchema-instance"; xsi:type="Group">http://acs.amazonaws.com/groups/global/AllUsersREAD 2013-11-08 13:56:55.094007 7fe3314c6700 15 Old AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/";>http://www.w3.org/2001/XMLSchema-instance"; xsi:type="Group">http://acs.amazonaws.com/groups/global/AllUsersREAD 2013-11-08 13:56:55.094066 7fe3314c6700 2 req 60:0.003745:s3:PUT /hosts:put_acls:http status=403 2013-11-08 13:56:55.094209 7fe3314c6700 1 == req done req=0xf68e20 http_status=403 == 2013-11-08 13:57:03.324082 7fe35d922700 2 RGWDataChangesLog::ChangesR