Re: [ceph-users] Ensure Hammer client compatibility

2018-08-20 Thread Lincoln Bryant
Hi Kees,

What interfaces do your Hammer clients need? If you're looking at
CephFS, we have had reasonable success moving our older clients (EL6)
to NFS Ganesha with the Ceph FSAL.

--Lincoln

On Mon, 2018-08-20 at 12:22 +0200, Kees Meijs wrote:
> Good afternoon Cephers,
> 
> While I'm fixing our upgrade-semi-broken cluster (see thread Upgrade
> to Infernalis: failed to pick suitable auth object) I'm wondering
> about ensuring client compatibility.
> 
> My end goal is BlueStore (i.e. running Luminous) and unfortunately
> I'm obliged to offer Hammer client compatibility.
> 
> Any pointers on how to ensure this configuration-wise?
> 
> Thanks!
> 
> Regards,
> Kees
> 
> -- 
> https://nefos.nl/contact 
> 
> Nefos IT bv
> Ambachtsweg 25 (industrienummer 4217)
> 5627 BZ Eindhoven
> Nederland
> 
> KvK 66494931
> 
> Aanwezig op maandag, dinsdag, woensdag en vrijdag
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-23 Thread Lincoln Bryant
Hi Sam,

What happens if you just disable swap altogether? i.e., with `swapoff
-a`

--Lincoln

On Tue, 2018-01-23 at 19:54 +, Samuel Taylor Liston wrote:
> We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos
> 7.4.  The OSDs are configured with encryption.  The cluster is
> accessed via two - RGWs  and there are 3 - mon servers.  The data
> pool is using 6+3 erasure coding.
> 
> About 2 weeks ago I found two of the nine servers wedged and had to
> hard power cycle them to get them back.  In this hard reboot 22 -
> OSDs came back with either a corrupted encryption or data
> partitions.  These OSDs were removed and recreated, and the resultant
> rebalance moved along just fine for about a week.  At the end of that
> week two different nodes were unresponsive complaining of page
> allocation failures.  This is when I realized the nodes were heavy
> into swap.  These nodes were configured with 64GB of RAM as a cost
> saving going against the 1GB per 1TB recommendation.  We have since
> then doubled the RAM in each of the nodes giving each of them more
> than the 1GB per 1TB ratio.  
> 
> The issue I am running into is that these nodes are still swapping; a
> lot, and over time becoming unresponsive, or throwing page allocation
> failures.  As an example, “free” will show 15GB of RAM usage (out of
> 128GB) and 32GB of swap.  I have configured swappiness to 0 and and
> also turned up the vm.min_free_kbytes to 4GB to try to keep the
> kernel happy, and yet I am still filling up swap.  It only occurs
> when the OSDs have mounted partitions and ceph-osd daemons active. 
> 
> Anyone have an idea where this swap usage might be coming from? 
> Thanks for any insight,
> 
> Sam Liston (sam.lis...@utah.edu)
> 
> Center for High Performance Computing
> 155 S. 1452 E. Rm 405
> Salt Lake City, Utah 84112 (801)232-6932
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-08 Thread Lincoln Bryant
Hi Alessandro,

What is the state of your PGs? Inactive PGs have blocked CephFS
recovery on our cluster before. I'd try to clear any blocked ops and
see if the MDSes recover.

--Lincoln

On Mon, 2018-01-08 at 17:21 +0100, Alessandro De Salvo wrote:
> Hi,
> 
> I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded.
> 
> I have 2 active mds instances and 1 standby. All the active
> instances 
> are now in replay state and show the same error in the logs:
> 
> 
>  mds1 
> 
> 2018-01-08 16:04:15.765637 7fc2e92451c0  0 ceph version 12.2.2 
> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),
> process 
> (unknown), pid 164
> starting mds.mds1 at -
> 2018-01-08 16:04:15.785849 7fc2e92451c0  0 pidfile_write: ignore
> empty 
> --pid-file
> 2018-01-08 16:04:20.168178 7fc2e1ee1700  1 mds.mds1 handle_mds_map
> standby
> 2018-01-08 16:04:20.278424 7fc2e1ee1700  1 mds.1.20635 handle_mds_map
> i 
> am now mds.1.20635
> 2018-01-08 16:04:20.278432 7fc2e1ee1700  1 mds.1.20635
> handle_mds_map 
> state change up:boot --> up:replay
> 2018-01-08 16:04:20.278443 7fc2e1ee1700  1 mds.1.20635 replay_start
> 2018-01-08 16:04:20.278449 7fc2e1ee1700  1 mds.1.20635  recovery set
> is 0
> 2018-01-08 16:04:20.278458 7fc2e1ee1700  1 mds.1.20635  waiting for 
> osdmap 21467 (which blacklists prior instance)
> 
> 
>  mds2 
> 
> 2018-01-08 16:04:16.870459 7fd8456201c0  0 ceph version 12.2.2 
> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),
> process 
> (unknown), pid 295
> starting mds.mds2 at -
> 2018-01-08 16:04:16.881616 7fd8456201c0  0 pidfile_write: ignore
> empty 
> --pid-file
> 2018-01-08 16:04:21.274543 7fd83e2bc700  1 mds.mds2 handle_mds_map
> standby
> 2018-01-08 16:04:21.314438 7fd83e2bc700  1 mds.0.20637 handle_mds_map
> i 
> am now mds.0.20637
> 2018-01-08 16:04:21.314459 7fd83e2bc700  1 mds.0.20637
> handle_mds_map 
> state change up:boot --> up:replay
> 2018-01-08 16:04:21.314479 7fd83e2bc700  1 mds.0.20637 replay_start
> 2018-01-08 16:04:21.314492 7fd83e2bc700  1 mds.0.20637  recovery set
> is 1
> 2018-01-08 16:04:21.314517 7fd83e2bc700  1 mds.0.20637  waiting for 
> osdmap 21467 (which blacklists prior instance)
> 2018-01-08 16:04:21.393307 7fd837aaf700  0 mds.0.cache creating
> system 
> inode with ino:0x100
> 2018-01-08 16:04:21.397246 7fd837aaf700  0 mds.0.cache creating
> system 
> inode with ino:0x1
> 
> The cluster is recovering as we are changing some of the osds, and
> there 
> are a few slow/stuck requests, but I'm not sure if this is the cause,
> as 
> there is apparently no data loss (until now).
> 
> How can I force the MDSes to quit the replay state?
> 
> Thanks for any help,
> 
> 
>  Alessandro
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Lincoln Bryant
Hi Sage,

We have been running the Ganesha FSAL for a while (as far back as Hammer / 
Ganesha 2.2.0), primarily for uid/gid squashing.

Things are basically OK for our application, but we've seen the following 
weirdness*:
- Sometimes there are duplicated entries when directories are listed. 
Same filename, same inode, just shows up twice in 'ls'.
- There can be a considerable latency between new files added to CephFS 
and those files becoming visible on our NFS clients. I understand this might be 
related to dentry caching. 
- Occasionally, the Ganesha FSAL seems to max out at 100,000 caps 
claimed which don't get released until the MDS is restarted.

*note: these issues are with Ganesha 2.2.0 and Hammer/Jewel, and have perhaps 
since been fixed upstream. 

(We've recently updated to Luminous / Ganesha 2.5.2, and will be happy to 
complain if any issues show up :))

Cheers,
Lincoln

> On Nov 8, 2017, at 3:41 PM, Sage Weil  wrote:
> 
> Who is running nfs-ganesha's FSAL to export CephFS?  What has your 
> experience been?
> 
> (We are working on building proper testing and support for this into 
> Mimic, but the ganesha FSAL has been around for years.)
> 
> Thanks!
> sage
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PG won't repair

2017-10-20 Thread Lincoln Bryant
Hi Rich,

Is the object inconsistent and 0-bytes on all OSDs?

We ran into a similar issue on Jewel, where an object was empty across the 
board but had inconsistent metadata. Ultimately it was resolved by doing a 
"rados get" and then a "rados put" on the object. *However* that was a last 
ditch effort after I couldn't get any other repair option to work, and I have 
no idea if that will cause any issues down the road :)

--Lincoln

> On Oct 20, 2017, at 10:16 AM, Richard Bade  wrote:
> 
> Hi Everyone,
> In our cluster running 0.94.10 we had a pg pop up as inconsistent
> during scrub. Previously when this has happened running ceph pg repair
> [pg_num] has resolved the problem. This time the repair runs but it
> remains inconsistent.
> ~$ ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 2 scrub errors; noout flag(s) set
> pg 3.f05 is active+clean+inconsistent, acting [171,23,131]
> 1 scrub errors
> 
> The error in the logs is:
> cstor01 ceph-mon: osd.171 10.233.202.21:6816/12694 45 : deep-scrub
> 3.f05 3/68ab5f05/rbd_data.19cdf512ae8944a.0001bb56/snapdir
> expected clone 3/68ab5f05/rbd_data.19cdf512ae8944a.0001bb56/148d2
> 
> Now, I've tried several things to resolve this. I've tried stopping
> each of the osd's in turn and running a repair. I've located the rbd
> image and removed it to empty out the object. The object is now zero
> bytes but still inconsistent. I've tried stopping each osd, removing
> the object and starting the osd again. It correctly identifies the
> object as missing and repair works to fix this but it still remains
> inconsistent.
> I've run out of ideas.
> The object is now zero bytes:
> ~$ find /var/lib/ceph/osd/ceph-23/current/3.f05_head/ -name
> "*19cdf512ae8944a.0001bb56*" -ls
> 537598582  0 -rw-r--r--   1 root root0 Oct 21
> 03:54 
> /var/lib/ceph/osd/ceph-23/current/3.f05_head/DIR_5/DIR_0/DIR_F/DIR_5/DIR_B/rbd\\udata.19cdf512ae8944a.0001bb56__snapdir_68AB5F05__3
> 
> How can I resolve this? Is there some way to remove the empty object
> completely? I saw reference to ceph-objectstore-tool which has some
> options to remove-clone-metadata but I don't know how to use this.
> Will using this to remove the mentioned 148d2 expected clone resolve
> this? Or would this do the opposite as it would seem that it can't
> find that clone?
> Documentation on this tool is sparse.
> 
> Any help here would be appreciated.
> 
> Regards,
> Rich
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade Hammer>Jewel>Luminous OSD fail to start

2017-09-12 Thread Lincoln Bryant
Did you set the sortbitwise flag, fix OSD ownership (or use the "setuser match 
path" option) and such after upgrading from Hammer to Jewel? I am not sure if 
that matters here, but it might help if you elaborate on your upgrade process a 
bit.

--Lincoln

> On Sep 12, 2017, at 2:22 PM, kevin parrikar  wrote:
> 
> Can some one please help me on this.I have no idea how to bring up the 
> cluster to operational state.
> 
> Thanks,
> Kev
> 
> On Tue, Sep 12, 2017 at 11:12 AM, kevin parrikar  
> wrote:
> hello All,
> I am trying to upgrade a small test setup having one monitor and one osd node 
> which is in hammer release .
> 
> 
> I updating from hammer to jewel using package update commands and things are 
> working.
> How ever after updating from Jewel to Luminous, i am facing issues with osd 
> failing to start .
> 
> upgraded packages on both nodes and i can see in "ceph mon versions" is 
> successful 
> 
>  ceph mon versions
> {
> "ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous 
> (rc)": 1
> }
> 
> but  ceph osd versions returns empty strig
>  
> 
> ceph osd versions
> {}
> 
> 
> dpkg --list|grep ceph
> ii  ceph 12.2.0-1trusty   
>   amd64distributed storage and file system
> ii  ceph-base12.2.0-1trusty   
>   amd64common ceph daemon libraries and management tools
> ii  ceph-common  12.2.0-1trusty   
>   amd64common utilities to mount and interact with a ceph storage 
> cluster
> ii  ceph-deploy  1.5.38   
>   all  Ceph-deploy is an easy to use configuration tool
> ii  ceph-mgr 12.2.0-1trusty   
>   amd64manager for the ceph distributed storage system
> ii  ceph-mon 12.2.0-1trusty   
>   amd64monitor server for the ceph storage system
> ii  ceph-osd 12.2.0-1trusty   
>   amd64OSD server for the ceph storage system
> ii  libcephfs1   10.2.9-1trusty   
>   amd64Ceph distributed file system client library
> ii  libcephfs2   12.2.0-1trusty   
>   amd64Ceph distributed file system client library
> ii  python-cephfs12.2.0-1trusty   
>   amd64Python 2 libraries for the Ceph libcephfs library
> 
> from OSD log:
> 
> 2017-09-12 05:38:10.618023 7fc307a10d00  0 set uid:gid to 64045:64045 
> (ceph:ceph)
> 2017-09-12 05:38:10.618618 7fc307a10d00  0 ceph version 12.2.0 
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), 
> pid 21513
> 2017-09-12 05:38:10.624473 7fc307a10d00  0 pidfile_write: ignore empty 
> --pid-file
> 2017-09-12 05:38:10.633099 7fc307a10d00  0 load: jerasure load: lrc load: isa
> 2017-09-12 05:38:10.633657 7fc307a10d00  0 
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2017-09-12 05:38:10.635164 7fc307a10d00  0 
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2017-09-12 05:38:10.637503 7fc307a10d00  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
> ioctl is disabled via 'filestore fiemap' config option
> 2017-09-12 05:38:10.637833 7fc307a10d00  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: 
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2017-09-12 05:38:10.637923 7fc307a10d00  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() 
> is disabled via 'filestore splice' config option
> 2017-09-12 05:38:10.639047 7fc307a10d00  0 
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
> syscall fully supported (by glibc and kernel)
> 2017-09-12 05:38:10.639501 7fc307a10d00  0 
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is 
> disabled by conf
> 2017-09-12 05:38:10.640417 7fc307a10d00  0 
> filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
> 2017-09-12 05:38:10.640842 7fc307a10d00  1 leveldb: Recovering log #102
> 2017-09-12 05:38:10.642690 7fc307a10d00  1 leveldb: Delete type=0 #102
> 
> 2017-09-12 05:38:10.643128 7fc307a10d00  1 leveldb: Delete type=3 #101
> 
> 2017-09-12 05:38:10.649616 7fc307a10d00  0 
> filestore(/var/lib/ceph/osd/ceph-0) mount(1758): enabling WRITEAHEAD journal 
> mode: checkpoint is not enabled
> 2017-09-12 05:38:10.654071 7fc307a10d00 -1 journal FileJournal::_open: 
> disabling aio for non-block journal.  Use journal_force_aio to force use of 
> aio anyway
> 2017-09-12 05:38:10.654590 7fc307a10d00  1 journal _open 
> /var/lib/ceph/osd/ceph-0/journal fd 28: 2147483648 

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-08-08 Thread Lincoln Bryant
Hi all, 

Apologies for necromancing an old thread, but I was wondering if anyone had any 
more thoughts on this. We're running v10.2.9 now and still have 3 PGs 
exhibiting this behavior in our cache pool after scrubs, deep-scrubs, and 
repair attempts. Some more information below.

Thanks much,
Lincoln


[1]

# rados list-inconsistent-obj 36.14f0 | jq
{
  "epoch": 820795,
  "inconsistents": [
{
  "object": {
"name": "1002378e2a6.0001",
"nspace": "",
"locator": "",
"snap": "head",
"version": 2251698
  },
  "errors": [],
  "union_shard_errors": [
"size_mismatch_oi"
  ],
  "selected_object_info": 
"36:0f29a1d4:::1002378e2a6.0001:head(737930'2208087 
client.36346283.1:5757188 dirty
 s 4136960 uv 2251698 alloc_hint [0 0])",
  "shards": [
{
  "osd": 173,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 242,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 295,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
}
  ]
}
  ]
}

2017-08-08 13:26:23.243245 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 shard 212 missing 36:a13626c6:::1002378e9a9.0001:head
2017-08-08 13:26:23.243250 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 shard 295: soid 36:a13626c6:::1002378e9a9.0001:head size 0 != size 
4173824 from auth oi 36:a13626c6:::1002378e9a9.0001:head(737930'2123468 
client.36346283.1:5782375 dirty s 4173824 uv 2164627 alloc_hint [0 0])
2017-08-08 13:26:23.243253 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 shard 353 missing 36:a13626c6:::1002378e9a9.0001:head
2017-08-08 13:26:23.243255 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 soid 36:a13626c6:::1002378e9a9.0001:head: failed to pick suitable 
auth object
2017-08-08 13:26:23.243362 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
scrub 36.2c85 36:a13626c6:::1002378e9a9.0001:head on disk size (0) does not 
match object info size (4173824) adjusted for ondisk to (4173824)
2017-08-08 13:26:34.310237 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 scrub 4 errors

> On May 15, 2017, at 5:28 PM, Gregory Farnum <gfar...@redhat.com> wrote:
> 
> 
> 
> On Mon, May 15, 2017 at 3:19 PM Lincoln Bryant <linco...@uchicago.edu> wrote:
> Hi Greg,
> 
> Curiously, some of these scrub errors went away on their own. The example pg 
> in the original post is now active+clean, and nothing interesting in the logs:
> 
> # zgrep "36.277b" ceph-osd.244*gz
> ceph-osd.244.log-20170510.gz:2017-05-09 06:56:40.739855 7f0184623700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170510.gz:2017-05-09 06:58:01.872484 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> ceph-osd.244.log-20170511.gz:2017-05-10 20:40:47.536974 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170511.gz:2017-05-10 20:41:38.399614 7f0184623700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> ceph-osd.244.log-20170514.gz:2017-05-13 20:49:47.063789 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170514.gz:2017-05-13 20:50:42.085718 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> ceph-osd.244.log-20170515.gz:2017-05-15 00:10:39.417578 7f0184623700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170515.gz:2017-05-15 00:11:26.189777 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> 
> (No matches in the logs for osd 175 and osd 297  — perhaps already rotated 
> away?)
> 
> Other PGs still exhibit this behavior though:
> 
> # rados list-inconsistent-obj 36.2953 | jq .
> {
>   "epoch": 737940,
>   "inconsistents": [
> {
>   "object": {
> "name": "1002378da6c.0001",
> "nspace": "",
> "locator": "",
> "snap": "head",
> "version": 2213621
>   },
>   "errors": [],
>   "union_shard_errors": [
> "size_mismatch_oi"
>   ],
>   "selected_object_info": 
> "36:ca95a23b:::1002378da6c.0001:head(737930'2177823 
> clie

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-17 Thread Lincoln Bryant

Hi Anton,

We observe something similar on our OSDs going from 10.2.7 to 10.2.9 
(see thread "some OSDs stuck down after 10.2.7 -> 10.2.9 update"). Some 
of our OSDs are not working at all on 10.2.9 or die with suicide 
timeouts. Those that come up/in take a very long time to boot up. Seems 
to not affect every OSD in our case though.


--Lincoln

On 7/17/2017 1:29 AM, Anton Dmitriev wrote:
During start it consumes ~90% CPU, strace shows, that OSD process 
doing something with LevelDB.

Compact is disabled:
r...@storage07.main01.ceph.apps.prod.int.grcc:~$ cat 
/etc/ceph/ceph.conf | grep compact

#leveldb_compact_on_mount = true

But with debug_leveldb=20 I see, that compaction is running, but why?

2017-07-17 09:27:37.394008 7f4ed2293700  1 leveldb: Compacting 1@1 + 
12@2 files
2017-07-17 09:27:37.593890 7f4ed2293700  1 leveldb: Generated table 
#76778: 277817 keys, 2125970 bytes
2017-07-17 09:27:37.718954 7f4ed2293700  1 leveldb: Generated table 
#76779: 221451 keys, 2124338 bytes
2017-07-17 09:27:37.777362 7f4ed2293700  1 leveldb: Generated table 
#76780: 63755 keys, 809913 bytes
2017-07-17 09:27:37.919094 7f4ed2293700  1 leveldb: Generated table 
#76781: 231475 keys, 2026376 bytes
2017-07-17 09:27:38.035906 7f4ed2293700  1 leveldb: Generated table 
#76782: 190956 keys, 1573332 bytes
2017-07-17 09:27:38.127597 7f4ed2293700  1 leveldb: Generated table 
#76783: 148675 keys, 1260956 bytes
2017-07-17 09:27:38.286183 7f4ed2293700  1 leveldb: Generated table 
#76784: 294105 keys, 2123438 bytes
2017-07-17 09:27:38.469562 7f4ed2293700  1 leveldb: Generated table 
#76785: 299617 keys, 2124267 bytes
2017-07-17 09:27:38.619666 7f4ed2293700  1 leveldb: Generated table 
#76786: 277305 keys, 2124936 bytes
2017-07-17 09:27:38.711423 7f4ed2293700  1 leveldb: Generated table 
#76787: 110536 keys, 951545 bytes
2017-07-17 09:27:38.869917 7f4ed2293700  1 leveldb: Generated table 
#76788: 296199 keys, 2123506 bytes
2017-07-17 09:27:39.028395 7f4ed2293700  1 leveldb: Generated table 
#76789: 248634 keys, 2096715 bytes
2017-07-17 09:27:39.028414 7f4ed2293700  1 leveldb: Compacted 1@1 + 
12@2 files => 21465292 bytes
2017-07-17 09:27:39.053288 7f4ed2293700  1 leveldb: compacted to: 
files[ 0 0 48 549 948 0 0 ]

2017-07-17 09:27:39.054014 7f4ed2293700  1 leveldb: Delete type=2 #76741

Strace:

open("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", O_RDONLY) = 18
stat("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", 
{st_mode=S_IFREG|0644, st_size=2154394, ...}) = 0

mmap(NULL, 2154394, PROT_READ, MAP_SHARED, 18, 0) = 0x7f96a67a
close(18)   = 0
brk(0x55d15664) = 0x55d15664
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, 

[ceph-users] some OSDs stuck down after 10.2.7 -> 10.2.9 update

2017-07-15 Thread Lincoln Bryant

Hi all,

After updating to 10.2.9, some of our SSD-based OSDs get put into "down" 
state and die as in [1].


After bringing these OSDs back up, they sit at 100% CPU utilization and 
never become up/in. From the log I see (from [2]):
heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1cfad0d700' had 
timed out after 1

before they ultimately crash.

Stracing them, I see them chewing on omaps for a while and then they 
seem to do nothing, but CPU utilization is still quite high.


I downgraded (inadvisable, I know) these OSDs to 10.2.7 and they come 
back happily.  I tried setting debug_osd = 20, debug_filestore = 20, 
debug_ms = 20, debug_auth = 20, debug_leveldb = 20 but it didn't seem 
like there was any additional information in the logs.


Does anyone have any clues how to debug this further? I'm a bit worried 
about running a mix of 10.2.7 and 10.2.9 OSDs in my pool.


For what it's worth, the SSD OSDs in this CRUSH root are serving CephFS 
metadata. Other OSDs (spinners in EC and replicated pools) are 
completely OK as far as I can tell. All hosts are EL7.


Thanks,
Lincoln

[1]
-8> 2017-07-15 13:21:51.959502 7f9d23a2a700  1 -- 
192.170.226.253:0/2474101 <== osd.456 192.170.226.250:6807/3547149 1293 
 osd_ping(ping_reply e818277 stamp 2017-07-15 13:21:51.958432) v2 
 47+0+0 (584190599 0 0) 0x7f9dd6a93000

 con 0x7f9dcf4d2300
-7> 2017-07-15 13:21:51.959578 7f9d2b26b700  1 -- 
192.170.226.253:0/2474101 <== osd.461 192.170.226.255:6814/4575940 1295 
 osd_ping(ping_reply e818277 stamp 2017-07-15 13:21:51.958432) v2 
 47+0+0 (584190599 0 0) 0x7f9d9a1c9200

 con 0x7f9dc38fff80
-6> 2017-07-15 13:21:51.959597 7f9d2b46d700  1 -- 
192.170.226.253:0/2474101 <== osd.460 192.170.226.254:6851/2545858 1290 
 osd_ping(ping_reply e818277 stamp 2017-07-15 13:21:51.958432) v2 
 47+0+0 (584190599 0 0) 0x7f9d9a1c7600

 con 0x7f9dc3900a00
-5> 2017-07-15 13:21:51.959612 7f9d1e14f700  1 -- 
192.170.226.253:0/2474101 <== osd.434 192.170.226.242:6803/3058582 1293 
 osd_ping(ping_reply e818277 stamp 2017-07-15 13:21:51.958432) v2 
 47+0+0 (584190599 0 0) 0x7f9dc78c0800

 con 0x7f9d7aebae80
-4> 2017-07-15 13:21:51.959650 7f9d19792700  1 -- 
192.170.226.253:0/2474101 <== osd.437 192.170.226.245:6818/2299326 1277 
 osd_ping(ping_reply e818277 stamp 2017-07-15 13:21:51.958432) v2 
 47+0+0 (584190599 0 0) 0x7f9dc78c0200

 con 0x7f9dd0c0ba80
-3> 2017-07-15 13:21:51.959666 7f9d5d940700  1 -- 
192.170.226.253:0/2474101 <== osd.460 192.170.226.254:6849/2545858 1290 
 osd_ping(ping_reply e818277 stamp 2017-07-15 13:21:51.958432) v2 
 47+0+0 (584190599 0 0) 0x7f9d9a1c8200

 con 0x7f9dc38ff500
-2> 2017-07-15 13:21:52.085120 7f9d659a2700  1 heartbeat_map 
is_healthy 'OSD::osd_op_tp thread 0x7f9ce0504700' had timed out after 15
-1> 2017-07-15 13:21:52.085130 7f9d659a2700  1 heartbeat_map 
is_healthy 'OSD::osd_op_tp thread 0x7f9ce0504700' had suicide timed out 
after 150
 0> 2017-07-15 13:21:52.108248 7f9d659a2700 -1 
common/HeartbeatMap.cc: In function 'bool 
ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, 
time_t)' thread 7f9d659a2700 time 2017-07-15 13:21:52.085137

common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")

 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x85) [0x7f9d6bb0f4a5]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char 
const*, long)+0x2e1) [0x7f9d6ba4e541]

 3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7f9d6ba4ed9e]
 4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f9d6ba4f57c]
 5: (CephContextServiceThread::entry()+0x15b) [0x7f9d6bb2724b]
 6: (()+0x7dc5) [0x7f9d69a26dc5]
 7: (clone()+0x6d) [0x7f9d680b173d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.





[2]

2017-07-15 14:35:23.730434 7f1d98bde800  0 ceph version 10.2.9 
(2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 2559209
2017-07-15 14:35:23.731923 7f1d98bde800  0 pidfile_write: ignore empty 
--pid-file
2017-07-15 14:35:23.772858 7f1d98bde800  0 
filestore(/var/lib/ceph/osd/ceph-459) backend xfs (magic 0x58465342)
2017-07-15 14:35:23.773367 7f1d98bde800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-459) detect_features: 
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-07-15 14:35:23.773374 7f1d98bde800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-459) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-07-15 14:35:23.773393 7f1d98bde800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-459) detect_features: 
splice is supported
2017-07-15 14:35:24.148987 7f1d98bde800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-459) detect_features: 
syncfs(2) syscall fully supported (by glibc and kernel)
2017-07-15 14:35:24.149090 7f1d98bde800  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-459) 

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-15 Thread Lincoln Bryant
Hi Greg,

Curiously, some of these scrub errors went away on their own. The example pg in 
the original post is now active+clean, and nothing interesting in the logs:

# zgrep "36.277b" ceph-osd.244*gz
ceph-osd.244.log-20170510.gz:2017-05-09 06:56:40.739855 7f0184623700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170510.gz:2017-05-09 06:58:01.872484 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub ok
ceph-osd.244.log-20170511.gz:2017-05-10 20:40:47.536974 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170511.gz:2017-05-10 20:41:38.399614 7f0184623700  0 
log_channel(cluster) log [INF] : 36.277b scrub ok
ceph-osd.244.log-20170514.gz:2017-05-13 20:49:47.063789 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170514.gz:2017-05-13 20:50:42.085718 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub ok
ceph-osd.244.log-20170515.gz:2017-05-15 00:10:39.417578 7f0184623700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170515.gz:2017-05-15 00:11:26.189777 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub ok

(No matches in the logs for osd 175 and osd 297  — perhaps already rotated 
away?)

Other PGs still exhibit this behavior though:

# rados list-inconsistent-obj 36.2953 | jq .
{
  "epoch": 737940,
  "inconsistents": [
{
  "object": {
"name": "1002378da6c.0001",
"nspace": "",
"locator": "",
"snap": "head",
"version": 2213621
  },
  "errors": [],
  "union_shard_errors": [
"size_mismatch_oi"
  ],
  "selected_object_info": 
"36:ca95a23b:::1002378da6c.0001:head(737930'2177823 
client.36346283.1:5635626 dirty s 4067328 uv 2213621)",
  "shards": [
{
  "osd": 113,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 123,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 173,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
}
  ]
}
  ]
}

Perhaps new data being written to this pg cleared things up? 

The only other data point that I can add is that, due to some tweaking of the 
cache tier size before this happened, the cache tier was reporting near full / 
full in `ceph -s` for a brief amount of time (maybe <1hr ?). 

Thanks for looking into this.

--Lincoln

> On May 15, 2017, at 4:50 PM, Gregory Farnum <gfar...@redhat.com> wrote:
> 
> On Mon, May 1, 2017 at 9:28 AM, Lincoln Bryant <linco...@uchicago.edu> wrote:
>> Hi all,
>> 
>> I’ve run across a peculiar issue on 10.2.7. On my 3x replicated cache 
>> tiering cache pool, routine scrubbing suddenly found a bunch of PGs with 
>> size_mismatch_oi errors. From the “rados list-inconsistent-pg tool”[1], I 
>> see that all OSDs are reporting size 0 for a particular pg. I’ve checked 
>> this pg on disk, and it is indeed 0 bytes:
>>-rw-r--r--  1 root root0 Apr 29 06:12 
>> 100235614fe.0005__head_6E9A677B__24
>> 
>> I’ve tried re-issuing a scrub, which informs me that the object info size 
>> (2994176) doesn’t match the on-disk size (0) (see [2]). I’ve tried a repair 
>> operation as well to no avail.
>> 
>> For what it’s worth, this particular cluster is currently migrating several 
>> disks from one CRUSH root to another, and there is a nightly cache 
>> flush/eviction script that is lowering the cache_target_*_ratios before 
>> raising them again in the morning.
>> 
>> This issue is currently affecting ~10 PGs in my cache pool. Any ideas how to 
>> proceed here?
> 
> Did anything come from this? It's tickling my brain (especially with
> the cache pool) but I'm not seeing anything relevant when I search my
> email.
> 
>> 
>> Thanks,
>> Lincoln
>> 
>> [1]:
>> {
>>  "epoch": 721312,
>>  "inconsistents": [
>>{
>>  "object": {
>>"name": "100235614fe.0005",
>>"nspace": "",
>>"locator": "",
>>"snap": "head",
>>"version": 2233551
>>  },
>>  "errors": [],
>>  "union_shard_errors": [
>>"size_m

[ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-01 Thread Lincoln Bryant
Hi all,

I’ve run across a peculiar issue on 10.2.7. On my 3x replicated cache tiering 
cache pool, routine scrubbing suddenly found a bunch of PGs with 
size_mismatch_oi errors. From the “rados list-inconsistent-pg tool”[1], I see 
that all OSDs are reporting size 0 for a particular pg. I’ve checked this pg on 
disk, and it is indeed 0 bytes:
-rw-r--r--  1 root root0 Apr 29 06:12 
100235614fe.0005__head_6E9A677B__24

I’ve tried re-issuing a scrub, which informs me that the object info size 
(2994176) doesn’t match the on-disk size (0) (see [2]). I’ve tried a repair 
operation as well to no avail. 

For what it’s worth, this particular cluster is currently migrating several 
disks from one CRUSH root to another, and there is a nightly cache 
flush/eviction script that is lowering the cache_target_*_ratios before raising 
them again in the morning. 

This issue is currently affecting ~10 PGs in my cache pool. Any ideas how to 
proceed here? 

Thanks,
Lincoln

[1]:
{
  "epoch": 721312,
  "inconsistents": [
{
  "object": {
"name": "100235614fe.0005",
"nspace": "",
"locator": "",
"snap": "head",
"version": 2233551
  },
  "errors": [],
  "union_shard_errors": [
"size_mismatch_oi"
  ],
  "selected_object_info": 
"36:dee65976:::100235614fe.0005:head(737928'2182216 
client.36346283.1:5754260 dirty s 2994176 uv 2233551)",
  "shards": [
{
  "osd": 175,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 244,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 297,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
}
  ]
}
  ]
}

[2]:
2017-05-01 10:50:13.812992 7f0184623700  0 log_channel(cluster) log [INF] : 
36.277b scrub starts
2017-05-01 10:51:02.495229 7f0186e28700 -1 log_channel(cluster) log [ERR] : 
36.277b shard 175: soid 36:dee65976:::100235614fe.0005:head size 0 != size 
2994176 from auth oi 36:dee65976:::100235614fe.0005:head(737928'2182216 
client.36346283.1:5754260 dirty s 2994176 uv 2233551)
2017-05-01 10:51:02.495234 7f0186e28700 -1 log_channel(cluster) log [ERR] : 
36.277b shard 244: soid 36:dee65976:::100235614fe.0005:head size 0 != size 
2994176 from auth oi 36:dee65976:::100235614fe.0005:head(737928'2182216 
client.36346283.1:5754260 dirty s 2994176 uv 2233551)
2017-05-01 10:51:02.495326 7f0186e28700 -1 log_channel(cluster) log [ERR] : 
36.277b shard 297: soid 36:dee65976:::100235614fe.0005:head size 0 != size 
2994176 from auth oi 36:dee65976:::100235614fe.0005:head(737928'2182216 
client.36346283.1:5754260 dirty s 2994176 uv 2233551)
2017-05-01 10:51:02.495328 7f0186e28700 -1 log_channel(cluster) log [ERR] : 
36.277b soid 36:dee65976:::100235614fe.0005:head: failed to pick suitable 
auth object
2017-05-01 10:51:02.495450 7f0186e28700 -1 log_channel(cluster) log [ERR] : 
scrub 36.277b 36:dee65976:::100235614fe.0005:head on disk size (0) does not 
match object info size (2994176) adjusted for ondisk to (2994176)
2017-05-01 10:51:20.223733 7f0184623700 -1 log_channel(cluster) log [ERR] : 
36.277b scrub 4 errors

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to boot OS on cluster node

2017-03-10 Thread Lincoln Bryant
Hi Shain,

As long as you don’t nuke the OSDs or the journals, you should be OK. I think 
the keyring and such are typically stored on the OSD itself. If you have lost 
track of what physical device maps to what OSD, you can always mount the OSDs 
in a temporary spot and cat the “whoami” file.

—Lincoln

> On Mar 10, 2017, at 11:33 AM, Shain Miley  wrote:
> 
> Hello,
> 
> We had an issue with one of our Dell 720xd servers and now the raid card 
> cannot seem to boot from the Ubuntu OS drive volume.
> 
> I would like to know...if I reload the OS...is there an easy way to get the 
> 12 OSD's disks back into the cluster without just having to remove them from 
> the cluster, wipe the drives and then re-add them?
> 
> Right now I have the 'noout' and 'nodown' flags set on the cluster so there 
> has been no data movement yet as a result of this node being down.
> 
> Thanks in advance for any help.
> 
> Shain
> 
> 
> -- 
> NPR | Shain Miley | Manager of Infrastructure, Digital Media | smi...@npr.org 
> | 202.513.3649
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Server Down?

2016-10-12 Thread Lincoln Bryant
Hi Ashwin,

Seems the website is down. From another thread: 
http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
 


I’ve been using the EU mirrors in the meanwhile: http://eu.ceph.com/ 


—Lincoln

> On Oct 12, 2016, at 4:15 PM, Ashwin Dev  wrote:
> 
> Hi,
> 
> I've been working on deploying ceph on a cluster. Looks like some of the main 
> repositories are down today - download.ceph.com . 
> It's been down since morning. 
> 
> Any idea what's happening? When can I expect it to be up?
> 
> Thanks!
> 
> -Ashwin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and calculation of directory size

2016-09-12 Thread Lincoln Bryant
Are you running ‘ls’ or are you doing something like: 'getfattr -d -m 
ceph.dir.* /path/to/your/ceph/mount’ ?

—Lincoln

> On Sep 12, 2016, at 1:00 PM, Ilya Moldovan  wrote:
> 
> Thanks, John
> 
> But why listing files in a directory with about a million files takes
> about 30 minutes?
> 
> Ilya Moldovan
> 
> 2016-09-08 20:59 GMT+03:00, Ilya Moldovan :
>> Hello!
>> 
>> How CephFS calculates the directory size? As I know there is two
>> implementations:
>> 
>> 1. Recursive directory traversal like in EXT4 and NTFS
>> 2. Calculation of the directory size by the file system driver and save it
>> as an attribute. In this case, the driver catches adding, deleting and
>> editing files on the fly and changes the size of the directory. In this
>> case there is no need recursive directory traversal.
>> 
>> The directory which we are requesting a size can potentially contain
>> thousands of files at different levels of nesting.
>> 
>> Our components will call the the directory size using the POSIX API. The
>> number of calls of this attribute is will be high and recursive directory
>> traversal is not suitable for us.
>> 
>> Thanks for the answers!
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 0.94.8 Hammer released

2016-08-30 Thread Lincoln Bryant
Hi all,

We are also interested in EL6 RPMs. My understanding was that EL6 would 
continue to be supported through Hammer. 

Is there anything we can do to help?

Thanks,
Lincoln

> On Aug 29, 2016, at 11:14 AM, Alex Litvak  
> wrote:
> 
> Hammer RPMs for 0.94.8 are still not available for  EL6.  Can this please be 
> addressed ?
> 
> Thank you in advance,
> 
> On 08/27/2016 06:25 PM, alexander.v.lit...@gmail.com wrote:
>> RPMs are not available at the distro side.
>> 
>> On Fri, 26 Aug 2016 21:31:45 + (UTC), Sage Weil
>>  wrote:
>> 
>>> This Hammer point release fixes several bugs.
>>> 
>>> We recommend that all hammer v0.94.x users upgrade.
>>> 
>>> For the changelog, please see
>>> 
>>> http://docs.ceph.com/docs/master/release-notes/#v0-94-8-hammer
>>> 
>>> Getting Ceph
>>> 
>>> 
>>> * Git at git://github.com/ceph/ceph.git
>>> * Tarball at http://download.ceph.com/tarballs/ceph-0.94.8.tar.gz
>>> * For packages, see http://ceph.com/docs/master/install/get-packages
>>> * For ceph-deploy, see 
>>> http://ceph.com/docs/master/install/install-ceph-deploy
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues with CephFS

2016-06-18 Thread Lincoln Bryant

Hi,

Are there any messages in 'dmesg'? Are you running a recent kernel on 
your client?


--Lincoln

On 6/18/2016 6:25 PM, ServerPoint wrote:

Hi Adam,

Thank you !

That worked.

So now I am testing another large cluster.

This is the Ceph status : (I am using Public network so I have put * )
--

root@admin:~/ceph-cluster# ceph -s
cluster 56b6fb46-dc51-4577-90cb-4b3882e82f5c
 health HEALTH_OK
 monmap e1: 3 mons at 
{monitor1=64.*.*.*:6789/0,monitor2=64.*.*.*:6789/0,monitor3=64.*.*.*:6789/0}

election epoch 6, quorum 0,1,2 monitor3,monitor2,monitor1
  fsmap e4: 1/1/1 up {0=monitor1=up:active}
 osdmap e207: 43 osds: 43 up, 43 in
flags sortbitwise
  pgmap v895: 1088 pgs, 3 pools, 2504 bytes data, 20 objects
1674 MB used, 79854 GB / 79855 GB avail
1088 active+clean
-


Now, the mount just hangs. There is no error and no reports, just the 
command hangs

--
~# mount -t ceph 64.*.*.*:6789:/ /mnt/mycephfs -o 
name=admin,secret=AQCOx2VXDjR4LhAALkE0xDeBPbRtQtMK3svuvw==


--

Can you help me figure this out.?

On 6/19/2016 3:28 AM, Adam Tygart wrote:

Responses inline.

On Sat, Jun 18, 2016 at 4:53 PM, ServerPoint 
 wrote:

Hi,

I am trying to setup a Ceph cluster and mount it as CephFS

These are the steps that I followed :
-
ceph-deploy new mon
  ceph-deploy install admin mon node2 node5 node6
  ceph-deploy mon create-initial
   ceph-deploy disk zap  node2:sdb node2:sdc node2:sdd
   ceph-deploy disk zap  node5:sdb node5:sdc node5:sdd
   ceph-deploy disk zap  node6:sdb node6:sdc node6:sdd
   ceph-deploy osd prepare node2:sdb node2:sdc node2:sdd
   ceph-deploy osd prepare node5:sdb node5:sdc node5:sdd
   ceph-deploy osd prepare node6:sdb node6:sdc node6:sdd
   ceph-deploy osd activate node2:/dev/sdb1 node2:/dev/sdc1 
node2:/dev/sdd1
ceph-deploy osd activate node5:/dev/sdb1  node5:/dev/sdc1 
node5:/dev/sdd1
ceph-deploy osd activate node6:/dev/sdb1  node6:/dev/sdc1 
node6:/dev/sdd1

ceph-deploy admin admin mon node2 node5 node6

ceph-deploy mds create mon
ceph osd pool create cephfs_data 100
   ceph osd pool create cephfs_metadata 100
   ceph fs new cephfs cephfs_metadata cephfs_data
--

Health of Cluster is Ok

root@admin:~/ceph-cluster# ceph -s
 cluster 5dfaa36a-45b8-47a2-85c4-3f06f53bcd03
  health HEALTH_OK
  monmap e1: 1 mons at {mon=10.10.0.122:6789/0}

Monitor at 10.10.0.122...


 election epoch 5, quorum 0 mon
   fsmap e15: 1/1/1 up {0=mon=up:active}
  osdmap e60: 9 osds: 9 up, 9 in
 flags sortbitwise
   pgmap v252: 264 pgs, 3 pools, 2068 bytes data, 20 objects
 309 MB used, 3976 GB / 3977 GB avail
  264 active+clean
  -


I then installed ceph on another server to make it as client.
But I am getting the below error while mounting it.

root@node9:~#  mount -t ceph 10.10.0.121:6789:/ /mnt/mycephfs -o
name=admin,secret=AQDhlGVXDhnoGxAAsX7HcOxbrWpSUpSuOTNWBg==
mount: Connection timed out



Mount trying to talk to 10.10.0.121 (mds-server?). The monitors are
the initial point of contact for anything ceph. They will tell the
client where everything else lives.

I tried restarting all the services but with no success. I am stuck 
here.

Please help.

Thanks in advance!


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshoot blocked OSDs

2016-04-28 Thread Lincoln Bryant
OK, a few more questions.

What does the load look like on the OSDs with ‘iostat’ during the rsync?

What version of Ceph? Are you using RBD, CephFS, something else? 

SSD journals or no?

—Lincoln

> On Apr 28, 2016, at 2:53 PM, Andrus, Brian Contractor <bdand...@nps.edu> 
> wrote:
> 
> Lincoln,
>  
> That was the odd thing to me. Ceph health detail listed all 4 OSDs, so I 
> checked all the systems.
> I have since let it settle until it is OK again and started. Within a couple 
> minutes, it started showing blocked requests and they are indeed on all 4 
> OSDs.
>  
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
>  
>  
>   <>
> From: Lincoln Bryant [mailto:linco...@uchicago.edu 
> <mailto:linco...@uchicago.edu>] 
> Sent: Thursday, April 28, 2016 12:31 PM
> To: Andrus, Brian Contractor
> Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Troubleshoot blocked OSDs
>  
> Hi Brian,
>  
> The first thing you can do is “ceph health detail”, which should give you 
> some more information about which OSD(s) have blocked requests.
>  
> If it’s isolated to one OSD in particular, perhaps use iostat to check 
> utilization and/or smartctl to check health. 
>  
> —Lincoln
>  
> On Apr 28, 2016, at 2:26 PM, Andrus, Brian Contractor <bdand...@nps.edu 
> <mailto:bdand...@nps.edu>> wrote:
>  
> All,
>  
> I have a small ceph cluster with 4 OSDs and 3 MONs on 4 systems.
> I was rsyncing about 50TB of files and things get very slow. To the point I 
> stopped the rsync, but even with everything stopped, I see:
>  
> health HEALTH_WARN
> 80 requests are blocked > 32 sec
>  
> The number was as high as 218, but they seem to be draining down.
> I see no issues on any of the systems, CPU load is low, memory usage is low.
>  
> How do I go about finding why a request is blocked for so long? These have 
> been hitting >500 seconds for block time.
>  
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshoot blocked OSDs

2016-04-28 Thread Lincoln Bryant
Hi Brian,

The first thing you can do is “ceph health detail”, which should give you some 
more information about which OSD(s) have blocked requests.

If it’s isolated to one OSD in particular, perhaps use iostat to check 
utilization and/or smartctl to check health. 

—Lincoln

> On Apr 28, 2016, at 2:26 PM, Andrus, Brian Contractor  
> wrote:
> 
> All,
>  
> I have a small ceph cluster with 4 OSDs and 3 MONs on 4 systems.
> I was rsyncing about 50TB of files and things get very slow. To the point I 
> stopped the rsync, but even with everything stopped, I see:
>  
> health HEALTH_WARN
> 80 requests are blocked > 32 sec
>  
> The number was as high as 218, but they seem to be draining down.
> I see no issues on any of the systems, CPU load is low, memory usage is low.
>  
> How do I go about finding why a request is blocked for so long? These have 
> been hitting >500 seconds for block time.
>  
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPHFS file or directories disappear when ls (metadata problem)

2016-03-23 Thread Lincoln Bryant
Hi, 

If you are using the kernel client, I would suggest trying something newer than 
3.10.x. I ran into this issue in the past, but it was fixed by updating my 
kernel to something newer. You may want to check the OS recommendations page as 
well: http://docs.ceph.com/docs/master/start/os-recommendations/ 


ELRepo maintains mainline RPMs for EL6 and EL7: http://elrepo.org/tiki/kernel-ml

Alternatively, you could try the FUSE client.

—Lincoln

> On Mar 23, 2016, at 11:12 AM, FaHui Lin  wrote:
> 
> Dear Ceph experts,
> 
> We meet a nasty problem with our CephFS from time to time:
> 
> When we try to list a directory under CephFS, some files or directories do 
> not show up. For example:
> 
> This is the complete directory content:
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 11 1559018781 Feb  2 07:43 dir-A
> drwxr-xr-x 1 10035 119061906 Apr 15  2015 dir-B
> -rw-r--r-- 1 10035 11  130750361 Aug  6  2015 file-1
> -rw-r--r-- 1 10035 11   72640608 Apr 15  2015 file-2
> 
> But sometimes we get only part of files/directories when listing, say:
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 11 1559018781 Feb  2 07:43 dir-A
> -rw-r--r-- 1 10035 11   72640608 Apr 15  2015 file-2
> Here dir-B & file-1 missing.
> 
> We found the files themselves are still intact since we and still see them on 
> another node mounting the same cephfs, or just at another time. So we think 
> this is a metadata problem.
> 
> One thing we found interesting(?) is that remounting cephfs or restart MDS 
> service will NOT help, but creating a new file under the directory may help:
> 
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 11 1559018781 Feb  2 07:43 dir-A
> -rw-r--r-- 1 10035 11   72640608 Apr 15  2015 file-2
> # touch /cephfs/ies/home/mika/file-tmp
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 11 1559018781 Feb  2 07:43 dir-A
> drwxr-xr-x 1 10035 119061906 Apr 15  2015 dir-B
> -rw-r--r-- 1 10035 11  130750361 Aug  6  2015 file-1
> -rw-r--r-- 1 10035 11   72640608 Apr 15  2015 file-2
> -rw-r--r-- 1 root  root0 Mar 23 15:34 file-tmp
> 
> 
> Strangely, when this happens, ceph cluster health usually shows HEALTH_OK, 
> and there's no significant errors in MDS or other service logs.
> 
> One thing we tried to improve is increasing MDS mds_cache_size to be 160 
> (16x default value), which does help to alleviate warnings like "mds0: Client 
> failing to respond to cache pressure", but still cannot solve the file 
> metadata missing problem.
> 
> Here's our ceph server info:
> 
> # ceph -s
> cluster d15a2cdb-354c-4bcd-a246-23521f1a7122
>  health HEALTH_OK
>  monmap e1: 3 mons at 
> {as-ceph01=117.103.102.128:6789/0,as-ceph02=117.103.103.93:6789/0,as-ceph03=117.103.109.124:6789/0}
> election epoch 6, quorum 0,1,2 as-ceph01,as-ceph02,as-ceph03
>  mdsmap e144: 1/1/1 up {0=as-ceph02=up:active}, 1 up:standby
>  osdmap e178: 10 osds: 10 up, 10 in
> flags sortbitwise
>   pgmap v105168: 256 pgs, 4 pools, 505 GB data, 1925 kobjects
> 1083 GB used, 399 TB / 400 TB avail
>  256 active+clean
>   client io 614 B/s rd, 0 op/s
> 
> # ceph --version
> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> 
> (We also met the same problem on Hammer release)
> 
> # uname -r
> 3.10.0-327.10.1.el7.x86_64
> 
> We're using centOS7 servers.
> 
> # ceph daemon mds.as-ceph02 perf dump
> {
> "mds": {
> "request": 76066,
> "reply": 76066,
> "reply_latency": {
> "avgcount": 76066,
> "sum": 61.151796797
> },
> "forward": 0,
> "dir_fetch": 1050,
> "dir_commit": 1017,
> "dir_split": 0,
> "inode_max": 160,
> "inodes": 130657,
> "inodes_top": 110882,
> "inodes_bottom": 19775,
> "inodes_pin_tail": 0,
> "inodes_pinned": 99670,
> "inodes_expired": 0,
> "inodes_with_caps": 99606,
> "caps": 105119,
> "subtrees": 2,
> "traverse": 81583,
> "traverse_hit": 74090,
> "traverse_forward": 0,
> "traverse_discover": 0,
> "traverse_dir_fetch": 24,
> "traverse_remote_ino": 0,
> "traverse_lock": 80,
> "load_cent": 7606600,
> "q": 0,
> "exported": 0,
> "exported_inodes": 0,
> "imported": 0,
> "imported_inodes": 0
> },
> "mds_cache": {
> "num_strays": 120,
> "num_strays_purging": 0,
> "num_strays_delayed": 0,
> "num_purge_ops": 0,
> "strays_created": 17276,
> "strays_purged": 17155,
> "strays_reintegrated": 1,
> "strays_migrated": 0,
> "num_recovering_processing": 0,
> "num_recovering_enqueued": 0,
> "num_recovering_prioritized": 0,
> "recovery_started": 0,
>   

Re: [ceph-users] CEPH FS - all_squash option equivalent

2016-03-03 Thread Lincoln Bryant
Also very interested in this if there are any docs available!

--Lincoln

> On Mar 3, 2016, at 1:04 PM, Fred Rolland  wrote:
> 
> Can you share a link describing the UID squashing feature?
> 
> On Mar 3, 2016 9:02 PM, "Gregory Farnum"  wrote:
> On Wed, Mar 2, 2016 at 11:22 PM, Fred Rolland  wrote:
> > Thanks for your reply.
> >
> > Server :
> > [root@ceph-1 ~]# rpm -qa | grep ceph
> > ceph-mon-0.94.1-13.el7cp.x86_64
> 
> That would be a Hammer release. Nothing there for doing anything with
> permission checks at all.
> -Greg
> 
> > ceph-radosgw-0.94.1-13.el7cp.x86_64
> > ceph-0.94.1-13.el7cp.x86_64
> > ceph-osd-0.94.1-13.el7cp.x86_64
> > ceph-deploy-1.5.25-1.el7cp.noarch
> > ceph-common-0.94.1-13.el7cp.x86_64
> > [root@ceph-1 ~]# uname -a
> > Linux ceph-1.qa.lab.tlv.redhat.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29
> > 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> >
> > Client:
> > [root@RHEL7 ~]# rpm -qa | grep ceph
> > ceph-fuse-0.94.6-0.el7.x86_64
> > python-cephfs-0.94.6-0.el7.x86_64
> > libcephfs1-0.94.6-0.el7.x86_64
> > ceph-common-0.94.6-0.el7.x86_64
> > ceph-0.94.6-0.el7.x86_64
> >
> > [root@RHEL7 ~]# uname -a
> > Linux RHEL7.1Server 3.10.0-229.26.1.el7.x86_64 #1 SMP Fri Dec 11 16:53:27
> > EST 2015 x86_64 x86_64 x86_64 GNU/Linux
> >
> >
> > [root@RHEL7 ~]# su - sanlock -s /bin/bash
> > Last login: Wed Mar  2 14:06:34 IST 2016 on pts/0
> > -bash-4.2$ whoami
> > sanlock
> > -bash-4.2$ touch /rhev/data-center/mnt/ceph-1.qa.lab\:6789\:_/test
> > touch: cannot touch ‘/rhev/data-center/mnt/ceph-1.qa.lab:6789:_/test’:
> > Permission denied
> >
> >
> > [root@RHEL7 ~]# su - vdsm -s /bin/bash
> > Last login: Wed Mar  2 12:19:11 IST 2016 on pts/1
> > -bash-4.2$ touch /rhev/data-center/mnt/ceph-1.qa.lab\:6789\:_/test
> > -bash-4.2$ rm /rhev/data-center/mnt/ceph-1.qa.lab\:6789\:_/test
> > -bash-4.2$
> >
> > Permissions of directory :
> > ll
> > total 0
> > drwxr-xr-x 1 vdsm kvm 0 Mar  2 14:08 
> >
> >
> >
> > On Wed, Mar 2, 2016 at 6:25 PM, Gregory Farnum  wrote:
> >>
> >> On Wed, Mar 2, 2016 at 4:21 AM, Fred Rolland  wrote:
> >> > Hi,
> >> >
> >> > I am trying to use CEPH FS in oVirt (RHEV).
> >> > The mount is created OK, however, the hypervisor need access to the
> >> > mount
> >> > from different users (eg: vdsm, sanlock)
> >> > It seems that Sanlock user is having permissions issues.
> >> >
> >> > When using NFS, configuring the export as all_squash and defining
> >> > anonuid/anongid will solve this problem [1].
> >> >
> >> > Is there a possibility to configure in Ceph FS an equivalent to NFS
> >> > all_squash/anonuid/anongid ?
> >>
> >> What version of Ceph are you running? Newer versions have added a
> >> security model and include *some* UID squashing features, but prior to
> >> Infernalis, CephFS didn't do any security checking at all (it was all
> >> client-side in the standard VFS).
> >> -Greg
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-10-28 Thread Lincoln Bryant
Hi Dennis,

We're using NFS Ganesha here as well. I can send you my configuration which is 
working but we squash users and groups down to a particular uid/gid, so it may 
not be super helpful for you.

I think files not being immediately visible is working as intended, due to 
directory caching. I _believe_ what you need to do is set the following 
(comments shamelessly stolen from the Gluster FSAL):
# If thuis flag is set to yes, a getattr is performed each time a readdir 
is done
# if mtime do not match, the directory is renewed. This will make the cache 
more
# synchronous to the FSAL, but will strongly decrease the directory cache 
performance
Use_Getattr_Directory_Invalidation = true;

Hope that helps.

Thanks,
Lincoln

> On Oct 28, 2015, at 9:08 AM, Dennis Kramer (DT)  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Sorry for raising this topic from the dead, but i'm having the same
> issues with NFS-GANESHA /w the wrong user/group information.
> 
> Do you maybe have a working ganesha.conf? I'm assuming I might
> mis-configured something in this file. It's also nice to have some
> reference config file from a working FSAL CEPH, the sample config is
> very minimalistic.
> 
> I also have another issue with files that are not immediately visible
> in a NFS folder after another system (using the same NFS) has created
> it. There seems to be a slight delay before all system have the same
> directory listing. This can be enforced by creating a *new* file in
> this directory which will cause a refresh on this folder. Changing
> directories also helps on affected system(s).
> 
> On 07/28/2015 11:30 AM, Haomai Wang wrote:
>> On Tue, Jul 28, 2015 at 5:28 PM, Burkhard Linke 
>>  wrote:
>>> Hi,
>>> 
>>> On 07/28/2015 11:08 AM, Haomai Wang wrote:
 
 On Tue, Jul 28, 2015 at 4:47 PM, Gregory Farnum
  wrote:
> 
> On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke 
>  wrote:
>>> 
>>> 
>>> *snipsnap*
>> 
>> Can you give some details on that issues? I'm currently
>> looking for a way to provide NFS based access to CephFS to
>> our desktop machines.
> 
> Ummm...sadly I can't; we don't appear to have any tracker
> tickets and I'm not sure where the report went to. :( I think
> it was from Haomai...
 
 My fault, I should report this to ticket.
 
 I have forgotten the details about the problem, I submit the
 infos to IRC :-(
 
 It related to the "ls" output. It will print the wrong
 user/group owner as "-1", maybe related to root squash?
>>> 
>>> Are you sure this problem is related to the CephFS FSAL? I also
>>> had a hard time setting up ganesha correctly, especially with
>>> respect to user and group mappings, especially with a kerberized
>>> setup.
>>> 
>>> I'm currently running a small test setup with one server and one
>>> client to single out the last kerberos related problems
>>> (nfs-ganesha 2.2.0 / Ceph Hammer 0.94.2 / Ubuntu 14.04).
>>> User/group listings have been OK so far. Do you remember whether
>>> the problem occurs every time or just arbitrarily?
>>> 
>> 
>> Great!
>> 
>> I'm not sure the reason. I guess it may related to nfs-ganesha
>> version or client distro version.
>> 
>>> Best regards, Burkhard 
>>> ___ ceph-users
>>> mailing list ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.20 (GNU/Linux)
> 
> iEYEARECAAYFAlYw1vgACgkQiJDTKUBxIRsrMACggkb1IZw7od43s9AFUMznwP6M
> hW4AoJf2O11uM0F20TQwFJKPt76YcwhW
> =PKLQ
> -END PGP SIGNATURE-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why was osd pool default size changed from 2 to 3.

2015-10-26 Thread Lincoln Bryant
>but because there were only two copies it had no way to tell which one was 
>correct, and when I forced it to choose it often chose wrong.

Yeah. This is a BIG problem with only running with two copies. Good luck if 
your pgs ever get inconsistent :)

--Lincoln

> On Oct 26, 2015, at 10:41 AM, Quentin Hartman  
> wrote:
> 
> TL;DR - Running two copies in my cluster cost me a weekend, and many more 
> hours of productive time during normal working hours. Networking problems can 
> be just as destructive as disk problems. I only run 2 copies on throwaway 
> data.
> 
> So, I have personal experience in data loss when running only two copies. I 
> had a networking problem in my ceph cluster, and it took me a long time to 
> track it down because it was an intermittent thing that caused the node with 
> the faulty connection to not only get marked out by it's peers, but also 
> caused it to incorrectly mark out other nodes. It was a mess, that I made 
> worse by trying to force recovery before I really knew what the problem was 
> since it was so elusive.
> 
> In the end, the cluster tried to do recovery on PGs that had gotten degraded, 
> but because there were only two copies it had no way to tell which one was 
> correct, and when I forced it to choose it often chose wrong. All of the data 
> was VM images, so in the end, I ended up having small bits of random 
> corruption across almost all my VMs. It took me about 40 hours of work over a 
> weekend to get things recovered (onto spare desktop machines since I still 
> hadn't found the problem and didn't trust the cluster) and rebuilt to make 
> sure that people could work on monday, and I was cleaning up little bits of 
> leftover mess for weeks. Once I finally found and repaired the problem, it 
> was another several days worth of work to get the cluster rebuilt and the VMs 
> migrated back onto it. Never will I run only two copies on things I actually 
> care about ever again, regardless of the quality of the underlying disk 
> hardware. In my case, the disks were fine all along.
> 
> QH
> 
> On Sat, Oct 24, 2015 at 8:35 AM, Christian Balzer  wrote:
> 
> 
> Hello,
> 
> There have been COUNTLESS discussions about Ceph reliability, fault
> tolerance and so forth in this very ML.
> Google is very much evil, but in this case it is your friend.
> 
> In those threads you will find several reliability calculators, some more
> flawed than others, but penultimately you do not use a replica of 2 for
> the same reasons people don't use RAID5 for anything valuable.
> 
> A replication of 2 MAY be fine with very reliable, fast and not too large
> SSDs, but that's about it.
> Spinning rust is never safe with just one copy.
> 
> Christian
> 
> On Sat, 24 Oct 2015 09:41:35 +0200 Stefan Eriksson wrote:
> 
> > > Am 23.10.2015 um 20:53 schrieb Gregory Farnum:
> > >> On Fri, Oct 23, 2015 at 8:17 AM, Stefan Eriksson 
> > wrote:
> > >>
> > >> Nothing changed to make two copies less secure. 3 copies is just so
> > >> much more secure and is the number that all the companies providing
> > >> support recommend, so we changed the default.
> > >> (If you're using it for data you care about, you should really use 3
> > copies!)
> > >> -Greg
> > >
> > > I assume that number really depends on the (number of) OSDs you have in
> > your crush rule for that pool. A replication of
> > > 2 might be ok for a pool spread over 10 osds, but not for one spread
> > > over
> > 100 osds
> > >
> > > Corin
> > >
> >
> > I'm also interested in this, what changes when you add 100+ OSDs (to
> > warrant 3 replicas instead of 2), and the reasoning as to why "the
> > companies providing support recommend 3." ?
> > Theoretically it seems secure to have two replicas.
> > If you have 100+ OSDs, I can see that maintenance will take much longer,
> > and if you use "set noout" then a single PG will be active when the other
> > replica is under maintenance.
> > But if you "crush reweight to 0" before the maintenance this would not be
> > an issue.
> > Is this the main reason?
> >
> > From what I can gather even if you add new OSDs to the cluster and the
> > balancing kicks in, it still maintains its two replicas.
> >
> > thanks.
> 
> 
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS "corruption" -- Nulled bytes

2015-10-08 Thread Lincoln Bryant
Hi Sage,

Will this patch be in 0.94.4? We've got the same problem here.

-Lincoln

> On Oct 8, 2015, at 12:11 AM, Sage Weil  wrote:
> 
> On Wed, 7 Oct 2015, Adam Tygart wrote:
>> Does this patch fix files that have been corrupted in this manner?
> 
> Nope, it'll only prevent it from happening to new files (that haven't yet 
> been migrated between the cache and base tier).
> 
>> If not, or I guess even if it does, is there a way to walk the
>> metadata and data pools and find objects that are affected?
> 
> Hmm, this may actually do the trick.. find a file that appears to be 
> zeroed, and do truncate it up and then down again.  For example, of foo is 
> 100 bytes, do
> 
> truncate --size 101 foo
> truncate --size 100 foo
> 
> then unmount and remound the client and see if the content reappears.
> 
> Assuming that works (it did in my simple test) it'd be pretty easy to 
> write something that walks the tree and does the truncate trick for any 
> file whose first however many bytes are 0 (though it will mess up 
> mtime...).
> 
>> Is that '_' xattr in hammer? If so, how can I access it? Doing a
>> listxattr on the inode just lists 'parent', and doing the same on the
>> parent directory's inode simply lists 'parent'.
> 
> This is the file in /var/lib/ceph/osd/ceph-NNN/current.  For example,
> 
> $ attr -l ./3.0_head/100.__head_F0B56F30__3
> Attribute "cephos.spill_out" has a 2 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "cephos.seq" has a 23 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "ceph._" has a 250 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "ceph._@1" has a 5 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "ceph.snapset" has a 31 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> 
> ...but hopefully you won't need to touch any of that ;)
> 
> sage
> 
> 
>> 
>> Thanks for your time.
>> 
>> --
>> Adam
>> 
>> 
>> On Mon, Oct 5, 2015 at 9:36 AM, Sage Weil  wrote:
>>> On Mon, 5 Oct 2015, Adam Tygart wrote:
 Okay, this has happened several more times. Always seems to be a small
 file that should be read-only (perhaps simultaneously) on many
 different clients. It is just through the cephfs interface that the
 files are corrupted, the objects in the cachepool and erasure coded
 pool are still correct. I am beginning to doubt these files are
 getting a truncation request.
>>> 
>>> This is still consistent with the #12551 bug.  The object data is correct,
>>> but the cephfs truncation metadata on the object is wrong, causing it to
>>> be implicitly zeroed out on read.  It's easily triggered by writers who
>>> use O_TRUNC on open...
>>> 
 Twice now have been different perl files, once was someones .bashrc,
 once was an input file for another application, timestamps on the
 files indicate that the files haven't been modified in weeks.
 
 Any other possibilites? Or any way to figure out what happened?
>>> 
>>> You can confirm by extracting the '_' xattr on the object (append any @1
>>> etc fragments) and feeding it to ceph-dencoder with
>>> 
>>> ceph-dencoder type object_info_t import  decode 
>>> dump_json
>>> 
>>> and confirming that truncate_seq is 0, and verifying that the truncate_seq
>>> on the read request is non-zero.. you'd need to turn up the osd logs with
>>> debug ms = 1 and look for the osd_op that looks like "read 0~$length
>>> [$truncate_seq@$truncate_size]" (with real values in there).
>>> 
>>> ...but it really sounds like you're hitting the bug.  Unfortunately
>>> the fix is not backported to hammer just yet.  You can follow
>>>http://tracker.ceph.com/issues/13034
>>> 
>>> sage
>>> 
>>> 
>>> 
 
 --
 Adam
 
 On Sun, Sep 27, 2015 at 10:44 PM, Adam Tygart  wrote:
> I've done some digging into cp and mv's semantics (from coreutils). If
> the inode is existing, the file will get truncated, then data will get
> copied in. This is definitely within the scope of the bug above.
> 
> --
> Adam
> 
> On Fri, Sep 25, 2015 at 8:08 PM, Adam Tygart  wrote:
>> It may have been. Although the timestamp on the file was almost a
>> month ago. The typical workflow for this particular file is to copy an
>> updated version overtop of it.
>> 
>> i.e. 'cp qss kstat'
>> 
>> I'm not sure if cp semantics would keep the same inode and simply
>> truncate/overwrite the contents, or if it would do an unlink and then
>> create a new file.
>> --
>> Adam
>> 
>> On Fri, Sep 25, 2015 at 8:00 PM, Ivo Jimenez  wrote:
>>> Looks like you might be experiencing this bug:
>>> 
>>>  http://tracker.ceph.com/issues/12551
>>> 
>>> Fix has been merged to master and I believe it'll be part of 
>>> infernalis. The

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Hello again,

Well, I disabled offloads on the NIC -- didn’t work for me. I also tried 
setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested elsewhere in the thread 
to no avail.

Today I was watching iostat on an OSD box ('iostat -xm 5') when the cluster got 
into “slow” state:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdb   0.0013.57   84.23  167.47 0.45 2.7826.26 
2.068.18   3.85  96.93
sdc   0.0046.715.59  289.22 0.03 2.5417.85 
3.18   10.77   0.97  28.72
sdd   0.0016.57   45.11   91.62 0.25 0.5512.01 
0.755.51   2.45  33.47
sde   0.0013.576.99  143.31 0.03 2.5334.97 
1.99   13.27   2.12  31.86
sdf   0.0018.764.99  158.48 0.10 1.0914.88 
1.267.69   1.24  20.26
sdg   0.0025.55   81.64  237.52 0.44 2.8921.36 
4.14   12.99   2.58  82.22
sdh   0.0089.42   16.17  492.42 0.09 3.8115.69
17.12   33.66   0.73  36.95
sdi   0.0020.16   17.76  189.62 0.10 1.6717.46 
3.45   16.63   1.57  32.55
sdj   0.0031.540.00  185.23 0.00 1.9121.15 
3.33   18.00   0.03   0.62
sdk   0.0026.152.40  133.33 0.01 0.8412.79 
1.077.87   0.85  11.58
sdl   0.0025.559.38  123.95 0.05 1.1518.44 
0.503.74   1.58  21.10
sdm   0.00 6.39   92.61   47.11 0.47 0.2610.65 
1.279.07   6.92  96.73

The %util is rather high on some disks, but I’m not an expert at looking at 
iostat so I’m not sure how worrisome this is. Does anything here stand out to 
anyone? 

At the time of that iostat, Ceph was reporting a lot of blocked ops on the OSD 
associated with sde (as well as about 30 other OSDs), but it doesn’t look all 
that busy. Some simple ‘dd’ tests seem to indicate the disk is fine.

Similarly, iotop seems OK on this host:

  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IO>COMMAND
472477 be/4 root0.00 B/s5.59 M/s  0.00 %  0.57 % ceph-osd -i 111 
--pid-file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph
470621 be/4 root0.00 B/s   10.09 M/s  0.00 %  0.40 % ceph-osd -i 111 
--pid-file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph
3495447 be/4 root0.00 B/s  272.19 K/s  0.00 %  0.36 % ceph-osd -i 114 
--pid-file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph
3488389 be/4 root0.00 B/s  596.80 K/s  0.00 %  0.16 % ceph-osd -i 109 
--pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster ceph
3488060 be/4 root0.00 B/s  600.83 K/s  0.00 %  0.15 % ceph-osd -i 108 
--pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph
3505573 be/4 root0.00 B/s  528.25 K/s  0.00 %  0.10 % ceph-osd -i 119 
--pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster ceph
3495434 be/4 root0.00 B/s2.02 K/s  0.00 %  0.10 % ceph-osd -i 114 
--pid-file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph
3502327 be/4 root0.00 B/s  506.07 K/s  0.00 %  0.09 % ceph-osd -i 118 
--pid-file /var/run/ceph/osd.118.pid -c /etc/ceph/ceph.conf --cluster ceph
3489100 be/4 root0.00 B/s  106.86 K/s  0.00 %  0.09 % ceph-osd -i 110 
--pid-file /var/run/ceph/osd.110.pid -c /etc/ceph/ceph.conf --cluster ceph
3496631 be/4 root0.00 B/s  229.85 K/s  0.00 %  0.05 % ceph-osd -i 115 
--pid-file /var/run/ceph/osd.115.pid -c /etc/ceph/ceph.conf --cluster ceph
3505561 be/4 root0.00 B/s2.02 K/s  0.00 %  0.03 % ceph-osd -i 119 
--pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster ceph
3488059 be/4 root0.00 B/s2.02 K/s  0.00 %  0.03 % ceph-osd -i 108 
--pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph
3488391 be/4 root   46.37 K/s  431.47 K/s  0.00 %  0.02 % ceph-osd -i 109 
--pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster ceph
3500639 be/4 root0.00 B/s  221.78 K/s  0.00 %  0.02 % ceph-osd -i 117 
--pid-file /var/run/ceph/osd.117.pid -c /etc/ceph/ceph.conf --cluster ceph
3488392 be/4 root   34.28 K/s  185.49 K/s  0.00 %  0.02 % ceph-osd -i 109 
--pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster ceph
3488062 be/4 root4.03 K/s   66.54 K/s  0.00 %  0.02 % ceph-osd -i 108 
--pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph

These are all 6TB seagates in single-disk RAID 0 on a PERC H730 Mini controller.

I did try removing the disk with 20k non-medium errors, but that didn’t seem to 
help. 

Thanks for any insight!

Cheers,
Lincoln Bryant

> On Sep 9, 2015, at 1:09 PM, Lincoln Bryant <linco...@uchicago.edu> wrote:
> 
> Hi Jan,
> 
> I’ll take a loo

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Hi Nick,

Thanks for responding. Yes, I am.

—Lincoln

> On Sep 17, 2015, at 11:53 AM, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> You are getting a fair amount of reads on the disks whilst doing these 
> writes. You're not using cache tiering are you?
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Lincoln Bryant
>> Sent: 17 September 2015 17:42
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance :: Ops
>> are blocked
>> 
>> Hello again,
>> 
>> Well, I disabled offloads on the NIC -- didn’t work for me. I also tried 
>> setting
>> net.ipv4.tcp_moderate_rcvbuf = 0 as suggested elsewhere in the thread to
>> no avail.
>> 
>> Today I was watching iostat on an OSD box ('iostat -xm 5') when the cluster
>> got into “slow” state:
>> 
>> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
>> avgqu-sz
>> await  svctm  %util
>> sdb   0.0013.57   84.23  167.47 0.45 2.7826.26   
>>   2.068.18   3.85
>> 96.93
>> sdc   0.0046.715.59  289.22 0.03 2.5417.85   
>>   3.18   10.77   0.97
>> 28.72
>> sdd   0.0016.57   45.11   91.62 0.25 0.5512.01   
>>   0.755.51   2.45
>> 33.47
>> sde   0.0013.576.99  143.31 0.03 2.5334.97   
>>   1.99   13.27   2.12
>> 31.86
>> sdf   0.0018.764.99  158.48 0.10 1.0914.88   
>>   1.267.69   1.24
>> 20.26
>> sdg   0.0025.55   81.64  237.52 0.44 2.8921.36   
>>   4.14   12.99   2.58
>> 82.22
>> sdh   0.0089.42   16.17  492.42 0.09 3.8115.69   
>>  17.12   33.66   0.73
>> 36.95
>> sdi   0.0020.16   17.76  189.62 0.10 1.6717.46   
>>   3.45   16.63   1.57
>> 32.55
>> sdj   0.0031.540.00  185.23 0.00 1.9121.15   
>>   3.33   18.00   0.03
>> 0.62
>> sdk   0.0026.152.40  133.33 0.01 0.8412.79   
>>   1.077.87   0.85
>> 11.58
>> sdl   0.0025.559.38  123.95 0.05 1.1518.44   
>>   0.503.74   1.58
>> 21.10
>> sdm   0.00 6.39   92.61   47.11 0.47 0.2610.65   
>>   1.279.07   6.92
>> 96.73
>> 
>> The %util is rather high on some disks, but I’m not an expert at looking at
>> iostat so I’m not sure how worrisome this is. Does anything here stand out to
>> anyone?
>> 
>> At the time of that iostat, Ceph was reporting a lot of blocked ops on the 
>> OSD
>> associated with sde (as well as about 30 other OSDs), but it doesn’t look all
>> that busy. Some simple ‘dd’ tests seem to indicate the disk is fine.
>> 
>> Similarly, iotop seems OK on this host:
>> 
>>  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IO>COMMAND
>> 472477 be/4 root0.00 B/s5.59 M/s  0.00 %  0.57 % ceph-osd -i 111 
>> --pid-
>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 470621 be/4 root0.00 B/s   10.09 M/s  0.00 %  0.40 % ceph-osd -i 111 
>> --pid-
>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3495447 be/4 root0.00 B/s  272.19 K/s  0.00 %  0.36 % ceph-osd -i 
>> 114 --
>> pid-file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3488389 be/4 root 0.00 B/s  596.80 K/s  0.00 %  0.16 % ceph-osd -i 109 --
>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3488060 be/4 root0.00 B/s  600.83 K/s  0.00 %  0.15 % ceph-osd -i 
>> 108 --
>> pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3505573 be/4 root0.00 B/s  528.25 K/s  0.00 %  0.10 % ceph-osd -i 
>> 119 --
>> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3495434 be/4 root0.00 B/s2.02 K/s  0.00 %  0.10 % ceph-osd -i 
>> 114 --pid-
>> file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3502327 be/4 root0.00 B/s  506.07 K/s  0.00 %  0.09 % ceph-osd -i 
>> 118 --
>> pid-file /var/run/ceph/osd.118.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3489100 be/4 root0.00 B/s  106.86 K/s  0.00 %  0.09 % ceph-osd -i 
>> 110 --
>> pid-file /var/run/ceph/osd.110.pid -c /etc/ceph/ceph.conf --cluster ceph
>> 3496631 be/4 root0.00 B/s  229.

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Just a small update — the blocked ops did disappear after doubling the 
target_max_bytes. We’ll see if it sticks! I’ve thought I’ve solved this blocked 
ops problem about 10 times now :)

Assuming this is the issue, is there any workaround for this problem (or is it 
working as intended)? (Should I set up a cron to run cache-try-flush-evict-all 
every night? :))

Another curious thing is that a rolling restart of all OSDs also seems to fix 
the problem — for a time. I’m not sure how that would fit in if this is the 
problem.

—Lincoln

> On Sep 17, 2015, at 12:07 PM, Lincoln Bryant <linco...@uchicago.edu> wrote:
> 
> We have CephFS utilizing a cache tier + EC backend. The cache tier and ec 
> pool sit on the same spinners — no SSDs. Our cache tier has a 
> target_max_bytes of 5TB and the total storage is about 1PB. 
> 
> I do have a separate test pool with 3x replication and no cache tier, and I 
> still see significant performance drops and blocked ops with no/minimal 
> client I/O from CephFS. Right now I have 530 blocked ops with 20MB/s of 
> client write I/O and no active scrubs. The rados bench on my test pool looks 
> like this:
> 
>  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>0   0 0 0 0 0 - 0
>1  319463   251.934   252   0.31017  0.217719
>2  31   10372   143.96936  0.978544  0.260631
>3  31   10372   95.9815 0 -  0.260631
>4  31   11180   79.985616   2.29218  0.476458
>5  31   11281   64.7886 42.5559   0.50213
>6  31   11281   53.9905 0 -   0.50213
>7  31   11584   47.9917 6   3.71826  0.615882
>8  31   11584   41.9928 0 -  0.615882
>9  31   1158437.327 0 -  0.615882
>   10  31   11786   34.3942   2.7   6.73678  0.794532
> 
> I’m really leaning more toward it being a weird controller/disk problem. 
> 
> As a test, I suppose I could double the target_max_bytes, just so the cache 
> tier stops evicting while client I/O is writing?
> 
> —Lincoln
> 
>> On Sep 17, 2015, at 11:59 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>> 
>> Ah rightthis is where it gets interesting.
>> 
>> You are probably hitting a cache full on a PG somewhere which is either 
>> making everything wait until it flushes or something like that. 
>> 
>> What cache settings have you got set?
>> 
>> I assume you have SSD's for the cache tier? Can you share the size of the 
>> pool.
>> 
>> If possible could you also create a non tiered test pool and do some 
>> benchmarks on that to rule out any issue with the hardware and OSD's.
>> 
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> Lincoln Bryant
>>> Sent: 17 September 2015 17:54
>>> To: Nick Fisk <n...@fisk.me.uk>
>>> Cc: ceph-users@lists.ceph.com
>>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance :: Ops
>>> are blocked
>>> 
>>> Hi Nick,
>>> 
>>> Thanks for responding. Yes, I am.
>>> 
>>> —Lincoln
>>> 
>>>> On Sep 17, 2015, at 11:53 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>>>> 
>>>> You are getting a fair amount of reads on the disks whilst doing these
>>> writes. You're not using cache tiering are you?
>>>> 
>>>>> -Original Message-
>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>>>>> Of Lincoln Bryant
>>>>> Sent: 17 September 2015 17:42
>>>>> To: ceph-users@lists.ceph.com
>>>>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance ::
>>>>> Ops are blocked
>>>>> 
>>>>> Hello again,
>>>>> 
>>>>> Well, I disabled offloads on the NIC -- didn’t work for me. I also
>>>>> tried setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested elsewhere
>>>>> in the thread to no avail.
>>>>> 
>>>>> Today I was watching iostat on an OSD box ('iostat -xm 5') when the
>>>>> cluster got into “slow” state:
>>>>> 
>>>>> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
>>>>> avgrq-sz avgqu-
>>> sz
>>>>> await  svctm  %util
>>>>> sdb   0.0013.57

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
We have CephFS utilizing a cache tier + EC backend. The cache tier and ec pool 
sit on the same spinners — no SSDs. Our cache tier has a target_max_bytes of 
5TB and the total storage is about 1PB. 

I do have a separate test pool with 3x replication and no cache tier, and I 
still see significant performance drops and blocked ops with no/minimal client 
I/O from CephFS. Right now I have 530 blocked ops with 20MB/s of client write 
I/O and no active scrubs. The rados bench on my test pool looks like this:

  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
0   0 0 0 0 0 - 0
1  319463   251.934   252   0.31017  0.217719
2  31   10372   143.96936  0.978544  0.260631
3  31   10372   95.9815 0 -  0.260631
4  31   11180   79.985616   2.29218  0.476458
5  31   11281   64.7886 42.5559   0.50213
6  31   11281   53.9905 0 -   0.50213
7  31   11584   47.9917 6   3.71826  0.615882
8  31   11584   41.9928 0 -  0.615882
9  31   1158437.327 0 -  0.615882
   10  31   11786   34.3942   2.7   6.73678  0.794532

I’m really leaning more toward it being a weird controller/disk problem. 

As a test, I suppose I could double the target_max_bytes, just so the cache 
tier stops evicting while client I/O is writing?

—Lincoln

> On Sep 17, 2015, at 11:59 AM, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> Ah rightthis is where it gets interesting.
> 
> You are probably hitting a cache full on a PG somewhere which is either 
> making everything wait until it flushes or something like that. 
> 
> What cache settings have you got set?
> 
> I assume you have SSD's for the cache tier? Can you share the size of the 
> pool.
> 
> If possible could you also create a non tiered test pool and do some 
> benchmarks on that to rule out any issue with the hardware and OSD's.
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Lincoln Bryant
>> Sent: 17 September 2015 17:54
>> To: Nick Fisk <n...@fisk.me.uk>
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance :: Ops
>> are blocked
>> 
>> Hi Nick,
>> 
>> Thanks for responding. Yes, I am.
>> 
>> —Lincoln
>> 
>>> On Sep 17, 2015, at 11:53 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>>> 
>>> You are getting a fair amount of reads on the disks whilst doing these
>> writes. You're not using cache tiering are you?
>>> 
>>>> -Original Message-
>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>>>> Of Lincoln Bryant
>>>> Sent: 17 September 2015 17:42
>>>> To: ceph-users@lists.ceph.com
>>>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance ::
>>>> Ops are blocked
>>>> 
>>>> Hello again,
>>>> 
>>>> Well, I disabled offloads on the NIC -- didn’t work for me. I also
>>>> tried setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested elsewhere
>>>> in the thread to no avail.
>>>> 
>>>> Today I was watching iostat on an OSD box ('iostat -xm 5') when the
>>>> cluster got into “slow” state:
>>>> 
>>>> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
>>>> avgqu-
>> sz
>>>> await  svctm  %util
>>>> sdb   0.0013.57   84.23  167.47 0.45 2.7826.26 
>>>> 2.068.18
>> 3.85
>>>> 96.93
>>>> sdc   0.0046.715.59  289.22 0.03 2.5417.85 
>>>> 3.18   10.77
>> 0.97
>>>> 28.72
>>>> sdd   0.0016.57   45.11   91.62 0.25 0.5512.01 
>>>> 0.755.51
>> 2.45
>>>> 33.47
>>>> sde   0.0013.576.99  143.31 0.03 2.5334.97 
>>>> 1.99   13.27
>> 2.12
>>>> 31.86
>>>> sdf   0.0018.764.99  158.48 0.10 1.0914.88 
>>>> 1.267.69   1.24
>>>> 20.26
>>>> sdg   0.0025.55   81.64  237.52 0.44 2.8921.36 
>>>> 4.14   12.99
>> 2.58
>>>> 82.22
>>>> sdh   0.0089.42   16.17  492.42 0.09

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Hi Nick,

Thanks for the detailed response and insight. SSDs are indeed definitely on the 
to-buy list. 

I will certainly try to rule out any hardware issues in the meantime.

Cheers,
Lincoln

> On Sep 17, 2015, at 12:53 PM, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> It's probably helped but I fear that your overall design is not going to work 
> well for you. Cache Tier + Base tier + journals on the same disks is going to 
> really hurt.
> 
> The problem when using cache tiering (especially with EC pools in future 
> releases) is that to modify a block that isn't in the cache tier you have to 
> promote it 1st, which often kicks another block out the cache.
> 
> So worse case you could have for a single write
> 
> R from EC -> W to CT + jrnl W -> W actual data to CT + jrnl W -> R from CT -> 
> W to EC + jrnl W
> 
> Plus any metadata updates. Either way you looking at probably somewhere near 
> a 10x write amplification for 4MB writes, which will quickly overload your 
> disks leading to very slow performance. Smaller IO's would still cause 4MB 
> blocks to be shifted between pools. What makes it worse is that these 
> promotions/evictions tend to happen to hot PG's and not spread round the 
> whole cluster meaning that a single hot OSD can hold up writes across the 
> whole pool.
> 
> I know it's not what you want to hear, but I can't think of anything you can 
> do to help in this instance unless you are willing to get some SSD journals 
> and maybe move the Cache pool on to separate disks or SSD's. Basically try 
> and limit the amount of random IO the disks have to do.
> 
> Of course please do try and find a time to stop all IO and then run the test 
> on the test 3 way pool, to rule out any hardware/OS issues. 
> 
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Lincoln Bryant
>> Sent: 17 September 2015 18:36
>> To: Nick Fisk <n...@fisk.me.uk>
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Ceph cluster NO read / write performance :: Ops
>> are blocked
>> 
>> Just a small update — the blocked ops did disappear after doubling the
>> target_max_bytes. We’ll see if it sticks! I’ve thought I’ve solved this 
>> blocked
>> ops problem about 10 times now :)
>> 
>> Assuming this is the issue, is there any workaround for this problem (or is 
>> it
>> working as intended)? (Should I set up a cron to run 
>> cache-try-flush-evict-all
>> every night? :))
>> 
>> Another curious thing is that a rolling restart of all OSDs also seems to 
>> fix the
>> problem — for a time. I’m not sure how that would fit in if this is the
>> problem.
>> 
>> —Lincoln
>> 
>>> On Sep 17, 2015, at 12:07 PM, Lincoln Bryant <linco...@uchicago.edu>
>> wrote:
>>> 
>>> We have CephFS utilizing a cache tier + EC backend. The cache tier and ec
>> pool sit on the same spinners — no SSDs. Our cache tier has a
>> target_max_bytes of 5TB and the total storage is about 1PB.
>>> 
>>> I do have a separate test pool with 3x replication and no cache tier, and I
>> still see significant performance drops and blocked ops with no/minimal
>> client I/O from CephFS. Right now I have 530 blocked ops with 20MB/s of
>> client write I/O and no active scrubs. The rados bench on my test pool looks
>> like this:
>>> 
>>> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>>>   0   0 0 0 0 0 - 0
>>>   1  319463   251.934   252   0.31017  0.217719
>>>   2  31   10372   143.96936  0.978544  0.260631
>>>   3  31   10372   95.9815 0 -  0.260631
>>>   4  31   11180   79.985616   2.29218  0.476458
>>>   5  31   11281   64.7886 42.5559   0.50213
>>>   6  31   11281   53.9905 0 -   0.50213
>>>   7  31   11584   47.9917 6   3.71826  0.615882
>>>   8  31   11584   41.9928 0 -  0.615882
>>>   9  31   1158437.327 0 -  0.615882
>>>  10  31   11786   34.3942   2.7   6.73678  0.794532
>>> 
>>> I’m really leaning more toward it being a weird controller/disk problem.
>>> 
>>> As a test, I suppose I could double the target_max_bytes, just so the cache
>> tier stops evicting while client I/O is writing?
>>> 
>>> —Linco

Re: [ceph-users] Straw2 kernel version?

2015-09-10 Thread Lincoln Bryant
Hi Robert,

I believe kernel versions 4.1 and beyond support straw2.

—Lincoln

> On Sep 10, 2015, at 1:43 PM, Robert LeBlanc  wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Has straw2 landed in the kernel and if so which version?
> 
> Thanks,
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.0.2
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJV8c9fCRDmVDuy+mK58QAAKOoP/ibMriwPzqlY0ow1N36V
> OX1wg+6r3nQRyGglvKVi9cmpPrgnlTZxPVv0KRr8xocRBrPYI//hob6qEVWH
> hvaUVg5PDbgQRGi4GNWP8oY0VR7rYxjQAys3c+Mo9LSs1ZmgygIxmuNSGR1w
> g3BCHJjBnSvrQ+NzDuIsaSnxAWCQKIJgMSmlOa0Pieqq4lXJDTNAdRILDOMn
> eAuJcXZqq2Ll8axQnl8ymIRvq9aZ/TQi+q0lqJ/wgAkO/coZm/18HmMa/VI0
> 1/8rZTG0Jy4lgxny5VB1OjAZLMGnKfPyKs8bvQeksNBhMhIZVeFrZ5JHQC3f
> 4VsmAnTtDxD7RSEhlVy66kBMmdOlU6PhlSWZQ0OmLgHotX8HC9TJAq2I18yJ
> ggk4mNkpcZwTz4PagjeEtST8/s1OIEjX4e9lh5u9einFv6mCxUMWT7bQwzFd
> SImx589rjXLyZjdDtXsPZxN1G2Qi4HnlgKnkC44mx4soypo2sDFFmtv6YeWJ
> e0Nr8RvFmKhPPgc71R1po9ZTOMIh3aBfMehvsAueVE8AhBZl8lvQyatAqYES
> S7dcuhVATS4gfkEv4XWR1MVhvLDYP3l/I1H32cp5mh43BCT/DpSHvyfr0lhb
> dxBlfSY/GYLFMGxbG73DFZO3S9o85nz2vma90rsS6AGx/oJOsJYUnXKcvUzL
> Qgep
> =e4Wy
> -END PGP SIGNATURE-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hammer reduce recovery impact

2015-09-10 Thread Lincoln Bryant

On 9/10/2015 5:39 PM, Lionel Bouton wrote:
For example deep-scrubs were a problem on our installation when at 
times there were several going on. We implemented a scheduler that 
enforces limits on simultaneous deep-scrubs and these problems are gone.


Hi Lionel,

Out of curiosity, how many was "several" in your case?

Cheers,
Lincoln
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-09 Thread Lincoln Bryant
Hi Jan,

I’ll take a look at all of those things and report back (hopefully :))

I did try setting all of my OSDs to writethrough instead of writeback on the 
controller, which was significantly more consistent in performance (from 
1100MB/s down to 300MB/s, but still occasionally dropping to 0MB/s). Still 
plenty of blocked ops. 

I was wondering if not-so-nicely failing OSD(s) might be the cause. My 
controller (PERC H730 Mini) seems frustratingly terse with SMART information, 
but at least one disk has a “Non-medium error count” of over 20,000..

I’ll try disabling offloads as well. 

Thanks much for the suggestions!

Cheers,
Lincoln

> On Sep 9, 2015, at 3:59 AM, Jan Schermer <j...@schermer.cz> wrote:
> 
> Just to recapitulate - the nodes are doing "nothing" when it drops to zero? 
> Not flushing something to drives (iostat)? Not cleaning pagecache (kswapd and 
> similiar)? Not out of any type of memory (slab, min_free_kbytes)? Not network 
> link errors, no bad checksums (those are hard to spot, though)?
> 
> Unless you find something I suggest you try disabling offloads on the NICs 
> and see if the problem goes away.
> 
> Jan
> 
>> On 08 Sep 2015, at 18:26, Lincoln Bryant <linco...@uchicago.edu> wrote:
>> 
>> For whatever it’s worth, my problem has returned and is very similar to 
>> yours. Still trying to figure out what’s going on over here.
>> 
>> Performance is nice for a few seconds, then goes to 0. This is a similar 
>> setup to yours (12 OSDs per box, Scientific Linux 6, Ceph 0.94.3, etc)
>> 
>> 384  16 29520 29504   307.287  1188 0.0492006  0.208259
>> 385  16 29813 29797   309.532  1172 0.0469708  0.206731
>> 386  16 30105 30089   311.756  1168 0.0375764  0.205189
>> 387  16 30401 30385   314.009  1184  0.036142  0.203791
>> 388  16 30695 30679   316.231  1176 0.0372316  0.202355
>> 389  16 30987 30971318.42  1168 0.0660476  0.200962
>> 390  16 31282 31266   320.628  1180 0.0358611  0.199548
>> 391  16 31568 31552   322.734  1144 0.0405166  0.198132
>> 392  16 31857 31841   324.859  1156 0.0360826  0.196679
>> 393  16 32090 32074   326.404   932 0.0416869   0.19549
>> 394  16 32205 32189   326.743   460 0.0251877  0.194896
>> 395  16 32302 32286   326.897   388 0.0280574  0.194395
>> 396  16 32348 32332   326.537   184 0.0256821  0.194157
>> 397  16 32385 32369   326.087   148 0.0254342  0.193965
>> 398  16 32424 32408   325.659   156 0.0263006  0.193763
>> 399  16 32445 32429   325.05484 0.0233839  0.193655
>> 2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg lat: 
>> 0.193655
>> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>> 400  16 32445 32429   324.241 0 -  0.193655
>> 401  16 32445 32429   323.433 0 -  0.193655
>> 402  16 32445 32429   322.628 0 -  0.193655
>> 403  16 32445 32429   321.828 0 -  0.193655
>> 404  16 32445 32429   321.031 0 -  0.193655
>> 405  16 32445 32429   320.238 0 -  0.193655
>> 406  16 32445 32429319.45 0 -  0.193655
>> 407  16 32445 32429   318.665 0 -  0.193655
>> 
>> needless to say, very strange.
>> 
>> —Lincoln
>> 
>> 
>>> On Sep 7, 2015, at 3:35 PM, Vickey Singh <vickey.singh22...@gmail.com> 
>>> wrote:
>>> 
>>> Adding ceph-users.
>>> 
>>> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh <vickey.singh22...@gmail.com> 
>>> wrote:
>>> 
>>> 
>>> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke <ulem...@polarzone.de> wrote:
>>> Hi Vickey,
>>> Thanks for your time in replying to my problem.
>>> 
>>> I had the same rados bench output after changing the motherboard of the 
>>> monitor node with the lowest IP...
>>> Due to the new mainboard, I assume the hw-clock was wrong during startup. 
>>> Ceph health show no errors, but all VMs aren't able to do IO (very high 
>>> load on the VMs - but no traffic).
>>> I stopped the mon, but this don't changed anything. I had to restart all 
>>> other mons to get IO again. After that I started the first mon also (with 
>>> the right time now) and all worked fine again...
>>> 
>>> Thanks

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-08 Thread Lincoln Bryant
For whatever it’s worth, my problem has returned and is very similar to yours. 
Still trying to figure out what’s going on over here.

Performance is nice for a few seconds, then goes to 0. This is a similar setup 
to yours (12 OSDs per box, Scientific Linux 6, Ceph 0.94.3, etc)

  384  16 29520 29504   307.287  1188 0.0492006  0.208259
  385  16 29813 29797   309.532  1172 0.0469708  0.206731
  386  16 30105 30089   311.756  1168 0.0375764  0.205189
  387  16 30401 30385   314.009  1184  0.036142  0.203791
  388  16 30695 30679   316.231  1176 0.0372316  0.202355
  389  16 30987 30971318.42  1168 0.0660476  0.200962
  390  16 31282 31266   320.628  1180 0.0358611  0.199548
  391  16 31568 31552   322.734  1144 0.0405166  0.198132
  392  16 31857 31841   324.859  1156 0.0360826  0.196679
  393  16 32090 32074   326.404   932 0.0416869   0.19549
  394  16 32205 32189   326.743   460 0.0251877  0.194896
  395  16 32302 32286   326.897   388 0.0280574  0.194395
  396  16 32348 32332   326.537   184 0.0256821  0.194157
  397  16 32385 32369   326.087   148 0.0254342  0.193965
  398  16 32424 32408   325.659   156 0.0263006  0.193763
  399  16 32445 32429   325.05484 0.0233839  0.193655
2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg lat: 0.193655
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
  400  16 32445 32429   324.241 0 -  0.193655
  401  16 32445 32429   323.433 0 -  0.193655
  402  16 32445 32429   322.628 0 -  0.193655
  403  16 32445 32429   321.828 0 -  0.193655
  404  16 32445 32429   321.031 0 -  0.193655
  405  16 32445 32429   320.238 0 -  0.193655
  406  16 32445 32429319.45 0 -  0.193655
  407  16 32445 32429   318.665 0 -  0.193655

needless to say, very strange.

—Lincoln


> On Sep 7, 2015, at 3:35 PM, Vickey Singh  wrote:
> 
> Adding ceph-users.
> 
> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh  
> wrote:
> 
> 
> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke  wrote:
> Hi Vickey,
> Thanks for your time in replying to my problem.
>  
> I had the same rados bench output after changing the motherboard of the 
> monitor node with the lowest IP...
> Due to the new mainboard, I assume the hw-clock was wrong during startup. 
> Ceph health show no errors, but all VMs aren't able to do IO (very high load 
> on the VMs - but no traffic).
> I stopped the mon, but this don't changed anything. I had to restart all 
> other mons to get IO again. After that I started the first mon also (with the 
> right time now) and all worked fine again...
> 
> Thanks i will try to restart all OSD / MONS and report back , if it solves my 
> problem 
> 
> Another posibility:
> Do you use journal on SSDs? Perhaps the SSDs can't write to garbage 
> collection?
> 
> No i don't have journals on SSD , they are on the same OSD disk. 
> 
> 
> 
> Udo
> 
> 
> On 07.09.2015 16:36, Vickey Singh wrote:
>> Dear Experts
>> 
>> Can someone please help me , why my cluster is not able write data.
>> 
>> See the below output  cur MB/S  is 0  and Avg MB/s is decreasing.
>> 
>> 
>> Ceph Hammer  0.94.2
>> CentOS 6 (3.10.69-1)
>> 
>> The Ceph status says OPS are blocked , i have tried checking , what all i 
>> know 
>> 
>> - System resources ( CPU , net, disk , memory )-- All normal 
>> - 10G network for public and cluster network  -- no saturation 
>> - Add disks are physically healthy 
>> - No messages in /var/log/messages OR dmesg
>> - Tried restarting OSD which are blocking operation , but no luck
>> - Tried writing through RBD  and Rados bench , both are giving same problemm
>> 
>> Please help me to fix this problem.
>> 
>> #  rados bench -p rbd 60 write
>>  Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or 0 
>> objects
>>  Object prefix: benchmark_data_stor1_1791844
>>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>>  0   0 0 0 0 0 - 0
>>  1  16   125   109   435.873   436  0.022076 0.0697864
>>  2  16   139   123   245.94856  0.246578 0.0674407
>>  3  16   139   123   163.969 0 - 0.0674407
>>  4  16   139   123   122.978 0 - 0.0674407
>>  5  16   139   12398.383 0 - 0.0674407
>>  6  16   139   123   81.9865 0 - 0.0674407
>>  7  16   

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-07 Thread Lincoln Bryant

Hi Vickey,

I had this exact same problem last week, resolved by rebooting all of my 
OSD nodes. I have yet to figure out why it happened, though. I _suspect_ 
in my case it's due to a failing controller on a particular box I've had 
trouble with in the past.


I tried setting 'noout', stopping my OSDs one host at a time, then 
rerunning RADOS bench between to see if I could nail down the 
problematic machine. Depending on your # of hosts, this might work for 
you. Admittedly, I got impatient with this approach though and just 
ended up restarting everything (which worked!) :)


If you have a bunch of blocked ops, you could maybe try a 'pg query' on 
the PGs involved and see if there's a common OSD with all of your 
blocked ops. In my experience, it's not necessarily the one reporting.


Anecdotally, I've had trouble with Intel 10Gb NICs and custom kernels as 
well. I've seen a NIC appear to be happy (no message in dmesg, machine 
appears to be communicating normally, etc) but when I went to iperf it, 
I was getting super pitiful performance (like KB/s). I don't know what 
kind of NICs you're using, but you may want to iperf everything just in 
case.


--Lincoln

On 9/7/2015 9:36 AM, Vickey Singh wrote:

Dear Experts

Can someone please help me , why my cluster is not able write data.

See the below output  cur MB/S  is 0  and Avg MB/s is decreasing.


Ceph Hammer  0.94.2
CentOS 6 (3.10.69-1)

The Ceph status says OPS are blocked , i have tried checking , what all i
know

- System resources ( CPU , net, disk , memory )-- All normal
- 10G network for public and cluster network  -- no saturation
- Add disks are physically healthy
- No messages in /var/log/messages OR dmesg
- Tried restarting OSD which are blocking operation , but no luck
- Tried writing through RBD  and Rados bench , both are giving same problemm

Please help me to fix this problem.

#  rados bench -p rbd 60 write
  Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or
0 objects
  Object prefix: benchmark_data_stor1_1791844
sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
  0   0 0 0 0 0 - 0
  1  16   125   109   435.873   436  0.022076 0.0697864
  2  16   139   123   245.94856  0.246578 0.0674407
  3  16   139   123   163.969 0 - 0.0674407
  4  16   139   123   122.978 0 - 0.0674407
  5  16   139   12398.383 0 - 0.0674407
  6  16   139   123   81.9865 0 - 0.0674407
  7  16   139   123   70.2747 0 - 0.0674407
  8  16   139   123   61.4903 0 - 0.0674407
  9  16   139   123   54.6582 0 - 0.0674407
 10  16   139   123   49.1924 0 - 0.0674407
 11  16   139   123   44.7201 0 - 0.0674407
 12  16   139   123   40.9934 0 - 0.0674407
 13  16   139   123   37.8401 0 - 0.0674407
 14  16   139   123   35.1373 0 - 0.0674407
 15  16   139   123   32.7949 0 - 0.0674407
 16  16   139   123   30.7451 0 - 0.0674407
 17  16   139   123   28.9364 0 - 0.0674407
 18  16   139   123   27.3289 0 - 0.0674407
 19  16   139   123   25.8905 0 - 0.0674407
2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117 avg lat:
0.0674407
sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 20  16   139   12324.596 0 - 0.0674407
 21  16   139   123   23.4247 0 - 0.0674407
 22  16   139   123 22.36 0 - 0.0674407
 23  16   139   123   21.3878 0 - 0.0674407
 24  16   139   123   20.4966 0 - 0.0674407
 25  16   139   123   19.6768 0 - 0.0674407
 26  16   139   123 18.92 0 - 0.0674407
 27  16   139   123   18.2192 0 - 0.0674407
 28  16   139   123   17.5686 0 - 0.0674407
 29  16   139   123   16.9628 0 - 0.0674407
 30  16   139   123   16.3973 0 - 0.0674407
 31  16   139   123   15.8684 0 - 0.0674407
 32  16   139   123   15.3725 0 - 0.0674407
 33  16   139   123   14.9067 0 - 0.0674407
 34  16   139   123   14.4683 0 - 0.0674407
 35  16   139   123   14.0549 0   

Re: [ceph-users] CephFS vs RBD

2015-07-22 Thread Lincoln Bryant
Hi Hadi,

AFAIK, you can’t safely mount RBD as R/W on multiple machines. You could 
re-export the RBD as NFS, but that’ll introduce a bottleneck and probably tank 
your performance gains over CephFS.

For what it’s worth, some of our RBDs are mapped to multiple machines, mounted 
read-write on one and read-only on the others. We haven’t seen any strange 
effects from that, but I seem to recall it being ill advised. 

—Lincoln

 On Jul 22, 2015, at 2:05 PM, Hadi Montakhabi h...@cs.uh.edu wrote:
 
 Hello Cephers,
 
 I've been experimenting with CephFS and RBD for some time now.
 From what I have seen so far, RBD outperforms CephFS by far. However, there 
 is a catch!
 RBD could be mounted on one client at a time!
 Now, assuming that we have multiple clients running some MPI code (and doing 
 some distributed I/O), all these clients need to read/write from the same 
 location and sometimes even the same file.
 Is this at all possible by using RBD, and not CephFS?
 
 Thanks,
 Hadi
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel 3.18 io bottlenecks?

2015-06-24 Thread Lincoln Bryant
Hi German,

Is this with CephFS, or RBD?

Thanks,
Lincoln

 On Jun 24, 2015, at 9:44 AM, German Anders gand...@despegar.com wrote:
 
 Hi all,
 
Is there any IO botleneck reported on kernel 3.18.3-031803-generic? since 
 I'm having a lot of iowait and the cluster is really getting slow, and 
 actually there's no much going on. I've read some time ago that there were 
 some issues with kern 3.18, so I would like to know what's the 'best' kernel 
 to go with, I'm using Ubuntu 14.04.1LTs and Ceph v0.82.
 
 Thanks a lot,
 
 Best regards,
 
 German
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Anyone using Ganesha with CephFS?

2015-06-22 Thread Lincoln Bryant
Hi Cephers,

Is anyone successfully using Ganesha for re-exporting CephFS as NFS?

I’ve seen some blog posts about setting it up and the basic functionality seems 
to be there. Just wondering if anyone in the community is actively using it, 
and could relate some experiences.

—Lincoln
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC on 1.1PB?

2015-06-19 Thread Lincoln Bryant
Hi Sean,

We have ~1PB of EC storage using Dell R730xd servers with 6TB OSDs. We've got 
our erasure coding profile set up to be k=10,m=3 which gives us a very 
reasonable chunk of the raw storage with nice resiliency.

I found that CPU usage was significantly higher in EC, but not so much as to be 
problematic. Additionally, EC performance was about 40% of replicated pool 
performance in our testing. 

With 36-disk servers you'll probably need to make sure you do the usual kernel 
tweaks like increasing the max number of file descriptors, etc. 

Cheers,
Lincoln

On Jun 19, 2015, at 10:36 AM, Sean wrote:

 I am looking to use Ceph using EC on a few leftover storage servers (36 disk 
 supermicro servers with dual xeon sockets and around 256Gb of ram). I did a 
 small test using one node and using the ISA library and noticed that the CPU 
 load was pretty spikey for just normal operation.
 
 Does anyone have any experience running Ceph EC on around 216 to 270 4TB 
 disks? I'm looking  to yield around 680 TB to 1PB if possible. just putting 
 my feelers out there to see if anyone else has had any experience and looking 
 for any guidance.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC on 1.1PB?

2015-06-19 Thread Lincoln Bryant
We're running 12 OSDs per node, with 32 hyper-threaded CPUs available. We 
over-provisioned the CPUs because we would like to additionally run jobs from 
our batch system and isolate them via cgroups (we're a high-throughput 
computing facility). . With a total of ~13000 pgs across a few pools, I'm 
seeing about 1GB of resident memory per OSD. As far as EC plugins go, we're 
using jerasure and haven't experimented with others.

That said, in our use case we're using CephFS, so we're fronting the 
erasure-coded pool with a cache tier. The cache pool is limited to 5TB, and 
right now usage is light enough that most operations live in the cache tier and 
rarely get flushed out to the EC pool. I'm sure as we bring more users onto 
this, there will be some more tweaking to do.

As far as performance goes, you might want to read Mark Nelson's excellent 
document about EC performance under Firefly. If you search the list archives, 
he sent a mail in February titled Erasure Coding CPU Overhead Data. I can 
forward you the PDF off-list if you would like.

--Lincoln

On Jun 19, 2015, at 12:42 PM, Sean wrote:

 Thanks lincoln! May I ask how many drives you have per storage node and how 
 many threads you have available? IE are you using hyper threading and do you 
 have more than 24 disks per node in your cluster? I noticed with our 
 replicated cluster that disks == more pgs == more cpu/ram and with 24+ disks 
 this ends up causing issues in some cases. So a 3 node cluster with 70 disks 
 each is fine but scaling up to 21 and i see issues. Even with connections, 
 pids, and file descriptors turned up. Are you using just jerasure or have you 
 tried the ISA driver as well? 
 
 Sorry for bombarding you with questions I am just curious as to where the 40% 
 performance comes from.
 
 On 06/19/2015 11:05 AM, Lincoln Bryant wrote:
 Hi Sean,
 
 We have ~1PB of EC storage using Dell R730xd servers with 6TB OSDs. We've 
 got our erasure coding profile set up to be k=10,m=3 which gives us a very 
 reasonable chunk of the raw storage with nice resiliency.
 
 I found that CPU usage was significantly higher in EC, but not so much as to 
 be problematic. Additionally, EC performance was about 40% of replicated 
 pool performance in our testing. 
 
 With 36-disk servers you'll probably need to make sure you do the usual 
 kernel tweaks like increasing the max number of file descriptors, etc. 
 
 Cheers,
 Lincoln
 
 On Jun 19, 2015, at 10:36 AM, Sean wrote:
 
  I am looking to use Ceph using EC on a few leftover storage servers (36 
 disk supermicro servers with dual xeon sockets and around 256Gb of ram). I 
 did a small test using one node and using the ISA library and noticed that 
 the CPU load was pretty spikey for just normal operation.
 
 Does anyone have any experience running Ceph EC on around 216 to 270 4TB 
 disks? I'm looking  to yield around 680 TB to 1PB if possible. just putting 
 my feelers out there to see if anyone else has had any experience and 
 looking for any guidance.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS client issue

2015-06-14 Thread Lincoln Bryant

Hi Matteo,

Are your clients using the FUSE client or the kernel client? If the 
latter, what kernel version?


--Lincoln

On 6/14/2015 10:26 AM, Matteo Dacrema wrote:

?Hi all,


I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , 
as ceph -s tells me, it's failing to respond to capability release.After 
tha?t all clients stop to respond: can't access files or mount/umont cephfs.

I've 1.5 million files , 2 metadata servers in active/standby configuration 
with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB 
osd and 4GB of RAM.



Here my configuration:


[global]
 fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
 mon_initial_members = cephmds01
 mon_host = 10.29.81.161
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 public network = 10.29.81.0/24
 tcp nodelay = true
 tcp rcvbuf = 0
 ms tcp read timeout = 600

 #Capacity
 mon osd full ratio = .95
 mon osd nearfull ratio = .85


[osd]
 osd journal size = 1024
 journal dio = true
 journal aio = true

 osd op threads = 2
 osd op thread timeout = 60
 osd disk threads = 2
 osd recovery threads = 1
 osd recovery max active = 1
 osd max backfills = 2


 # Pool
 osd pool default size = 2

 #XFS
 osd mkfs type = xfs
 osd mkfs options xfs = -f -i size=2048
 osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog

 #FileStore Settings
 filestore xattr use omap = false
 filestore max inline xattr size = 512
 filestore max sync interval = 10
 filestore merge threshold = 40
 filestore split multiple = 8
 filestore flusher = false
 filestore queue max ops = 2000
 filestore queue max bytes = 536870912
 filestore queue committing max ops = 500
 filestore queue committing max bytes = 268435456
 filestore op threads = 2

[mds]
 max mds = 1
 mds cache size = 75
 client cache size = 2048
 mds dir commit ratio = 0.5



Here ceph -s output:


root@service-new:~# ceph -s
 cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
  health HEALTH_WARN
 mds0: Client 94102 failing to respond to cache pressure
  monmap e2: 2 mons at 
{cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
 election epoch 34, quorum 0,1 cephmds02,cephmds01
  mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
  osdmap e669: 8 osds: 8 up, 8 in
   pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
 288 GB used, 342 GB / 631 GB avail
  256 active+clean
   client io 3091 kB/s rd, 342 op/s

Thank you.
Regards,
Matteo







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Lincoln Bryant
Thanks John, Greg.

If I understand this correctly, then, doing this:
rados -p hotpool cache-flush-evict-all
should start appropriately deleting objects from the cache pool. I just started 
one up, and that seems to be working.

Otherwise, the cache's confgured timeouts/limits should get those deletions 
propagated through to the cold pool naturally.

Is that right?

Thanks again,
Lincoln

On Jun 12, 2015, at 1:12 PM, Gregory Farnum wrote:

 On Fri, Jun 12, 2015 at 11:07 AM, John Spray john.sp...@redhat.com wrote:
 
 Just had a go at reproducing this, and yeah, the behaviour is weird.  Our
 automated testing for cephfs doesn't include any cache tiering, so this is a
 useful exercise!
 
 With a writeback overlay cache tier pool on an EC pool, I write a bunch of
 files, then do a rados cache-flush-evict-all, then delete the files in
 cephfs.  The result is that all the objects are still present in a rados
 ls on either base or cache pool, but if I try to rm any of them I get an
 ENOENT.
 
 Then, finally, when I do another cache-flush-evict-all, now the objects are
 all finally disappearing from the df stats (base and cache pool stats
 ticking down together).
 
 So intuitively, I guess the cache tier is caching the delete-ness of the
 objects, and only later flushing that (i.e. deleting from the base pool).
 The object is still in the cache on that basis, and presumably not getting
 flushed (i.e. deleting in base pool) until usual timeouts/space limits
 apply.
 
 Yep, that's it exactly. This is expected behavior.
 
 Maybe we need something to kick delete flushes to happen much
 earlier (like, ASAP when the cluster isn't too busy doing other
 promotions/evictions).
 
 Sounds like a good RADOS feature request/blueprint that somebody in
 the community might be able to handle.
 -Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Lincoln Bryant
Greetings experts,

I've got a test set up with CephFS configured to use an erasure coded pool + 
cache tier on 0.94.2. 

I have been writing lots of data to fill the cache to observe the behavior and 
performance when it starts evicting objects to the erasure-coded pool.

The thing I have noticed is that after deleting the files via 'rm' through my 
CephFS kernel client, the cache is emptied but the objects that were evicted to 
the EC pool stick around.

I've attached an image that demonstrates what I'm seeing.

Is this intended behavior, or have I misconfigured something?

Thanks,
Lincoln Bryant

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph mount error

2015-06-11 Thread Lincoln Bryant
Hi,

Are you using cephx? If so, does your client have the appropriate key on it? It 
looks like you have an mds set up and running from your screenshot.

Try mounting it like so:

mount -t ceph -o name=admin,secret=[your secret] 192.168.1.105:6789:/ 
/mnt/mycephfs 

--Lincoln

On Jun 7, 2015, at 10:14 AM, 张忠波 wrote:

 Hi ,
 My ceph health is OK ,  And now , I want to  build  a  Filesystem , refer to  
 the CEPH FS QUICK START guide .
 http://ceph.com/docs/master/start/quick-cephfs/
 however , I got a error when i use the command ,  mount -t ceph 
 192.168.1.105:6789:/ /mnt/mycephfs .  error :   mount error 22 = Invalid 
 argument 
 I refer to munual , and now , I don't know how to solve it . 
 I am looking forward to your reply !
 
 截图1.png
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Lincoln Bryant
Hi Adam,

You can get the MDS to spit out more debug information like so:

# ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'

At least then you can see where it's at when it crashes.

--Lincoln

On May 22, 2015, at 9:33 AM, Adam Tygart wrote:

 Hello all,
 
 The ceph-mds servers in our cluster are performing a constant
 boot-replay-crash in our systems.
 
 I have enable debug logging for the mds for a restart cycle on one of
 the nodes[1].
 
 Kernel debug from cephfs client during reconnection attempts:
 [732586.352173] ceph:  mdsc delayed_work
 [732586.352178] ceph:  check_delayed_caps
 [732586.352182] ceph:  lookup_mds_session 88202f01c000 210
 [732586.352185] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732586.352189] ceph:  send_renew_caps ignoring mds0 (up:replay)
 [732586.352192] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732586.352195] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732586.352198] ceph:  mdsc delayed_work
 [732586.352200] ceph:  check_delayed_caps
 [732586.352202] ceph:  lookup_mds_session 881036cbf800 1
 [732586.352205] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732586.352207] ceph:  send_renew_caps ignoring mds0 (up:replay)
 [732586.352210] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732586.352212] ceph:  mdsc put_session 881036cbf800 2 - 1
 [732591.357123] ceph:  mdsc delayed_work
 [732591.357128] ceph:  check_delayed_caps
 [732591.357132] ceph:  lookup_mds_session 88202f01c000 210
 [732591.357135] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732591.357139] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732591.357142] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732591.357145] ceph:  mdsc delayed_work
 [732591.357147] ceph:  check_delayed_caps
 [732591.357149] ceph:  lookup_mds_session 881036cbf800 1
 [732591.357152] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732591.357154] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732591.357157] ceph:  mdsc put_session 881036cbf800 2 - 1
 [732596.362076] ceph:  mdsc delayed_work
 [732596.362081] ceph:  check_delayed_caps
 [732596.362084] ceph:  lookup_mds_session 88202f01c000 210
 [732596.362087] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732596.362091] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732596.362094] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732596.362097] ceph:  mdsc delayed_work
 [732596.362099] ceph:  check_delayed_caps
 [732596.362101] ceph:  lookup_mds_session 881036cbf800 1
 [732596.362104] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732596.362106] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732596.362109] ceph:  mdsc put_session 881036cbf800 2 - 1
 
 Anybody have any debugging tips, or have any ideas on how to get an mds 
 stable?
 
 Server info: CentOS 7.1 with Ceph 0.94.1
 Client info: Gentoo, kernel cephfs. 3.19.5-gentoo
 
 I'd reboot the client, but at this point, I don't believe this is a
 client issue.
 
 [1] 
 https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing
 
 --
 Adam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Lincoln Bryant
I've experienced MDS issues in the past, but nothing sticks out to me in your 
logs.

Are you using a single active MDS with failover, or multiple active MDS? 

--Lincoln

On May 22, 2015, at 10:10 AM, Adam Tygart wrote:

 Thanks for the quick response.
 
 I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this one:
 https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing
 
 Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just
 times out and then the mds gets respawned.
 
 --
 Adam
 
 On Fri, May 22, 2015 at 9:42 AM, Lincoln Bryant linco...@uchicago.edu wrote:
 Hi Adam,
 
 You can get the MDS to spit out more debug information like so:
 
# ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'
 
 At least then you can see where it's at when it crashes.
 
 --Lincoln
 
 On May 22, 2015, at 9:33 AM, Adam Tygart wrote:
 
 Hello all,
 
 The ceph-mds servers in our cluster are performing a constant
 boot-replay-crash in our systems.
 
 I have enable debug logging for the mds for a restart cycle on one of
 the nodes[1].
 
 Kernel debug from cephfs client during reconnection attempts:
 [732586.352173] ceph:  mdsc delayed_work
 [732586.352178] ceph:  check_delayed_caps
 [732586.352182] ceph:  lookup_mds_session 88202f01c000 210
 [732586.352185] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732586.352189] ceph:  send_renew_caps ignoring mds0 (up:replay)
 [732586.352192] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732586.352195] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732586.352198] ceph:  mdsc delayed_work
 [732586.352200] ceph:  check_delayed_caps
 [732586.352202] ceph:  lookup_mds_session 881036cbf800 1
 [732586.352205] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732586.352207] ceph:  send_renew_caps ignoring mds0 (up:replay)
 [732586.352210] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732586.352212] ceph:  mdsc put_session 881036cbf800 2 - 1
 [732591.357123] ceph:  mdsc delayed_work
 [732591.357128] ceph:  check_delayed_caps
 [732591.357132] ceph:  lookup_mds_session 88202f01c000 210
 [732591.357135] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732591.357139] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732591.357142] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732591.357145] ceph:  mdsc delayed_work
 [732591.357147] ceph:  check_delayed_caps
 [732591.357149] ceph:  lookup_mds_session 881036cbf800 1
 [732591.357152] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732591.357154] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732591.357157] ceph:  mdsc put_session 881036cbf800 2 - 1
 [732596.362076] ceph:  mdsc delayed_work
 [732596.362081] ceph:  check_delayed_caps
 [732596.362084] ceph:  lookup_mds_session 88202f01c000 210
 [732596.362087] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732596.362091] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732596.362094] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732596.362097] ceph:  mdsc delayed_work
 [732596.362099] ceph:  check_delayed_caps
 [732596.362101] ceph:  lookup_mds_session 881036cbf800 1
 [732596.362104] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732596.362106] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732596.362109] ceph:  mdsc put_session 881036cbf800 2 - 1
 
 Anybody have any debugging tips, or have any ideas on how to get an mds 
 stable?
 
 Server info: CentOS 7.1 with Ceph 0.94.1
 Client info: Gentoo, kernel cephfs. 3.19.5-gentoo
 
 I'd reboot the client, but at this point, I don't believe this is a
 client issue.
 
 [1] 
 https://drive.google.com/file/d/0B4XF1RWjuGh5WU1OZXpNb0Z1ck0/view?usp=sharing
 
 --
 Adam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Lincoln Bryant
I notice in both logs, the last entry before the MDS restart/failover is when 
the mds is replaying the journal and gets to 
/homes/gundimed/IPD/10kb/1e-500d/DisplayLog/

2015-05-22 09:59:19.116231 7f9d930c1700 10 mds.0.journal EMetaBlob.replay for 
[2,head] had [inode 13f8e31 [...2,head] 
/homes/gundimed/IPD/10kb/1e-500d/DisplayLog/ auth v20776 f(v0 m2015-05-22 
02:34:09.00 357=357+0) n(v1 rc2015-05-22 02:34:09.00 b71340955004 
358=357+1) (iversion lock) | dirfrag=1 dirty=1 0x6ded9c8]

2015-05-22 08:04:31.993007 7f87afb2f700 10 mds.0.journal EMetaBlob.replay for 
[2,head] had [inode 13f8e31 [...2,head] 
/homes/gundimed/IPD/10kb/1e-500d/DisplayLog/ auth v20776 f(v0 m2015-05-22 
02:34:09.00 357=357+0) n(v1 rc2015-05-22 02:34:09.00 b71340955004 
358=357+1) (iversion lock) | dirfrag=1 dirty=1 0x76a59c8]

Maybe there's some problem in this part of the journal? Or maybe that's the end 
of the journal and it crashes afterwards? No idea :( Hopefully one of the devs 
can weigh in.

--Lincoln

On May 22, 2015, at 11:40 AM, Adam Tygart wrote:

 I knew I forgot to include something with my initial e-mail.
 
 Single active with failover.
 
 dumped mdsmap epoch 30608
 epoch   30608
 flags   0
 created 2015-04-02 16:15:55.209894
 modified2015-05-22 11:39:15.992774
 tableserver 0
 root0
 session_timeout 60
 session_autoclose   300
 max_file_size   17592186044416
 last_failure30606
 last_failure_osd_epoch  24298
 compat  compat={},rocompat={},incompat={1=base v0.20,2=client
 writeable ranges,3=default file layouts on dirs,4=dir inode in
 separate object,5=mds uses versioned encoding,6=dirfrag is stored in
 omap,8=no anchor table}
 max_mds 1
 in  0
 up  {0=20284976}
 failed
 stopped
 data_pools  25
 metadata_pool   27
 inline_data disabled
 20285024:   10.5.38.2:7021/32024 'hobbit02' mds.-1.0 up:standby seq 1
 20346784:   10.5.38.1:6957/223554 'hobbit01' mds.-1.0 up:standby seq 1
 20284976:   10.5.38.13:6926/66700 'hobbit13' mds.0.1696 up:replay seq 1
 
 --
 Adam
 
 On Fri, May 22, 2015 at 11:37 AM, Lincoln Bryant linco...@uchicago.edu 
 wrote:
 I've experienced MDS issues in the past, but nothing sticks out to me in 
 your logs.
 
 Are you using a single active MDS with failover, or multiple active MDS?
 
 --Lincoln
 
 On May 22, 2015, at 10:10 AM, Adam Tygart wrote:
 
 Thanks for the quick response.
 
 I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this 
 one:
 https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing
 
 Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just
 times out and then the mds gets respawned.
 
 --
 Adam
 
 On Fri, May 22, 2015 at 9:42 AM, Lincoln Bryant linco...@uchicago.edu 
 wrote:
 Hi Adam,
 
 You can get the MDS to spit out more debug information like so:
 
   # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'
 
 At least then you can see where it's at when it crashes.
 
 --Lincoln
 
 On May 22, 2015, at 9:33 AM, Adam Tygart wrote:
 
 Hello all,
 
 The ceph-mds servers in our cluster are performing a constant
 boot-replay-crash in our systems.
 
 I have enable debug logging for the mds for a restart cycle on one of
 the nodes[1].
 
 Kernel debug from cephfs client during reconnection attempts:
 [732586.352173] ceph:  mdsc delayed_work
 [732586.352178] ceph:  check_delayed_caps
 [732586.352182] ceph:  lookup_mds_session 88202f01c000 210
 [732586.352185] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732586.352189] ceph:  send_renew_caps ignoring mds0 (up:replay)
 [732586.352192] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732586.352195] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732586.352198] ceph:  mdsc delayed_work
 [732586.352200] ceph:  check_delayed_caps
 [732586.352202] ceph:  lookup_mds_session 881036cbf800 1
 [732586.352205] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732586.352207] ceph:  send_renew_caps ignoring mds0 (up:replay)
 [732586.352210] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732586.352212] ceph:  mdsc put_session 881036cbf800 2 - 1
 [732591.357123] ceph:  mdsc delayed_work
 [732591.357128] ceph:  check_delayed_caps
 [732591.357132] ceph:  lookup_mds_session 88202f01c000 210
 [732591.357135] ceph:  mdsc get_session 88202f01c000 210 - 211
 [732591.357139] ceph:  add_cap_releases 88202f01c000 mds0 extra 680
 [732591.357142] ceph:  mdsc put_session 88202f01c000 211 - 210
 [732591.357145] ceph:  mdsc delayed_work
 [732591.357147] ceph:  check_delayed_caps
 [732591.357149] ceph:  lookup_mds_session 881036cbf800 1
 [732591.357152] ceph:  mdsc get_session 881036cbf800 1 - 2
 [732591.357154] ceph:  add_cap_releases 881036cbf800 mds0 extra 680
 [732591.357157] ceph:  mdsc put_session 881036cbf800 2 - 1
 [732596.362076] ceph:  mdsc delayed_work
 [732596.362081] ceph:  check_delayed_caps
 [732596.362084] ceph:  lookup_mds_session

Re: [ceph-users] Kernel Bug in 3.13.0-52

2015-05-13 Thread Lincoln Bryant
Hi Daniel,

There are some kernel recommendations here, although it's unclear if they only 
apply to RBD or also to CephFS.
http://ceph.com/docs/master/start/os-recommendations/

--Lincoln

On May 13, 2015, at 3:03 PM, Daniel Takatori Ohara wrote:

 Thank Gregory for the answer.
 
 I will be upgrade the kernel.
 
 Do you know what kernel the CephFS is stable?
 
 Thanks.
 
 
 Att.
 
 ---
 Daniel Takatori Ohara.
 System Administrator - Lab. of Bioinformatics
 Molecular Oncology Center 
 Instituto Sírio-Libanês de Ensino e Pesquisa
 Hospital Sírio-Libanês
 Phone: +55 11 3155-0200 (extension 1927)
 R: Cel. Nicolau dos Santos, 69
 São Paulo-SP. 01308-060
 http://www.bioinfo.mochsl.org.br
 
 
 On Wed, May 13, 2015 at 5:01 PM, Gregory Farnum g...@gregs42.com wrote:
 On Wed, May 13, 2015 at 12:08 PM, Daniel Takatori Ohara
 dtoh...@mochsl.org.br wrote:
  Hi,
 
  We have a small ceph cluster with 4 OSD's and 1 MDS.
 
  I run Ubuntu 14.04 with 3.13.0-52-generic in the clients, and CentOS 6.6
  with 2.6.32-504.16.2.el6.x86_64 in Servers.
 
  The version of Ceph is 0.94.1
 
  Sometimes, the CephFS freeze, and the dmesg show me the follow messages :
 
  May 13 15:53:10 blade02 kernel: [93297.784094] [ cut here
  ]
  May 13 15:53:10 blade02 kernel: [93297.784121] WARNING: CPU: 10 PID: 299 at
  /build/buildd/linux-3.13.0/fs/ceph/inode.c:701 fill_inode.isra.8+0x9ed/0xa00
  [ceph]()
  May 13 15:53:10 blade02 kernel: [93297.784129] Modules linked in: 8021q garp
  stp mrp llc nfsv3 rpcsec_gss_krb5 nfsv4 ceph libceph libcrc32c intel_rapl
  x86_pkg_temp_thermal intel_powerclamp ipmi_devintf gpi
  May 13 15:53:10 blade02 kernel: [93297.784204] CPU: 10 PID: 299 Comm:
  kworker/10:1 Tainted: GW 3.13.0-52-generic #86-Ubuntu
  May 13 15:53:10 blade02 kernel: [93297.784207] Hardware name: Dell Inc.
  PowerEdge M520/050YHY, BIOS 2.1.3 01/20/2014
  May 13 15:53:10 blade02 kernel: [93297.784221] Workqueue: ceph-msgr con_work
  [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784225]  0009
  880801093a28 8172266e 
  May 13 15:53:10 blade02 kernel: [93297.784233]  880801093a60
  810677fd ffea 0036
  May 13 15:53:10 blade02 kernel: [93297.784239]  
   c9001b73f9d8 880801093a70
  May 13 15:53:10 blade02 kernel: [93297.784246] Call Trace:
  May 13 15:53:10 blade02 kernel: [93297.784257]  [8172266e]
  dump_stack+0x45/0x56
  May 13 15:53:10 blade02 kernel: [93297.784264]  [810677fd]
  warn_slowpath_common+0x7d/0xa0
  May 13 15:53:10 blade02 kernel: [93297.784269]  [810678da]
  warn_slowpath_null+0x1a/0x20
  May 13 15:53:10 blade02 kernel: [93297.784280]  [a046facd]
  fill_inode.isra.8+0x9ed/0xa00 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784290]  [a046e3cd] ?
  ceph_alloc_inode+0x1d/0x4e0 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784302]  [a04704cf]
  ceph_readdir_prepopulate+0x27f/0x6d0 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784318]  [a048a704]
  handle_reply+0x854/0xc70 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784331]  [a048c3f7]
  dispatch+0xe7/0xa90 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784342]  [a02a4a78] ?
  ceph_tcp_recvmsg+0x48/0x60 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784354]  [a02a7a9b]
  try_read+0x4ab/0x10d0 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784365]  [a02a9418] ?
  try_write+0x9a8/0xdb0 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784373]  [8101bc23] ?
  native_sched_clock+0x13/0x80
  May 13 15:53:10 blade02 kernel: [93297.784379]  [8109d585] ?
  sched_clock_cpu+0xb5/0x100
  May 13 15:53:10 blade02 kernel: [93297.784390]  [a02a98d9]
  con_work+0xb9/0x640 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784398]  [81083aa2]
  process_one_work+0x182/0x450
  May 13 15:53:10 blade02 kernel: [93297.784403]  [81084891]
  worker_thread+0x121/0x410
  May 13 15:53:10 blade02 kernel: [93297.784409]  [81084770] ?
  rescuer_thread+0x430/0x430
  May 13 15:53:10 blade02 kernel: [93297.784414]  [8108b5d2]
  kthread+0xd2/0xf0
  May 13 15:53:10 blade02 kernel: [93297.784420]  [8108b500] ?
  kthread_create_on_node+0x1c0/0x1c0
  May 13 15:53:10 blade02 kernel: [93297.784426]  [817330cc]
  ret_from_fork+0x7c/0xb0
  May 13 15:53:10 blade02 kernel: [93297.784431]  [8108b500] ?
  kthread_create_on_node+0x1c0/0x1c0
  May 13 15:53:10 blade02 kernel: [93297.784434] ---[ end trace
  05d3f5ee1f31bc67 ]---
  May 13 15:53:10 blade02 kernel: [93297.784437] ceph: fill_inode badness on
  8807f7eaa5c0
 
 I don't follow the kernel stuff too closely, but the CephFS kernel
 client is still improving quite rapidly and 3.13 is old at this point.
 You could try upgrading to something newer.
 Zheng might also know what's going on and if it's been fixed.
 -Greg
 
 

Re: [ceph-users] Kernel Bug in 3.13.0-52

2015-05-13 Thread Lincoln Bryant
With CephFS, it seems to be safe bet to use the newest kernel available to you.

I believe you will need kernel 4.1+ if you are using Hammer CRUSH tunables 
(straw2). There have been some threads on this recently.

--Lincoln

On May 13, 2015, at 3:20 PM, Daniel Takatori Ohara wrote:

 Hello Lincoln, 
 
 Thank's for the answer. I will be upgrade the kernel in clients.
 
 But, in the version 0.94.1 (hammer), the kernel is the same? Is the 3.16?
 
 Thank's,
 
 
 Att.
 
 ---
 Daniel Takatori Ohara.
 System Administrator - Lab. of Bioinformatics
 Molecular Oncology Center 
 Instituto Sírio-Libanês de Ensino e Pesquisa
 Hospital Sírio-Libanês
 Phone: +55 11 3155-0200 (extension 1927)
 R: Cel. Nicolau dos Santos, 69
 São Paulo-SP. 01308-060
 http://www.bioinfo.mochsl.org.br
 
 
 On Wed, May 13, 2015 at 5:11 PM, Lincoln Bryant linco...@uchicago.edu wrote:
 Hi Daniel,
 
 There are some kernel recommendations here, although it's unclear if they 
 only apply to RBD or also to CephFS.
   http://ceph.com/docs/master/start/os-recommendations/
 
 --Lincoln
 
 On May 13, 2015, at 3:03 PM, Daniel Takatori Ohara wrote:
 
 Thank Gregory for the answer.
 
 I will be upgrade the kernel.
 
 Do you know what kernel the CephFS is stable?
 
 Thanks.
 
 
 Att.
 
 ---
 Daniel Takatori Ohara.
 System Administrator - Lab. of Bioinformatics
 Molecular Oncology Center 
 Instituto Sírio-Libanês de Ensino e Pesquisa
 Hospital Sírio-Libanês
 Phone: +55 11 3155-0200 (extension 1927)
 R: Cel. Nicolau dos Santos, 69
 São Paulo-SP. 01308-060
 http://www.bioinfo.mochsl.org.br
 
 
 On Wed, May 13, 2015 at 5:01 PM, Gregory Farnum g...@gregs42.com wrote:
 On Wed, May 13, 2015 at 12:08 PM, Daniel Takatori Ohara
 dtoh...@mochsl.org.br wrote:
  Hi,
 
  We have a small ceph cluster with 4 OSD's and 1 MDS.
 
  I run Ubuntu 14.04 with 3.13.0-52-generic in the clients, and CentOS 6.6
  with 2.6.32-504.16.2.el6.x86_64 in Servers.
 
  The version of Ceph is 0.94.1
 
  Sometimes, the CephFS freeze, and the dmesg show me the follow messages :
 
  May 13 15:53:10 blade02 kernel: [93297.784094] [ cut here
  ]
  May 13 15:53:10 blade02 kernel: [93297.784121] WARNING: CPU: 10 PID: 299 at
  /build/buildd/linux-3.13.0/fs/ceph/inode.c:701 
  fill_inode.isra.8+0x9ed/0xa00
  [ceph]()
  May 13 15:53:10 blade02 kernel: [93297.784129] Modules linked in: 8021q 
  garp
  stp mrp llc nfsv3 rpcsec_gss_krb5 nfsv4 ceph libceph libcrc32c intel_rapl
  x86_pkg_temp_thermal intel_powerclamp ipmi_devintf gpi
  May 13 15:53:10 blade02 kernel: [93297.784204] CPU: 10 PID: 299 Comm:
  kworker/10:1 Tainted: GW 3.13.0-52-generic #86-Ubuntu
  May 13 15:53:10 blade02 kernel: [93297.784207] Hardware name: Dell Inc.
  PowerEdge M520/050YHY, BIOS 2.1.3 01/20/2014
  May 13 15:53:10 blade02 kernel: [93297.784221] Workqueue: ceph-msgr 
  con_work
  [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784225]  0009
  880801093a28 8172266e 
  May 13 15:53:10 blade02 kernel: [93297.784233]  880801093a60
  810677fd ffea 0036
  May 13 15:53:10 blade02 kernel: [93297.784239]  
   c9001b73f9d8 880801093a70
  May 13 15:53:10 blade02 kernel: [93297.784246] Call Trace:
  May 13 15:53:10 blade02 kernel: [93297.784257]  [8172266e]
  dump_stack+0x45/0x56
  May 13 15:53:10 blade02 kernel: [93297.784264]  [810677fd]
  warn_slowpath_common+0x7d/0xa0
  May 13 15:53:10 blade02 kernel: [93297.784269]  [810678da]
  warn_slowpath_null+0x1a/0x20
  May 13 15:53:10 blade02 kernel: [93297.784280]  [a046facd]
  fill_inode.isra.8+0x9ed/0xa00 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784290]  [a046e3cd] ?
  ceph_alloc_inode+0x1d/0x4e0 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784302]  [a04704cf]
  ceph_readdir_prepopulate+0x27f/0x6d0 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784318]  [a048a704]
  handle_reply+0x854/0xc70 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784331]  [a048c3f7]
  dispatch+0xe7/0xa90 [ceph]
  May 13 15:53:10 blade02 kernel: [93297.784342]  [a02a4a78] ?
  ceph_tcp_recvmsg+0x48/0x60 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784354]  [a02a7a9b]
  try_read+0x4ab/0x10d0 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784365]  [a02a9418] ?
  try_write+0x9a8/0xdb0 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784373]  [8101bc23] ?
  native_sched_clock+0x13/0x80
  May 13 15:53:10 blade02 kernel: [93297.784379]  [8109d585] ?
  sched_clock_cpu+0xb5/0x100
  May 13 15:53:10 blade02 kernel: [93297.784390]  [a02a98d9]
  con_work+0xb9/0x640 [libceph]
  May 13 15:53:10 blade02 kernel: [93297.784398]  [81083aa2]
  process_one_work+0x182/0x450
  May 13 15:53:10 blade02 kernel: [93297.784403]  [81084891]
  worker_thread+0x121/0x410
  May 13 15:53:10 blade02 kernel: [93297.784409]  [81084770

[ceph-users] Failing to respond to cache pressure?

2015-05-05 Thread Lincoln Bryant
Hello all,

I'm seeing some warnings regarding trimming and cache pressure. We're running 
0.94.1 on our cluster, with erasure coding + cache tiering backing our CephFS.

 health HEALTH_WARN
mds0: Behind on trimming (250/30)
mds0: Client 74135 failing to respond to cache pressure

The trimming error popped up after restarting the mds, but then went away on 
its own. However, failing to respond to cache pressure persists.

The cluster is basically idle at the moment (no reads/writes when watching ceph 
-w), so this is very confusing to me.

Is there any way to identify the hostname or IP address of client 74135, so I 
can check the client itself?

Thanks much,
Lincoln
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Teething Problems

2015-03-23 Thread Lincoln Bryant
Hi David,

I also see only the RBD pool getting created by default in 0.93.

With regards to resizing placement groups, I believe you can use:
ceph osd pool set [pool name] pg_num
ceph osd pool set [pool name] pgp_num

Be forewarned, this will trigger data migration.

Cheers,
Lincoln

On Mar 4, 2015, at 2:27 PM, Datatone Lists wrote:

 I have been following ceph for a long time. I have yet to put it into
 service, and I keep coming back as btrfs improves and ceph reaches
 higher version numbers.
 
 I am now trying ceph 0.93 and kernel 4.0-rc1.
 
 Q1) Is it still considered that btrfs is not robust enough, and that
 xfs should be used instead? [I am trying with btrfs].
 
 I followed the manual deployment instructions on the web site 
 (http://ceph.com/docs/master/install/manual-deployment/) and I managed
 to get a monitor and several osds running and apparently working. The
 instructions fizzle out without explaining how to set up mds. I went
 back to mkcephfs and got things set up that way. The mds starts.
 
 [Please don't mention ceph-deploy]
 
 The first thing that I noticed is that (whether I set up mon and osds
 by following the manual deployment, or using mkcephfs), the correct
 default pools were not created.
 
 bash-4.3# ceph osd lspools
 0 rbd,
 bash-4.3# 
 
 I get only 'rbd' created automatically. I deleted this pool, and
 re-created data, metadata and rbd manually. When doing this, I had to
 juggle with the pg- num in order to avoid the 'too many pgs for osd'.
 I have three osds running at the moment, but intend to add to these
 when I have some experience of things working reliably. I am puzzled,
 because I seem to have to set the pg-num for the pool to a number that
 makes (N-pools x pg-num)/N-osds come to the right kind of number. So
 this implies that I can't really expand a set of pools by adding osds
 at a later date. 
 
 Q2) Is there any obvious reason why my default pools are not getting
 created automatically as expected?
 
 Q3) Can pg-num be modified for a pool later? (If the number of osds is 
 increased dramatically).
 
 Finally, when I try to mount cephfs, I get a mount 5 error.
 
 A mount 5 error typically occurs if a MDS server is laggy or if it
 crashed. Ensure at least one MDS is up and running, and the cluster is
 active + healthy.
 
 My mds is running, but its log is not terribly active:
 
 2015-03-04 17:47:43.177349 7f42da2c47c0  0 ceph version 0.93 
 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110
 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors 
 {default=true}
 
 (This is all there is in the log).
 
 I think that a key indicator of the problem must be this from the
 monitor log:
 
 2015-03-04 16:53:20.715132 7f3cd0014700  1
 mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.?
 [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem
 disabled
 
 (I have added the '' sections to obscure my ip address)
 
 Q4) Can you give me an idea of what is wrong that causes the mds to not
 play properly?
 
 I think that there are some typos on the manual deployment pages, for
 example:
 
 ceph-osd id={osd-num}
 
 This is not right. As far as I am aware it should be:
 
 ceph-osd -i {osd-num}
 
 An observation. In principle, setting things up manually is not all
 that complicated, provided that clear and unambiguous instructions are
 provided. This simple piece of documentation is very important. My view
 is that the existing manual deployment instructions gets a bit confused
 and confusing when it gets to the osd setup, and the mds setup is
 completely absent.
 
 For someone who knows, this would be a fairly simple and fairly quick 
 operation to review and revise this part of the documentation. I
 suspect that this part suffers from being really obvious stuff to the
 well initiated. For those of us closer to the start, this forms the
 ends of the threads that have to be picked up before the journey can be
 made.
 
 Very best regards,
 David
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG to pool mapping?

2015-02-04 Thread Lincoln Bryant
On Feb 4, 2015, at 3:27 PM, Gregory Farnum wrote:

 On Wed, Feb 4, 2015 at 1:20 PM, Chad William Seys
 cws...@physics.wisc.edu wrote:
 Hi all,
   How do I determine which pool a PG belongs to?
   (Also, is it the case that all objects in a PG belong to one pool?)
 
 PGs are of the form 1.a2b3c4. The part prior to the period is the
 pool ID; the part following distinguishes the PG and is based on the
 hash range it covers. :)
 
 Yes, all objects in a PG belong to a single pool; they are hash ranges
 of the pool.
 -Greg
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

You can also map the pool number to the pool name with:

'ceph osd lspools'

Similarly, 'rados lspools' will print out the pools line by line.

Cheers,
Lincoln

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Testing

2015-01-28 Thread Lincoln Bryant
Hi Raj,

Sébastien Han has done some excellent Ceph benchmarking on his blog here: 
http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/

Maybe that's a good place to start for your own testing?

Cheers,
Lincoln

On Jan 28, 2015, at 12:59 PM, Jeripotula, Shashiraj wrote:

 Resending, Guys, Please help me point to some good documentation.
  
 Thanks in advance.
  
 Regards
  
 Raj
  
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Jeripotula, Shashiraj
 Sent: Tuesday, January 27, 2015 10:32 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Ceph Testing
  
 Hi  All,
  
 Is there a good documentation on Ceph Testing.
  
 I have the following setup done, but not able to find a good document to 
 start doing the tests.
  
  
  
  
 image001.png
 Please advise.
  
 Thanks
  
 Raj
  
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Building Ceph

2015-01-06 Thread Lincoln Bryant
Hi Pankaj,

You can search for the lib using the 'yum provides' command, which accepts 
wildcards.

[root@sl7 ~]# yum provides */lib64/libkeyutils*
Loaded plugins: langpacks
keyutils-libs-1.5.8-3.el7.x86_64 : Key utilities library
Repo: sl
Matched from:
Filename: /lib64/libkeyutils.so.1.5
Filename: /lib64/libkeyutils.so.1

Cheers,
Lincoln

On Jan 5, 2015, at 12:26 PM, Garg, Pankaj wrote:

 Hi,
 I’m trying to build Ceph on my RHEL (Scientific Linux 7 – Nitrogen), with 
 3.10.0.
 I am using the configure script and I am now stuck on “libkeyutils not found”.
 I can’t seem to find the right library for this. What Is the right yum update 
 name for this library?
 Any help appreciated.
 Thanks
 Pankaj
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph status

2015-01-06 Thread Lincoln Bryant

Hi Ajitha,

For one, it looks like you don't have enough OSDs for the number of 
replicas you have specified in the config file. What is the value of 
your 'osd pool default size' in ceph.conf? If it's 3, for example, 
then you need to have at least 3 hosts with 1 OSD each (with the default 
CRUSH rules, IIRC). Alternatively, you could reduce the replication 
level. You can see how to do that here: 
http://ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas


The other warning indicates that your monitor VM has a nearly full disk.

Hope that helps!

Cheers,
Lincoln

On 1/6/2015 5:07 AM, Ajitha Robert wrote:

Hi all,

I have installed ceph using ceph-deploy utility.. I have created three
VM's, one for monitor+mds and other two VM's for OSD's. ceph admin is
another seperate machine...


.Status and health of ceph are shown below.. Can you please suggest What i
can infer from the status.. I m a beginner to this..

*ceph status*

   cluster 3a946c74-b16d-41bd-a5fe-41efa96f0ee9
  health HEALTH_WARN 46 pgs degraded; 18 pgs incomplete; 64 pgs stale;
46 pgs stuck degraded; 18 pgs stuck inactive; 64 pgs stuck stale; 64 pgs
stuck unclean; 46 pgs stuck undersized; 46 pgs undersized; mon.MON low disk
space
  monmap e1: 1 mons at {MON=10.184.39.66:6789/0}, election epoch 1,
quorum 0 MON
  osdmap e19: 5 osds: 2 up, 2 in
   pgmap v33: 64 pgs, 1 pools, 0 bytes data, 0 objects
 10304 MB used, 65947 MB / 76252 MB avail
   18 stale+incomplete
   46 stale+active+undersized+degraded


*ceph health*

HEALTH_WARN 46 pgs degraded; 18 pgs incomplete; 64 pgs stale; 46 pgs stuck
degraded; 18 pgs stuck inactive; 64 pgs stuck stale; 64 pgs stuck unclean;
46 pgs stuck undersized; 46 pgs undersized; mon.MON low disk space

*ceph -w*
 cluster 3a946c74-b16d-41bd-a5fe-41efa96f0ee9
  health HEALTH_WARN 46 pgs degraded; 18 pgs incomplete; 64 pgs stale;
46 pgs stuck degraded; 18 pgs stuck inactive; 64 pgs stuck stale; 64 pgs
stuck unclean; 46 pgs stuck undersized; 46 pgs undersized; mon.MON low disk
space
  monmap e1: 1 mons at {MON=10.184.39.66:6789/0}, election epoch 1,
quorum 0 MON
  osdmap e19: 5 osds: 2 up, 2 in
   pgmap v31: 64 pgs, 1 pools, 0 bytes data, 0 objects
 10305 MB used, 65947 MB / 76252 MB avail
   18 stale+incomplete
   46 stale+active+undersized+degraded

2015-01-05 20:38:53.159998 mon.0 [INF] from='client.? 10.184.39.66:0/1011909'
entity='client.bootstrap-mds' cmd='[{prefix: auth get-or-create,
entity: mds.MON, caps: [osd, allow rwx, mds, allow, mon,
allow profile mds]}]': finished


2015-01-05 20:41:42.003690 mon.0 [INF] pgmap v32: 64 pgs: 18
stale+incomplete, 46 stale+active+undersized+degraded; 0 bytes data, 10304
MB used, 65947 MB / 76252 MB avail
2015-01-05 20:41:50.100784 mon.0 [INF] pgmap v33: 64 pgs: 18
stale+incomplete, 46 stale+active+undersized+degraded; 0 bytes data, 10304
MB used, 65947 MB / 76252 MB avail





*Regards,Ajitha R*



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow/Hung IOs

2015-01-05 Thread Lincoln Bryant
Hi BIll,

From your log excerpt, it looks like your slow requests are happening on OSDs 
14 and 18. Is it always these two OSDs?

If you don't have a long recovery time (e.g., the cluster is just full of test 
data), maybe you could try setting OSDs 14 and 18 out and re-benching?

Alternatively I suppose you could just use bonnie++ or dd etc to write to those 
OSDs (careful to not clobber any Ceph dirs) and see how the performance looks. 

Cheers,
Lincoln

On Jan 5, 2015, at 4:36 PM, Sanders, Bill wrote:

 Hi Ceph Users,
 
 We've got a Ceph cluster we've built, and we're experiencing issues with slow 
 or hung IO's, even running 'rados bench' on the OSD cluster.  Things start 
 out great, ~600 MB/s, then rapidly drops off as the test waits for IO's. 
 Nothing seems to be taxed... the system just seems to be waiting.  Any help 
 trying to figure out what could cause the slow IO's is appreciated.
 
 For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete:
 
 A typical rados bench:
  Total time run: 957.458274
 Total writes made:  9251
 Write size: 4194304
 Bandwidth (MB/sec): 38.648 
 
 Stddev Bandwidth:   157.323
 Max bandwidth (MB/sec): 964
 Min bandwidth (MB/sec): 0
 Average Latency:3.21126
 Stddev Latency: 51.9546
 Max latency:910.72
 Min latency:0.04516
 
 
 According to ceph.log, we're not experiencing any OSD flapping or monitor 
 election cycles, just slow requests:
 
 # grep slow /var/log/ceph/ceph.log:
 2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow 
 requests, 1 included below; oldest blocked for  513.611379 secs
 2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow 
 request 30.136429 seconds old, received at 2015-01-05 13:42:12.801205: 
 osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
 from 3,37
 2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow 
 requests, 1 included below; oldest blocked for  520.612372 secs
 2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow 
 request 480.636547 seconds old, received at 2015-01-05 13:34:49.302080: 
 osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.0c77 [write 
 3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops 
 from 26,37
 2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow 
 requests, 1 included below; oldest blocked for  543.615545 secs
 2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow 
 request 60.140595 seconds old, received at 2015-01-05 13:42:12.801205: 
 osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
 from 3,37
 2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow 
 requests, 1 included below; oldest blocked for  606.941954 secs
 2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow 
 request 240.101138 seconds old, received at 2015-01-05 13:40:04.832272: 
 osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.0010 [write 
 475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops 
 from 27,33
 2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow 
 requests, 1 included below; oldest blocked for  603.624511 secs
 2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow 
 request 120.149561 seconds old, received at 2015-01-05 13:42:12.801205: 
 osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
 from 3,37
 2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 slow 
 requests, 1 included below; oldest blocked for  723.661722 secs
 2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] slow 
 request 240.186772 seconds old, received at 2015-01-05 13:42:12.801205: 
 osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
 from 3,37
 2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 slow 
 requests, 1 included below; oldest blocked for  740.980083 secs
 2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11640 254 : [WRN] slow 
 request 480.063439 seconds old, received at 2015-01-05 13:38:18.908100: 
 osd_op(client.91911.1:3113675 rb.0.13f5.238e1f29.0010 [write 
 475136~512] 3.679a939d ondisk+write e994) v4 currently waiting for subops 
 from 27,34
 2015-01-05 13:48:05.030581 osd.14 39.7.48.7:6818/11640 255 : [WRN] 4 slow 
 requests, 1 included below; oldest blocked for  847.039098 secs
 2015-01-05 13:48:05.030587 osd.14 39.7.48.7:6818/11640 256 : [WRN] slow 
 request 480.198282 seconds old, received at 2015-01-05 13:40:04.832272: 
 osd_op(client.92008.1:3101102 

[ceph-users] mds continuously crashing on Firefly

2014-11-13 Thread Lincoln Bryant
Hi Cephers,

Over night, our MDS crashed, failing over to the standby which also crashed! 
Upon trying to restart them this morning, I find that they no longer start and 
always seem to crash on the same file in the logs. I've pasted part of a ceph 
mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' below [1].

Can anyone help me interpret this error? 

Thanks for your time,
Lincoln Bryant

[1]
-7 2014-11-13 10:52:15.064784 7fc49d8ab700  7 mds.0.locker rdlock_start  
on (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile 
auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion 
lock) cr={374559=0-4194304@1} 
caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
-6 2014-11-13 10:52:15.064794 7fc49d8ab700  7 mds.0.locker rdlock_start 
waiting on (ifile sync-mix) on [inode 1000258c3c8 [2,head] 
/stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) 
(ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} 
caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
-5 2014-11-13 10:52:15.064805 7fc49d8ab700 10 mds.0.cache.ino(1000258c3c8) 
add_waiter tag 4000 0xbf71920 !ambig 1 !frozen 1 !freezing 1
-4 2014-11-13 10:52:15.064808 7fc49d8ab700 15 mds.0.cache.ino(1000258c3c8) 
taking waiter here
-3 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log 
(ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile 
auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion 
lock) cr={374559=0-4194304@1} 
caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
-2 2014-11-13 10:52:15.064827 7fc49d8ab700  1 -- 192.170.227.116:6800/6489 
== osd.104 192.170.227.122:6812/1084 911  osd_op_reply(82611 
100022a4e3a. [tmapget 0~0] v0'0 uv78780 ondisk = 0) v6  187+0+1410 
(1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0
-1 2014-11-13 10:52:15.064843 7fc49d8ab700 10 mds.0.cache.dir(100022a4e3a) 
_tmap_fetched 1410 bytes for [dir 100022a4e3a 
/stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ 
[2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | 
waiter=1 authpin=1 0x3b0a040] want_dn=
 0 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) 
**
 in thread 7fc49d8ab700

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: /usr/bin/ceph-mds() [0x82f741]
 2: /lib64/libpthread.so.0() [0x371c40f710]
 3: (gsignal()+0x35) [0x371bc32635]
 4: (abort()+0x175) [0x371bc33e15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds continuously crashing on Firefly

2014-11-13 Thread Lincoln Bryant
Hi all,

Just providing an update to this -- I started the mds daemon on a new server 
and rebooted a box with a hung CephFS mount (from the first crash) and the 
problem seems to have gone away. 

I'm still not sure why the mds was shutting down with a Caught signal, 
though. 

Cheers,
Lincoln

On Nov 13, 2014, at 11:01 AM, Lincoln Bryant wrote:

 Hi Cephers,
 
 Over night, our MDS crashed, failing over to the standby which also crashed! 
 Upon trying to restart them this morning, I find that they no longer start 
 and always seem to crash on the same file in the logs. I've pasted part of a 
 ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' below [1].
 
 Can anyone help me interpret this error? 
 
 Thanks for your time,
 Lincoln Bryant
 
 [1]
-7 2014-11-13 10:52:15.064784 7fc49d8ab700  7 mds.0.locker rdlock_start  
 on (ifile sync-mix) on [inode 1000258c3c8 [2,head] 
 /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) 
 (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} 
 caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
 ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
-6 2014-11-13 10:52:15.064794 7fc49d8ab700  7 mds.0.locker rdlock_start 
 waiting on (ifile sync-mix) on [inode 1000258c3c8 [2,head] 
 /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) 
 (ifile sync-mix) (iversion lock) cr={374559=0-4194304@1} 
 caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
 ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
-5 2014-11-13 10:52:15.064805 7fc49d8ab700 10 
 mds.0.cache.ino(1000258c3c8) add_waiter tag 4000 0xbf71920 !ambig 1 
 !frozen 1 !freezing 1
-4 2014-11-13 10:52:15.064808 7fc49d8ab700 15 
 mds.0.cache.ino(1000258c3c8) taking waiter here
-3 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log 
 (ifile sync-mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile 
 auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync-mix) (iversion 
 lock) cr={374559=0-4194304@1} 
 caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
 ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
-2 2014-11-13 10:52:15.064827 7fc49d8ab700  1 -- 
 192.170.227.116:6800/6489 == osd.104 192.170.227.122:6812/1084 911  
 osd_op_reply(82611 100022a4e3a. [tmapget 0~0] v0'0 uv78780 ondisk = 
 0) v6  187+0+1410 (1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0
-1 2014-11-13 10:52:15.064843 7fc49d8ab700 10 
 mds.0.cache.dir(100022a4e3a) _tmap_fetched 1410 bytes for [dir 100022a4e3a 
 /stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ 
 [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | 
 waiter=1 authpin=1 0x3b0a040] want_dn=
 0 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) 
 **
 in thread 7fc49d8ab700
 
 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: /usr/bin/ceph-mds() [0x82f741]
 2: /lib64/libpthread.so.0() [0x371c40f710]
 3: (gsignal()+0x35) [0x371bc32635]
 4: (abort()+0x175) [0x371bc33e15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting Incomplete PGs

2014-10-28 Thread Lincoln Bryant
Hi Greg, Loic,

I think we have seen this as well (sent a mail to the list a week or so ago 
about incomplete pgs). I ended up giving up on the data and doing a 
force_create_pgs after doing a find on my OSDs and deleting the relevant pg 
dirs. If there are any logs etc you'd like to see for debugging / post-mortem, 
I'd be happy to send them along.

Cheers,
Lincoln

On Oct 28, 2014, at 4:11 PM, Gregory Farnum wrote:

 On Thu, Oct 23, 2014 at 6:41 AM, Chris Kitzmiller
 ckitzmil...@hampshire.edu wrote:
 On Oct 22, 2014, at 8:22 PM, Craig Lewis wrote:
 
 Shot in the dark: try manually deep-scrubbing the PG.  You could also try
 marking various osd's OUT, in an attempt to get the acting set to include
 osd.25 again, then do the deep-scrub again.  That probably won't help
 though, because the pg query says it probed osd.25 already... actually , it
 doesn't.  osd.25 is in probing_osds not probed_osds. The deep-scrub
 might move things along.
 
 Re-reading your original post, if you marked the slow osds OUT, but left
 them running, you should not have lost data.
 
 
 That's true. I just marked them out. I did lose osd.10 (in addition to
 out'ting those other two OSDs) so I'm not out of the woods yet.
 
 If the scrubs don't help, it's probably time to hop on IRC.
 
 
 When I issue the deep-scrub command the cluster just doesn't scrub it. Same
 for regular scrub. :(
 
 This pool was offering an RBD which I've lost my connection to and it won't
 remount so my data is totally inaccessible at the moment. Thanks for your
 help so far!
 
 It looks like you are suffering from
 http://tracker.ceph.com/issues/9752, which we've not yet seen in-house
 but have had reported a few times. I suspect that Loic (CC'ed) would
 like to discuss your cluster's history with you to try and narrow it
 down.
 -Greg
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs stuck in 'incomplete' state, blocked ops, query command hangs

2014-10-21 Thread Lincoln Bryant
Hi cephers,

We have two pgs that are stuck in 'incomplete' state across two different 
pools: 
pg 2.525 is stuck inactive since forever, current state incomplete, last acting 
[55,89]
pg 0.527 is stuck inactive since forever, current state incomplete, last acting 
[55,89]
pg 0.527 is stuck unclean since forever, current state incomplete, last acting 
[55,89]
pg 2.525 is stuck unclean since forever, current state incomplete, last acting 
[55,89]
pg 0.527 is incomplete, acting [55,89]
pg 2.525 is incomplete, acting [55,89]

Basically, we ran into a problem where we had 2x replication and 2 disks on 
different machines died near-simultaneously, and my pgs were stuck in 
'down+peering'. I had to do some combination of declaring the OSDs as lost, and 
running 'force_create_pg'. I realize the data on those pgs is now lost, but I'm 
stuck as to how to get the pgs out of 'incomplete'. 

I also see many ops blocked on the primary OSD for these:
100 ops are blocked  67108.9 sec
100 ops are blocked  67108.9 sec on osd.55

However, this is a new disk. If I 'ceph osd out osd.55', the pgs move to 
another OSD and the new primary gets blocked ops. Restarting osd.55 does 
nothing. Other pgs on osd.55 seem okay.

I would attach the result of a query, but If I run a 'ceph pg 2.525 query', the 
command totally hangs until I ctrl-c

ceph pg 2.525 query
^CError EINTR: problem getting command descriptions from pg.2.525

I've also tried 'ceph pg repair 2.525', which does nothing.

Any thoughts here? Are my pools totally sunk? 

Thanks,
Lincoln
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck in 'incomplete' state, blocked ops, query command hangs

2014-10-21 Thread Lincoln Bryant
,\
  maybe_went_rw: 1,\
  up: [\
89],\
  acting: [\
89,\
89,\
89]\},\
\{ first: 224178,\
  last: 224210,\
  maybe_went_rw: 1,\
  up: [\
55,\
89],\
  acting: [\
55,\
89,\
55,\
55]\},\
\{ first: 224211,\
  last: 224211,\
  maybe_went_rw: 0,\
  up: [\
89],\
  acting: [\
89,\
89,\
89]\},\
\{ first: 224212,\
  last: 224289,\
  maybe_went_rw: 1,\
  up: [\
55,\
89],\
  acting: [\
55,\
89,\
55,\
55]\},\
\{ first: 224290,\
  last: 224290,\
  maybe_went_rw: 0,\
  up: [\
55],\
  acting: [\
55,\
55,\
55]\}],\
  probing_osds: [\
24,\
33,\
48,\
55,\
89],\
  down_osds_we_would_probe: [\
85],\
  peering_blocked_by: []\},\
\{ name: Started,\
  enter_time: 2014-10-21 11:51:08.013457\}],\
  agent_state: \{\}\}}

There are some things like this in the peer info:

  up: [],
  acting: [],
  up_primary: -1,
  acting_primary: -1},


I also see things like:
  down_osds_we_would_probe: [
85],

But I don't have an OSD 85:
85  3.64osd.85  DNE

# ceph osd rm osd.85
osd.85 does not exist.
# ceph osd lost 85 --yes-i-really-mean-it
osd.85 is not down or doesn't exist

Any help would be greatly appreciated.

Thanks,
Lincoln

On Oct 21, 2014, at 9:39 AM, Lincoln Bryant wrote:

 Hi cephers,
 
 We have two pgs that are stuck in 'incomplete' state across two different 
 pools: 
 pg 2.525 is stuck inactive since forever, current state incomplete, last 
 acting [55,89]
 pg 0.527 is stuck inactive since forever, current state incomplete, last 
 acting [55,89]
 pg 0.527 is stuck unclean since forever, current state incomplete, last 
 acting [55,89]
 pg 2.525 is stuck unclean since forever, current state incomplete, last 
 acting [55,89]
 pg 0.527 is incomplete, acting [55,89]
 pg 2.525 is incomplete, acting [55,89]
 
 Basically, we ran into a problem where we had 2x replication and 2 disks on 
 different machines died near-simultaneously, and my pgs were stuck in 
 'down+peering'. I had to do some combination of declaring the OSDs as lost, 
 and running 'force_create_pg'. I realize the data on those pgs is now lost, 
 but I'm stuck as to how to get the pgs out of 'incomplete'. 
 
 I also see many ops blocked on the primary OSD for these:
 100 ops are blocked  67108.9 sec
 100 ops are blocked  67108.9 sec on osd.55
 
 However, this is a new disk. If I 'ceph osd out osd.55', the pgs move to 
 another OSD and the new primary gets blocked ops. Restarting osd.55 does 
 nothing. Other pgs on osd.55 seem okay.
 
 I would attach the result of a query, but If I run a 'ceph pg 2.525 query', 
 the command totally hangs until I ctrl-c
 
 ceph pg 2.525 query
 ^CError EINTR: problem getting command descriptions from pg.2.525
 
 I've also tried 'ceph pg repair 2.525', which does nothing.
 
 Any thoughts here? Are my pools totally sunk? 
 
 Thanks,
 Lincoln
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck in 'incomplete' state, blocked ops, query command hangs

2014-10-21 Thread Lincoln Bryant
: [
24,
33,
48,
55,
89],
  down_osds_we_would_probe: [
85],
  peering_blocked_by: []},
{ name: Started,
  enter_time: 2014-10-21 11:51:08.013457}],
  agent_state: {}}

Thanks,
Lincoln

On Oct 21, 2014, at 11:59 AM, Lincoln Bryant wrote:

 A small update on this, I rebooted all of the Ceph nodes and was able to then 
 query one of the misbehaving pgs.
 
 I've attached the query for pg 2.525. 
 
 incomplete-pg-query-2.525.rtf
 
 There are some things like this in the peer info:
 
  up: [],
  acting: [],
  up_primary: -1,
  acting_primary: -1},
 
 
 I also see things like:
  down_osds_we_would_probe: [
85],
 
 But I don't have an OSD 85:
   85  3.64osd.85  DNE
 
 # ceph osd rm osd.85
 osd.85 does not exist.
 # ceph osd lost 85 --yes-i-really-mean-it
 osd.85 is not down or doesn't exist
 
 Any help would be greatly appreciated.
 
 Thanks,
 Lincoln
 
 On Oct 21, 2014, at 9:39 AM, Lincoln Bryant wrote:
 
 Hi cephers,
 
 We have two pgs that are stuck in 'incomplete' state across two different 
 pools: 
 pg 2.525 is stuck inactive since forever, current state incomplete, last 
 acting [55,89]
 pg 0.527 is stuck inactive since forever, current state incomplete, last 
 acting [55,89]
 pg 0.527 is stuck unclean since forever, current state incomplete, last 
 acting [55,89]
 pg 2.525 is stuck unclean since forever, current state incomplete, last 
 acting [55,89]
 pg 0.527 is incomplete, acting [55,89]
 pg 2.525 is incomplete, acting [55,89]
 
 Basically, we ran into a problem where we had 2x replication and 2 disks on 
 different machines died near-simultaneously, and my pgs were stuck in 
 'down+peering'. I had to do some combination of declaring the OSDs as lost, 
 and running 'force_create_pg'. I realize the data on those pgs is now lost, 
 but I'm stuck as to how to get the pgs out of 'incomplete'. 
 
 I also see many ops blocked on the primary OSD for these:
 100 ops are blocked  67108.9 sec
 100 ops are blocked  67108.9 sec on osd.55
 
 However, this is a new disk. If I 'ceph osd out osd.55', the pgs move to 
 another OSD and the new primary gets blocked ops. Restarting osd.55 does 
 nothing. Other pgs on osd.55 seem okay.
 
 I would attach the result of a query, but If I run a 'ceph pg 2.525 query', 
 the command totally hangs until I ctrl-c
 
 ceph pg 2.525 query
 ^CError EINTR: problem getting command descriptions from pg.2.525
 
 I've also tried 'ceph pg repair 2.525', which does nothing.
 
 Any thoughts here? Are my pools totally sunk? 
 
 Thanks,
 Lincoln
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mon won't start, possibly due to corrupt disk?

2014-07-18 Thread Lincoln Bryant
Thanks Greg. Just for posterity, ceph-kvstore-tool /var/lib/ceph/mon/store.db 
set auth last_committed ver 0 did the trick and we're back to HEALTH_OK.

Cheers,
Lincoln Bryant

On Jul 18, 2014, at 4:15 PM, Gregory Farnum wrote:

 Hmm, this log is just leaving me with more questions. Could you tar up
 the /var/lib/ceph/mon/store.db (substitute actual mon store path as
 necessary) and upload it for me? (you can use ceph-post-file to put it
 on our servers if you prefer.) Just from the log I don't have a great
 idea of what's gone wrong, but you might find that
 ceph-kvstore-tool /var/lib/ceph/mon/store.db set auth last_committed ver 0
 helps. (To be perfectly honest I'm just copying that from a similar
 report in the tracker at http://tracker.ceph.com/issues/8851, but
 that's the approach I was planning on.)
 
 Nothing has changed in the monitor that should have caused issues, but
 with two reports I'd like to at least see if we can do something to be
 a little more robust in the face of corruption!
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 On Thu, Jul 17, 2014 at 1:39 PM, Lincoln Bryant linco...@uchicago.edu wrote:
 Hi all,
 
 I tried restarting my mon today, but I find that it no longer starts. 
 Whenever I try to fire up the mon, I get errors of this nature:
 
   -3 2014-07-17 15:12:32.738510 7f25b0921780 10 mon.a@-1(probing).auth 
 v1537 update_from_paxos
   -2 2014-07-17 15:12:32.738526 7f25b0921780 10 mon.a@-1(probing).auth 
 v1537 update_from_paxos version 1537 keys ver 0 latest 0
   -1 2014-07-17 15:12:32.738532 7f25b0921780 10 mon.a@-1(probing).auth 
 v1537 update_from_paxos key server version 0
0 2014-07-17 15:12:32.739836 7f25b0921780 -1 mon/AuthMonitor.cc: In 
 function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 
 7f25b0921780 time 2014-07-17 15:12:32.738549
 mon/AuthMonitor.cc: 155: FAILED assert(ret == 0)
 
 After having a conversation with Greg in IRC, it seems that the disk state 
 is corrupted. This seems to be CephX related, although we do not have CephX 
 enabled on this cluster.
 
 At Greg's request, I've attached the logs in this mail to hopefully squirrel 
 out what exactly is corrupted. I've set debug {mon,paxos,auth,keyvaluestore} 
 to 20 in ceph.conf.
 
 I'm hoping to be able to recover -- unfortunately we've made the mistake of 
 only deploying a single mon for this cluster, and there is some data I'd 
 like to preserve.
 
 Thanks for any help,
 Lincoln Bryant
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] issues with ceph

2014-05-09 Thread Lincoln Bryant
Hi Erik,

What happens if you try to stat one of the missing files (assuming you know 
the name of the file before you remount raw)?

I had a problem where files would disappear and reappear in CephFS, which I 
believe was fixed in kernel 3.12.

Cheers,
Lincoln

On May 9, 2014, at 9:30 AM, Aronesty, Erik wrote:

 So we were attempting to stress test a cephfs installation, and last night, 
 after copying 500GB of files, we got this:
 
 570G in the raw directory
 
 q782657@usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ ls -lh
 total 32M
 -rw-rw-r-- 1 q783775 pipeline  32M May  8 10:39 
 2014-02-25T12:00:01-0800_data_manifest.tsv
 -rw-rw-r-- 1 q783775 pipeline  144 May  8 10:42 cghub.key
 drwxrwxr-x 1 q783775 pipeline 234G May  8 11:31 fastqs
 drwxrwxr-x 1 q783775 pipeline 570G May  8 13:33 raw
 -rw-rw-r-- 1 q783775 pipeline   86 May  8 11:19 readme.txt
 
 But when I ls into the raw folder, I get zero files:
 
 q782657@usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ ls -lh raw
 total 0
 
 If I mount that folder again... all the files re-appear.
 
 Is this a bug that's been solved in a newer release?
 
 KERNEL:
 Linux usadc-nasea05 3.11.0-20-generic #34~precise1-Ubuntu SMP Thu Apr 3 
 17:25:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 
 CEPH:
 ii  ceph  0.72.2-1precise   
 distributed storage and file system
 
 
 -- No errors that I could see on the client machine:
 
 q782657@usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ dmesg | grep ceph
 [588560.047193] Key type ceph registered
 [588560.047334] libceph: loaded (mon/osd proto 15/24)
 [588560.102874] ceph: loaded (mds proto 32)
 [588560.117392] libceph: client6005 fsid f067539c-7426-47ee-afb0-7d2c6dfcbcd0
 [588560.126477] libceph: mon1 10.18.176.180:6789 session established
 
 
 -- Ceph itself looks fine.
 
 root@usadc-nasea05:~# ceph health
 HEALTH_OK
 
 root@usadc-nasea05:~# ceph quorum_status
 {election_epoch:668,quorum:[0,1,2,3],quorum_names:[usadc-nasea05,usadc-nasea06,usadc-nasea07,usadc-nasea08],quorum_leader_name:usadc-nasea05,monmap:{epoch:1,fsid:f067539c-7426-47ee-afb0-7d2c6dfcbcd0,modified:0.00,created:0.00,mons:[{rank:0,name:usadc-nasea05,addr:10.18.176.179:6789\/0},{rank:1,name:usadc-nasea06,addr:10.18.176.180:6789\/0},{rank:2,name:usadc-nasea07,addr:10.18.176.181:6789\/0},{rank:3,name:usadc-nasea08,addr:10.18.176.182:6789\/0}]}}
 
 root@usadc-nasea05:~# ceph mon dump
 dumped monmap epoch 1
 epoch 1
 fsid f067539c-7426-47ee-afb0-7d2c6dfcbcd0
 last_changed 0.00
 created 0.00
 0: 10.18.176.179:6789/0 mon.usadc-nasea05
 1: 10.18.176.180:6789/0 mon.usadc-nasea06
 2: 10.18.176.181:6789/0 mon.usadc-nasea07
 3: 10.18.176.182:6789/0 mon.usadc-nasea08
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] issues with ceph

2014-05-09 Thread Lincoln Bryant
FWIW, I believe the particular/similar bug I was thinking of was fixed by:
 commit 590fb51f1c (vfs: call d_op-d_prune() before unhashing dentry)

--Lincoln

On May 9, 2014, at 12:37 PM, Gregory Farnum wrote:

 I'm less current on the kernel client, so maybe there are some
 since-fixed bugs I'm forgetting, but:
 
 On Fri, May 9, 2014 at 8:55 AM, Aronesty, Erik
 earone...@expressionanalysis.com wrote:
 I can always remount and see them.
 
 But I wanted to preserve the broken state and see if I could figure out 
 why it was happening.   (strace isn't particularly revealing.)
 
 Some other things I noted was that
 
 - if I reboot the metadata server nobody seems to fail over to the hot 
 spare (everything locks up until it's back online).   I'm guessing you have 
 to manually make the spare primary, and then switch back?
 
 That shouldn't happen. What's the output of ceph -s?
 
 - if I reboot the mon that someone is mounted to, his mount locks up (even 
 if I list 4 monitors in the fstab), but other clients still work.
 
 Can you elaborate? We discovered an issue in our userspace network
 code that might have an analogous problem in the kernel, but it
 generally was only a problem if a NIC disappeared (ie, powered off)
 without coming back on.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Committee

2013-11-06 Thread Lincoln Bryant
Seems interesting to me. I've added my name to the pot :)

--Lincoln

On Nov 6, 2013, at 11:56 AM, Loic Dachary wrote:

 
 
 On 07/11/2013 01:53, ja...@peacon.co.uk wrote:
 It's a great idea... are there any requirements, to be considered?
 
 Being a Ceph user seems to be the only requirement to me. Do you have 
 something else in mind ?
 
 Cheers
 
 
 On 2013-11-06 17:35, Loic Dachary wrote:
 Hi Ceph,
 
 I would like to open a discussion about organizing a Ceph User
 Committee. We briefly discussed the idea with Ross Turk, Patrick
 McGarry and Sage Weil today during the OpenStack summit. A pad was
 created and roughly summarizes the idea:
 
 http://pad.ceph.com/p/user-committee
 
 If there is enough interest, I'm willing to devote one day a week
 working for the Ceph User Committee. And yes, that includes sitting at
 the Ceph booth during the FOSDEM :-) And interviewing Ceph users and
 describing their use cases, which I enjoy very much. But also
 contribute to a user centric roadmap, which is what ultimately matters
 for the company I work for.
 
 If you'd like to see this happen but don't have time to participate
 in this discussion, please add your name + email at the end of the
 pad.
 
 What do you think ?
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 -- 
 Loïc Dachary, Artisan Logiciel Libre
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph creating stuck inactive and unclean

2013-11-03 Thread Lincoln Bryant

Hi Juan,

Are the two OSDs that you started with on the same host? I've seen the 
same problem, which fixed itself after I added more OSDs on a separate host.


Cheers,
Lincoln

On 11/3/2013 12:09 PM, Juan Vega wrote:

Ceph Users,

  


I'm trying to create a cluster with 9 OSDs manually (withouth
ceph-deploy). I started with only 2, and will be adding more afterwards.
The problem is that the cluster never finished 'creating', even though
the OSDs are up and in:

  


vegaj@ServerB2-exper:~/cephcluster$ ceph -c ceph.conf -w

   cluster 31843595-d34f-4506-978a-88d44eef

health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean

monmap e1: 1 mons at {serverB2-exper=192.168.100.2:6789/0}, election
epoch 2, quorum 0 serverB2-exper

osdmap e12: 2 osds: 2 up, 2 in

 pgmap v759: 192 pgs: 192 creating; 0 bytes data, 2436 MB used, 206
GB / 220 GB avail

mdsmap e1: 0/0/1 up

  

  


2013-11-03 10:02:35.221325 mon.0 [INF] pgmap v759: 192 pgs: 192
creating; 0 bytes data, 2436 MB used, 206 GB / 220 GB avail

2013-11-03 10:04:35.242818 mon.0 [INF] pgmap v760: 192 pgs: 192
creating; 0 bytes data, 2436 MB used, 206 GB / 220 GB avail

  


I'm not using authentication. My ceph.conf is as follows:

  


[global]

fsid = 31843595-d34f-4506-978a-88d44eef

mon_initial_members = serverB2-exper

mon_host = 192.168.100.2

osd_journal_size = 1024

filestore_xattr_use_omap = true

auth cluster required = none

auth service required = none

auth client required = none

auth supported = none

  


Am I missing something?

  

  


Thanks,

  


Juan Vega

  





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xfsprogs not found in RHEL

2013-08-27 Thread Lincoln Bryant
Hi,

xfsprogs should be included in the EL6 base.

Perhaps run yum clean all and try again?

Cheers,
Lincoln

On Aug 27, 2013, at 9:16 PM, sriram wrote:

 I am trying to install CEPH and I get the following error - 
 
 --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-babel.noarch 0:0.9.4-5.1.el6 will be installed
 --- Package python-backports-ssl_match_hostname.noarch 0:3.2-0.3.a3.el6 will 
 be installed
 --- Package python-docutils.noarch 0:0.6-1.el6 will be installed
 -- Processing Dependency: python-imaging for package: 
 python-docutils-0.6-1.el6.noarch
 --- Package python-jinja2.x86_64 0:2.2.1-1.el6 will be installed
 --- Package python-pygments.noarch 0:1.1.1-1.el6 will be installed
 --- Package python-six.noarch 0:1.1.0-2.el6 will be installed
 -- Running transaction check
 --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
 -- Finished Dependency Resolution
 Error: Package: ceph-0.67.2-0.el6.x86_64 (ceph)
Requires: xfsprogs
 
 
 Machine Info - 
 
 Linux version 2.6.32-131.4.1.el6.x86_64 
 (mockbu...@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 
 4.4.5-6) (GCC) ) #1 SMP Fri Jun 10 10:54:26 EDT 2011
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Trying to identify performance bottlenecks

2013-08-05 Thread Lincoln Bryant
Hi all,

I'm trying to identify the performance bottlenecks in my experimental Ceph 
cluster. A little background on my setup:
10 storage servers, each configured with:
-(2) dual-core opterons
-8 GB of RAM
-(6) 750GB disks (1 OSD per disk, 7200 RPM SATA, probably 4-5 
years old.). JBOD w/ BTRFS
-1GbE
-CentOS 6.4, custom kernel 3.7.8
1 dedicated mds/mon server
-same specs at OSD nodes
(2 more dedicated mons waiting in the wings, recently reinstalled ceph)
1 front-facing node mounting CephFS, with a 10GbE connection into the 
switch stack housing the storage machines
-CentOS 6.4, custom kernel 3.7.8

Some Ceph settings:
[osd]
osd journal size = 1000
filestore xattr use omap = true

When I try to transfer files in/out via CephFS (10GbE host), I'm seeing only 
about 230MB/s at peak. First, is this what I should expect? Given 60 OSDs 
spread across 10 servers, I would have thought I'd get something closer to 
400-500 MB/s or more. I tried upping the number of placement groups to 3000 for 
my 'data' pool (following the formula here: 
http://ceph.com/docs/master/rados/operations/placement-groups/) with no 
increase in performance. I also saw no performance difference between XFS and 
BTRFS.

I also see a lot of messages like this in the log: 
10.1.6.4:6815/30138 3518 : [WRN] slow request 30.874441 seconds old, received 
at 2013-07-31 10:52:49.721518: osd_op(client.7763.1:67060 10003ba.13d4 
[write 0~4194304] 0.102b9365 RETRY=-1 snapc 1=[] e1454) currently waiting for 
subops from [1]

Does anyone have any thoughts as to what the bottleneck may be, if there is 
one? Or, any idea what I should try to measure to determine the bottleneck?

Perhaps my disks are just that bad? :)

Cheers,
Lincoln___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com