Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis
Forgive the wall of text, i shortened it a little here is the osd log when I attempt to start the osd: 2018-08-04 03:53:28.917418 7f3102aa87c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-21) detect_feature: extsize is disabled by conf 2018-08-04 03:53:28.977564 7f3102aa87c0 0

[ceph-users] Inconsistent PGs every few days

2018-08-03 Thread Dimitri Roschkowski
Hi, I run a cluster with 7 OSD. The cluster has no much traffic on it. But every few days, I get a HEALTH_ERR, because of inconsistent PGs: root@Sam ~ # ceph status cluster: id: c4bfc288-8ba8-4c3a-b3a6-ed95503f50b7 health: HEALTH_ERR 3 scrub errors

Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Redmond
Hi, You can export and import PG's using ceph_objectstore_tool, but if the osd won't start you may have trouble exporting a PG. It maybe useful to share the errors you get when trying to start the osd. Thanks On Fri, Aug 3, 2018 at 10:13 PM, Sean Patronis wrote: > > > Hi all. > > We have an

[ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis
Hi all. We have an issue with some down+peering PGs (I think), when I try to mount or access data the requests are blocked: 114891/7509353 objects degraded (1.530%) 887 stale+active+clean 1 peering 54 active+recovery_wait 19609

Re: [ceph-users] FileStore SSD (journal) vs BlueStore SSD (DB/Wal)

2018-08-03 Thread Xavier Trilla
Hi Sam, Having done any benchmark myself -as we only use SSDs or NVMes- but is my understanding Luminous -I would not recommend upgrading production to Mimic yet, but I’m quite conservative- Bluestore is going to be slower for writes than filestore with SSD journals. You could try dmcache,

Re: [ceph-users] RGW problems after upgrade to Luminous

2018-08-03 Thread David Turner
I came across you mentioning bucket check --fix before, but I totally forgot that I should be passing --bucket=mybucket with the command to actually do anything. I'm running this now and it seems to actually be doing something. My guess was that it was stuck in the state and now that I can clean

Re: [ceph-users] RGW problems after upgrade to Luminous

2018-08-03 Thread Yehuda Sadeh-Weinraub
Oh, also -- one thing that might work is running bucket check --fix on the bucket. That should overwrite the reshard status field in the bucket index. Let me know if it happens to fix the issue for you. Yehuda. On Fri, Aug 3, 2018 at 9:46 AM, Yehuda Sadeh-Weinraub wrote: > Is it actually

Re: [ceph-users] RGW problems after upgrade to Luminous

2018-08-03 Thread Yehuda Sadeh-Weinraub
Is it actually resharding, or is it just stuck in that state? On Fri, Aug 3, 2018 at 7:55 AM, David Turner wrote: > I am currently unable to write any data to this bucket in this current > state. Does anyone have any ideas for reverting to the original index > shards and cancel the reshard

Re: [ceph-users] Ceph Balancer per Pool/Crush Unit

2018-08-03 Thread Reed Dier
I suppose I may have found the solution I was unaware existed. > balancer optimize { [...]} : Run optimizer to create a > new plan So apparently you can create a plan specific to a pool(s). So just to double check this, I created two plans, plan1 with the hdd pool (and not the ssd pool);

Re: [ceph-users] RGW problems after upgrade to Luminous

2018-08-03 Thread David Turner
I am currently unable to write any data to this bucket in this current state. Does anyone have any ideas for reverting to the original index shards and cancel the reshard processes happening to the bucket? On Thu, Aug 2, 2018 at 12:32 PM David Turner wrote: > I upgraded my last cluster to

Re: [ceph-users] FileStore SSD (journal) vs BlueStore SSD (DB/Wal)

2018-08-03 Thread Sam Huracan
Hi, Anyone can help us answer these questions? 2018-08-03 8:36 GMT+07:00 Sam Huracan : > Hi Cephers, > > We intend to upgrade our Cluster from Jewel to Luminous (or Mimic?) > > Our model is currently using OSD File Store with SSD Journal (1 SSD for 7 > SATA 7.2K) > > My question are: > > >

Re: [ceph-users] Error: journal specified but not allowed by osd backend

2018-08-03 Thread David Majchrzak
Thanks Eugen! I was looking into running all the commands manually, following the docs for add/remove osd but tried ceph-disk first. I actually made it work by changing the id part in ceph-disk ( it was checking the wrong journal device, which was owned by root:root ). The next part was that I

Re: [ceph-users] Error: journal specified but not allowed by osd backend

2018-08-03 Thread Eugen Block
Hi, we have a full bluestore cluster and had to deal with read errors on the SSD for the block.db. Something like this helped us to recreate a pre-existing OSD without rebalancing, just refilling the PGs. I would zap the journal device and let it recreate. It's very similar to your

Re: [ceph-users] stuck with active+undersized+degraded on Jewel after cluster maintenance

2018-08-03 Thread Pawel S
On Fri, Aug 3, 2018 at 2:07 PM Paweł Sadowsk wrote: > On 08/03/2018 01:45 PM, Pawel S wrote: > > hello! > > > > We did maintenance works (cluster shrinking) on one cluster (jewel) > > and after shutting one of osds down we found this situation where > > recover of pg can't be started because of

Re: [ceph-users] Ceph MDS and hard links

2018-08-03 Thread Yan, Zheng
On Fri, Aug 3, 2018 at 8:53 PM Benjeman Meekhof wrote: > > Thanks, that's useful to know. I've pasted the output you asked for > below, thanks for taking a look. > > Here's the output of dump_mempools: > > { > "mempool": { > "by_pool": { > "bloom_filter": { >

Re: [ceph-users] Ceph MDS and hard links

2018-08-03 Thread Benjeman Meekhof
Thanks, that's useful to know. I've pasted the output you asked for below, thanks for taking a look. Here's the output of dump_mempools: { "mempool": { "by_pool": { "bloom_filter": { "items": 4806709, "bytes": 4806709 },

Re: [ceph-users] stuck with active+undersized+degraded on Jewel after cluster maintenance

2018-08-03 Thread Paweł Sadowsk
On 08/03/2018 01:45 PM, Pawel S wrote: > hello! > > We did maintenance works (cluster shrinking) on one cluster (jewel) > and after shutting one of osds down we found this situation where > recover of pg can't be started because of "querying" one of peers. We > restarted this OSD, tried to out and

[ceph-users] stuck with active+undersized+degraded on Jewel after cluster maintenance

2018-08-03 Thread Pawel S
hello! We did maintenance works (cluster shrinking) on one cluster (jewel) and after shutting one of osds down we found this situation where recover of pg can't be started because of "querying" one of peers. We restarted this OSD, tried to out and in. Nothing helped, finally we moved out data

Re: [ceph-users] Cephfs meta data pool to ssd and measuring performance difference

2018-08-03 Thread Linh Vu
Try IOR mdtest for metadata performance. From: ceph-users on behalf of Marc Roos Sent: Friday, 3 August 2018 7:49:13 PM To: dcsysengineer Cc: ceph-users Subject: Re: [ceph-users] Cephfs meta data pool to ssd and measuring performance difference I have moved

Re: [ceph-users] [Jewel 10.2.11] OSD Segmentation fault

2018-08-03 Thread Alexandru Cucu
Hello, Another OSD started randomly crashing with segmentation fault. Haven't managed to add the last 3 OSDs back to the cluster as the daemons keep crashing. --- -2> 2018-08-03 12:12:52.670076 7f12b6b15700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1533287572670073, "job": 3, "event":

Re: [ceph-users] Cephfs meta data pool to ssd and measuring performance difference

2018-08-03 Thread Marc Roos
I have moved the pool, but strange thing is that if I do something like this for object in `cat out`; do rados -p fs_meta get $object /dev/null ; done I do not see any activity on the ssd drives with something like dstat (checked on all nodes (sdh)) net/eth4.60-net/eth4.52

[ceph-users] Strange OSD crash starts other osd flapping

2018-08-03 Thread Daznis
Hello, Yesterday I have encountered a strange osd crash which led to cluster flapping. I had to force nodown flag on the cluster to finish the flapping. The first osd that crashed with: 2018-08-02 17:23:23.275417 7f87ec8d7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8803dfb700'

Re: [ceph-users] Write operation to cephFS mount hangs

2018-08-03 Thread Eugen Block
Hi, I send the logfile in the attachment. I can find no error messages or anything problematic… I didn't see any log file attached to the email. Another question: Is there a link between the VMs that fail to write to CephFS and the hypervisors? Are all failing clients on the same

Re: [ceph-users] [Ceph-maintainers] download.ceph.com repository changes

2018-08-03 Thread Fabian Grünbichler
On Mon, Jul 30, 2018 at 11:36:55AM -0600, Ken Dreyer wrote: > On Fri, Jul 27, 2018 at 1:28 AM, Fabian Grünbichler > wrote: > > On Tue, Jul 24, 2018 at 10:38:43AM -0400, Alfredo Deza wrote: > >> Hi all, > >> > >> After the 12.2.6 release went out, we've been thinking on better ways > >> to remove